Aarhus - SBForge
Download
Report
Transcript Aarhus - SBForge
Aarhus
BnF main topics – 2013 – crawling
side
• Keep crawling
– Broad and focused crawls
– Limit of 100 Tb
• Crawl of password protected content
– “Press project”: PDFs of daily newspapers
– Tests with other kinds of content
• Work on direct deposit of e-books
BnF main topics – 2013 – access
and preservation sides
• Merging professional and public WB
– Various optimizations
– Clickable permalink…
• Draw links between web archives and BnF indexing and
promotion tools
– general catalogue, data.bnf.fr…
• Open access to web archives in regional libraries
– Legal and technical aspects
• Start ingesting our web archives in our digital repository
Direct deposit for e-books?
• High-level discussions between National
Publishers Union and BnF
– A better international framework: IFLA
statement on legal deposit, FEP/CENL
declaration…
• Why not crawling?
– A better unitary indexation of each e-book
– No problems of DRMs
– Discussing directly with publishers
Direct deposit for e-books? /
technical side
• A technical layer is available: the extranet for
publishers
– 2011: digital legal deposit forms
– 2012/3: direct transfer of metadata (ONIX)
– 2013/4: ebooks?
• What do we need to decide?
– Who will be the main interlocutor?
– How many and what kind of formats? What
validation? Is it possible to refuse?
– What link between the paper and digital version in the
catalogue?
– What access tool? Gallica or web archives?
RESAW project : some keywords
• Networking (researchers and heritage
institutions)
• Standards and collection quality
• Shared tools and services (storage
infrastructure, analyzing tools, portal)
• Methods and training
RESAW project : interest for BnF
• Promote the use of web archives towards
researchers
• Help launching international and national
research programs
• Offer groundbreaking tools and services
• Get feedback about our collection development
policies
• Promote the building and use of web archives
towards high level decision makers
Current situation at BnF
• No current research project
– But the Web legal deposit team involved in research frameworks:
“Labex” : “excellence laboratories”
– Participation in the “Hypertext corpus initiative framework” (lead:
Medialab)
• Relationships with researchers
– Political sciences (Political science institute in Paris and
Grenoble, universities of Nancy and Cergy)
– Social sciences (university of Paris 1, Grenoble)
– Netart (Avignon)
– Web metrics (AFNIC)?
• Relationships with associations (literature, sustainable
development…)
International initiatives to follow up
• Collaborative web harvesting
– EU elections, “Olympics” project, Vaclav Havel collection
– Use of “nomination tool” provided by University of North
Texas
• Portal and shared access
– IIPC website, Memento
• Research project
– BL/IA/JISC project on .uk analysis
– 80 Tb of data provided by IA
– Common crawl project (?)
• Training
– PhD sponsorship (UNT)
Questions and comments
• The networked we dream about!
• Some objectives already (partially)
covered by IIPC
– standards, interoperability, shared portal
• Legal issues will be very difficult to solve
• Be cautious with the term “quality” (prefer
relevancy for specific goals?)
• What will you ask for?
– Money, doctoral students, engineers…