Aucun titre de diapositive

Download Report

Transcript Aucun titre de diapositive

Pierre Bérard
Institut Fourier, CNRS–Université Joseph Fourier
&
Cellule MathDoc, CNRS–Université Joseph Fourier
Grenoble (France)
Cornell July 25, 2002
NUMDAM
Cellule MathDoc
www-mathdoc.ujf-grenoble.fr
• An institute on Scientific Information & Communication in
Mathematics, supported by Centre National de la
Recherche Scientifique (CNRS) and Ministère de la
Recherche.
• General mission: documentation issues in mathematics at
the national level in France, in cooperation with
mathematics libraries and institutes.
Cornell July 25, 2002
NUMDAM
Cornell July 25, 2002
NUMDAM
NUMDAM
Digitisation of Ancient Mathematics Documents
NUMérisation de
Documents
Anciens
Mathématiques
A digitisation program supported by
and Ministère de la Recherche,
managed by the Cellule MathDoc.
Cornell July 25, 2002
NUMDAM
NUMDAM: aims
• Reinforce French mathematical journals (visibility,
accessibility, durability).
• Hand down digitised archives of the French
mathematical heritage to future generations and
participate in international efforts with the same
endeavour.
• Strive towards making this digitised mathematical
heritage freely accessible.
Cornell July 25, 2002
NUMDAM
Political choices
• Database freely accessible on the web.
• Full text freely accessible after a moving – wall
(depending on each serial).
• Scheduled interoperability between retro-digitized
and natively digital collections.
• National and international co-operations in as far
as possible.
Cornell July 25, 2002
NUMDAM
Technical choices
• Scan from first to last page @ 600 dpi.
• OCR (non-corrected @99,9%, mathematical formulae and images
excluded).
• Multi-page files for logical units (TIFF, PDF + hidden text, DjVu).
• End-of-article bibliographies treated (corrected OCR @ 99,99% + markup of “ author ”, “ title ”, “ year ” fields)
• Database: cataloguing data for each article, summary (if present), endof-article bibliography (if present), hidden OCRed text. Structured data
exchange in XML.
• In as far as possible links to/from JFM, ZM and MR databases.
• Future enhancements scheduled depending on technology available.
Cornell July 25, 2002
NUMDAM
Production choices
• Use of an external operator for the technical treatments.
• « In house » study, segmentation, cataloguing, quality control,
and display.
• Quality and durability policy :
 Prefer standard and easily convertible formats, as sources of future
processing if necessary (TIFF, XML), not be tied to a proprietary system.
 Archive high quality images, which should allow to regenerate the text
(formula OCR, structure recognition).
Cornell July 25, 2002
NUMDAM
NUMDAM Phase I
Journals
Journal
Period
Annales de l’Institut Fourier
1949 – 2000
Bulletin de la Société mathématique de France
1872 – 2000
Mémoires de la Société mathématique de France
1964 – 2000
Publications mathématiques (IHÉS)
1959 – 2000
Journées équations aux dérivées partielles
1975 – 2000
(Saint-Jean-de-Monts)
About 136 000 pages and 5 500 articles
Annales scientifiques de l’École normale
1864 – 1998
supérieure
About 67 000 pages and 1 750 articles
Cornell July 25, 2002
NUMDAM
NUMDAM Phase I: Chronology
• Spring 2003. — End of the industrial phase of NUMDAM Phase I,
public access to articles via the web.
• Autumn 2002. — Start of NUMDAM Phase II. Dealing with © issues
continued.
• August 2002. — First 50,000 pages delivered by vendor.
• Feb. - May 2002. — Setting-up production chain (vendor) and quality
control (Cellule MathDoc). Dealing with © issues.
• Dec. 2001. — Choice of vendor validated by CNRS.
• Nov. 2000 - Oct. 2001. — Cataloguing and checking database.
• Oct. 2000 - May 2001. — Writing up schedule of conditions/vendor.
• July 2000. — Funding by CNRS.
Cornell July 25, 2002
NUMDAM
NUMDAM Phase II
• Take an active part in the Digital Mathematics Library project. Cooperate with
other digitisation projects (Gallica–BnF, possibly EMANI digitisation part).
Inventory of resources & cooperation with historians and mathematicians to make
scientific choices and establish priorities, in order to
• Digitise all French mathematics journals (Annales de l’Institut Henri Poincaré,
Annales de l’Université de Toulouse, Comptes Rendus de l’Académie, Journal
de l’École polytechnique, ....), and possibly some mathematically important
general science journals.
• Digitise important seminar series (séminaires Bourbaki, Cartan, séminaire de
Probabilités de Strasbourg, ...).
•
Digitise a substantial set of important monographies.
Cornell July 25, 2002
NUMDAM
NUMDAM programme: overview
Examination of
collections and settingup the database
Schedule of technical
conditions
Vendor
Digitisation
Segmentation
Treatements
(ocr & bibliographies)
Quality control
Software
developments
SQL  XML
Quality control
Authors id & ©
Display: Search and
Browsing
Display
Links
Links: JFM, MR, ZM
Cornell July 25, 2002
Database maintenance
NUMDAM
Copyright issues and
negotiations with
publishers
Quality control procedure
LOG
Automatic control
Perl
(Log of errors)
Rejection
Synthesis
Files received
from vendor
TIFF; XML, TIFF,
PDF and DjVu
Sorting samples
Perl
Samples
Check-list
Php
Log of errors
(files TIFF; XML, TIFF, PDF, DjVu)
BD
MySQL
Cornell July 25, 2002
Validation
Visual control
NUMDAM
NUMDAM Programme
XML description of physical volumes
Cornell July 25, 2002
NUMDAM
Publications Mathématiques
de l’Institut des Hautes Études Scientifiques
Physical volume: Year 1962, Volume 12
Cornell July 25, 2002
NUMDAM
A paper in a physical volume
Article by Bernard Dwork in Publications Mathématiques IHÉS, 12 (1962), 5-68
Cornell July 25, 2002
NUMDAM
Cornell July 25, 2002
NUMDAM
Bibliographies
Cornell July 25, 2002
NUMDAM
Cross-linking
MR
28#3039
ZM
0173.48601
MR 10,592e
ZM 0032.39402
PMIHES_1962__12__5_0
SQL
EDBM
DB of articles & DB of images
Cornell July 25, 2002
PDF
DjVu
NUMDAM
External databases
JFM, MR, ZM, ...
MR —— NUMDAM
MR–lookup
|Publications IHES|Shih||13||1962||PMIHES_1962__13__5_0||
BdD
NUMDAM
MR–lookup
|Inst. Hautes Etudes Sci. Publ. Math.|Shih||13||1962||PMIHES_1962__13__5_0|
26#1893|Homologie des espaces fibr\'es.
Cornell July 25, 2002
NUMDAM
MR
JFM & ZM —— NUMDAM
New identification tool in development
in the LIMES framework (EU project)
|Publications IHES|Shih||13||1962||PMIHES_1962__13__5_0||
BdD
NUMDAM
ZM–lookup
|Inst. Hautes Etudes Sci. Publ. Math.|Shih||13||1962||PMIHES_1962__13__5_0|
0105.16903|Homologie des espaces fibr\'es.
Cornell July 25, 2002
NUMDAM
ZM
Identification of authors:
two purposes
• Improve search facilities by setting-up a
reference list of authors.
• Provide a tool to help address copyright
issues.
Cornell July 25, 2002
NUMDAM
Internal tool ...
Cornell July 25, 2002
NUMDAM
Cornell July 25, 2002
NUMDAM
Cornell July 25, 2002
NUMDAM
Cornell July 25, 2002
NUMDAM
Cornell July 25, 2002
NUMDAM
NUMDAM: search interface based on EDBM (in development)
Cornell July 25, 2002
NUMDAM
Abstract if available
JFM
MR
ZM
Cornell July 25, 2002
NUMDAM
NUMDAM URLs
•
•
•
•
•
•
Main:
www-mathdoc.ujf-grenoble.fr/NUMDAM/
Visitors (sample files):
www-mathdoc.ujf-grenoble.fr/NUMDAM/Visitors/
Login: VISITORS Pwd: v\to\num
LiNuM (Books at BnF, Cornell, Göttingen, Michigan):
www-mathdoc.ujf-grenoble.fr/LiNuM/
Journal de Mathématiques Pures et Appliquées 1836 – 1880 (BnF):
www-mathdoc.ujf-grenoble.fr/JMPA/
Search NUMDAM database:
math-sahel.ujf-grenoble.fr/NUMDAM/Public/Bd/consultation.htm
Inventory:
math-sahel.ujf grenoble.fr/NUMDAM/Public/Inventaire/inventaire.htm
Cornell July 25, 2002
NUMDAM
Thank you for your attention ...
Cornell July 25, 2002
NUMDAM