Transcript ppt

Information Access to Historical Documents
from the Early New High German Period
(Andreas Hauser, Markus Heller, Elisabeth Leiss, Klaus U. Schulz, Christiane Wanzeck)
Background: Recovering cultural heritage (Google print, digital libraries)
- huge amount of historical documents in libraries
- growing amount of historical documents digitized in symbolic form
- spelling variations, language change: normalization difficult
- yet no good tools for searching, data mining, ...
- recent broad interest in humanities
Challenge: Cope with many interesting spelling variations in historical corpora
- between noisy text and foreign language text
- time/place dependency
- special correction/normalization techniques needed
Our work
- classification of error classes
- design of strategy for relating modern words with historical variants
- experiments with various string distances