Presentation - E

Download Report

Transcript Presentation - E

Tools and Standards:
The State of the Art
E-MELD Meeting 2006
Report: Working Group 3:
'Lexicon creation'
Steve Abney, Dunstan Brown, Östen Dahl,
Sebastian Drude, Susanna Imrie,
Marc Kemps-Snijder, Christopher Manning,
Mike Maxwel, Vivian Ngai
Lexical databases and tools
●
General remarks, ecology
●
Comments on the tools-page
●
description of needed tools and standards
Lexical databases and tools
●
General remarks, ecology
●
Comments on the tools-page
●
description of needed tools and standards
General
'Lexicon creation':
many more issues than this, considering
lexical data in its 'environment'ː
- interoperability with other types of data
- search on lexical data
- presentation and archivation ...
Position wrt. Documentation / Description
General: ecology of lexica
C
C
4
1a
3a
LDB
6
LDB
7
12
5
St +
Th
11
8
9
3
TXT
LD
output
10
12
2
1
A
12
LDB
U
Lexical databases and tools
●
General remarks, ecology
●
Comments on the tools-page
●
description of needed tools and standards
Comments on tool pages
Format of presentationː
● Eliminate ratings, add keywords
(main functionalities)
● Using basic tasks in the workflow related to
lexical databasesː




Interaction with (interlinear) texts
Concordance etc.
Consistency control
Output / presentation formats
Comments on tool pages
Workflow / tasks:
Word discovery, creation of entries
● Enrichment of information on lexical units
● Revisions, cleaning up
● merger with other databases, collaboration
● Queries, data mining / retrieval
● Output + Presentation
●
Comments on tool pages
Potentially useful additions:
●
Links to some exemplary on-line dictionaries
●
Word lists for elicitation
●
Mention general tools
●
Mention Wiktionary-technology
Lexical databases and tools
●
General remarks, ecology
●
Comments on the tools-page
●
needed tools and standards
Needed tools and standards
Tools:
●
Consistency control / management
(values of data categories, structure)
●
Morphological parsers (modules)
●
Version Control, collaboration
Needed tools and standards
Standards
● It is difficult to imagine a widely accepted
standard for a fixed microstructure for lexical
entries
● There should nevertheless be proposals /
templates, especially for specific areas
● There can be repositories of terminology /
data categories, to choose from or for
orientation
● Also, repositories for values (controlled
vocabularies -> GOLD, semantic domains...)
Needed tools and standards
There are some proposals for standards that
should be referred to for orientation:
● MDF
● OLIF
● LMF (ISO)
Nevertheless, tools should generally allow for
customization of the structure