Transcript Document
SIL FieldWorks Language Explorer:
The lexicon component
Gary Simons
SIL International
Lexicon Tools and Lexicon Standards
Nijmegen, 4–5 August 2010
SIL FieldWorks
FieldWorks is:
a suite of integrated software tools to help
field workers manage language and
cultural data, with support for complex
scripts.
http://fieldworks.sil.org/
The Language Explorer tool is designed to:
manage a lexical database
produce dictionaries
interlinearize texts
analyze morphology
2
Quick Tour
A short quick tour screen movie
demonstrates the look and feel
It is the first of 55 narrated screen movies
available at:
http://downloads.sil.org/FieldWorks/Movies
/brief demo menu.html
3
Integration among areas
The Lexicon, Texts, and Grammar areas
all operate over the same database.
In the Lexicon area, users enter lexical
entries directly.
In the Texts area, as new morphemes are
glossed in text, new lexical entries are
created behind the scenes.
In the Grammar area, users describe the
categories and features used in lexical
description, plus the inflectional templates
that guide automatic parsing in Texts. 4
Conceptual-modeling approach
Lexicon, texts, and grammar are all stored
in a single, normalized relational database.
We began by working with domain experts
to build a conceptual model of the areas
and how they integrate.
That was modeled in UML and transformed
to a SQL relational database schema.
See the full model with over 100 classes at:
http://fieldworks.sil.org/ModelDoc/ModelDocumentation.chm
5
Some key features
Use automatic parsing to empirically verify
morphological description within lexicon
Build the word net via lexical relations
Build richness into the lexicon by eliciting
through semantic domains
Use “bulk edit” for global clean up
Repurpose content by developing multiple
presentation views
Clean separation between stored data and
presentation (see example in next 2 slides)
6
Root-based dictionary (Cherokee)
- Stem entries just cross-refer to root
- Root entries list stems as subentries
- Subentries give full description
7
Stem-based dictionary (Cherokee)
- Stem entries give full description
- Root entries cross-refer to stems
- No subentries
8
Pathways to publishing
First create a “configured view” to display the
lexical entries as desired
Then use the Pathway plug-in to take this
stream of configured content and lay it out
onto pages for a publishable dictionary
http://code.google.com/p/pathway/
Publishing tools supported so far:
Prince XML (to PDF)
Open Office (to ODF)
Adobe InDesign
9
Lexical interchange
Supports two import formats:
From Shoebox / Toolbox via SFM
“Standard Format Markers” = backslash codes
User configures the mapping of markers to
conceptual equivalents in FLEx database
The default mapping is for MDF SFM
From WeSay / Lexique Pro via LIFT
Lexicon Interchange FormaT: an XML
application for interchange of lexicons
http://code.google.com/p/lift-standard/
10
Lexicon export
The entire database for a language project
can be dumped to Fieldworks XML
http://fieldworks.sil.org/supportdocs/FieldWorks XML model.doc
The complete lexical database (a subset of
the whole project) can be exported to:
LIFT XML
MDF-based SFM (either root- or stem-based)
http://fieldworks.sil.org/supportdocs/Export options in Flex.doc
11
More lexicon export
Any configured view can be exported to:
A streamlined version of Fieldworks XML
MDF-based SFM
XHTML + CSS for presentation
Furthermore, one can create a Fieldworks
XML Template (FXT) to define a custom
export format (XML, SFM, plain text)
http://fieldworks.sil.org/supportdocs/FXT export options.doc
12
Interoperation with GOLD
FLEX is preloaded with a grammatical categories
catalog that is based on an early GOLD
http://www.sil.org/computing/fieldworks/flex/categories.html
Similarly, a Morphosyntactic Gloss Assistant is
preloaded with morphosyntactic properties from
an early GOLD; see p. 10 of:
http://www.sil.org/~simonsg/preprint/FLExParser Preprint.pdf
Thus morphosyntactic information in lexicon and
texts is implicitly aligned with GOLD
The remaining step is for us to map to GOLD ids
when they are standardized; then we can easily
export GOLD ids in LIFT and other XML
13
Uptake
October 2009: FLEx 3.0 released in Fieldworks 6.0.
Free download from:
http://www.sil.org/computing/fieldworks/FW_downloads.htm
323 members of a reasonably active Google
Group (~3,000 messages)
http://groups.google.com/group/flex-list
185 language projects have registered as users
Over 30 did a 4-day FLEx workshop led by Beth
Bryson at InField 2010. Beth will also do a
one-day FLEx workshop at ICLDC, Feb 2011.
14