Kirrkirr: a Bilingual Warlpiri-English Dictionary
Download
Report
Transcript Kirrkirr: a Bilingual Warlpiri-English Dictionary
Kirrkirr: a Bidirectional WarlpiriEnglish Dictionary
Kristen Parton
Kirrkirr: Objectives
Kirrkirr aims to present the contents of a dictionary in a way which is
flexible, interactive, customizable, and (especially) fun
Kirrkirr has diverse target users, with varying levels of literacy, for
example professional linguists, elementary school children,
teachers, and native speakers
Currently, Kirrkirr is used with the Australian Aboriginal language
Warlpiri, spoken by about 3,000 people in northern Australia
Kirrkirr uses a Warlpiri-English dictionary developed by linguists in
Australia, with detailed information about each word, including
glosses, definitions, dialects, grammatical comments and crossreferences between words for synonyms, antonyms, “see also” and
other relationships
Unlike paper dictionaries, electronic dictionaries can provide an
interactive educational tool customizable to various audiences
Dictionary Usability
The interface has a colorful, clickable panel which links words
related in different ways, rather than just relying on the alphabetical
list of words; this also makes the dictionary more interactive
Many words are linked to pictures and sounds, which reinforce the
meaning of the words through non-textual means
The dictionary uses “fuzzy spelling” to catch spelling errors made
by the user when searching for a word
User modes tailor the appearance of the formatted entries to each
target audience:
English meaning only,for novice users with English
backgrounds
In Warlpiri, for native speakers of Warlpiri
Basic details, for intermediate users such as students
Full details, for advanced users such as teachers or linguists
Lexicon Structure
The dictionary is maintained by linguists in Australia in an adhoc text format, which is converted to a structured XML
dictionary by a Perl script
Rather than load the large (10Mb) XML file in memory, each
headword’s XML entry is loaded individually as needed
The rich structure of the XML allows XSLT stylesheet
manipulation of the dictionary entries to produce output
formatted differently for different users
The XSLT stylesheet outputs HTML pages, which make use of
the cross-references in the dictionary by creating hyperlinks
between different words
Customizing Format with XSLT
At run-time, the XML entries are processed by an XSLT stylesheet,
which selects which elements of the entry to show, determines the
order to show them in, and formats each field differently depending on
the user mode
For example, “Meaning only” outputs the english glosses of a word
in large font, whereas “Full details” outputs all of the information in
the dictionary in a normal sized font in a specific order.
Since the XML is parsed at run-time, more information can be added to
the XML to allow “parameter passing” from the program to the XSLT
For example, the location of the images folder can only be
determined at run-time, but by adding an <IMAGE-DIR> field to the
XML at run-time, the XSLT can create an <IMG SRC> tag to display
an image in the HTML output
English-Warlpiri Dictionary
The original dictionary is one-way Warlpiri to English, but a
bidirectional bilingual dictionary is more useful for most users
An English index was built from glosses in the dictionary such that
each gloss links to the equivalent Warlpiri entries.
Rather than being two separate monolingual dictionaries, these
dictionaries share the same data, thus eliminating conflicting entries
and maintaining consistency
The XML entries of all the Warlpiri equivalents to an English word are
merged, and passed to an XSL T spreadsheet, which creates an
HTML page for the English word
English-Warlpiri Dictionary
To make the English dictionary symmetric to the Warlpiri, Kirrkirr
now has an English word list, English formatted entries, a much
faster English search, and the capability to do “fuzzy spelling” in
English
Problems arise because most Warlpiri words have several English
equivalents, and also because phrases in English might be indexed
under several different terms
For example, “yawarrangi” meaning “large male kangaroo”
should be indexed under “kangaroo” rather than “large” or “male”
However, the “jawirdiki” and other words that mean “stay put”
should be indexed under “stay” and not “put”
Words like “kirany-kiranypa” meaning “spinifex lizard” should be
indexed under “spinifex” (the type) and “lizard”
Warlpiri Morphology
Warlpiri is an agglutinating language, meaning that grammatical
suffixes get added on to words:
nyangulparnangku
nya- ngu- lpa- rnangku
See- PAST- IPFV- 1SG.SUBj- 2SG.OBJ
“I was looking at you.”
Root word: “nya-nyi” meaning “to see”
For lookup in the dictionary, users have to know the root word
This is difficult for learners of Warlpiri, given that morphemes are not
always separated by hyphens and verbs are indexed with non-past
tense inflections
To make Kirrkirr more usable, a morphological analyzer was
implemented to accept well-formed Warlpiri words and find the
possible root words to look up
Morphological Analysis
Suffixes from the dictionary are stored in a trie for quick lookup
Each time an affix is stripped, the remaining string is checked to
see whether it is in the dictionary
Each possible morpheme is added to a lattice structure which
holds all possible morphological decompositions of the word
Grammar rules are applied to eliminate many impossible parses
Some properties of Warlpiri make parsing more difficult, and
show the need for a different indexing system:
Verbs are stored with non-past inflections but are seen with
different inflections. For example, “nya-nyi” may show up as
“nya-ngu.” But indexing “nya-nyi” under “nya” creates more
abiguity, since “nya” is another word.
Some words have optional suffixes, such as “l(pa)” which
may be seen as “l” or “lpa.” These words must be indexed
under both entries.
Conclusions
Making Kirrkirr a bidirectional English-Warlpiri and Warlpiri-English
dictionary increases its usability and practicality, by making it easier
for users who are more comfortable in English to browse and search
in English.
Allowing lookup of Warlpiri words from actual speech using the
morphological analysis also increases usability, especially for users
who are learning Warlpiri, since they do not have to figure out the
root word.
Future work:
Improving the morphological analysis to provide roughly ranked
possible parses of all morphemes of an entire word, using more
grammatical information and frequency information
Extending Kirrkirr to other languages