Power Point - E

Download Report

Transcript Power Point - E

Comparability of language data and
analysis
Using an ontology for linguistics
Scott Farrar, U Bremen
Terry Langendoen, U Arizona
Jan 9, 2004
Symposium on Best Practice
LSA, Boston, MA
1
Multiple language resources


Symposium focus so far has been on
digital preservation of the work of
individual projects.
Imagine there are 100,000 or more Web
accessible digital language archives
covering most of the world’s languages.

Jan 9, 2004
annotated texts, lexicons, grammatical
descriptions, research papers, typological
comparisons, ...
Symposium on Best Practice
LSA, Boston, MA
2
Limits on access to content



Jan 9, 2004
Metadata gets you only a little way in.
String searching gets results, but it’s often
not reliable (low “precision” and “recall”).
Database searches typically can only be
carried out one site at a time.
Symposium on Best Practice
LSA, Boston, MA
3
Smart searches need smart data



Jan 9, 2004
Use informational, not presentational,
markup (cf. presentations by Simons and
Lewis).
XML can be used to represent linguistic
analyses to any desired degree of
refinement.
Analyses in other formats (e.g. relational
databases) can be migrated to XML for
both archiving, and smart web searching.
Symposium on Best Practice
LSA, Boston, MA
4
Smart markup isn’t enough

Meaning and use of structural markup
varies from site to site.
Same term used with different meanings.
 Different terms used with the same
meaning.
 Markup element and attribute names and
values, and structural content may be in
different natural languages.


Jan 9, 2004
Sites are encoded at different levels of
granularity.
Symposium on Best Practice
LSA, Boston, MA
5
How to say what you mean


Markup is syntax; it’s meaning can only be
inferred for individual sites, or groups of
sites that use a common markup scheme
(e.g. TEI).
So if markup term T means “x” in archive A
and “y” in archive B, then we need:
A resource (called an ontology) that provides
the definitions “x” and “y” in a systematic and
machine-interpretable format.
 A mechanism to link T to “x” in A and T to “y”
in B.

Jan 9, 2004
Symposium on Best Practice
LSA, Boston, MA
6
What is an ontology?





Jan 9, 2004
A computational artifact;
A conceptualization of a domain;
A theory of what is;
The types in a knowledge base.
There can be many ontologies for a given
domain.
Symposium on Best Practice
LSA, Boston, MA
7
Why an ontology for linguistics?

Language documentation
need to decipher markup
 semantics and markup
 Semantic Web implementation


Natural language processing
conceptual basis for semantics (grounding)
 as a common framework for linguistic and
non-linguistic knowledge

Jan 9, 2004
Symposium on Best Practice
LSA, Boston, MA
8
GOLD

General Ontology for Linguistic
Description—http://emeld.org/gold



Jan 9, 2004
Incorporated in EMELD’s FIELD tool.
Built using an upper ontology (SUMO)
http://ontology.teknowledge.com
Currently in a very early stage of
development.
Symposium on Best Practice
LSA, Boston, MA
9
Partial SUMO taxonomy
Entity
Abstract
Physical
Relation
Object
Perdurant
Proposition
SetOrClass
Region
Quantity
Agent
SelfConnectedObject
Jan 9, 2004
Attribute
Collection
Symposium on Best Practice
LSA, Boston, MA
10
What currently is in GOLD?

Categories for:
linguistic form
 morphosyntactic categories




semantics for morphosyntactic categories


Jan 9, 2004
features
values
using SUMO
documentation
Symposium on Best Practice
LSA, Boston, MA
11
Format of GOLD

Semantic Web initiative


http://w3.org/2001/sw/
Web Ontology Language (OWL)
An emerging Web standard and growing
user base
 Extensible
 Lots of visualization tools and APIs are
available for OWL.

Jan 9, 2004
Symposium on Best Practice
LSA, Boston, MA
12
What’s still needed


Buildout of GOLD (and/or development of
companion ontologies) to cover the entire
field.
Mechanisms to link sites to ontologies.



Jan 9, 2004
Can be done in part using metadata.
Development of additional ontology-aware
tools for data creation and migration.
A way of ensuring that ontologies endure
just like the data they help interpret.
Symposium on Best Practice
LSA, Boston, MA
13