Role of NLP in Linguistics

Download Report

Transcript Role of NLP in Linguistics

Role of NLP in Linguistics
16-07-2010
Dipti Misra Sharma
Language Technologies Research Centre
International Institute of Information Technology
Hyderabad
India
NLP and Linguistics
• Have similar goals
– Understanding human language(s)
• NLP relies on the theoretical models
provided by linguistics
– Therefore, NLP definitely needs linguistics
What about Linguistics ? Does it benefit
from NLP ?
NLP is useful
• NLP tools can be useful for certain
linguistic tasks such as
– collecting, organizing, classifying
data,
– providing statistics etc.
 This saves effort, brings forth facts
which help in generalizations ....
Makes life easier for linguists
NLP and Linguistics Resources
• NLP techniques are useful for creating
linguistic resources such as
– verb frames, transfer grammars, bilingual
lexicons etc
• Studies in CL have shown the usefulness of
NLP techniques in historical linguistics as well
(e.g. phylogenetic trees)
Thus, NLP is useful not only for data related tasks but
also for creation of linguistic resources
What else ?
• NLP researchers and linguists look at language
from different perspectives
• NLP researchers look for solutions which
provide higher coverage
– exceptions can be dealt with later
• Linguistic researchers find exceptions more
interesting
– these help identify problem areas for the
theory
However
Resource creation for NLP involves a close
study of large scale real time data (e.g.
linguistic annotation)

Close look at real time data often springs
linguistic issues which have theoretical
implications

Our experience
Hindi has
•
A long list of lexical items
•
Historically derived from Sanskrit verb
roots
But
•
Are categorized as adjectives in Hindi
For example,
‘sthita’ (situated), swiikrita (accepted),
sviikaarya (acceptable), likhita (written),
kathit (told) ……
However
These ‘adjectives’ of Hindi have modifiers which
have argument like properties – both semantically
and syntactically
For example,
dillii mein sthit qutub miinaar ek
Delhi in
situated Qutub Minar
darshaniiy
one worth-watching
sthal hai
place is
Qutub Minar situated in Delhi is a place worth visiting
unke dvaaraa kathit kahaaniyaan bahut pracalit hain
Them by
` told
stories
very
The stories told by them are very popular
popular are
The issue (1/2)
• Both ‘dillii mein’ and ‘unke dvaaraa’ have
appropriate case markers
• ‘mein’ is locative and ‘dvaaraa’ agentive
• These adjectives are historically non-finite
verbs
– However, Hindi grammars do not account
for them so anymore
– These are not morphologically
decompositional either
The issue (2/2)
Morphological decomposition of sthit (situated)
and kathit (told) would lead to a Sanskrit analysis
and NOT a Hindi analysis

Hindi, for example, does not have ‘sthaa’ or
‘kath’ as verb roots

It doesn’t have ‘ita’ as an active participial suffix
either.

How do we explain the argument like properties of
their modifiers ?
What does it indicate ?
Linguists understand the relation but not
through a linguistic process of Hindi

A linguistic process (or at least the roots and
suffixes) from Sanskrit will have to be brought
in

Is it that languages have elements which are at
different stages of development/evolution ?
Another example
• Indian languages show frequent use of complex
predicates
Examples:
pratiikshaa karnaa (wait do), kshamaa karnaa
(forgive do)
•
The problem,
When is an NV sequence a complex predicate
and when it is not ?
Complex Predicates
The problem has long been discussed in linguistics
literature


Several diagnostics have also been proposed
However,
 Quite a few NV sequences are a single unit
semantically
 Syntactically, they fail the diagnostics
The question remains,
Do we consider such cases as ‘complex verbs’ or as
instances of ‘verb argument’ ?
Conclusions
• NLP tools and techniques can be useful for
linguists
• NLP throws up rich examples which need to
be handled
• Poses challenges for the theory