Transcript Lexicon

Natural Language Processing (2a)
Zhao Hai 赵海
Department of Computer Science and Engineering
Shanghai Jiao Tong University
2010-2011
[email protected]
http://bcmi.sjtu.edu.cn/~zhaohai/lessons/nlp2011/index.html
1
Outline

Lexicons and Lexical Analysis

Lexicon: A Language Resource

A Lexicon for English Words: WordNet
2
Lexicons and Lexical Analysis (1)
Lexicon: A Language Resource (1)
Features for Lexicons (1)
A lexicon means machine dictionary, which has the following features:
 It elaborately provides all information which a dictionary contains;
 Based on semantic descriptions, it describes syntagmatic and
paradigmatic relationships for each word, e.g.:
red + flower, green + leave, big + eye (syntagmatic rel.)
red, green, and big; flower, leave and eye (paradigmatic rel.);
3
Lexicons and Lexical Analysis (2)
Lexicon: A Language Resource (2)
Features for Lexicons (2)

word building: fixed collocation between words;

systematization: description consistency including
morphological, syntactic and semantic description;

formalization: expression with meta-langauge, e.g.
[±noun].
4
Lexicons and Lexical Analysis (3)
Lexicon: A Language Resource (3)
Construction of Lexicons
The construction of a lexicon might contain the following
critical points:
 a knowledgebase rather than database is built. This
work should be fulfilled by domain experts;
 it can be built by manual or semi-automatic mode;
 it can be applied to any machine platforms and domains;
 it should have a general framework, so that it is able to
interact with other lexicons.
5
Lexicons and Lexical Analysis (4)
Lexicon: A Language Resource (4)
Types of Lexicons
The lexicon can be divided into four categories:

general lexicon (or basic lexicon);

collocation lexicon;

bilingual lexicon;

domain lexicon.
6
Lexicons and Lexical Analysis (5)
Lexicon: A Language Resource (5)
Information within Lexicons
The information of a basic lexicon may contain:
 lexical information (lexical entry etc.);
 morphological information (POS, tense, etc.);
 syntactic information (sentence pattern of verb, etc.);
 semantic information (semantic attribute, predicate frame,
etc.);
 conceptual information (conceptual mark, word meaning
explanation, etc.).
7
Lexicons and Lexical Analysis (6)
Lexicon: A Language Resource (6)
Sample (Morp., Syn. and Sem.)
“给” (give) :
Morp = [hq2, hq7, vjg, vjl, …];
Syn = [bso, bss, ksd, …];
Sem = [kyd, 240202].
e.g.: hq2 – allow to be followed by a numeral (verb as a quantifier);
bso – it can not act as an object solely;
kyd – donate or bestow;
240202 – taxonomic code
8
Lexicons and Lexical Analysis (7)
Lexicon: A Language Resource (7)
Sample (Frame)
“给” (give) →
S = NP + VP + NP1 + NP2
Syntactic Frame
NP = [AP] + [QP] + N
VP = [ADP] + V
NP1 = [QP] + N
NP2 = [QP] + N;
NP = AGT (Agent)
Semantic Frame
NP1 = DAT (Dative)
NP2 = OBJ (Patient)
NP = human | country | society | saying
Semantic Constraint
NP1 = human | animal | collectivity | region
NP2 = thing | a slap in the face | way out | elicitation
9
Lexicons and Lexical Analysis (8)
Lexicon: A Language Resource (8)
Collocation Lexicon
Col(w) = <cat, mor, syn, msy, sen>
where: cat – multi-POS;
mor – morphology;
syn – syntax and semantics;
msy – nesting collocation;
sen – sentence modifying rule set.
10
Lexicons and Lexical Analysis (9)
Lexicon: A Language Resource (9)
Sample (Collocation Lexicon)
w: ‘大概’ (probably)
cat: ^ ‘大概’ + (‘的’; n)  @setmark(a);
cat: ^ ‘大概’ + (m; p; v; a; b; z)  @setmark(d);
cat: q + ^ ‘大概’  @setmark(n);
…
…
11
Lexicons and Lexical Analysis (10)
A Lexicon for English Words: WordNet (1)
What is WordNet ?

WordNet is an on-line lexical reference system whose design
is inspired by current psycholinguistic theories of human lexical
memory.

English nouns, verbs, adjectives and adverbs are organized
into synonym sets, each representing one underlying lexical
concept. Different relations link the synonym.
12
Lexicons and Lexical Analysis (11)
A Lexicon for English Words: WordNet (2)
Information within WordNet
WordNet divides the lexicon into five categories:
 Nouns
 Verbs
 Adjectives
 Adverbs
 Function verbs (particles)
WordNet organizes lexical information in terms of word
meanings, rather than word forms. Therefore, for organization,
semantic relations are used.
13
Lexicons and Lexical Analysis (12)
A Lexicon for English Words: WordNet (3)
Psycholinguistics

The 20th Century has seen the emergence of psycho-
linguistics, an interdisciplinary field of research concerned with
the cognitive bases of linguistic competence.

Both linguists and psycholinguists have explored in consider-
able depth the factors determining the contemporary (belonging
to the same time) structure of linguistic knowledge in general,
and lexical knowledge in particular.
14
Lexicons and Lexical Analysis (13)
A Lexicon for English Words: WordNet (4)
Psycholexicology
Miller and Johnson-Laird (1976) have proposed that research
concerned with the lexical component of language should be
called psycholexicology.
 As linguistic theories evolved in recent decades, linguists
became increasingly explicit about the information a lexicon
must contain in order for the phonological, syntactic, and lexical
components to work together in the everyday production and
comprehension of linguistic messages, and those proposals have
been incorporated into the work of psycholinguists.

15
Lexicons and Lexical Analysis (14)
A Lexicon for English Words: WordNet (5)
Lexicography

Beginning with word association studies at the turn of the
century and continuing down to the sophisticated experimental
tasks of the past twenty years, psycholinguists have discovered
many synchronic properties of the mental lexicon that can be
exploited in lexicography.
16
Lexicons and Lexical Analysis (15)
A Lexicon for English Words: WordNet (6)
Naissance of WordNet

In 1985 a group of psychologists and linguists at Princeton
University undertook to develop a lexical database along lines
suggested by these investigations (Miller, 1985).

The initial idea was to provide an aid to use in searching
dictionaries conceptually, rather than merely alphabetically.

As the work proceeded, however, it demanded a more
ambitious formulation of its own principles and goals.
17
Lexicons and Lexical Analysis (16)
A Lexicon for English Words: WordNet (7)
Size of WordNet

http://wordnet.princeton.edu/
POS
Unique Strings
Synsets
Total Word-Sense
Pairs
Noun
117798
82115
146312
Verb
11529
13767
25047
Adjective
21479
18156
30002
Adverb
4481
3621
5580
Totals
155287
117659
206941
18
Lexicons and Lexical Analysis (17)
A Lexicon for English Words: WordNet (8)
Some Problems

What kinds of utterances enter into these lexical associations?

What is the nature and organization of the lexicalized concepts
that words can express?

What syntactic roles do different words play?
19
Lexicons and Lexical Analysis (18)
A Lexicon for English Words: WordNet (9)
Lexical Matrix (1)

In order to reduce ambiguity, ‘‘word form’’ is used here to
refer to the physical utterance;

‘‘word meaning’’ is referred to the lexicalized concept that a
form can be used to express;

Then the starting point for lexical semantics can be said to be
the mapping between forms and meanings.
20
Lexicons and Lexical Analysis (19)
A Lexicon for English Words: WordNet (10)
Lexical Matrix (2)
Word
Meanings
M1
M2
M3
.
.
.
Mm
F1
E1,1
Word Forms
F2
F3 . . .
Fn
E1,2
E2,2
E3,3
.
.
.
Em,n
If there are two entries in
the same column, the word
form is polysemous; if
there are two entries in the
same row, the two word
forms are synonyms
(relative to a context).
Therefore, F1 and F2 are
synonyms; F2 is
polysemous.
21
Lexicons and Lexical Analysis (20)
A Lexicon for English Words: WordNet (11)
Polysemy and Synonymy

Mappings between forms and meanings are many:many—some
forms have several different meanings, and some meanings can be
expressed by several different forms.

That is to say, a listener or reader who recognizes a form must
cope with its polysemy; a speaker or writer who hopes to express a
meaning must decide between synonyms.
22
Lexicons and Lexical Analysis (21)
A Lexicon for English Words: WordNet (12)
Some of the Relations

Synonym

Antonym

Hyponymy / Hypernymy (Subordination / Superordination)

Meronymy / Holonymy (Part-Whole)
23
Lexicons and Lexical Analysis (22)
A Lexicon for English Words: WordNet (13)
Synonym (1)
There are several definitions for synonym:
Two expressions are synonymous if the substitution of one for
the other never changes the truth value of a sentence in which the
substitution is made.

Two expressions are synonymous in a linguistic context C if the
substitution of one for the other in C does not alter the truth value.


…
24
Lexicons and Lexical Analysis (23)
A Lexicon for English Words: WordNet (14)
Synonym (2)
Note that the definition of synonymy in terms of substitutability
makes it necessary to partition WordNet into nouns, verbs,
adjectives, and adverbs.

That is to say, if concepts are represented by synsets, and if
synonyms must be interchangeable, then words in different
syntactic categories cannot be synonyms (cannot form synsets)
because they are not interchangeable.

25
Lexicons and Lexical Analysis (24)
A Lexicon for English Words: WordNet (15)
Antonym (1)

The antonym of a word x is sometimes not-x, but not always.
For example, rich and poor are antonyms, but to say that someone
is not rich does not imply that they must be poor; many people
consider themselves neither rich nor poor.

Antonymy is a lexical relation between word forms, not a
semantic relation between word meanings.
26
Lexicons and Lexical Analysis (25)
A Lexicon for English Words: WordNet (16)
Antonym (2)
For example, the meanings {rise, ascend} and {fall, descend} may be
conceptual opposites, but they are not antonyms; [rise / fall] are
antonyms and so are [ascend / descend], but most people hesitate and look
thoughtful when asked if rise and descend, or ascend and fall, are antonyms.
Note that synonymy words are enclosed in curly brackets, ‘{’ and ‘}’,
and other lexical relations will be enclosed in square brackets, ‘[’ and ‘]’.
27
Lexicons and Lexical Analysis (26)
A Lexicon for English Words: WordNet (17)
Hyponymy / Hypernymy
It is a semantic relation between word meanings. It is also called
as subordination / superordination, subset / superset, or the ISA
relation.

Hyponymy is transitive and asymmetrical. x is said to be a
hyponymy of y if native speakers of English accept the sentence
constructed as “An x is a (kind of) y.”

Ex.: tree is a hyponymy of plant
plant is a hypernymy of a tree
28
Lexicons and Lexical Analysis (27)
A Lexicon for English Words: WordNet (18)
Meronymy / Holonymy
It is a semantic relation which can also be called as part-whole
or HASA relation.

x is said to be a meronymy of y if native speakers of English
accept the sentence constructed as “An x is a part of y”.

Ex.: a frame is a part of car or
a car has a frame.
29
Lexicons and Lexical Analysis (28)
A Lexicon for English Words: WordNet (19)
User Interface
30
Lexicons and Lexical Analysis (29)
A Lexicon for English Words: WordNet (20)
References
G. Miller, R. Beckwith, C. Fellbaum, D. Gross, and K. J. Miller.
1990. Introduction to WordNet: An on-line lexical database.
Journal of Lexicography, Vol. 3, pages 235-244.
G. Miller. 1990. Nouns in WordNet: A Lexical Inheritance
System. Journal of Lexicography, Vol. 3, pages 245-264.
C. Fellbaum. 1990. English Verbs as a Semantic. Journal of
Lexicography, Vol. 3, pages 278-301.
31
Lexicons and Lexical Analysis (30)
Assignments (2)
1.
The text described several different example tests for distinguishing word
classes. For example, nouns can occur in sentences of the form I saw the
X, whereas adjectives can occur in sentences of the form It’s so X. Give
some additional tests to distinguish these forms and to distinguish
between count nouns and mass nouns. State whether each of the
following words can be used as an adjective, count noun, or mass noun. If
the word is ambiguous, give all its possible uses.
milk, house, liquid, green, group, concept, airborne
32