Statistical Natural Language Procesing: linguistic essentials

Download Report

Transcript Statistical Natural Language Procesing: linguistic essentials

Lecture 2: Linguistic
Essentials
Wen-Hsiang Lu (盧文祥)
Department of Computer Science and Information Engineering,
National Cheng Kung University
2014/02/24
Parts of Speech and
Morphology



Parts of Speech correspond to syntactic or
grammatical categories such as noun, verb,
adjectives and prepositions.
Word categories are systematically related by
morphological processes such as the
formation of plural form from the singular form.
The major types of morphological processes
are inflection, derivation and compounding.
Words’ Syntactic Functions






Typically, nouns refer to entities in the world like people,
animals and things.
Determiners describe the particular reference of a noun
and adjectives describe the properties of nouns.
Verbs are used to describe actions, activities and states.
Adverbs modify a verb in the same way as adjectives
modify nouns.
Prepositions are typically small words that express
spatial or time relationships. Prepositions can also be
used as particles (質詞) to create phrasal verbs.
Conjunctions and complementizers (subordinating
conjunction) link two words, phrases or clauses.
CKIP POS Tag Set
精簡詞類
簡化標記
對應的CKIP詞類標記
N
Na
Naa, Nab, Nac, Nad, Naea, Naeb
/*普通名詞*/
N
Nb
Nba, Nbc
/*專有名稱*/
N
Nc
Nca, Ncb, Ncc, Nce
/*地方詞*/
N
Ncd
Ncda, Ncdb
/*位置詞*/
N
Nd
Ndaa, Ndab, Ndc, Ndd
/*時間詞*/
DET
Neu
Neu
/*數詞定詞*/.
DET
Nes
Nes
/*特指定詞*/
DET
Nep
Nep
/*指代定詞*/
DET
Neqa
Neqa
/*數量定詞*/
POST
Neqb
Neqb
/*後置數量定詞*/
M
Nf
Nfa, Nfb, Nfc, Nfd, Nfe, Nfg, Nfh, Nfi
/*量詞*/
POST
Ng
Ng
/*後置詞*/
N
Nh
Nhaa, Nhab, Nhac, Nhb, Nhc
/*代名詞*/
Nv
Nv
Nv1,Nv2,Nv3,Nv4
/*名物化動詞*/
CKIP POS Tag Set
精簡詞類
Vi
Vt
Vi
Vt
Vt
Vt
Vt
Vt
Vt
Vi
Vt
Vi
Vt
Vt
Vt
Vt
簡化標記
VA
VAC
VB
VC
VCL
VD
VE
VF
VG
VH
VHC
VI
VJ
VK
VL
V_2
對應的CKIP詞類標記
VA11,12,13,VA3,VA4
/*動作不及物動詞*/
VA2
/*動作使動動詞*/
VB11,12,VB2
/*動作類及物動詞*/
VC2, VC31,32,33
/*動作及物動詞*/
VC1
/*動作接地方賓語動詞*/
VD1, VD2
/*雙賓動詞*/
VE11, VE12, VE2
/*動作句賓動詞*/
VF1, VF2
/*動作謂賓動詞*/
VG1, VG2
/*分類動詞*/
VH11,12,13,14,15,17,VH21
/*狀態不及物動詞*/
VH16, VH22
/*狀態使動動詞/
VI1,2,3
/*狀態類及物動詞*/
VJ1,2,3
/*狀態及物動詞*/
VK1,2
/*狀態句賓動詞*/
VL1,2,3,4
/*狀態謂賓動詞*/
V_2
/*有*/
Syntax or Phrase Structure: A
simple context-free grammar




S --> NP VP
NP --> AT NNS |
AT NN |
NP PP
VP --> VP PP |
VBD |
VBD NP
PP --> IN NP





The Grammar
AT --> the
NNS --> children |
students |
mountains
VBD --> slept |
ate |
saw
IN --> in |
of
NN --> cake
The Lexicon
Syntax or Phrase Structure: A
Parse Tree
S
NP
VP
AT
NNS
VBD
The
children
ate
NP
AT
NN
the
cake
Local and Non-Local
Dependencies



A local dependency is a dependency between two
words expressed within the same syntactic rule.
A non-local dependency is an instance in which two
words can be syntactically dependent even though
they occur far apart in a sentence (e.g., subject-verb
agreement; long-distance dependencies such as
wh-extraction).
Non-local phenomena are a challenge for certain
statistical NLP approaches (e.g., n-grams) that model
local dependencies.
Semantic Roles



Most commonly, noun phrases are
arguments of verbs. These arguments have
semantic roles: the agent of an action, the
patient and other roles such as the
instrument or the goal.
In English, these semantic roles correspond
to the notions of subject and object.
But things are complicated by the notions of
direct and indirect object, active and
passive voice.
Subcategorization



Different verbs can relate different numbers of
entities: transitive versus intransitive verbs.
Tightly related verb arguments are called
complements but less tightly related ones are
called adjuncts (修飾語). Prototypical examples
of adjuncts tell us time, place, or manner of the
action or state described by the verb.
Verbs are classified according to the type of
complements they permit. This called
subcategorization. Subcategorizations allow to
capture syntactic as well as semantic regularities.
Attachment Ambiguity and
Garden-Path Sentences

Attachment ambiguities occur with
phrases that could have been generated by
two different nodes in the parse tree.


E.g.: The children ate the cake with a spoon.
Garden-Path sentences are sentences that
lead you along a path that suddenly turns
out not to work.

E.g.: The horse raced past the barn (穀倉) fell.
Semantics (I)

Semantics is the study of the meaning of
words, constructions, and utterances.

Semantics can be divided into two parts:
lexical semantics and combination semantics.
Semantics (II)

Lexical semantics:


synonymy, antonym
hypernymy (上位詞), hyponymy (下位詞)


meronymy (局部詞), holonymy (總體詞)


Animal is a hypernym of cat.
Leaf is a meronym of tree.
polysemy (一詞多義), homonymy (同形異義;
同音異義), and homophony (同音異義).
Semantics (III)

Compositionality: the meaning of the whole
often differs from the meaning of the parts.


Idioms correspond to cases where the
compound phrase means something
completely different from its parts.


White paper, white hair, white skin, white wine,
white house
Kick the bucket
Collocations consist of two or more words
that correspond to some conventional way of
saying things.

Strong tea, make up
Pragmatics



Pragmatics (語用學) is the area of studies
that goes beyond the study of the meaning of
a sentence and tries to explain what the
speaker really is expressing.
Understand the scope of quantifiers, speech
acts, discourse analysis, anaphoric relations
(指代關係).
The resolution of anaphoric relations is
crucial to the task of information extraction.