Transcript Ehsan

Adaptor Grammars
Ehsan Khoddammohammadi
Recent Advances in Parsing Technology
WS 2012/13
Saarland University
1
Outline
• Definition and motivation behind
unsupervised grammar learning
• Non-parametric Bayesian statistics
• Adaptor grammars vs. PCFG
• A short introduction to Chinese Restaurant
Process
• Applications of Adaptor grammar
2
Unsupervised Learning
• How many categories of objects?
• How many features does an object have?
• How many words and rules are in a language?
3
Grammar Induction
Goal:
– study how a grammar and parses can be learnt
from terminal strings alone
Motivation:
– Help us understand human language acquisition
– Inducing parsers for low-resource languages
4
Nonparametric Bayesian statistics
• Learning the things people learn requires using
rich, unbounded hypothesis spaces
• Language learning is non-parametric inference,
no (obvious) bound on number of words,
grammatical, morphemes.
• Use stochastic processes to define priors on
infinite hypothesis spaces
5
Nonparametric Bayesian statistics
1
𝑃 𝐺𝑟𝑎𝑚𝑚𝑎𝑟 𝐷𝑎𝑡𝑎 ) = P Data Grammar) P Grammar
𝑍
• Likelihood: how well grammar describes data
• Prior: Encode our knowledge or expectation of
grammars before seeing the data
– Universal Grammar (very specific)
– Shorter Grammars (general constraints)
• Posterior: Shows uncertainty of learner about which
grammar is correct (distribution over grammars)
6
Is PCFG good enough for our purpose?
• PCFG can be learnt through Bayesian
framework but …
• Set of rules is fixed in standard PCFG
estimation
• PCFG rules are “too small” to be effective
units of generalization
How can we solve this problem?
7
Two Non-parametric Bayesian
extensions to PCFGs
1. let the set of non-terminals grow unboundedly:
– Start with un-lexicalized short grammar
– Split-Join of non-terminals
1. let the set of rules grow unboundedly:
– Generate new rules when ever you need
– Learn sub-trees and their probabilities ( Bigger units
of generalization)
8
Adaptive Grammar
• CFG rules is used to generate the trees as in a
CFG
• We have two types of non-terminals:
– Un-adapted (normal) non-terminals
• Picking a rule and recursive expanding its children as in
PCFG
– Adapted non-terminals
• Picking a rule and recursive expanding its children
• Generating a previously generated tree (proportional to
number of times that it is already generated)
9
The Story of Adaptor Grammars
• In PCFG, rules are applied independently from each
other.
• The sequence of trees generated by an adaptor
grammar are not independent.
• if an adapted sub-tree has been used frequently in the
past, it's more likely to be used again.
• An un-adapted nonterminal 𝐴 expands Using 𝐴 → 𝛽 with
probability proportional to 𝜃𝐴 →𝛽
• An adapted nonterminal 𝐴 expands:
– to a sub-tree 𝜏 rooted in 𝐴 with probability proportional
to the number of times 𝜏 was previously generated
– Using 𝐴 → 𝛽 with probability proportional to 𝛼𝐴 𝜃𝐴 →𝛽
– 𝛼 is prior.
10
Properties of Adaptor grammars
• In Adaptor grammars:
– The probability of adapted sub-trees are learnt
separately, not just product of probability of rules.
– “Rich get richer” (Zipf distribution)
– Useful compound structures are more probable
than their parts.
– there is no recursion amongst adapted nonterminals (an adapted non-terminal never
expands to itself)
11
The Chinese Restaurant Process
12
The Chinese Restaurant Process
• n customers walk into a restaurant, choose
tables zi with probability
 n j

P(zi  j | z1,...,zi1 )  i 1 
 
i 1 
existing table j
next unoccupied table
• Defines an exchangeable distribution over
seating arrangements (inc. counts on tables)
13
CRP
Type equation here.
𝑃 𝑧 =1
14
CRP
𝑧=1
𝛼
𝑃 𝑧 =
𝛼
15
CRP
𝑧 = 1,1
𝑃 𝑧 =
𝛼
𝛼
×
1
1+𝛼
16
CRP
𝑧 = 1,1,2
𝑃 𝑧 =
𝛼
𝛼
1
×
1+𝛼
×
𝛼
2+𝛼
17
CRP
𝑧 = 1,1,2,1
𝑃 𝑧 =
𝛼
𝛼
×
1
1+𝛼
×
𝛼
2+𝛼
×
2
3+𝛼
18
Application of Adaptor grammars
No usage for parsing! Because grammar
induction is hard.
1.
2.
3.
4.
Word Segmentation
Learning concatenative morphology
Learning the structure of NE NPs
Topic Modeling
19
Unsupervised Word Segmentation
• Input: phoneme sequences with sentence
boundaries
• Task: identify words
20
Word segmentation with PCFG
𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒 → 𝑊𝑜𝑟𝑑 +
𝑊𝑜𝑟𝑑 → 𝑃ℎ𝑜𝑛𝑒𝑚𝑒 +
𝑃ℎ𝑜𝑛𝑒𝑚𝑒 → 𝑎 . . . 𝑧
21
Unigram word segmentation
𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒 → 𝑊𝑜𝑟𝑑 +
𝑊𝑜𝑟𝑑(𝑎𝑑𝑎𝑝𝑡𝑒𝑑) → 𝑃ℎ𝑜𝑛𝑒𝑚𝑒 +
22
Collocation word segmentation
𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒 → 𝐶𝑜𝑙𝑙𝑜𝑐 +
𝐶𝑜𝑙𝑙𝑜𝑐(𝑎𝑑𝑎𝑝𝑡𝑒𝑑) → 𝑊𝑜𝑟𝑑 +
𝑊𝑜𝑟𝑑(𝑎𝑑𝑎𝑝𝑡𝑒𝑑) → 𝑃ℎ𝑜𝑛𝑒𝑚𝑒 +
23
Performance
• Evaluated on Brent corpus
Generalization
Accuracy
Unigram
56%
+ collocations
76%
+ syllable structure
87%
24
Morphology
• Input: raw text
• Task: identify stems and morphemes and
decompose a word to its morphological
components
• Adaptor grammars can just be applied for
simple concatenative morphology.
25
CFG for morphological analysis
𝑊𝑜𝑟𝑑 → 𝑆𝑡𝑒𝑚 𝑆𝑢𝑓𝑓𝑖𝑥
𝑆𝑡𝑒𝑚 → 𝐶ℎ𝑎𝑟𝑠
𝑆𝑢𝑓𝑓𝑖𝑥 → 𝐶ℎ𝑎𝑟𝑠
𝐶ℎ𝑎𝑟𝑠 → 𝐶ℎ𝑎𝑟
𝐶ℎ𝑎𝑟𝑠 → 𝐶ℎ𝑎𝑟 𝐶ℎ𝑎𝑟𝑠
𝐶ℎ𝑎𝑟 → 𝑎 𝑏 𝑐 | …
26
Adaptor grammar for morphological
analysis
𝑊𝑜𝑟𝑑𝑎𝑑𝑎𝑝𝑡𝑒𝑑 → 𝑆𝑡𝑒𝑚 𝑆𝑢𝑓𝑓𝑖𝑥
𝑆𝑡𝑒𝑚𝑎𝑑𝑎𝑝𝑡𝑒𝑑 → 𝑃ℎ𝑜𝑛𝑒𝑚𝑒 +
𝑆𝑢𝑓𝑓𝑖𝑥𝑎𝑑𝑎𝑝𝑡𝑒𝑑 → 𝑃ℎ𝑜𝑛𝑒𝑚𝑒 +
Generated Words:
1. cats
2. dogs
3. cats
27
Performance
• For more sophisticated model:
• 116,129 tokens: 70% correctly segmented
• 7,170 verb types: 66% correctly segmented
28
Inference
• distribution of adapted trees are
exchangeable : Gibbs sampling
• Variational Inference method is also provided
for learning adaptor grammars.
Covering this part is beyond the objectives of
this talk.
29
Conclusion
• We are interested in inducing grammars without
supervision for two reasons:
– Language acquisition
– Low-resource languages
• PCFG rules are too much small for bigger
generalization
• Learning the things people learn requires using
rich, unbounded hypothesis spaces
• Adaptor grammars using CRP to learn rules from
this unbounded hypothesis spaces
30
References
• Adaptor Grammars: A Framework for Specifying Compositional
Nonparametric Bayesian Models, M. Johnson et al. , ADVANCES IN
NEURAL INFORMATION PROCESSING SYSTEMS, 2007
• Using adaptor grammars to identify synergies in the unsupervised
acquisition of linguistic structure, Mark Johnson, ACL-08, HLT , 2008
• Inferring Structure from Data, Tom Griffith, Machine Learning summer
school, Sardinia, 2010
31
Thank you for your attention!
32