Statistical Natural Language Processing

Download Report

Transcript Statistical Natural Language Processing

Statistical Natural Language
Processing
What is NLP?

Natural Language Processing (NLP), or
Computational Linguistics, is concerned with
theoretical and practical issues in the design
and implementation of computer systems for
processing human languages

It is an interdisciplinary field which draws on
other areas of study such as computer
science, artificial intelligence, linguistics and
logic
Applications of NLP
 natural
language interfaces to
databases
 programs for classifying and retrieving
documents by content
 explanation generation for expert
systems
 machine translation
 advanced word-processing tools
What makes NLP a
computational challenge?
 Ambiguous
nature of Natural Language.
 There are varied applications for
language technology
 Knowledge representation is a difficult
task.
 There are different levels of information
encoded in our language
What is statistical NLP?
 Statistical
NLP aims to perform
statistical inference for the field of NLP
 Statistical inference consists of taking
some data generated in accordance
with some unknown probability
distribution and making inferences.
Motivations for Statistical NLP

Cognitive modeling of the human language
processing has not reached a stage where
we can have a complete mapping between
the language signal and the information
contents.
 Complete mapping is not always required.
 Statistical approach provides the flexibility
required for making the modeling of a
language more accurate.
Idea behind Statistical NLP
 View
language processing as a noisy
channel information transmission.
 The approach requires a model that
characterizes the transmission by giving
for every message the probability of the
observed output
Statistical Modeling and
Classification
 Primitive
acoustic features
 Quantization
 Maximum likelihood and related rules
 Class conditional density function
 Hidden Markov Model Methodology
Details….
Primitive acoustic features are used to
estimate the speech spectrum on the basis of
its statistical properties.
By means of quantization a typical speech
signal can be represented as a sequence of
symbols and can be mapped using statistical
decision rules into a multidimensional
acoustic feature space, thus classifying the
signal.
Maximum Likelihood
Although there is no direct method for computing the
probability of a phonetic unit given its acoustic
features,we can use Bayes rule to estimate the
probability of a phonetic class given its features
from the likelihood of the features given the
class. This method leads to the maximum likelihood
classifier which assigns an unknown vector to that
class whose probability density function conditioned
on the class has the maximum value.
Another variant of the maximum likelihood methodology
is clustering.
Hidden Markov Models
A Hidden Markov Model, is a set of states (lexical
categories in our case) with directed edges
labeled with transition probabilities that
indicate the probability of moving to the state at
the end of the directed edge, given that one is
now in the state at the start of the edge. The
states are also labeled with a function which
indicates the probabilities of outputting different
symbols if in that state (while in a state, one
outputs a single symbol before moving to the
next state). In our case, the symbol output from
a state/lexical category is a word belonging to
that lexical category.
Hidden Markov Models (cont.)
Conditional Class Density
Function
All statistical methods of speech
recognition depend on the class
conditional density function.
These, in turn, depend on the existence of
a sufficiently large, correctly labeled
training set and well understood
statistical estimation techniques
How does statistics help
 Disambiguation
may be achieved by
using stochastic context free grammars
 It helps in providing degrees of
grammaticality
 Naturalness
 Structural preference
 Error Tolerance
Example using stochastic
CFG
for example consider the sentence
“ John Walks “
The grammar is as follows :
1 S -> NP V
0.7
2 S -> NP
0.3
3 NP -> N
0.8
4 NP -> N N
0.2
5 N -> John
0.6
6 N -> Walks 0.4
7 V -> Walks 1.0
The numbers on the right represent the weights for each rule.The
weight of the analysis is the product of the weights of the rules used in
the derivation.
Predicting the right sentence that is perceived
is based on these weights.
Degrees of grammaticality
 Traditional
approaches to NLP do not
accommodate gradations of
grammaticality. A sentence is either
correct or not.
 In some cases acceptability may vary
with the structure and context of the
sentence.
Structural Preference
Consider the sentence
“ The emergency crews hate most is domestic
violence.”
The correct interpretation is:
“The emergency [that the crews hate most] is domestic
violence.”
These preferences can be seen more as structural
preferences rather than parsing preferences.
Statistical approaches can easily handle such structural
preferences.
Error Tolerance
 A remarkable
property of human
language comprehension is error
tolerance.
 Many sentences that the traditional
approach classifies as ungrammatical
can actually be interpreted by statistical
NLP techniques.
Conclusions

Free and commercial software is now
available that provides a lot of NLP features.
(e.g. Microsoft XP has a speech recognition
software by which users can control menus
and execute commands)
 A lot of research is going into developing new
applications and investigating new techniques
and approaches that will make Statistical NLP
more feasible in the near future.