LING 681 Intro to Comp Ling
Download
Report
Transcript LING 681 Intro to Comp Ling
Finite-state automata
Day 12
LING 681.02
Computational Linguistics
Harry Howard
Tulane University
Course organization
http://www.tulane.edu/~ling/NLP/
NLTK is installed on the computers in this
room!
How would you like to use the Provost's
$150?
21-Sept-2009
LING 681.02, Prof. Howard, Tulane University
2
SLP §2.2
Finite-state automata
2.2.1 Sheeptalk
The sheep language
How would you recognize this language?
baa!
baaa!
baaaa!
baaaaa!
…
Regex in Python?
'^baa+!$'
21-Sept-2009
LING 681.02, Prof. Howard, Tulane University
4
State transitions
a
b
q0
a
q1
a
q2
!
q3
q4
A directed graph of vertices/nodes connected by
links/arcs.
Each node is labeled as a state, q0 - q4
q0 is the start state;
q4 is the final or accepting state.
Each transition from state to state is labeled with
the character that it recognizes.
21-Sept-2009
LING 681.02, Prof. Howard, Tulane University
5
Equivalency to a tape
q0
b
a
a
a
!
Start at the start state.
Check the next character of the input.
If it matches the symbol on the arc leaving the current state,
then cross the arc, and move to the next state.
Return to beginning.
If it is the final state, and there is no more input, the input has been
recognized successfully.
If the final state is never reached, the input is rejected.
21-Sept-2009
LING 681.02, Prof. Howard, Tulane University
6
State-transition table
Input
: marks final state.
State
b
a
!
0 = illegal or missing
0
1
2
3
4:
1
0
0
0
0
0
2
3
3
0
0
0
0
4
0
transition.
If in state 0 and see
input b, go to state 1;
if in state 0 and see
input a or !, fail.
21-Sept-2009
LING 681.02, Prof. Howard, Tulane University
7
Definition
p. 28
The procedure is known as a finite(-state)
automaton, which consists of five parameters:
Q
q0
F
(q,i)
21-Sept-2009
a finite set of n states,
a finite input alphabet of symbols,
the start state,
the set of final states,
the transition function or matrix. For a state q
and an input symbol i, (q,i) returns a new
state q'.
LING 681.02, Prof. Howard, Tulane University
8
Algorithm
Fig. 2.12, p. 29
function D-RECOGNIZE(tape, machine) returns accept or reject
index Beginning of tape
current-state Initial state of machine
loop
if End of input has been reached then
if current-state is an accept state then
return accept
else
return reject
elseif transition-table[current-state,tape[index]] is empty then
return reject
else
current-state transition-table[current-state,tape[index]]
index index + 1
end
21-Sept-2009
LING 681.02, Prof. Howard, Tulane University
9
Python dictionaries
Use the Python data object known as a dictionary
(hash table in other languages)
your_dict = {key1: value1, key2: value2, ...}
>>> your_dict[key1]
value1
Part of speech dictionary, see NLPP pp. 190ff
pos = {'colorless': 'ADJ', 'ideas': 'N',
'furiously': 'ADV'}
>>> pos['furiously']
'ADV'
21-Sept-2009
LING 681.02, Prof. Howard, Tulane University
10
A state-transition table
Need compound keys
your_dict = {(key1a, key1a): value1, (key2a, key2b): value2, ...}
>>> your_dict[(key2a, key2a)]
value2
Sample first line for sheep language
stt = {(0,'b'): 1, (0,'a'): 0, (0,'!'): 0,
>>> stt[(0,'!')]
0
HINT: put dictionary at beginning of
function
21-Sept-2009
LING 681.02, Prof. Howard, Tulane University
11
Next time
Code up the algorithm and bring a
printout of it to class on Wed.
SLP §2.2.2-end