LING 681 Intro to Comp Ling

Download Report

Transcript LING 681 Intro to Comp Ling

Finite-state automata
Day 12
LING 681.02
Computational Linguistics
Harry Howard
Tulane University
Course organization
 http://www.tulane.edu/~ling/NLP/
 NLTK is installed on the computers in this
room!
 How would you like to use the Provost's
$150?
21-Sept-2009
LING 681.02, Prof. Howard, Tulane University
2
SLP §2.2
Finite-state automata
2.2.1 Sheeptalk
The sheep language
 How would you recognize this language?
baa!
baaa!
baaaa!
baaaaa!
…
 Regex in Python?
'^baa+!$'
21-Sept-2009
LING 681.02, Prof. Howard, Tulane University
4
State transitions
a
b
q0
a
q1
a
q2
!
q3
q4
 A directed graph of vertices/nodes connected by
links/arcs.
 Each node is labeled as a state, q0 - q4
 q0 is the start state;
 q4 is the final or accepting state.
 Each transition from state to state is labeled with
the character that it recognizes.
21-Sept-2009
LING 681.02, Prof. Howard, Tulane University
5
Equivalency to a tape
q0
b
a
a
a
!
 Start at the start state.
 Check the next character of the input.
 If it matches the symbol on the arc leaving the current state,
 then cross the arc, and move to the next state.
 Return to beginning.
 If it is the final state, and there is no more input, the input has been
recognized successfully.
 If the final state is never reached, the input is rejected.
21-Sept-2009
LING 681.02, Prof. Howard, Tulane University
6
State-transition table
Input
 : marks final state.
State
b
a
!
 0 = illegal or missing
0
1
2
3
4:
1
0
0
0
0
0
2
3
3
0
0
0
0
4
0
transition.
 If in state 0 and see
input b, go to state 1;
 if in state 0 and see
input a or !, fail.
21-Sept-2009
LING 681.02, Prof. Howard, Tulane University
7
Definition
p. 28
 The procedure is known as a finite(-state)
automaton, which consists of five parameters:
Q

 q0
F
 (q,i)
21-Sept-2009
a finite set of n states,
a finite input alphabet of symbols,
the start state,
the set of final states,
the transition function or matrix. For a state q
and an input symbol i, (q,i) returns a new
state q'.
LING 681.02, Prof. Howard, Tulane University
8
Algorithm
Fig. 2.12, p. 29
function D-RECOGNIZE(tape, machine) returns accept or reject
index  Beginning of tape
current-state  Initial state of machine
loop
if End of input has been reached then
if current-state is an accept state then
return accept
else
return reject
elseif transition-table[current-state,tape[index]] is empty then
return reject
else
current-state  transition-table[current-state,tape[index]]
index  index + 1
end
21-Sept-2009
LING 681.02, Prof. Howard, Tulane University
9
Python dictionaries
 Use the Python data object known as a dictionary
(hash table in other languages)
your_dict = {key1: value1, key2: value2, ...}
>>> your_dict[key1]
value1
 Part of speech dictionary, see NLPP pp. 190ff
pos = {'colorless': 'ADJ', 'ideas': 'N',
'furiously': 'ADV'}
>>> pos['furiously']
'ADV'
21-Sept-2009
LING 681.02, Prof. Howard, Tulane University
10
A state-transition table
 Need compound keys
your_dict = {(key1a, key1a): value1, (key2a, key2b): value2, ...}
>>> your_dict[(key2a, key2a)]
value2
 Sample first line for sheep language
stt = {(0,'b'): 1, (0,'a'): 0, (0,'!'): 0,
>>> stt[(0,'!')]
0
 HINT: put dictionary at beginning of
function
21-Sept-2009
LING 681.02, Prof. Howard, Tulane University
11
Next time
Code up the algorithm and bring a
printout of it to class on Wed.
SLP §2.2.2-end