Transcript Document

Algorithms of Artificial
Intelligence
Lecture 4: Search
E. Tyugu
© Enn Tyugu
1
Backtracking on and-or-tree
• Let us have the following specific functions describing
a search tree:
succ(p) - set of immediate descendants of the
node p
terminal(p) - true iff p is a terminal node.
• We assume that the root of a search tree is an andnode.
• The following algorithm produces a subtree of an
and-or tree given by its root p. This subtree is empty,
if the search is not succesful. It starts with empty
state given as the second parameter .
© Enn Tyugu
2
Backtracking on and-or-tree
A.2.17: ANDOR(p,state):
if terminal(p) and good(p)
then state := p;
success(state)
elif terminal(p) and not good(p)
then failure()
else for x succ(p) do
L: for y succ(x) do
if empty(ANDOR(y,state))
then continue
else
state:=(state,y(ANDOR(y,sate)));
break L
fi
od;
failure();
od;
success(state)
fi
© Enn Tyugu
3
Dependency-directed backtracking
0
A
contradictions:
B
1
8
C
D
C
5
2
E
F
3
4
A&C
B&E
E
6
D
9
12
F
E
F
E
F
7
10
11
13
14
© Enn Tyugu
4
Branch-and-bound search
select(M) -- selects a set to be split from the set of sets M
split(A) -- splits the set A into a number of smaller sets
prune(M) -- finds the dominated sets in M and prunes them out
card(M) -- gives maximum of cardinalities of sets in M
bestin(M) -- finds the best element in M.
A.2.18: M:={X};
while card(M) > 1 do
A:=select(M);
M:=M split(A);
M:=prune(M)
od;
bestin(M);
© Enn Tyugu
5
Branch-and-bound search continued
A.2.18-a:
lb(X) – lower bound of values of elements in X
ub(X) – upper bound of values of elements in X
prune(M):
T := selectFrom(M):
for X  M do
if lb(X) > ub(T) then T := X fi
od;
for X  M do
if lb(T) > ub(X) then M := M\{X} fi
od
© Enn Tyugu
6
Stochastic branch-and-bound search
It may be difficult to estimate the maximal and minimal values of a goal
function f on a set. One can use the following idea:
instead of a deterministic search perform a stochastic search with a
distribution of probability of testing that is higher for the areas of space
where the found values of f are higher.
This gives us a good optimization algorithm for nonlinear multimodal
optimization problems (see Strongin).
© Enn Tyugu
7
Stochastic branch-and-bound search
distr - probability distribution for selecting a point in the seach space
random(distr) - generator of random points with distribution distr in the
search space
modify(distr,x,f(x))- procedure of adjustment of the distribution that
increases the probability of selecting a point with higher value of f.
A.2.19: distr:= even;
x:= random(distr);
while not good(x) do
x:= random(distr);
distr:= modify(distr,x,f(x))
od
© Enn Tyugu
8
Binary search as branch-and-bound
The functions left(p), right(p) and val(p) are for selecting the left or right
subtree and taking the value of the node p (of the root of the tree).
A.2.19:
binsearch(x,p) =
if empty(p) then failure
elif val(p) = x then success
elif val(p) < x then binsearch(x,right(p))
else binsearch(x,left(p))
fi
© Enn Tyugu
9
Alpha-beta pruning
This algorithm is for heuristic search of best moves on game
trees of games of two players, assuming that both players apply
the best tactics, i.e. they do as good moves as possible on the
basis of the available knowledge. It is a concretisation of the
branch and bound search.
This algorithm gives the same result as the minimax procedure
for the games of two players. This algorithm lies in the ground
of many game programs for chess and other games.
© Enn Tyugu
10
Example
maximize
1
2
minimize
11
12
22
21
maximize
4
1
8
5
1
2
7
5
111 . . .
© Enn Tyugu
11
Example of alpha-beta pruning
maximize
4 1
2 2
minimize
4 11
8 12
7 22
2 21
maximize
4
1
8
5
1
2
7
5
111
112
121
122
211
212
221
222
© Enn Tyugu
12
Alpha-beta pruning
alpha - minimal value of the end result that can be obtained for certain by
the maximizing player;
beta - maximal value of the end result that can must be given away for
certain by
the minimizing player.
A.2.20: alphabeta(p, alpha,beta) =
if empty(succ(p)) then return(f(p))
else m:=alpha;
for x succ(p) do
t:=-alphabeta(x,-beta,-m);
if t>m then m:=t fi;
od
if m > beta then return(m) else return(beta) fi
fi
© Enn Tyugu
13
A* algorithm
The problem it solves is finding a shortest path between two nodes of a
graph, where besides the lengths of arcs of the graph also estimates of the
lower
bound of the distance to the end node are given for every node of the
graph. We
shall use the following notations:
s -- given initial node of the path
g -- given end node of the path
p(n,i) -- length of the arc from the node n to the node i
l(n) -- length of the shortest path from the initial node s to the node n
h(n) -- length of the shortest path from the node n to the end node g
H(n) -- estimate of the lower bound of h(n)
Open -- set of nodes to be checked
succ(n) -- successor nodes of n.
© Enn Tyugu
14
A* algorithm continued
A.2.21: l(s) := 0;
OPEN := {s};
while not empty(Open) do
x := select (Open);
n:=x;
for i Open\ {x} do
if l(i) + H(i) < l(n) + H(n)
then n := i
fi
od;
if n = g then success()
else
Open := Open succ (n) \ {n};
for i succ (n) do
l(i) := l(n) + p(n,i)
od;
fi
od;
failure()
© Enn Tyugu
15
A* continued
The A* algorithm has some interesting properties:
1.
Under the conjecture that H(n) < h(n) for each node n of the
graph, this algorithm gives precise answer -- it finds an actual
shortest path from s to g.
2.
This is the fastest algorithm which can be constructed for
solving the given problem precisely.
© Enn Tyugu
16
Unification
A substitution is a tuple of variable - expression pairs, for example:
((x,A),(y,F(B)),(z,W)).
We shall additionally require that no variable of a substitution can
appear in an expression of the same substitution. A substitution
describes a transformation of expressions where variables of an
expression will be replaced by their corresponding expressions.
Application of a substitution s to an expression E is denoted by E°s.
Example:
P(x,x,y,v)°((x,A),(y,F(B)),(z,W)) = P(A,A,F(B),v).
© Enn Tyugu
17
Unification continued
Substitutions s and t are independent, if their replaceable variables are
different. In this case we can build a composition s*t of the substitutions
s and t which is a substitution consisting of the pairs of both s and t .
If E,...,F are expressions, s is a substitution, and E°s = ... = F°s then s
is a unifier of E,..., F. The most general unifier of E,...,F denoted by
mgu(E,...,F) is the unifier s of E,...,F with the property that for any
unifier t of E,...,F there exists a unifier r, such that E°s*r=E°t, ... ,
F°s*r=F°t. This unifier s is in a sense the least unifier of E,...,F. If there
exists mgu(E,...,F) for expressions E,...,F, then it is unique. This follows
immediately from the definition of mgu.
© Enn Tyugu
18
Examples
Unification, i.e. finding of mgu, can be directly used for answering
queries to
knowledge bases. For example, if we have a knowledge base about
people
which contains the facts:
married(Peter, Ann)
married (Jan, Linda)
one can ask a question about Lindas marriage by asking to unify the
formula married(x,Linda) with the facts of the knowledge base. The
answer ((x,Jan)) will show that Linda is married with Jan.
© Enn Tyugu
19
Unification algorithm
Let us have the functions:
var(x) -- true, if x is a variable
const(x) -- true, if x is a constant
MguVar(x,y) -- gives mgu of the variable x and expression y (not
a
variable), if such exists and fails if x
occurs in y.
includes(x,y) -- true, if y includes x.
Then the unification algorithm is as follows.
© Enn Tyugu
20
Unification algorithm
A.2.22: Mgu(x,y)=
if x=y then succes( )
elif var(x) then succes(MguVar(x,y))
elif var(y) then succes(MguVar(y,x))
elif (const(x)  sconst(y)) & x y  length(x) length(y) then failure()
else g := ( );
for i := to length(x) do
s := Mgu (part(x,i), part(y,i));
if s= failure then failure() fi;
g := compose (g,s);
x := subst(x,g);
y := subst(y,g)
od;
success(g)
fi
MguVar(x,y) = if includes(x,y) then failure()
else ((x/y)) fi
© Enn Tyugu
21
Dictionary search
Find a given word from a dictionary where the words are ordered in
the lexicographic order. The frequency of occurrence of the i-th letter
of the alphabet is pi. Total frequency of occurrence of letters preceding
the i-th letter of the alphabet is
i-1
qi =  pj
j=1
If a dictionary contains l pages then an approximate value of the page
number h(w) where the word w consisting of letters of the alphabet
with numbers i1,i2,...,in can be computed from the frequencies of
these letters as follows:
h(w) = 1+ l(qi1 + pi1(qi2 + ... + pin qin) ...).
© Enn Tyugu
22
Dictionary search
•
Let any word w be selected on a given page k and expected page
number h(w) be computed according to the formula given above. This
page number is the value of a new function denoted by f(k). In order
to find the page number j for the word v, we have to solve the
equation f(j) = h(v) with respect to j (h(v) is constant for the word v).
•
We use the following variables:
–
j - currently expected page number of the word v (j is equal to
h(v) in
the beginning of search)
–
a - number of the first page to search
– b - number of the last page to search
•
The algorithm computes recursively a better approximation to j, until
the right value is found. In this case, good(j) means that the given
word is on the page j.
© Enn Tyugu
23
Dictionary search
A.2.23:dict (j, a, b, v)=
= if good(j) then j
elif j=a then dict (j+1, a+1, b,v)
elif j=b then dict (j -1 , a, b-1,v)
else s= f(j);
if s > h(v)
then dict (j –(j-a)*(s-h(v ))/(s-f(a)), a, j, v)
else dict (j+(b-j)*(h(v)-s)/(f(b)-s) , j, b,v)
fi
fi
© Enn Tyugu
24
Viterbi algorithm
This is an algorithm of so called compound decision theory that uses the
dynamic programming paradigm. It is a word recognition algorithm,
specifically for finding the most likely word for a sequence of patterns x1, …, xm
which represent symbols (letters). The goal is to choose the assignment w1, …,
wm, of letters from a given alphabet L which has the maximum probability over
all possible rm assignments (r is the number of letters of the alphabet L).
Assuming the first-order Markov property of the sequence of patterns, i.e. that
probability of appearance of a pattern in the sequence depends only on the
preceeding pattern, we get an algorithm with time complexity O(m*r2) instead of
O(rm). An improved version of the algorithm is called DVA (Dictionary Viterbi
Algorithm) and it takes into the account also a dictionary - only words from the
dictionary are accepted.
© Enn Tyugu
25
Viterbi algorithm
p1(x,w) - probability of the letter w with the pattern x being the first letter in a word
p(x,w,u) - probability of the letter w with the pattern x following the letter u in a word
pw – probability of the prefix ending with the letter w
prefw – best prefix ending with the letter w
selectFrom(L) - produces an element of L
A2.24:
for w1 L do pw1 := p1(x1,w); prefw1 := (w1) od;
for i=2 to m do
for wi L do
zz:= selectFrom(L);
p := pzz*p(xi ,wi,zz);
for z L do
if pz*p(xi ,wi,z) > p then zz:= z; p := pz*p(xi ,wi,z) fi
od;
pwi:=p; prefwi:=(prefzz,wi);
od
od;
© Enn Tyugu
26
Dictionary Viterbi Algorithm
inDictionary(z) = true, iff there is a word beginning with z in dictionary
selectFrom(pref,L) gives a letter that suits for the prefix pref, so that the prefix is
always in dictionary ).
A2.25: for w1 L do pw1 := p1(x1,w); prefw1 := (w1) od;
for i=2 to m do
for wi L do
zz:= selectFrom(L);
p := pzz*p(xi ,wi,zz);
for z L do
if pz*p(xi ,wi,z) > p and inDictionary(prefzz,wi)
then zz:= z; p := pz*p(xi ,wi,z) fi
od;
pwi:=p; prefwi:=(prefzz,wi);
od
od;
© Enn Tyugu
27
Hierarchy of search methods
search
Viterbi search
breadth-first
backtracking
depth-first
dependencydirected
best first
search-space
modifications
dictionary
search
beam
search
branch
and bound
binary
search
lru
rlu
A*
unification
© Enn Tyugu

stochastic
branch-and-bound
28