Week 12 - Ken Cosh

Download Report

Transcript Week 12 - Ken Cosh

269202 ALGORITHMS FOR ISNE
DR. KENNETH COSH
WEEK 12
REVIEW
 Searching


Uninformed Searches

Breadth First

Depth First

Iterative
Informed Searches

Best First

Heuristics

Hill Climbing

Simulated Annealing
THIS WEEK
 Adversarial Searches

Investigating how to plan ahead in environments where other agents are planning against us.
 Game Theory
WHO STOLE THE FINAL EXAM?
 OK, I know one of you stole the final exam…
 You have 3 choices, “Admit it”, “Accuse someone else” or “Deny knowledge”.
 If you “Deny Knowledge” you will get a D, unless everyone denies knowledge,
when you’ll get a B
 If you “Accuse someone” you will get a C, or if everyone else accuses the same
person you will get a B, unless they admit it when you get a D.
 If you “Admit it” you get an F, unless everyone else accuses you, when you get an
A.
GAME THEORY
 That example is similar to the famous “Prisoner’s Dilemma”

“Two suspects, A and B, are arrested by the police. The police have insufficient evidence for a conviction, and,
having separated both prisoners, visit each of them to offer the same deal: if one testifies for the prosecution
against the other and the other remains silent, the betrayer goes free and the silent accomplice receives the
full 10-year sentence. If both stay silent, the police can sentence both prisoners to only six months in jail for a
minor charge. If each betrays the other, each will receive a two-year sentence. Each prisoner must make the
choice of whether to betray the other or to remain silent. However, neither prisoner knows for sure what
choice the other prisoner will make. So the question this dilemma poses is: What will happen? How will the
prisoners act?”
GAME THEORY
 A branch of economics investigates any multiagent environment where one agents
actions impact on another agent, whether co-operatively or competitively.
 Games are deterministic, turn-taking, two-player, zero-sum games with perfect
information.
 In other words a deterministic and fully observable environment with two agents who alternate
actions. Utility values at the end of the game are equal and opposite - the winner of chess receives
‘1’, the loser ‘-1’ (or possibly ‘0’ in stalemate).
GAMES
 Games fit well with A.I. as they are easy to represent and agents are
restricted to a small number of actions, the results of the actions are
well defined.
 This is in contrast to physical games such as football (or American Rugby) - here
there are imprecise rules and a wide variety of actions. Computer game
manufacturers attempts at producing such games often result in limited and
repetitive versions.
GAME SUCCESS
 AI has had some success at game playing;
 Machines are better than humans in Othello and checkers.
 Sometimes they can win at chess and backgammon.
 Why not more success?
 At Go AI only performs amateurly.
 The average game of chess has a branching factor of 35, with
roughly 50 moves per player - which is 35100 or 10154 nodes on a
search graph!
 So we have to make a decision, even if it isn’t optimal.
FORMAL DEFINITION
 Initial State - the initial board position indicating who’s move.
 Successor Function - returns a list of moves and their resulting states.
 Terminal Test - determines if it is a terminal state, i.e. if the game is
over.
 Utility Function - An objective function giving a numeric value for
states - terminal states could be -1, 0 or 1.
SEARCHING
 From that definition we can create a game tree, which can be searched for optimal
moves.
 Normal searches produce a sequence of actions leading to a terminal win state, but
in games the other agent has an impact.
 Therefore the search needs to find a contingent strategy .
 The optimal strategy is one which finds an outcome at least as good as any other
against an infallible opponent.
GAME TREES
Initial Moves (X):
X
X
Move 2 (O):
XO
Move 3 (X):
XOX XO
X
….
Terminal States:
Utility:
X OX
O
X
XO
X
….
X
O
0
O
XO XO
X
X
X
X
X
X
O
XO
X
X
O
XO
X
O
X
X
….
….
….
XOX XOX XOX
OX OOX
X
O X X O X OO
-1
X
X
1
The search tree even for a
game as simple as tic tac toe, is
v. complex
X
MINIMAX ALGORITHM
 Assuming the opponent will play optimally, the minimax algorithm is designed to maximise the worst case
outcome - in tictactoe this is a draw.
A
a1
B
b1
4
b2
7
a3
a2
C
b3
3
c1
2
D
c3
c2
12
6
d1
10
d2
11
 In this simple example - which action should A take (a1, a2 or a3)?
d3
2
MINIMAX ALGORITHM
A
a1
3
4
2
B
b1
b2
7
b3
3
a3
a2
c1
2
2
C
c3
c2
12
6
d1
10
D
d2
11
d3
2
 Even though the best utility is found with ‘a2’ (12), and the best
average is found with ‘a3’ (7.67), ‘a1’ is the choice as we assume
that the opponent will choose their optimum strategy.
MULTIPLAYER GAMES
 In games with more than 2 players, the utility is scored as a vector containing utilities for each player (5,2,3).
Minimax functions will choose an action which still leads to their best worst case scenario.
 In multiplayer games, often ‘alliances’ are formed, with more than one player joining forces in order to stop
another player from reaching a terminal win state.
MINIMAX EFFICIENCY
 If the maximum depth of the tree is ‘m’ and there are ‘b’ legal moves at each point, then time complexity is O(bm)
and space complexity is O(bm), i.e. Exponential time efficiency.
 We can improve the efficiency, for instance through Alpha-Beta Pruning;

If a player has a better choice ‘m’ either at the parent node of ‘n’ or any node higher up, then ‘n’ will never be reached, so
effectively we can prune it.
PRUNING EXAMPLE
A
a1
3
B
b1
4
b2
7
2
b3
3
c1
2
a3
a2
C
2
c3
12
6
10
first search, finding the worst case ‘3’.
D
d1
c2
 First we examine choice ‘a1’ with depth
d2
11
d3
2
A
a1
3
B
b1
4
b2
7
b3
3
a2
a3
PRUNING EXAMPLE 2
A
a1
3
b1
4
B
b2
7
2
b3
c1
3
a3
a2
C
 Examining choice ‘a2’, we immediately find a
worse worst case ‘2’, so no need to
continue searching
c3
c2
A
2
a1
3
When we look at choice ‘a3’, we
have to search all 3 leaves before
we find a worse worst case.
B
b1
4
b2
7
2
b3
3
c1
2
a3
a2
C
2
c3
D
d1
c2
10
d2
11
d3
2
PRUNING EFFICIENCY
 Pruning choices in this way has dramatic effects on efficiency.

We only need to examine O(bd/2) nodes
 This is assuming that we can examine successors that are likely to be best first. - You know what ASS U ME
means.
 Even with random selection, the efficiency improves, however, for many games this time inefficiency makes playing
the game impractical - a minimax decision for chess would take longer than the allowed 2 hour game.
IMPERFECT, REAL-TIME DECISIONS
 Our algorithm needs to be able to make a decision within a specified time period, based on imperfect
information.
 Humans use (sub-consciously) a heuristic utility function to judge the value of a position as humans are even
more limited in searching capabilities - so a similar heuristic is needed here
 I.e. Cut the search off at a non-terminal state.
DESIGNING A HEURISTIC
 A heuristic needs to comprise various features, weighted by their importance -
perhaps using probabilities of success based on previous experiences (in this
case 72% of previous instances have lead to success).
 Human experience can be used to guide the heuristic, for example in chess;
 Material Value - each piece can be given a value, pawn = 1, queen =8
 Pawn Structure - a good pawn layout might be worth more.
 King Safety - a well protected King might be worth a few points.
 2 bishops together are worth a little over 2*1bishop.
 Bishops are more valuable during the end game than at the start.
SEARCH CUT-OFF
 Given the sophisticated heuristic function, which can be applied to various
states, a cut off point must be chosen so the amount of time used is
acceptable.
The wrong cut off point could give
rise to what is known as the ‘horizon
effect’ - where a poor worst case is
just over the horizon of the search.
ALTERNATIVE HORIZONS
 An alternative vision of the horizon effect
comes from this situation.
 A computer playing as black almost
inexplicably chose to move e7-e8 - why?
GAMES WITH CHANCE
 Some games, such as Backgammon combine chance with skill, where chance is controlled by the roll of a dice.

We can’t construct a game tree as we don’t know what our opponents possible moves are - but we do know what their
likely moves are.

So the game tree needs to be expanded to include probabilities of the opponent receiving certain dice rolls.
PROBABILITY
 Probability is based on our percepts of the environment - what we know about the environment.

My doctor said golf caused my shoulder injury as soon as he knew I played golf - even though the shoulder injury dates
from before I started playing golf.

When we pick a card there is a 1/52 chance it is the Ace of Spades, after we look at it, the chance is either 0 or 1.
 Probabilities can change when more evidence is acquired.
UTILITY THEORY
 Utility Theory represents preferences for certain states - the quality of a state being useful;

Is it preferable to choose plan C1440 (leaving for the airport 24 hours early) which has a 99.999% probability of success over
plan A60, which has a 95% probability of success? Given the poor utility of the long wait, perhaps not.

Should I stop playing golf to improve my ‘lack of shoulder pain’ utility, and risk lowering my ‘playing golf pleasure’ utility?
DECISION THEORY
 Decision Theory = Probability Theory + Utility Theory.
 An agent is rational if they choose the action which yields the highest utility averaged across all possible outcomes
of the action.
 The principle of Maximum Expected Utility (MEU).
ACTING UNDER UNCERTAINTY
ACTING UNDER UNCERTAINTY
 Suppose we need to check in at the airport and need to choose a time to set off. Plan A60 involves leaving 60
minutes before check-in - we could assume this was a good plan.
 We can’t say “Plan A60 will get us to the airport in time”, only “Plan A60 will get us to the airport in time so long as we
have enough gas, and there are no accidents, and the car doesn’t break down, and check-in doesn’t close early, and…”
 Perhaps Plan B90 would be better?
ACTING UNDER UNCERTAINTY
 While Plan B90 increases the ‘degree of belief’ that we will get to the airport on time, it also introduces a likely
long unproductive wait at the airport.
 To maximise an agents performance, the relative importance of both goals needs to be considered;

‘getting to the airport’

‘avoiding a long wait’
 The rational decision is A60, but how do we draw that conclusion?
DIAGNOSIS
 Diagnosis is a task aimed at dealing with uncertainty normally using probability theory to construct a degree of
belief in various statements.

Your car won’t start, so there’s a 80% chance the battery is dead.

You’ve got pain in your left calf so there’s a 70% chance you’ve pulled a muscle, and a 2% chance your leg has fallen off.
 Probability provides a way of summarising the uncertainty that comes from our laziness and ignorance.
DOMAIN RANDOM VARIABLES
 Boolean Random Variables

Late<True, False>
 Discrete RandomVariables

Weather<Sunny, Rainy, Snowy, Cloudy>
 Continuous Random Variables

Temperature = 30.2
ATOMIC EVENTS
 Atomic Events are a particular combination of states;

Late = True, Weather = Cloudy

Late = False, Weather = Sunny
 The existence of certain atomic events can lead to certain understandings;

Late = False, Weather = Rainy, means <Rainy => Late> = False
PRIOR PROBABILITY
 Unconditional Probabilities can be assigned to each state - degree of belief with no other information;

P(Weather = Sunny) = 0.8

P(Weather = Rainy) = 0.1

P(Weather = Cloudy) = 0.0999

P(Weather = Snowy) = 0.0001
JOINT PROBABILITY
 The probabilities for a combination of random variables can be stored in a grid
- 2 or more dimensional.

Take a simple example with 3 boolean variables, Late, Male, AM.
Late
Not Late
MaleNot Male MaleNot Male
AM
not AM
0.22 0.25
0.1 0.18
0.08 0.02
0.1 0.05
JOINT PROBABILITY
 The sum of all probabilities adds up to 1.
 Suppose we want to know the probability of being male OR late;

P(Male OR Late) = 0.22+0.25+0.1+0.18+0.08+0.1 = 0.93
 Or the probability of being male AND late;

P(Male AND Late) = 0.22+0.1 = 0.32
 Thus we can deduce probabilities given certain inputs.
JOINT PROBABILITY
 Suppose we add a further random variable - the weather.

We have to expand our table, in reality adding 4 times the size of the weather, one for each weather condition.
 In doing this it is reasonable to ask how the former and latter table are related; how does P(Late, AM, Male,
Weather = sunny) relate to P(Late, AM, Male)?
PRODUCT RULE
 We can use the ‘Product Rule’
 The probability of a and b is the same as the probability of b multiplied by the probability
of a given that b is the case;
 P(a^b) = P(a|b)P(b)
 Or in our case;
 P(Late, AM, Male, Weather = sunny) = P(Weather = sunny | Late, AM, Male)P(Late, AM, Male)
 The probability of a man being late on a sunny morning is the same as the probability of it
being sunny given that a man is late in the morning, multiplied by the probability of a man
being late in the morning.
HANG ON A MINUTE!
 Let’s review that;
 “The probability of a man being late on a sunny morning is the same as the probability
of it being sunny given that a man is late in the morning, multiplied by the
probability of a man being late in the morning.”
 Unless we are a weathermonger, the probability of it being sunny isn’t
influenced by a man’s morning tardiness!
PRODUCT RULE
 The Product Rule can be stated in 2 ways;
 P(a^b) = P(a|b)P(b)
 P(a^b) = P(b|a)P(a)
 Or in our case;
 P(Late, AM, Male, Weather = sunny) = P(Late, AM, Male | Weather = sunny)P(Weather = sunny)
 The probability of a man being late on a sunny morning is the same as the probability of a
man is late in the morning given that it is sunny, multiplied by the probability of it being
sunny.
JOINT PROBABILITIES
 There is intuitively something more satisfactory about this
statement - perhaps the weather could be a factor in
determining lateness.
 We could expand our knowledge about the domain by
adding further random variables - perhaps adding
Transport<Car, Bike, Walk> or FavouriteFood<Icecream,
Steak, Fish>.
 Transport ‘might’ have a direct influence over tardiness - it
could be argued that ‘walkers’ should set off earlier, but
surely FavouriteFood can be considered ‘Independent’.
 When Independence can be found, the subset of data to
be analysed can be greatly reduced.
BAYES’ LAW
 Given
 P(a^b) = P(a|b)P(b)
 P(a^b) = P(b|a)P(a)
 Then,
 P(b|a)P(a) = P(a|b)P(b)
 And so,
 P(b|a) = P(a|b)P(b) / P(a)
 Great! - so what?
BAYES’ LAW
 Requires 2 unconditional probabilities and 1 conditional
probability, to calculate 1 further conditional probability!
 But, if we have those probabilities, then it can be very useful.
 Meningitis causes patients to have stiff necks 50% of the time.
The probability of having meningitis is 1 in 50,000 and the
probability that any patient has a stiff neck is 1 in 20.
 P(s|m) = 0.5
 P(m) = 1/50000
 P(s) = 1/20
 P(m|s) = P(s|m)P(m)/P(s) = (0.5*(1/50000))/(1/20) = 0.0002 or 1 in
5,000
WELCOME TO WUMPUS WORLD
 Wumpus World is a cave consisting of rooms connected by passageways.
Somewhere in the cave is the evil Wumpus beast who eats anyone who enters the
room. An agent can kill the Wumpus though by shooting an arrow - the arrow goes
straight until it hits the Wumpus or a wall. Wumpus World also contains deathly pits
in the floor which trap anyone who enters the room (except the Wumpus). On the
positive side there is a big heap of gold that can be won by a successful agent. In
rooms surrounding the Wumpus the agent can perceive a stench, in rooms
surrounding a pit the agent perceives a breeze, and in rooms surrounding the gold
the agent perceives a glitter. Agents can move through the 4*4 grid by turning left or
right (90 degrees) or walking forward.
WUMPUS WORLD
Stench
Glitter
WUMPUS Breeze
Stench
Glitter
Stench
Breeze
PIT
PIT
Breeze
GOLD
Glitter
Breeze
Breeze
PIT
Breeze
WUMPUS WORLD
OK
OK
WUMPUS WORLD
OK
?P
Breeze
?P
WUMPUS WORLD
W!
Stench
OK
Breeze
P!
WUMPUS WORLD
W!
OK
?G
Stench Glitter
Breeze
OK
?G
P!
ASSIGNMENT – PART 1
 To what extent does Bayes’
Law affect the probability of
locating a pit?

Consider the layout to the
right, what can you deduce
about the probability of
where the pit is?
Breeze
Breeze
ASSIGNMENT – PART II
 Assuming obvious robotic and vision constraints have been solved; how could the approaches to game playing
that we’ve discussed today be applied to robot pool. Could a robot win a game of pool?
SUMMARY
 Games are an excellent test for A.I.
 The minimax algorithm assumes the opponent is infallible and chooses the best worst case scenario for each
turn.

First developing a search tree and then performing depth first search on it.
 The search tree is pruned by alpha beta pruning, but in many cases it is still too large a search tree - so decisions
have to be made when to cut off the search.