Week 12 - Ken Cosh
Download
Report
Transcript Week 12 - Ken Cosh
269202 ALGORITHMS FOR ISNE
DR. KENNETH COSH
WEEK 12
REVIEW
Searching
Uninformed Searches
Breadth First
Depth First
Iterative
Informed Searches
Best First
Heuristics
Hill Climbing
Simulated Annealing
THIS WEEK
Adversarial Searches
Investigating how to plan ahead in environments where other agents are planning against us.
Game Theory
WHO STOLE THE FINAL EXAM?
OK, I know one of you stole the final exam…
You have 3 choices, “Admit it”, “Accuse someone else” or “Deny knowledge”.
If you “Deny Knowledge” you will get a D, unless everyone denies knowledge,
when you’ll get a B
If you “Accuse someone” you will get a C, or if everyone else accuses the same
person you will get a B, unless they admit it when you get a D.
If you “Admit it” you get an F, unless everyone else accuses you, when you get an
A.
GAME THEORY
That example is similar to the famous “Prisoner’s Dilemma”
“Two suspects, A and B, are arrested by the police. The police have insufficient evidence for a conviction, and,
having separated both prisoners, visit each of them to offer the same deal: if one testifies for the prosecution
against the other and the other remains silent, the betrayer goes free and the silent accomplice receives the
full 10-year sentence. If both stay silent, the police can sentence both prisoners to only six months in jail for a
minor charge. If each betrays the other, each will receive a two-year sentence. Each prisoner must make the
choice of whether to betray the other or to remain silent. However, neither prisoner knows for sure what
choice the other prisoner will make. So the question this dilemma poses is: What will happen? How will the
prisoners act?”
GAME THEORY
A branch of economics investigates any multiagent environment where one agents
actions impact on another agent, whether co-operatively or competitively.
Games are deterministic, turn-taking, two-player, zero-sum games with perfect
information.
In other words a deterministic and fully observable environment with two agents who alternate
actions. Utility values at the end of the game are equal and opposite - the winner of chess receives
‘1’, the loser ‘-1’ (or possibly ‘0’ in stalemate).
GAMES
Games fit well with A.I. as they are easy to represent and agents are
restricted to a small number of actions, the results of the actions are
well defined.
This is in contrast to physical games such as football (or American Rugby) - here
there are imprecise rules and a wide variety of actions. Computer game
manufacturers attempts at producing such games often result in limited and
repetitive versions.
GAME SUCCESS
AI has had some success at game playing;
Machines are better than humans in Othello and checkers.
Sometimes they can win at chess and backgammon.
Why not more success?
At Go AI only performs amateurly.
The average game of chess has a branching factor of 35, with
roughly 50 moves per player - which is 35100 or 10154 nodes on a
search graph!
So we have to make a decision, even if it isn’t optimal.
FORMAL DEFINITION
Initial State - the initial board position indicating who’s move.
Successor Function - returns a list of moves and their resulting states.
Terminal Test - determines if it is a terminal state, i.e. if the game is
over.
Utility Function - An objective function giving a numeric value for
states - terminal states could be -1, 0 or 1.
SEARCHING
From that definition we can create a game tree, which can be searched for optimal
moves.
Normal searches produce a sequence of actions leading to a terminal win state, but
in games the other agent has an impact.
Therefore the search needs to find a contingent strategy .
The optimal strategy is one which finds an outcome at least as good as any other
against an infallible opponent.
GAME TREES
Initial Moves (X):
X
X
Move 2 (O):
XO
Move 3 (X):
XOX XO
X
….
Terminal States:
Utility:
X OX
O
X
XO
X
….
X
O
0
O
XO XO
X
X
X
X
X
X
O
XO
X
X
O
XO
X
O
X
X
….
….
….
XOX XOX XOX
OX OOX
X
O X X O X OO
-1
X
X
1
The search tree even for a
game as simple as tic tac toe, is
v. complex
X
MINIMAX ALGORITHM
Assuming the opponent will play optimally, the minimax algorithm is designed to maximise the worst case
outcome - in tictactoe this is a draw.
A
a1
B
b1
4
b2
7
a3
a2
C
b3
3
c1
2
D
c3
c2
12
6
d1
10
d2
11
In this simple example - which action should A take (a1, a2 or a3)?
d3
2
MINIMAX ALGORITHM
A
a1
3
4
2
B
b1
b2
7
b3
3
a3
a2
c1
2
2
C
c3
c2
12
6
d1
10
D
d2
11
d3
2
Even though the best utility is found with ‘a2’ (12), and the best
average is found with ‘a3’ (7.67), ‘a1’ is the choice as we assume
that the opponent will choose their optimum strategy.
MULTIPLAYER GAMES
In games with more than 2 players, the utility is scored as a vector containing utilities for each player (5,2,3).
Minimax functions will choose an action which still leads to their best worst case scenario.
In multiplayer games, often ‘alliances’ are formed, with more than one player joining forces in order to stop
another player from reaching a terminal win state.
MINIMAX EFFICIENCY
If the maximum depth of the tree is ‘m’ and there are ‘b’ legal moves at each point, then time complexity is O(bm)
and space complexity is O(bm), i.e. Exponential time efficiency.
We can improve the efficiency, for instance through Alpha-Beta Pruning;
If a player has a better choice ‘m’ either at the parent node of ‘n’ or any node higher up, then ‘n’ will never be reached, so
effectively we can prune it.
PRUNING EXAMPLE
A
a1
3
B
b1
4
b2
7
2
b3
3
c1
2
a3
a2
C
2
c3
12
6
10
first search, finding the worst case ‘3’.
D
d1
c2
First we examine choice ‘a1’ with depth
d2
11
d3
2
A
a1
3
B
b1
4
b2
7
b3
3
a2
a3
PRUNING EXAMPLE 2
A
a1
3
b1
4
B
b2
7
2
b3
c1
3
a3
a2
C
Examining choice ‘a2’, we immediately find a
worse worst case ‘2’, so no need to
continue searching
c3
c2
A
2
a1
3
When we look at choice ‘a3’, we
have to search all 3 leaves before
we find a worse worst case.
B
b1
4
b2
7
2
b3
3
c1
2
a3
a2
C
2
c3
D
d1
c2
10
d2
11
d3
2
PRUNING EFFICIENCY
Pruning choices in this way has dramatic effects on efficiency.
We only need to examine O(bd/2) nodes
This is assuming that we can examine successors that are likely to be best first. - You know what ASS U ME
means.
Even with random selection, the efficiency improves, however, for many games this time inefficiency makes playing
the game impractical - a minimax decision for chess would take longer than the allowed 2 hour game.
IMPERFECT, REAL-TIME DECISIONS
Our algorithm needs to be able to make a decision within a specified time period, based on imperfect
information.
Humans use (sub-consciously) a heuristic utility function to judge the value of a position as humans are even
more limited in searching capabilities - so a similar heuristic is needed here
I.e. Cut the search off at a non-terminal state.
DESIGNING A HEURISTIC
A heuristic needs to comprise various features, weighted by their importance -
perhaps using probabilities of success based on previous experiences (in this
case 72% of previous instances have lead to success).
Human experience can be used to guide the heuristic, for example in chess;
Material Value - each piece can be given a value, pawn = 1, queen =8
Pawn Structure - a good pawn layout might be worth more.
King Safety - a well protected King might be worth a few points.
2 bishops together are worth a little over 2*1bishop.
Bishops are more valuable during the end game than at the start.
SEARCH CUT-OFF
Given the sophisticated heuristic function, which can be applied to various
states, a cut off point must be chosen so the amount of time used is
acceptable.
The wrong cut off point could give
rise to what is known as the ‘horizon
effect’ - where a poor worst case is
just over the horizon of the search.
ALTERNATIVE HORIZONS
An alternative vision of the horizon effect
comes from this situation.
A computer playing as black almost
inexplicably chose to move e7-e8 - why?
GAMES WITH CHANCE
Some games, such as Backgammon combine chance with skill, where chance is controlled by the roll of a dice.
We can’t construct a game tree as we don’t know what our opponents possible moves are - but we do know what their
likely moves are.
So the game tree needs to be expanded to include probabilities of the opponent receiving certain dice rolls.
PROBABILITY
Probability is based on our percepts of the environment - what we know about the environment.
My doctor said golf caused my shoulder injury as soon as he knew I played golf - even though the shoulder injury dates
from before I started playing golf.
When we pick a card there is a 1/52 chance it is the Ace of Spades, after we look at it, the chance is either 0 or 1.
Probabilities can change when more evidence is acquired.
UTILITY THEORY
Utility Theory represents preferences for certain states - the quality of a state being useful;
Is it preferable to choose plan C1440 (leaving for the airport 24 hours early) which has a 99.999% probability of success over
plan A60, which has a 95% probability of success? Given the poor utility of the long wait, perhaps not.
Should I stop playing golf to improve my ‘lack of shoulder pain’ utility, and risk lowering my ‘playing golf pleasure’ utility?
DECISION THEORY
Decision Theory = Probability Theory + Utility Theory.
An agent is rational if they choose the action which yields the highest utility averaged across all possible outcomes
of the action.
The principle of Maximum Expected Utility (MEU).
ACTING UNDER UNCERTAINTY
ACTING UNDER UNCERTAINTY
Suppose we need to check in at the airport and need to choose a time to set off. Plan A60 involves leaving 60
minutes before check-in - we could assume this was a good plan.
We can’t say “Plan A60 will get us to the airport in time”, only “Plan A60 will get us to the airport in time so long as we
have enough gas, and there are no accidents, and the car doesn’t break down, and check-in doesn’t close early, and…”
Perhaps Plan B90 would be better?
ACTING UNDER UNCERTAINTY
While Plan B90 increases the ‘degree of belief’ that we will get to the airport on time, it also introduces a likely
long unproductive wait at the airport.
To maximise an agents performance, the relative importance of both goals needs to be considered;
‘getting to the airport’
‘avoiding a long wait’
The rational decision is A60, but how do we draw that conclusion?
DIAGNOSIS
Diagnosis is a task aimed at dealing with uncertainty normally using probability theory to construct a degree of
belief in various statements.
Your car won’t start, so there’s a 80% chance the battery is dead.
You’ve got pain in your left calf so there’s a 70% chance you’ve pulled a muscle, and a 2% chance your leg has fallen off.
Probability provides a way of summarising the uncertainty that comes from our laziness and ignorance.
DOMAIN RANDOM VARIABLES
Boolean Random Variables
Late<True, False>
Discrete RandomVariables
Weather<Sunny, Rainy, Snowy, Cloudy>
Continuous Random Variables
Temperature = 30.2
ATOMIC EVENTS
Atomic Events are a particular combination of states;
Late = True, Weather = Cloudy
Late = False, Weather = Sunny
The existence of certain atomic events can lead to certain understandings;
Late = False, Weather = Rainy, means <Rainy => Late> = False
PRIOR PROBABILITY
Unconditional Probabilities can be assigned to each state - degree of belief with no other information;
P(Weather = Sunny) = 0.8
P(Weather = Rainy) = 0.1
P(Weather = Cloudy) = 0.0999
P(Weather = Snowy) = 0.0001
JOINT PROBABILITY
The probabilities for a combination of random variables can be stored in a grid
- 2 or more dimensional.
Take a simple example with 3 boolean variables, Late, Male, AM.
Late
Not Late
MaleNot Male MaleNot Male
AM
not AM
0.22 0.25
0.1 0.18
0.08 0.02
0.1 0.05
JOINT PROBABILITY
The sum of all probabilities adds up to 1.
Suppose we want to know the probability of being male OR late;
P(Male OR Late) = 0.22+0.25+0.1+0.18+0.08+0.1 = 0.93
Or the probability of being male AND late;
P(Male AND Late) = 0.22+0.1 = 0.32
Thus we can deduce probabilities given certain inputs.
JOINT PROBABILITY
Suppose we add a further random variable - the weather.
We have to expand our table, in reality adding 4 times the size of the weather, one for each weather condition.
In doing this it is reasonable to ask how the former and latter table are related; how does P(Late, AM, Male,
Weather = sunny) relate to P(Late, AM, Male)?
PRODUCT RULE
We can use the ‘Product Rule’
The probability of a and b is the same as the probability of b multiplied by the probability
of a given that b is the case;
P(a^b) = P(a|b)P(b)
Or in our case;
P(Late, AM, Male, Weather = sunny) = P(Weather = sunny | Late, AM, Male)P(Late, AM, Male)
The probability of a man being late on a sunny morning is the same as the probability of it
being sunny given that a man is late in the morning, multiplied by the probability of a man
being late in the morning.
HANG ON A MINUTE!
Let’s review that;
“The probability of a man being late on a sunny morning is the same as the probability
of it being sunny given that a man is late in the morning, multiplied by the
probability of a man being late in the morning.”
Unless we are a weathermonger, the probability of it being sunny isn’t
influenced by a man’s morning tardiness!
PRODUCT RULE
The Product Rule can be stated in 2 ways;
P(a^b) = P(a|b)P(b)
P(a^b) = P(b|a)P(a)
Or in our case;
P(Late, AM, Male, Weather = sunny) = P(Late, AM, Male | Weather = sunny)P(Weather = sunny)
The probability of a man being late on a sunny morning is the same as the probability of a
man is late in the morning given that it is sunny, multiplied by the probability of it being
sunny.
JOINT PROBABILITIES
There is intuitively something more satisfactory about this
statement - perhaps the weather could be a factor in
determining lateness.
We could expand our knowledge about the domain by
adding further random variables - perhaps adding
Transport<Car, Bike, Walk> or FavouriteFood<Icecream,
Steak, Fish>.
Transport ‘might’ have a direct influence over tardiness - it
could be argued that ‘walkers’ should set off earlier, but
surely FavouriteFood can be considered ‘Independent’.
When Independence can be found, the subset of data to
be analysed can be greatly reduced.
BAYES’ LAW
Given
P(a^b) = P(a|b)P(b)
P(a^b) = P(b|a)P(a)
Then,
P(b|a)P(a) = P(a|b)P(b)
And so,
P(b|a) = P(a|b)P(b) / P(a)
Great! - so what?
BAYES’ LAW
Requires 2 unconditional probabilities and 1 conditional
probability, to calculate 1 further conditional probability!
But, if we have those probabilities, then it can be very useful.
Meningitis causes patients to have stiff necks 50% of the time.
The probability of having meningitis is 1 in 50,000 and the
probability that any patient has a stiff neck is 1 in 20.
P(s|m) = 0.5
P(m) = 1/50000
P(s) = 1/20
P(m|s) = P(s|m)P(m)/P(s) = (0.5*(1/50000))/(1/20) = 0.0002 or 1 in
5,000
WELCOME TO WUMPUS WORLD
Wumpus World is a cave consisting of rooms connected by passageways.
Somewhere in the cave is the evil Wumpus beast who eats anyone who enters the
room. An agent can kill the Wumpus though by shooting an arrow - the arrow goes
straight until it hits the Wumpus or a wall. Wumpus World also contains deathly pits
in the floor which trap anyone who enters the room (except the Wumpus). On the
positive side there is a big heap of gold that can be won by a successful agent. In
rooms surrounding the Wumpus the agent can perceive a stench, in rooms
surrounding a pit the agent perceives a breeze, and in rooms surrounding the gold
the agent perceives a glitter. Agents can move through the 4*4 grid by turning left or
right (90 degrees) or walking forward.
WUMPUS WORLD
Stench
Glitter
WUMPUS Breeze
Stench
Glitter
Stench
Breeze
PIT
PIT
Breeze
GOLD
Glitter
Breeze
Breeze
PIT
Breeze
WUMPUS WORLD
OK
OK
WUMPUS WORLD
OK
?P
Breeze
?P
WUMPUS WORLD
W!
Stench
OK
Breeze
P!
WUMPUS WORLD
W!
OK
?G
Stench Glitter
Breeze
OK
?G
P!
ASSIGNMENT – PART 1
To what extent does Bayes’
Law affect the probability of
locating a pit?
Consider the layout to the
right, what can you deduce
about the probability of
where the pit is?
Breeze
Breeze
ASSIGNMENT – PART II
Assuming obvious robotic and vision constraints have been solved; how could the approaches to game playing
that we’ve discussed today be applied to robot pool. Could a robot win a game of pool?
SUMMARY
Games are an excellent test for A.I.
The minimax algorithm assumes the opponent is infallible and chooses the best worst case scenario for each
turn.
First developing a search tree and then performing depth first search on it.
The search tree is pruned by alpha beta pruning, but in many cases it is still too large a search tree - so decisions
have to be made when to cut off the search.