Lecture-09-20050914 - Kansas State University

Download Report

Transcript Lecture-09-20050914 - Kansas State University

Lecture 9 of 42
Game Tree Search II
Wednesday, 14 September 2005
William H. Hsu
Department of Computing and Information Sciences, KSU
http://www.kddresearch.org
http://www.cis.ksu.edu/~bhsu
Reading:
Chapter 6, Russell and Norvig 2e
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Lecture Outline
•
Today’s Reading
– Sections 6.1-6.4, Russell and Norvig 2e
– Recommended references: Rich and Knight, Winston
•
Reading for Next Class: Sections 6.5-6.8, Russell and Norvig
•
Games as Search Problems
– Frameworks: two-player, multi-player; zero-sum; perfect information
– Minimax algorithm
• Perfect decisions
• Imperfect decisions (based upon static evaluation function)
– Issues
• Quiescence
• Horizon effect
– Need for pruning
•
Next Lecture: Alpha-Beta Pruning, Expectiminimax, Current “Hot” Problems
•
Next Week: Knowledge Representation – Logics and Production Systems
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Overview
•
Perfect Play
– General framework(s)
– What could agent do with perfect info?
•
Resource Limits
– Search ply
– Static evaluation: from heuristic search to heuristic game tree search
– Examples
• Tic-tac-toe, connect four, checkers, connect-five / Go-Moku / wu3 zi3 qi2
• Chess, go
•
Games with Uncertainty
– Explicit: games of chance (e.g., backgammon, Monopoly)
– Implicit: see project suggestions!
Adapted from slides by S. Russell, UC Berkeley
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Minimax Algorithm:
Decision and Evaluation
 what’s this?
 what’s this?
Adapted from slides by S. Russell, UC Berkeley
CIS 730: Introduction to Artificial Intelligence
Figure 5.3 p. 126 R&N
Kansas State University
Department of Computing and Information Sciences
Properties of Minimax
•
Complete?
– … yes, provided following are finite:
• Number of possible legal moves (generative breadth of tree)
• “Length of game” (depth of tree) – more specifically?
– Perfect vs. imperfect information?
• Q: What search is perfect minimax analogous to?
• A: Bottom-up breadth-first
•
Optimal?
– … yes, provided perfect info (evaluation function) and opponent is optimal!
– … otherwise, guaranteed if evaluation function is correct
•
Time Complexity?
– Depth of tree: m
– Legal moves at each point: b
– O(bm) – NB, m  100, b  35 for chess!
•
Space Complexity? O(bm) – why?
Adapted from slides by S. Russell, UC Berkeley
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Review:
Alpha-Beta (-) Pruning Example
What are ,  values here?
≥3
MAX
MIN
≤2
3
≤ 14 ≤5
2
MAX
3
12
8
2
Adapted from slides by S. Russell, UC Berkeley
CIS 730: Introduction to Artificial Intelligence
14
5
2
Figure 5.6 p. 131 R&N
Kansas State University
Department of Computing and Information Sciences
Alpha-Beta (-) Pruning:
Modified Minimax Algorithm
Adapted from slides by S. Russell, UC Berkeley
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Digression:
Learning Evaluation Functions
•
Learning = Improving with Experience at Some Task
– Improve over task T,
– with respect to performance measure P,
– based on experience E.
•
Example: Learning to Play Checkers
– T: play games of checkers
– P: percent of games won in world tournament
– E: opportunity to play against self
•
Refining the Problem Specification: Issues
– What experience?
– What exactly should be learned?
– How shall it be represented?
– What specific algorithm to learn it?
•
Defining the Problem Milieu
– Performance element: How shall the results of learning be applied?
– How shall the performance element be evaluated? The learning system?
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Example: Learning to Play Checkers
•
Type of Training Experience
–
Direct or indirect?
–
Teacher or not?
–
Knowledge about the game (e.g., openings/endgames)?
•
Problem: Is Training Experience Representative (of Performance Goal)?
•
Software Design
•
–
Assumptions of the learning system: legal move generator exists
–
Software requirements: generator, evaluator(s), parametric target function
Choosing a Target Function
–
ChooseMove: Board  Move (action selection function, or policy)
–
V: Board  R (board evaluation function)
–
Ideal target V; approximated target
–
Goal of learning process: operational description
(approximation) of V
Vˆ
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
A Target Function for
Learning to Play Checkers
•
Possible Definition
– If b is a final board state that is won, then V(b) = 100
– If b is a final board state that is lost, then V(b) = -100
– If b is a final board state that is drawn, then V(b) = 0
– If b is not a final board state in the game, then V(b) = V(b’) where b’ is the best
final board state that can be achieved starting from b and playing optimally
until the end of the game
– Correct values, but not operational
•
Choosing a Representation for the Target Function
– Collection of rules?
– Neural network?
– Polynomial function (e.g., linear, quadratic combination) of board features?
– Other?
•
A Representation for Learned Function
Vˆ b   w 0  w 1bp b   w 2 rp b   w 3 bk b   w 4 rk b   w 5 bt b   w 6 rt b 
– bp/rp = number of black/red pieces; bk/rk = number of black/red kings;
bt/rt = number of black/red pieces threatened (can be taken on next turn)
–
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
A Training Procedure for
Learning to Play Checkers
•
Obtaining Training Examples
–
the learned function
the training value
One Rule For Estimating Training Values:
–
•
the target function
V̂ b 
–
Vtrain b 
–
•
V b 
Vtrain b   Vˆ Successor b 
Choose Weight Tuning Rule
–
Least Mean Square (LMS) weight update rule:
REPEAT
• Select a training example b at random
• Compute the error(b) for this training example
error b   Vtrain b   Vˆ b 
• For each board feature fi, update weight wi as follows:
w i  w i  c  fi  error b 
where c is a small, constant factor to adjust the learning rate
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Design Choices for
Learning to Play Checkers
Determine Type of
Training Experience
Games
against experts
Games
against self
Table of
correct moves
Determine
Target Function
Board  move
Board  value
Determine Representation of
Learned Function
Polynomial
Linear function
of six features
Artificial neural
network
Determine
Learning Algorithm
Gradient
descent
Linear
programming
Completed Design
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Knowledge Bases
Adapted from slides by S. Russell, UC Berkeley
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Simple Knowledge-Based Agent
Adapted from slides by S. Russell, UC Berkeley
CIS 730: Introduction to Artificial Intelligence
Figure 6.1 p. 152 R&N
Kansas State University
Department of Computing and Information Sciences
Summary Points
•
Introduction to Games as Search Problems
– Frameworks
• Two-player versus multi-player
• Zero-sum versus cooperative
• Perfect information versus partially-observable (hidden state)
– Concepts
• Utility and representations (e.g., static evaluation function)
• Reinforcements: possible role for machine learning
• Game tree
•
Family of Algorithms for Game Trees: Minimax
– Propagation of credit
– Imperfect decisions
– Issues
• Quiescence
• Horizon effect
– Need for pruning
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences
Terminology
•
Game Graph Search
– Frameworks
• Two-player versus multi-player
• Zero-sum versus cooperative
• Perfect information versus partially-observable (hidden state)
– Concepts
• Utility and representations (e.g., static evaluation function)
• Reinforcements: possible role for machine learning
• Game tree: node/move correspondence, search ply
•
Family of Algorithms for Game Trees: Minimax
– Propagation of credit
– Imperfect decisions
– Issues
• Quiescence
• Horizon effect
– Need for (alpha-beta) pruning
CIS 730: Introduction to Artificial Intelligence
Kansas State University
Department of Computing and Information Sciences