The Implementation of Artificial Intelligence and Temporal
Download
Report
Transcript The Implementation of Artificial Intelligence and Temporal
The Implementation of Artificial
Intelligence and Temporal
Difference Learning Algorithms in
a Computerized Chess
Programme
By James Mannion
Computer Systems Lab 08-09
Period 3
Abstract
Searching through large sets of data
Complex, vast domains
Heuristic searches
Chess
Evaluation Function
Machine Learning
Introduction
Simple domains, simple heuristics
The domain of chess
Deep Blue – brute force
Looking at 30100 moves before making the first
Supercomputer
Too many calculations
Not efficient
Introduction (cont’d)
Minimax search
Alpha-beta pruning
Only look 2-3 moves into the future
Estimate strength of position
Evaluation function
Can improve heuristic by learning
Introduction (cont’d)
Seems simple, but can become quite complex.
Chess masters spend careers learning how to
“evaluate” moves
Purpose: can a computer learn a good
evaluation function?
Background
Claude Shannon, 1950
Brute force would take too long
Discusses evaluation function
2-ply algorithm, but looks further into the future
for moves that could lead to checkmate
Possibility of learning in distant future
Background (cont’d)
D.F. Beal, M.C. Smith, 1999
Temporal Difference learning
Program spends 20,000 games learning the
evaluation function
Beats program that did not learn a function
Shows that programs can make their evaluation
functions better
Background (cont’d)
David E. Moriarty, Riso Miikkulainen
Evolutionary Neural Networks
Othello
Complex Strategies developed
Background (cont’d)
Shiu-li Huang, Fu-ren Lin, 2007
Temporal Difference Learning
Multi-Agent Bargaining
Bargaining, while not necessarily
adversarial, is similar to chess and other
games.
Development
Python
Stage 1: Text based chess game
Two humans input their moves
Illegal moves not allowed
Development (cont’d)
Development (cont’d)
Development (cont’d)
Development (cont’d)
•
Stage 2: Introduce a computer player
•
2-3 ply
•
Evaluation function will start out such that
choices are random
Development (cont’d)
Stage 3: Learning
Temporal Difference Learning
Adjusts the weights of the evaluation
function slightly based on gameplay
Evaluation function updated each time a
game is played
Testing
Learning vs No Learning
Two equal, random computer players pitted
against each other.
One will have the ability to learn
Thousands of games
Win-loss differential tracked over the length
of the test
By the end, the learner should be winning
significantly more games.
References
Shannon, Claude. “Programming a Computer
for Playing Chess.” 1950
Beal, D.F., Smith, M.C. “Temporal Difference
Learning for Heuristic Search and Game
Playing.” 1999
Moriarty, David E., Miikkulainen, Risto.
“Discovering Complex Othello Strategies
Through Evolutionary Neural Networks.”
Huang, Shiu-li, Lin, Fu-ren. “Using TemporalDifference Learning for Multi-Agent
Bargaining.” 2007