Transcript Class 9

CS 415 – A.I.
Slide Set 12
Chapter 5 – Stochastic Learning
 Heuristic – apply to problems who either don’t
have an exact solution, or whose state spaces
are prohibitively large
 Stochastic Methodology – also good for these
situations

Based on counting the elements of an application
domain
Addition and Multiplication Rule





Set A
|A| - cardinality of A (number of elts)
 A could be: empty, finite, countably infinite, or
uncountably infinite
U – Universe (a.k.a. Domain)
 The set of ALL elements that could be in A
A’ - Compliment
Example
 U – people in a room
 A – males from U
 A’ – females in the room
Other Notations
 Subset, Union, Intersection
Permutations and Combinations
 Permutation – an arranged sequence of
elements of a set (each used only once)

Question: how many unique permutations are
there of a set of size n?


n * n-1 * n-2 * n-3 * … * 1
Question: how many ways can we arrange a set
of 10 books on a shelf where only 6 books can
fit?

nPr

Combination – Any subset of the elements that can be formed
 Question: How many combinations given a set of items?
 1 combination for n elements
 Order DOES NOT MATTER
 Question: How many combinations taken r at a time (How
many ways can I form a four person committee from 10
people?)
Elements of Probability Theory
Examples
 What is the probability of rolling a 7 or 11
from two fair die?

Sample Space Size?


36
Event Size?

8



For 6: 1,6; 2,5; 3,4; 4,3; 5,2; 6,1
For 11: 10,1; 1,10
Probability

8/36 = 2/9

Add them together because they are “independent”
 How many four-of-a-kind hands can be dealt
in all possible five card hands?

Sample Space?


52 cards taken 5 at a time
Event Space?

Multiply number of combinations of 13 cards 1 at a
time * Combination of 4 taken 4 at a time * 48


(number of different kinds of cards) * (number of ways to
pick all four cards of same kind) * (times the number of ways
the fifth card can be chosen)
See top of pg 172

Approx. 0.00024
The probability of any event E from the sample space S is:
The sum of the probabilities of all possible outcomes is 1
The probability of the compliment of an event is
The probability of the contradictory or false outcome of an event
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005
8
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005
9
Are Two Events Independent?
 Random bit strings of length four
 2 Events
1.
2.
String has even number of ones
Bit string ends with a zero
 A total of 24=16 bit strings
–
–
8 strings end with zero
8 strings have even number of ones
 Are independent events
All of Probability Theory
 In a Nut Shell
Probabilistic Inference: Example
 3 boolean random variables

All either true or false




Given state traffic data in Table 5.1



S – traffic is slowing down
A – probability of an accident
C – probability of road construction
Next slide
Note: all possibilities sum to 1
Can use these numbers to calculate


Probability traffic slowdown
Probability of construction without slowdown, etc.
Table 5.1 The joint probability distribution for the traffic slowdown, S, accident, A, and
construction, C, variable of the example of Section 5.3.2
Fig 5.1 A Venn diagram representation of the probability distributions of Table 5.1; S
is traffic slowdown, A is accident, C is construction.
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005
11
Random Variables
 Individual
Probability Computation
Combinatorical Methods (Analytical)
1.

Ex: Probability of rolling a 5 on a 6-sided die
Sampling Events (Empirical)
2.


For times when it isn’t that simple to analyze
Assumptions




Not all events are equally likely

(Easier if they are)
Probability of event lies between 0 and 1
Probabilities of union of sets still holds
Use “Random Variables”
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005
12
Random Variable Example
 Random variable – Season

Domain – {spring, summer, fall, winter}
 Discrete Random Variable

p(Season=spring) = .75
 Boolean Random Variable

p(Season=spring) = true
Expectation
 Expectation – the notion of expected payoff or
cost of an outcome
Example
 A fair roulette wheel



Integers 0 – 36 equally spaced
Each player places $1 on any number
Wheel stops on that number, wins $35

Else – loses the $1
 Reward of win $35

Probability 1/37
 Cost of loss $1

Probability 36/37
 Ex(E) = 35(1/37) + (-1)(36/37) = -0.027
Conditional Probability
 2 kinds of Probablities
Prior Probabilities
1.

What’s the probability of getting a 2 or a 3 on a fair
die?
Conditional (Posterior) Probabilities
2.

If a patient has system X, Y and Z then what is the
probability that he has the flu?
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005
14
Fig 5.2 A Venn diagram illustrating the calculations of P(d|s) as a function of
p(s|d).
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005
15
The chain rule for two sets:
The generalization of the chain rule to multiple sets
We make an inductive argument to prove the chain rule, consider the
nth case:
We apply the intersection of two sets of rules to get:
And then reduce again, considering that:
Until
is reached, the base case, which we have already
demonstrated.
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005
16
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005
17
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005
18
You say [to ow m ey tow] and I say [t ow m aa t ow]…
- Ira Gershwin, Lets Call The Whole Thing Off
Fig 5.3 A probabilistic finite state acceptor for the pronunciation of “tomato”,
adapted from Jurafsky and Martin (2000).
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005
19
Phoneme Recognition Problem


Use the Tomato style stochastic finite state acceptor
 Interpret ambiguous collections of phonemes
 See how well the phonemes match the path through the
state machine for this and other words
phoneme is the smallest structural unit that distinguishes
meaning. Phonemes are not the physical segments
themselves, but, in theoretical terms, cognitive abstractions
or categorizations of them.

the /t/ sound in the words tip, stand, water, and cat
Phoneme Recognition Problem



Suppose an algorithm has identified the phoneme /ni/
 Occurs just after other recognized speech, /l/
Need to associate phoneme with either a word or the first
part of a word
How?
 Brown corpus
 1 Million word collection of sentences from 500 texts
 Switchboard corpus
 1.4 Million word collection of phone conversations

Together: ~2.5 Million words that let us sample
written and spoken words
How to proceed?
 Can determine which word with the phoneme
is most likely used


See Table 5.2
Most likely, “the”
Table 5.2 The ni words with their frequencies and probabilities from the Brown
and Switchboard corpora of 2.5M words, adapted from Jurafsky and Martin
(2000).
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005
20
 Use Bayes’ theorem



p(word | [ni]) = p([ni]|word) x p(word)
See Table 5.3
Most likely, “new”


“I new” doesn’t make sense
“I need”, does
Table 5.3 The ni phone/word probabilities from the Brown and Switchboard
corpora (Jurafsky and Martin, 2000).
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005
21
Bayes’ Theorem
 Review: one disease and one symptom

Individual hypotheses, hi

Each is disjoint

Set of hypotheses, H
 Set of evidence, E
P(hi|E) = (p(E|hi) x p(hi))/p(E)
 Can use this to determine which hypothesis is
strongest given E


Drop the denominator
Arg max (maximum likelihood) hypothesis
Finding p(E)
 Given: entire space is partitioned by the set of
hypotheses hi

Partition of a set = split of set into disjoint
subsets
p(E) = Σip(E|hi)p(hi)
Bayes’ Theorem : General Form
The general form of Bayes’ theorem where we assume the set of
hypotheses H partition the evidence set E:
Example
 Suppose you want to buy a car


Prob go to dealer 1, d1
Prob purchasing a car at d1, a1
 Necessary for using Bayes’


All probabilities with various hi must be known
All relationships between evidence and
hypothesis {p(E|hi)} must be known
The application of Bayes’ rule to the car purchase problem:
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005
23
Naïve Bayes, or the Bayes classifier, that uses the partition
assumption, even when it is not justified:
Luger: Artificial Intelligence, 5th edition. © Pearson Education Limited, 2005
24