Transcript ppt

Probabilistic Inference
Reading: Chapter 13
Next time: How should we define artificial
intelligence?
Reading for next time (see Links, Reading for
Retrospective Class):
Turing paper
Mind, Brain and Behavior, John Searle
Prepare discussion points by midnight, wed night
(see end of slides)
Transition to empirical AI

Add in
Ability to infer new facts from old
 Ability to generalize
 Ability to learn based on past observation


Key:
Observation of the world
 Best decision given what is known

2
Overview of Probabilistic Inference

Some terminology

Inference by enumeration

Bayesian Networks
3
4
5
6
7
8
Probability Basics

Sample space

Atomic event

Probability model

An event A
9
10
Random Variables

Random variable

Probability for a random variable
11
12
13
14
15
16
Logical Propositions and Probability


Proposition = event (set of sample points)
Given Boolean random variables A and B:





Event a = set of sample points where A(ω)=true
Event ⌐a=set of sample points where A(ω)=false
Event aΛb=points where A(ω)=true and B(ω)=true
Often the sample space is the Cartesian product of the
range of variables
Proposition=disjunction of atomic events in which it is true

(aVb) = (⌐aΛb)V(aΛ⌐b)V(aΛb)
P(aVb)= P(⌐aΛb)+P(aΛ⌐b)+P(aΛb)
17
18
19
20
21
22
23
24
Axioms of Probability



All probabilities are between 0 and 1
Necessarily true propositions have
probability 1. Necessarily false
propositions have probability 0
The probability of a disjunction is


P(aVb)=P(a)+P(b)-P(aΛb)
P(⌐a)=1-p(a)
25

The definitions imply that certain logically
related events must have related probabilities
P(aVb)= P(a)+P(b)-P(aΛb)
26
Prior Probability

Prior or unconditional probabilities of propositions


Probability distribution gives values for all
possible assignments



P(female=true)=.5 corresponds to belief prior to
arrival of any new evidence
P(color) = (color = green, color=blue, color=purple)
P(color)=<.6,.3,.1> (normalized: sums to 1)
Joint probability distribution for a set of r.v.s gives
the probability of every atomic event on those
r.v.s (i.e., every sample point)

P(color,gender) = a 3X2 matrix
27
28
29
30
31
32
33
Inference by enumeration

Start with the joint distribution
34
Inference by enumeration

P(HasTeeth)=.06+.12+.02=.2
35
Inference by enumeration

P(HasTeethVColor=Green)=.06+.12+.02+.24=.4
4
36
Conditional Probability

Conditional or posterior probabilities



E.g., P(PlayerWins|HostOpenDoor=1 and
PlayerPickDoor2 and Door1=goat) = .5
If we know more (e.g., HostOpenDoor=3 and
door3-goat):
P(PlayerWins)=1
Note: the less specific belief remains valid after
more evidence arrives, but is not always useful
New evidence may be irrelevant, allowing
simplification:

P(PlayerWins|Californiaearthquake)=P(PlayerWins)=.3
37
Conditional Probability
A general version holds for joint distributions:
P(PlayerWins,HostOpensDoor1)=P(PlayerWins|HostOpensDoor1)*P(Ho
stOpensDoor1)
38
Inference by enumeration


Compute conditional probabilities:
P(⌐Hasteeth|color=green)= P(⌐HasteethΛcolor=green)
P(color=green)
0.8
= 0.24
0.06+.24
39
Normalization



Denominator can be viewed as normalization constraint α
P(⌐Hasteeth|color=green) = α P(⌐Hasteeth|color=green)
=α[P(⌐Hasteeth,color=green, female)+
P(⌐Hasteeth,color=green, ⌐ female)]
=α[<0.03,0.12>+<0.03,0.012>]=α<0.06,0.24>
=<0.2,0.8>
Compute distribution on query
variable by fixing evidence variables
and summing over hidden variables
40
Inference by enumeration
41
Independence




A and B are independent iff
P(A|B)=P(A) or P(B|A)=P(B) or
P(A,B)=P(A)P(B)
32 entries reduced to 12; for n
independent biased coins, 2n -> n
Absolute independence powerful but rare
Any domain is large with hundreds of
variables none of which are independent
42
43
Conditional Independence




If I have length <=.2, the probability that I
am female doesn’t depend on whether or not
I have teeth:
P(female|length<=.2,hasteeth)=P(female|h
asteeth)
The same independence holds if I am >.2
P(male|length>.2,hasteeth)=P(male|length>.2)
Gender is conditionally independent of hasteeth
given length
44


In most cases, the use of
conditional independence reduces
the size of the representation of the
joint distribution from exponential
in n to linear in n
Conditional independence is our
most basic and robust form of
knowledge about uncertain
environments
45
Next Class: Turing Paper


A discussion class
Graduate students and non-degree students:
Anyone beyond a bachelor’s:




Prepare a short statement on the paper. Can be
your reaction, your position, a place where you
disagree, an explication of a point.
Undergraduates: Be prepared with questions for
the graduate students
All: Submit your statement or your question by
midnight Wed night.
All statements and questions will be printed and
distributed in class on Wednesday.
46