Transcript P(A | B)
Artificial Intelligence
Probabilistic Reasoning
Course 8
Logic
Propositional Logic
The study of statements and their connectivity
structure
Predicate Logic
The study of individuals and their properties
Uncertainty
Agents almost never have access to the
hole truth about their environment
Agents have to act under uncertainty
Rational decisions depends
Relative importance of several goals
Likelihood
Degree to which they will be achived
Uncertainty
Complete description of environment problems
Laziness – too much work to describe all facts
Theoretical ignorance – no complete theory of the
domain (medicine)
Practical Ignorance – not all situations are analyzed
Degree of belief of affirmations
Probability theory
Dempster-Shafer theory
Fuzzy logic
Truth maintenance systems
Nonmonotonic reasoning
Probabilities
Provides a way of summarizing the
uncertainty that cones from laziness and
ignorance
Uncertainty
Abduction is a reasoning process that tries to
form plausible explanations for abnormal
observations
Abduction is distinctly different from deduction and
induction
Abduction is inherently uncertain
Uncertainty is an important issue in abductive
reasoning
Definition (Encyclopedia Britannica): reasoning
that derives an explanatory hypothesis from a
given set of facts
The inference result is a hypothesis that, if
true, could explain the occurrence of the given
facts
Comparing abduction, deduction,
and induction
Deduction: major premise:
minor premise:
conclusion:
All balls in the box are black
These balls are from the box
These balls are black
Abduction: rule:
observation:
explanation:
All balls in the box are black
These balls are black
These balls are from the box
Induction: case:
These balls are from the box
observation:
These balls are black
hypothesized rule: All ball in the box are black
Deduction reasons from causes to effects
Abduction reasons from effects to causes
Induction reasons from specific cases to general rules
A => B
A
--------B
A => B
B
------------Possibly A
Whenever
A then B
------------Possibly
A => B
Uncertainty
Uncertain inputs
Missing data
Noisy data
Multiple causes lead to multiple effects
Incomplete enumeration of conditions or effects
Incomplete knowledge of causality in the domain
Probabilistic/stochastic effects
Abduction and induction are inherently uncertain
Default reasoning, even in deductive fashion, is
uncertain
Incomplete deductive inference may be uncertain
Uncertain knowledge
Uncertain outputs
Probabilistic reasoning only gives probabilistic
results (summarizes uncertainty from various
sources)
Probabilities
Kolmogorov showed that three simple axioms
lead to the rules of probability theory
1.
All probabilities are between 0 and 1:
•
2.
0 ≤ P(a) ≤ 1
Valid propositions (tautologies) have probability
1, and unsatisfiable propositions have probability
0:
•
3.
De Finetti, Cox, and Carnap have also provided
compelling arguments for these axioms
P(true) = 1 ; P(false) = 0
The probability of a disjunction is given by:
•
P(a b) = P(a) + P(b) – P(a b)
a
ab
b
Paradox
“Monty Hall” problem
Paradox
Conditional Probabilities
P(A|B) – the part of the environment in
which B is true and A is also true
Probability of A, conditioned by B
•D = headache, P(D)=1/10
•G = flue, P(G)=1/40
•P(D|G) = ½
•If someone has flue, the
probability of also having headache
is 50%
•P(D|G)=P(D^G)/P(G)
Bayesian Theorem
P(A|B) = P(A^B) / P(B)
P(A^B) = P(A|B) * P(B)
P(A^B) = P(B|A) * P(A)
=>P(B|A) = P(A|B) * P(B) / P(A)
Example
Diagnosis
Known probabilities
Meningitis: P(M)=0.002%
Stiffed neck: P(N)=5%
Meningitis causes in half of cases stiffed neck:
P(N|M)=50%
If a patient has stiffed neck, what is the
probability to have meningitis?
P(M|N) = P(G|M)*P(M)/P(G) = 0.02%
Independence
Variables A and B are independent if any
of the following hold:
P(A,B) = P(A) P(B)
P(A | B) = P(A)
P(B | A) = P(B)
This says that knowing the outcome of
A does not tell me anything new about
the outcome of B.
Independence
How is independence useful?
Suppose you have n coin flips and you want to
calculate the joint distribution P(C1, …, Cn)
If the coin flips are not independent, you need
2n values in the table
If the coin flips are independent, then
n
P(C1 ,..., Cn ) P(Ci )
i 1
Conditional Independence
Variables A and B are conditionally
independent given C if any of the following
hold:
P(A, B | C) = P(A | C) P(B | C)
P(A | B, C) = P(A | C)
P(B | A, C) = P(B | C)
Knowing C tells me everything about B. I don’t gain
anything by knowing A (either because A doesn’t
influence B or because knowing C provides all the
information knowing A would give)
Bayesian Network
Directed acyclic graphs (DAF) where the nodes represent
random variables and directed edges capture their
dependence
Each node in the graph is a random
variable
A node X is a parent of another node Y if
there is an arrow from node X to node Y eg.
A is a parent of B
A
B
C
Informally, an arrow from node X to node Y
means X has a direct influence on Y
D
Bayesian Networks
Two important properties:
1. Encodes the conditional independence
relationships between the variables in the
graph structure
2. Is a compact representation of the joint
probability distribution over the variables
Conditional Independence
The
Markov condition: given its parents (P1,
P2), a node (X) is conditionally independent of its
non-descendants (ND1, ND2)
P1
ND1
P2
X
C1
ND2
C2
The Joint Probability Distribution
Due to the Markov condition, we can
compute the joint probability distribution
over all the variables X1, …, Xn in the
Bayesian net using the formula:
n
P( X 1 x1 ,..., X n xn ) P( X i xi | Parents( X i ))
i 1
Where Parents(Xi) means the values of the Parents of the node Xi
with respect to the graph
Example
Variables:
•weather can have three states: sunny,
cloudy, or rainy
•grass can be wet or dry
•sprinkler can be on or off
Causal links in this world:
•If it is rainy, then it will make the grass
wet directly.
•But if it is sunny for a long time, that
too can make the grass wet, indirectly,
by causing us to turn on the sprinkler.
Links in bayesian networks
The links may form loops, but they may
not form cycles
Dempster-Shafer Theory
It is based on the work of Dempster who attempted to model
uncertainty by a range of probabilities rather than a single
probabilistic number.
Belife [bel,pl] interval
If no information about A and ¬A is present the belief interval
is [0,1]
Bel = belief bel(A)
Pl = plausibility pl(A)=1-bel(¬A)
(not 0.5)
In the knowledge acquisition process the interval becomes
smaller
bel(A)<=P(A)<=pl(A)
Example
Sue tells the true 90% of the times
Bill tells the true 80% of the times
P(M)=0.9, P(¬M)=0.1
P(B)=0.8, P(¬B)=0.2
Case 1: Sue and Bill tell George that his
car has been stolen
Probability that non of them to be trustable is
0.02
Probability that at least one them is trustable
is 1-0.02=0.98
Belief interval is [0.98,1]
Example 2
Case 2. Sue says that the car is stolen and Bill says that
not
Can be both affirmations trustable (contradictions)
Sue is trustable (the car was stolen)
Bill is trustable (the car was not stolen)
0.18/0.28=0.64
Belief that the car was not stolen
0.18+0.08+0.02=0.28
Belief that the car was stolen
(1-0.8) x (1-0.9) = 0.02
All non null probabilities
0.8 x (1-0.9) = 0.08
Both of them are not trustable (no concrete information)
0.9 x (1-0.8) = 0.18
0.o8/0.28=0.29
The belief interval that the car was not stolen
[0.64,1-0.29]=[0.64,0.71]
Next
Course
Planning and Reasoning in real world
Laboratory
Start CLISP