Transcript P(A | B)

Artificial Intelligence
Probabilistic Reasoning
Course 8
Logic

Propositional Logic


The study of statements and their connectivity
structure
Predicate Logic

The study of individuals and their properties
Uncertainty

Agents almost never have access to the
hole truth about their environment


Agents have to act under uncertainty
Rational decisions depends



Relative importance of several goals
Likelihood
Degree to which they will be achived
Uncertainty

Complete description of environment problems




Laziness – too much work to describe all facts
Theoretical ignorance – no complete theory of the
domain (medicine)
Practical Ignorance – not all situations are analyzed
Degree of belief of affirmations

Probability theory

Dempster-Shafer theory

Fuzzy logic

Truth maintenance systems
Nonmonotonic reasoning

Probabilities

Provides a way of summarizing the
uncertainty that cones from laziness and
ignorance
Uncertainty

Abduction is a reasoning process that tries to
form plausible explanations for abnormal
observations


Abduction is distinctly different from deduction and
induction
Abduction is inherently uncertain

Uncertainty is an important issue in abductive
reasoning

Definition (Encyclopedia Britannica): reasoning
that derives an explanatory hypothesis from a
given set of facts
 The inference result is a hypothesis that, if
true, could explain the occurrence of the given
facts
Comparing abduction, deduction,
and induction

Deduction: major premise:
minor premise:
conclusion:
All balls in the box are black
These balls are from the box
These balls are black

Abduction: rule:
observation:
explanation:
All balls in the box are black
These balls are black
These balls are from the box

Induction: case:
These balls are from the box
observation:
These balls are black
hypothesized rule: All ball in the box are black
Deduction reasons from causes to effects
Abduction reasons from effects to causes
Induction reasons from specific cases to general rules
A => B
A
--------B
A => B
B
------------Possibly A
Whenever
A then B
------------Possibly
A => B
Uncertainty



Uncertain inputs


Missing data
Noisy data




Multiple causes lead to multiple effects
Incomplete enumeration of conditions or effects
Incomplete knowledge of causality in the domain
Probabilistic/stochastic effects


Abduction and induction are inherently uncertain
Default reasoning, even in deductive fashion, is
uncertain
Incomplete deductive inference may be uncertain
Uncertain knowledge
Uncertain outputs

Probabilistic reasoning only gives probabilistic
results (summarizes uncertainty from various
sources)
Probabilities

Kolmogorov showed that three simple axioms
lead to the rules of probability theory

1.
All probabilities are between 0 and 1:
•
2.
0 ≤ P(a) ≤ 1
Valid propositions (tautologies) have probability
1, and unsatisfiable propositions have probability
0:
•
3.
De Finetti, Cox, and Carnap have also provided
compelling arguments for these axioms
P(true) = 1 ; P(false) = 0
The probability of a disjunction is given by:
•
P(a  b) = P(a) + P(b) – P(a  b)
a
ab
b
Paradox

“Monty Hall” problem
Paradox
Conditional Probabilities

P(A|B) – the part of the environment in
which B is true and A is also true

Probability of A, conditioned by B
•D = headache, P(D)=1/10
•G = flue, P(G)=1/40
•P(D|G) = ½
•If someone has flue, the
probability of also having headache
is 50%
•P(D|G)=P(D^G)/P(G)
Bayesian Theorem
P(A|B) = P(A^B) / P(B)
 P(A^B) = P(A|B) * P(B)
 P(A^B) = P(B|A) * P(A)


=>P(B|A) = P(A|B) * P(B) / P(A)
Example

Diagnosis

Known probabilities




Meningitis: P(M)=0.002%
Stiffed neck: P(N)=5%
Meningitis causes in half of cases stiffed neck:
P(N|M)=50%
If a patient has stiffed neck, what is the
probability to have meningitis?

P(M|N) = P(G|M)*P(M)/P(G) = 0.02%
Independence

Variables A and B are independent if any
of the following hold:



P(A,B) = P(A) P(B)
P(A | B) = P(A)
P(B | A) = P(B)
This says that knowing the outcome of
A does not tell me anything new about
the outcome of B.
Independence
How is independence useful?
 Suppose you have n coin flips and you want to
calculate the joint distribution P(C1, …, Cn)
 If the coin flips are not independent, you need
2n values in the table
 If the coin flips are independent, then
n
P(C1 ,..., Cn )   P(Ci )
i 1
Conditional Independence

Variables A and B are conditionally
independent given C if any of the following
hold:



P(A, B | C) = P(A | C) P(B | C)
P(A | B, C) = P(A | C)
P(B | A, C) = P(B | C)
Knowing C tells me everything about B. I don’t gain
anything by knowing A (either because A doesn’t
influence B or because knowing C provides all the
information knowing A would give)
Bayesian Network

Directed acyclic graphs (DAF) where the nodes represent
random variables and directed edges capture their
dependence
Each node in the graph is a random
variable
A node X is a parent of another node Y if
there is an arrow from node X to node Y eg.
A is a parent of B
A
B
C
Informally, an arrow from node X to node Y
means X has a direct influence on Y
D
Bayesian Networks
Two important properties:
1. Encodes the conditional independence
relationships between the variables in the
graph structure
2. Is a compact representation of the joint
probability distribution over the variables
Conditional Independence
The
Markov condition: given its parents (P1,
P2), a node (X) is conditionally independent of its
non-descendants (ND1, ND2)
P1
ND1
P2
X
C1
ND2
C2
The Joint Probability Distribution
Due to the Markov condition, we can
compute the joint probability distribution
over all the variables X1, …, Xn in the
Bayesian net using the formula:
n
P( X 1  x1 ,..., X n  xn )   P( X i  xi | Parents( X i ))
i 1
Where Parents(Xi) means the values of the Parents of the node Xi
with respect to the graph
Example
Variables:
•weather can have three states: sunny,
cloudy, or rainy
•grass can be wet or dry
•sprinkler can be on or off
Causal links in this world:
•If it is rainy, then it will make the grass
wet directly.
•But if it is sunny for a long time, that
too can make the grass wet, indirectly,
by causing us to turn on the sprinkler.
Links in bayesian networks

The links may form loops, but they may
not form cycles
Dempster-Shafer Theory


It is based on the work of Dempster who attempted to model
uncertainty by a range of probabilities rather than a single
probabilistic number.
Belife [bel,pl] interval



If no information about A and ¬A is present the belief interval
is [0,1]


Bel = belief bel(A)
Pl = plausibility pl(A)=1-bel(¬A)
(not 0.5)
In the knowledge acquisition process the interval becomes
smaller

bel(A)<=P(A)<=pl(A)
Example

Sue tells the true 90% of the times


Bill tells the true 80% of the times


P(M)=0.9, P(¬M)=0.1
P(B)=0.8, P(¬B)=0.2
Case 1: Sue and Bill tell George that his
car has been stolen



Probability that non of them to be trustable is
0.02
Probability that at least one them is trustable
is 1-0.02=0.98
Belief interval is [0.98,1]
Example 2

Case 2. Sue says that the car is stolen and Bill says that
not


Can be both affirmations trustable (contradictions)
Sue is trustable (the car was stolen)


Bill is trustable (the car was not stolen)


0.18/0.28=0.64
Belief that the car was not stolen


0.18+0.08+0.02=0.28
Belief that the car was stolen


(1-0.8) x (1-0.9) = 0.02
All non null probabilities


0.8 x (1-0.9) = 0.08
Both of them are not trustable (no concrete information)


0.9 x (1-0.8) = 0.18
0.o8/0.28=0.29
The belief interval that the car was not stolen

[0.64,1-0.29]=[0.64,0.71]
Next

Course


Planning and Reasoning in real world
Laboratory

Start CLISP