Probability in AI - Temple University
Download
Report
Transcript Probability in AI - Temple University
Made by: Maor Levy, Temple University 2012
1
Probability expresses uncertainty.
Pervasive in all of Artificial Intelligence
Machine learning
Information Retrieval (e.g., Web)
Computer Vision
Robotics
Based on mathematical calculus.
Probability of a fair coin:
1
P(COIN tail )
2
1
P( tail )
2
2
Example: Probability of cancer
P(has cancer) = 0.02
P(¬has cancer) = 0.98
Multiple events: cancer, test result
P(has cancer, test positive)
The problem with joint distributions: it takes
2𝐷 − 1 numbers to specify them!
3
Conditional Probability describes the cancer
test:
P(test positive | has cancer) = 0.9
P(test positive | ¬has cancer) = 0.2
Put this together with: Prior probability
P(has cancer) = 0.02
P(test negative | has cancer) = 0.1
Total probability is a fundamental rule relating
marginal probabilities to conditional
probabilities.
𝑃 𝑌 =
𝑖𝑃
𝑌 𝑋 = 𝑖 𝑃(𝑋 = 𝑖)
4
In summary:
P(has cancer) = 0.02
P(¬has cancer) = 0.98
P(test positive | has cancer) = 0.9
P(test positive | ¬has cancer) = 0.2
P(test negative | has cancer) = 0.1
P(test negative | ¬has cancer) = 0.8
P(cancer) and P(Test positive | cancer) is called
the model.
Calculating P(Test positive) is called
prediction.
Calculating P(Cancer | test positive) is called
diagnostic reasoning.
5
A belief network consists of:
◦ A
directed acyclic graph with nodes labeled with random variables
◦ a domain for each random variable
◦ a set of conditional probability tables for each variable
◦ given its parents (including prior probabilities for nodes with
no parents).
A belief network is a graph: the nodes are random
variables; there is an arc from the parents of each
node into that node.
◦ A belief network is automatically acyclic by construction.
◦ A belief network is a directed acyclic graph (DAG) where nodes
are random variables.
◦ The parents of a node n are those variables on which n directly
depends.
◦ A belief network is a graphical representation of dependence
and independence:
A variable is independent of its non-descendants given its parents.
6
Whether l1 is lit (L1_lit) depends only on the
status of the light (L1_st) and whether there is
power in wire w0. Thus, L1_lit is independent
of the other variables given L1_st and W0.
In a belief network, W0 and L1_st are parents
of L1_lit.
Similarly, W0 depends
only on whether there
is power in w1, whether
there is power in w2,
the position of switch
s2 (S2_pos), and the
status of switch s2 (S2_st).
7
To represent a domain in a belief network, you
need to consider:
What are the relevant variables?
What will you observe?
What would you like to find out (query)?
What other features make the model simpler?
What values should these variables take?
What is the relationship between them? This should
be expressed in terms of local influence.
How does the value of each variable depend on its
parents? This is expressed in terms of the
conditional probabilities.
8
The power network can be used in a number
of ways:
◦ Conditioning on the status of the switches and
circuit
◦ breakers, whether there is outside power and the
position of the switches, you can simulate the
lighting.
◦ Given values for the switches, the outside power,
and whether the lights are lit, you can determine the
posterior probability that each switch or circuit
breaker is ok or not.
◦ Given some switch positions and some outputs and
some intermediate values, you can determine the
probability of any other variable in the network.
9
A Bayes network is a form of probabilistic
graphical model. Specifically, a Bayes network
is a directed acyclic graph of nodes
representing variables and arcs representing
dependence relations among the variables.
A representation of the joint distribution over all the
variables represented by nodes in the graph. Let the
variables be X(1), ..., X(n).
Let parents(A) be the parents of the node A.
Then the joint distribution for X(1) through X(n) is
represented as the product of the probability
distributions P(Xi | Parents(Xi)) for i = 1 to n:
n
P( X 1 x1 ,..., X n xn ) P(X i xi | Parents( X i))
i 1
If X has no parents, its probability distribution is
said to be unconditional, otherwise it is conditional. 10
Examples of Bayes network:
11
True Bayesians actually consider conditional
probabilities as more basic than joint
probabilities.
It is easy to define P(A|B) without reference to
the joint probability P(A,B).
Bayes’ Rule:
◦ Posterior =
𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 ∗ 𝑃𝑟𝑖𝑜𝑟
𝑀𝑎𝑟𝑔𝑖𝑛𝑎𝑙 𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑
→𝑃 𝐴𝐵 =
Back to the cancer example:
◦ P has cancer test positive) =
◦ =
0.9∗0.02
0.02∗0.9+0.98∗0.2
=
0.018
0.214
𝑃
𝐵𝐴
𝑃 𝐴
𝑃 𝐵
𝑃 𝑡𝑒𝑠𝑡 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 ℎ𝑎𝑠 𝑐𝑎𝑛𝑐𝑒𝑟)∗𝑃(ℎ𝑎𝑠 𝑐𝑎𝑛𝑐𝑒𝑟)
𝑃(𝑡𝑒𝑠𝑡 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒)
≈ 0.0841
12
Two variables are independent if:
∀𝑥, 𝑦 ∶ 𝑃 𝑥 𝑃 𝑦 = 𝑃(𝑥, 𝑦)
It means that the occurrence of one event makes it
neither more nor less probable that the other occurs.
This says that their joint distribution factors into a product two
simpler distributions
This implies: ∀𝑥, 𝑦 ∶ 𝑃 𝑥|𝑦 = 𝑃(𝑥)
We write
Independence is a simplifying modeling assumption
Empirical joint distributions: at best “close” to independent
For example:
The event of getting a 6 the first time a die is rolled and the
event of getting a 6 the second time are independent.
By contrast, the event of getting a 6 the first time a die is rolled and
the event that the sum of the numbers seen on the first and second
trials is 8 are not independent.
13
Two events are dependent if the outcome or
occurrence of the first affects the outcome or
occurrence of the second so that the probability is
changed.
Example: A card is chosen at random from a standard
deck of 52 playing cards. Without replacing it, a
second card is chosen. What is the probability that the
first card chosen is a queen and the second card
chosen is a jack?
Probabilities:
P(queen on first pick) =
4
52
P(jack on 2nd pick given queen on 1st pick) =
P(queen and jack) =
4
52
4
4
4
51
∗ 51 = 663
14
X and Y are conditionally independent given a third
event Z precisely if the occurrence or non-occurrence
of X and the occurrence or non-occurrence of Y are
independent events in their conditional probability
distribution given Z. We write:
15
Why Bayes Networks?
A
P(A)
P(B)
P(C|A,B)
P(D|E)
P(E|C)
B
C
D
E
Joint Distribution of any five variables is: 25 − 1 = 31
In Bayes network:
◦ P(A,B,C,D,E)=P(A)*P(B)*P(C|A,B)*P(D|E)*P(E|C)
◦ Parameters:
1
1
4
2
2
Total of 10
16
The Naïve network has 216 − 1 = 65535 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠
Bayes Network needs only 47 numerical
probabilities to specify the joint.
17
Are X and Y conditionally
independent given evidence
vars {Z}?
Active Triples
Inactive Triples
Yes, if X and Y “separated” by Z
Look for active paths from X to Y
No active paths = independence!
A path is active if each triple
is active:
Causal chain A B C where B is
unobserved (either direction)
Common cause A B C where B is
unobserved
Common effect (aka v-structure)
A B C where B or one of its
descendants is observed
All it takes to block a path is
a single inactive segment
18
Examples:
L
R
B
R
B
T
D
T
T’
Yes
Yes
T’
Yes
Yes
19
Overview:
Bayes network:
Graphical representation of joint distributions
Efficiently encode conditional independencies
Reduce number of parameters from exponential to linear (in
many cases)
20