Transcript Document

AI – CS289
Uncertainty Management
Introduction to Uncertainty Management
25th September 2006
Dr Bogdan L. Vrusias
[email protected]
AI – CS289
Uncertainty Management
Contents
•
•
•
•
•
Defining Uncertainty
Basic probability theory
Bayesian reasoning
Bias of the Bayesian method
Certainty factors theory and evidential reasoning
25th September 2006
Bogdan L. Vrusias © 2006
2
AI – CS289
Uncertainty Management
Defining Uncertainty
• Information can be incomplete, inconsistent, uncertain, or all three. In
other words, information is often unsuitable for solving a problem.
• Uncertainty is defined as the lack of the exact knowledge that would
enable us to reach a perfectly reliable conclusion. Classical logic
permits only exact reasoning. It assumes that perfect knowledge
always exists and the law of the excluded middle can always be
applied:
IF
THEN
25th September 2006
A is true
A is not false
IF
A is false
THEN A is not true
Bogdan L. Vrusias © 2006
3
AI – CS289
Uncertainty Management
Sources of Uncertain Knowledge
• Weak implications. Domain experts and knowledge engineers have
the painful task of establishing concrete correlations between IF
(condition) and THEN (action) parts of the rules.
• Therefore, expert systems need to have the ability to handle vague
associations, for example by accepting the degree of correlations as
numerical certainty factors.
25th September 2006
Bogdan L. Vrusias © 2006
4
AI – CS289
Uncertainty Management
Sources of Uncertain Knowledge
• Imprecise language. Our natural language is ambiguous and
imprecise. We describe facts with such terms as often and sometimes,
frequently and hardly ever.
• As a result, it can be difficult to express knowledge in the precise IFTHEN form of production rules. However, if the meaning of the facts
is quantified, it can be used in expert systems.
• In 1944, Ray Simpson asked 355 high school and college students to
place 20 terms like often on a scale between 1 and 100. In 1968,
Milton Hakel repeated this experiment.
25th September 2006
Bogdan L. Vrusias © 2006
5
AI – CS289
Uncertainty Management
Sources of Uncertain Knowledge
Imprecise
language
25th September 2006
Ray Simpson (1944)
Term
Mean value
Always
99
Very often
88
Usually
85
Often
78
Generally
78
Frequently
73
Rather often
65
About as often as not
50
Now and then
20
Sometimes
20
Occasionally
20
Once in a while
15
Not often
13
Usually not
10
Seldom
10
Hardly ever
7
Very seldom
6
Rarely
5
Almost never
3
Never
0
Bogdan L. Vrusias © 2006
Milton Hakel (1968)
Term
Mean value
Always
100
Very often
87
Usually
79
Often
74
Rather often
74
Frequently
72
Generally
72
About as often as not
50
Now and then
34
Sometimes
29
Occasionally
28
Once in a while
22
Not often
16
Usually not
16
Seldom
9
Hardly ever
8
Very seldom
7
Rarely
5
Almost never
2
Never
0
6
AI – CS289
Uncertainty Management
Sources of Uncertain Knowledge
• Unknown data. When the data is incomplete or missing, the only
solution is to accept the value “unknown” and proceed to an
approximate reasoning with this value.
• Combining the views of different experts. Large expert systems
usually combine the knowledge and expertise of a number of experts.
Unfortunately, experts often have contradictory opinions and produce
conflicting rules. To resolve the conflict, the knowledge engineer has
to attach a weight to each expert and then calculate the composite
conclusion. But no systematic method exists to obtain these weights.
25th September 2006
Bogdan L. Vrusias © 2006
7
AI – CS289
Uncertainty Management
Basic Probability Theory
• The concept of probability has a long history that goes back thousands
of years when words like “probably”, “likely”, “maybe”, “perhaps”
and “possibly” were introduced into spoken languages. However, the
mathematical theory of probability was formulated only in the 17th
century.
• The probability of an event is the proportion of cases in which the
event occurs. Probability can also be defined as a scientific measure
of chance.
25th September 2006
Bogdan L. Vrusias © 2006
8
AI – CS289
Uncertainty Management
Basic Probability Theory
• Probability can be expressed mathematically as a numerical index with
a range between zero (an absolute impossibility) to unity (an absolute
certainty).
• Most events have a probability index strictly between 0 and 1, which
means that each event has at least two possible outcomes: favourable
outcome or success, and unfavourable outcome or failure.
25th September 2006
Psuccess  
the number of successes
the number of possible outcomes
P failure  
the number of failures
the number of possible outcomes
Bogdan L. Vrusias © 2006
9
AI – CS289
Uncertainty Management
Basic Probability Theory
• If s is the number of times success can occur, and f is the number of
times failure can occur, then
• and
Psuccess   p 
s
s f
P failure   q 
f
s f
p+q=1
• If we throw a coin, the probability of getting a head will be equal to the
probability of getting a tail. In a single throw, s = f = 1, and therefore
the probability of getting a head (or a tail) is 0.5.
25th September 2006
Bogdan L. Vrusias © 2006
10
AI – CS289
Uncertainty Management
Conditional Probability
• Let A be an event in the world and B be another event. Suppose that
events A and B are not mutually exclusive, but occur conditionally on
the occurrence of the other. The probability that event A will occur if
event B occurs is called the conditional probability. Conditional
probability is denoted mathematically as p(A|B) in which the vertical
bar represents "given" and the complete probability expression is
interpreted as
– “Conditional probability of event A occurring given that event B has
occurred”.
pA B  
25th September 2006
the number of times A and B can occur
the number of times B can occur
Bogdan L. Vrusias © 2006
11
AI – CS289
Uncertainty Management
Conditional Probability
• The number of times A and B can occur, or the probability that both A
and B will occur, is called the joint probability of A and B. It is
represented mathematically as p(AB). The number of ways B can
occur is the probability of B, p(B), and thus
pA B  
p A  B 
p B 
• Similarly, the conditional probability of event B occurring given that
event A has occurred equals
pB A 
25th September 2006
p  B  A
p  A
Bogdan L. Vrusias © 2006
12
AI – CS289
Uncertainty Management
Conditional Probability
Hence
pB  A  pB A p A
and
p A  B  pB A p A
Substituting the last equation into the equation
pA B  
p A  B 
p B 
yields the Bayesian rule:
25th September 2006
Bogdan L. Vrusias © 2006
13
AI – CS289
Uncertainty Management
Bayesian Rule
p A B  
pB A p A
p B 
where:
p(A|B) is the conditional probability that event A occurs given that event B
has occurred;
p(B|A) is the conditional probability of event B occurring given that event
A has occurred;
p(A) is the probability of event A occurring;
p(B) is the probability of event B occurring.
25th September 2006
Bogdan L. Vrusias © 2006
14
AI – CS289
Uncertainty Management
The Joint Probability
n
n
i 1
i 1
 p A  Bi    pA Bi  pBi 
B4
B3
25th September 2006
B1
A
B2
Bogdan L. Vrusias © 2006
15
AI – CS289
Uncertainty Management
The Joint Probability
• If the occurrence of event A depends on only two mutually exclusive
events, B and NOT B, we obtain:
p(A) = p(AB)  p(B) + p(AB)  p(B)
where  is the logical function NOT.
• Similarly,
p(B) = p(BA)  p(A) + p(BA)  p(A)
• Substituting this equation into the Bayesian rule yields:
pA B  
25th September 2006
p B A p A
p B A p A  p B A pA
Bogdan L. Vrusias © 2006
16
AI – CS289
Uncertainty Management
Bayesian Reasoning
• Suppose all rules in the knowledge base are represented in the
following form:
IF
THEN
E
H
is true
is true {with probability p}
• This rule implies that if event E occurs, then the probability that event
H will occur is p.
• In expert systems, H usually represents a hypothesis and E denotes
evidence to support this hypothesis.
25th September 2006
Bogdan L. Vrusias © 2006
17
AI – CS289
Uncertainty Management
Bayesian Reasoning
The Bayesian rule expressed in terms of hypotheses and evidence looks
like this:
p H E  
p E H  pH 
p E H  pH   p E H  pH 
where:
p(H) is the prior probability of hypothesis H being true;
p(E|H) is the probability that hypothesis H being true will result in
evidence E;
p(H) is the prior probability of hypothesis H being false;
p(E|H) is the probability of finding evidence E even when hypothesis H
is false.
25th September 2006
Bogdan L. Vrusias © 2006
18
AI – CS289
Uncertainty Management
Bayesian Reasoning
• In expert systems, the probabilities required to solve a problem are
provided by experts.
• An expert determines the prior probabilities for possible hypotheses
p(H) and p(H), and also the conditional probabilities for observing
evidence E if hypothesis H is true, p(E|H), and if hypothesis H is false,
p(E|H).
• Users provide information about the evidence observed and the expert
system computes p(H|E) for hypothesis H in light of the user-supplied
evidence E. Probability p(H|E) is called the posterior probability of
hypothesis H upon observing evidence E.
25th September 2006
Bogdan L. Vrusias © 2006
19
AI – CS289
Uncertainty Management
Bayesian Reasoning
• We can take into account both multiple hypotheses H1, H2,..., Hm and
multiple evidences E1, E2,..., En. The hypotheses as well as the
evidences must be mutually exclusive and exhaustive.
• Single evidence E and multiple hypotheses follow:
pH i E  
pE H i  pH i 
m
 pE H k  pH k 
k 1
• Multiple evidences and multiple hypotheses follow:
pH i E1 E2 . . . En  
pE1 E2 . . . En H i  pH i 
m
 pE1 E2 . . . En H k  pH k 
k 1
25th September 2006
Bogdan L. Vrusias © 2006
20
AI – CS289
Uncertainty Management
Bayesian Reasoning
• This requires to obtain the conditional probabilities of all possible
combinations of evidences for all hypotheses, and thus places an
enormous burden on the expert.
• Therefore, in expert systems, conditional independence among
different evidences assumed. Thus, instead of the unworkable
equation, we attain:
pH i E1 E2 . . . En  
pE1 H i  pE2 H i  . . .  pEn H i  pH i 
m
 pE1 H k  pE2 H k  . . .  pEn H k  pH k 
k 1
25th September 2006
Bogdan L. Vrusias © 2006
21
AI – CS289
Uncertainty Management
Ranking Potentially True Hypotheses
• Let us consider a simple example:
– Suppose an expert, given three conditionally independent
evidences E1, E2,..., En, creates three mutually exclusive and
exhaustive hypotheses H1, H2,..., Hm, and provides prior
probabilities for these hypotheses – p(H1), p(H2) and p(H3),
respectively. The expert also determines the conditional
probabilities of observing each evidence for all possible
hypotheses.
25th September 2006
Bogdan L. Vrusias © 2006
22
AI – CS289
Uncertainty Management
The Prior and Conditional Probabilities
Probability
Hypothesis
i =1
i =2
i =3
pH i 
0.40
0.35
0.25
p E1 H i 
0.3
0.8
0.5
p E 2 H i 
0.9
0.0
0.7
p E3 H i 
0.6
0.7
0.9
Assume that we first observe evidence E3. The expert system computes
the posterior probabilities for all hypotheses as:
25th September 2006
Bogdan L. Vrusias © 2006
23
AI – CS289
Uncertainty Management
The Prior and Conditional Probabilities
pH i E3  
pE3 H i  pH i 
3
 pE3 H k  pH k 
,
i = 1, 2, 3
k 1
thus
pH1 E3  
0.6  0.40
 0.34
0.6  0.40 + 0.7  0.35 + 0.9  0.25
pH 2 E3  
0.7  0.35
 0.34
0.6  0.40 + 0.7  0.35 + 0.9  0.25
p H 3 E3  
0.9  0.25
 0.32
0.6  0.40 + 0.7  0.35 + 0.9  0.25
After evidence E3 is observed, belief in hypothesis H2 increases and
becomes equal to belief in hypothesis H1. Belief in hypothesis H3 also
increases and even nearly reaches beliefs in hypotheses H1 and H2.
25th September 2006
Bogdan L. Vrusias © 2006
24
AI – CS289
Uncertainty Management
The Prior and Conditional Probabilities
Suppose now that we observe evidence E1. The posterior probabilities are
calculated as
pH i E1E3  
3
pE1 H i  pE3 H i  pH i 
 pE1 H k  pE3 H k  pH k 
,
i = 1, 2, 3
k 1
hence
p H1 E1E3  
0.3  0.6  0.40
 0.19
0.3  0.6  0.40 + 0.8  0.7  0.35 + 0.5  0.9  0.25
pH 2 E1E3  
0.8  0.7  0.35
 0.52
0.3  0.6  0.40 + 0.8  0.7  0.35 + 0.5  0.9  0.25
pH 3 E1E3  
0.5  0.9  0.25
 0.29
0.3  0.6  0.40 + 0.8  0.7  0.35 + 0.5  0.9  0.25
Hypothesis H2 has now become the most likely one.
25th September 2006
Bogdan L. Vrusias © 2006
25
AI – CS289
Uncertainty Management
The Prior and Conditional Probabilities
After observing evidence E2, the final posterior probabilities for all hypotheses are
calculated:
pH i E1E2 E3  
3
hence
pE1 H i  pE2 H i  pE3 H i  pH i 
 pE1 H k  pE2 H k  pE3 H k  pH k 
,
i = 1, 2, 3
k 1
p H1 E1E2 E3  
0.3  0.9  0.6  0.40
 0.45
0.3  0.9  0.6  0.40 + 0.8  0.0  0.7  0.35 + 0.5  0.7  0.9  0.25
pH 2 E1E2 E3  
0.8  0.0  0.7  0.35
0
0.3  0.9  0.6  0.40 + 0.8  0.0  0.7  0.35 + 0.5  0.7  0.9  0.25
pH 3 E1E2 E3  
0.5  0.7  0.9  0.25
 0.55
0.3  0.9  0.6  0.40 + 0.8  0.0  0.7  0.35 + 0.5  0.7  0.9  0.25
Although the initial ranking was H1, H2 and H3, only hypotheses H1 and H3 remain
under consideration after all evidences (E1, E2 and E3) were observed.
25th September 2006
Bogdan L. Vrusias © 2006
26
AI – CS289
Uncertainty Management
Exercise
• From which bowl is the cookie?
To illustrate, suppose there are two bowls full of cookies. Bowl #1 has
10 chocolate chip and 30 plain cookies, while bowl #2 has 20 of each.
Our friend Fred picks a bowl at random, and then picks a cookie at
random. We may assume there is no reason to believe Fred treats one
bowl differently from another, likewise for the cookies. The cookie
turns out to be a plain one. How probable is it that Fred picked it out of
bowl #1?
(taken from wikipedia.org)
25th September 2006
Bogdan L. Vrusias © 2006
27
AI – CS289
Uncertainty Management
Solution
•
Let H1 correspond to bowl #1, and H2 to bowl #2. It is given that the bowls
are identical from Fred's point of view, thus P(H1) = P(H2), and the two must
add up to 1, so both are equal to 0.5. The datum D is the observation of a plain
cookie. From the contents of the bowls, we know that P(D | H1) = 30/40 =
0.75 and P(D | H2) = 20/40 = 0.5. Bayes' formula then yields
•
Before observing the cookie, the probability that Fred chose bowl #1 is the
prior probability, P(H1), which is 0.5. After observing the cookie, we revise
the probability to P(H1|D), which is 0.6.
25th September 2006
Bogdan L. Vrusias © 2006
28
AI – CS289
Uncertainty Management
Closing
•
•
•
•
Questions???
Remarks???
Comments!!!
Evaluation!
25th September 2006
Bogdan L. Vrusias © 2006
29