P(s) - IIT Bombay

Download Report

Transcript P(s) - IIT Bombay

CS621 : Artificial Intelligence
Pushpak Bhattacharyya
CSE Dept.,
IIT Bombay
Lecture 30
Uncertainty, Fuizziness
To Model Uncertainty
• Law of universe
• Heisenberg’s uncertainty principle: Position
and momentum cannot be measured
simultaneously with certainty. If one is
measured with precision, the other is
uncertain
– Δp.Δx>=h/4π (h is plank’’s constant)
Uncertainty modeling (contd.)
• Also a law of life
• Real life inferencing modeled with Default
Logic, Non Monotonic Reasoning, Modal
Logic, Belief Worlds, Fuzzy logic, Abduction
• Other models: Bayesian Inferencing, Entropy
and Information Theory
Bayesian Decision Theory
• Bayes Theorem : Given the random variables A and
B,
P( A) P( B | A)
P( A | B) 
P( A | B)
P ( A)
P ( B | A)
Posterior probability
Prior probability
Likelihood
P( B)
Bayes Theorem Derivation
P( A  B)  P( B  A)
Commutativity of “intersection”
P( A) P( B | A)  P( B) P( A | B)
P( A) P( B | A)
 P( A | B) 
P( B)
To understand when and why to apply
Bayes Theorem
An example: it is known that in a population, 1 in 50000
has meningitis and 1 in 20 has stiff neck. It is also
observed that 50% of the meningitis patients have stiff
neck.
A doctor observes that a patient has stiff neck. What
is the probability that the patient has meningitis?
(Mitchel, Machine Learning, 1997)
Ans: We need to find
P(m|s): probability of meningitis given
the stiff neck
Apply Bayes Rule (why?)
=
P(m|s)
[P(m). P(s|m)]/P(s)
P(m)= prior probability of meningitis
P(s|m)=likelihod of stiff neck given meningitis
P(s)=Probability of stiff neck
Probabilities
P(m)=
1
50000
Prior
P(s)=
1
20
P(s|m)= 0.5
Likelihood
1
* 0.5
P(m) P( s | m)  50000
P(m | s ) 
1
P( s )
20
P(m | s)«P(~ m | s)
1

5000
posterior
Hence meningitis is not likely
Some Issues
• p(m|s) could have been found as
#( m  s )
#s
Questions:
– Which is more reliable to compute, p(s|m) or p(m|s)?
– Which evidence is more sparse , p(s|m) or p(m|s)?
– Test of significance : The counts are always on a sample of
population. Which probability count has sufficient statistics?
Fuzziness
Fuzzy Logic
• Models Human Reasoning
• Works with imprecise statements such as:
In a process control situation, “If the
temperature is moderate and the pressure is
high, then turn the knob slightly right”
• The rules have “Linguistic Variables”, typically
adjectives qualified by adverbs (adverbs are
hedges).
Underlying Theory: Theory of Fuzzy
Sets
• Intimate connection between logic and set theory.
• Given any set ‘S’ and an element ‘e’, there is a very
natural predicate, μs(e) called as the belongingness
predicate.
• The predicate is such that,
μs(e) = 1,
iff e ∈ S
= 0,
otherwise
• For example, S = {1, 2, 3, 4}, μs(1) = 1 and μs(5) = 0
• A predicate P(x) also defines a set naturally.
S = {x | P(x) is true}
For example, even(x) defines S = {x | x is even}
Fuzzy Set Theory (contd.)
• Fuzzy set theory starts by questioning the fundamental
assumptions of set theory viz., the belongingness predicate,
μ, value is 0 or 1.
• Instead in Fuzzy theory it is assumed that,
μs(e) = [0, 1]
• Fuzzy set theory is a generalization of classical set theory
also called Crisp Set Theory.
• In real life belongingness is a fuzzy concept.
Example: Let, T = set of “tall” people
μT (Ram) = 1.0
μT (Shyam) = 0.2
Shyam belongs to T with degree 0.2.
Linguistic Variables
• Fuzzy sets are named by
Linguistic Variables
(typically adjectives). μtall(h)
• Underlying the LV is a
1
numerical quantity
E.g. For ‘tall’ (LV), ‘height’
is numerical quantity.
• Profile of a LV is the plot
shown in the figure
0
shown alongside.
0.4
4.5
1
2
3
4
height h
5
6
Example Profiles
μpoor(w)
μrich(w)
wealth w
wealth w
Example Profiles
μA (x)
μA (x)
x
Profile representing
moderate (e.g. moderately rich)
x
Profile representing
extreme
Concept of Hedge
• Hedge is an intensifier
• Example:
LV = tall, LV1 = very tall,
LV2 = somewhat tall
• ‘very’ operation:
μvery tall(x) = μ2tall(x)
• ‘somewhat’ operation:
μsomewhat tall(x) = √(μtall(x))
somewhat tall
tall
1
μtall(h)
0
very tall
h
Representation of Fuzzy sets
Let U = {x1,x2,…..,xn}
|U| = n
The various sets composed of elements from U are presented
as points on and inside the n-dimensional hypercube. The crisp
μA(x1)=0.3
sets are the corners of the hypercube.
μA(x2)=0.4
(0,1)
U={x1,x2}
(1,1)
x2
(x1,x2)
A(0.3,0.4)
x2
(1,0)
(0,0)
Φ
x1
x1
A fuzzy set A is represented by a point in the n-dimensional
space as the point {μA(x1), μA(x2),……μA(xn)}
Degree of fuzziness
The centre of the hypercube is the “most
fuzzy” set. Fuzziness decreases as one nears
the corners
Measure of fuzziness
Called the entropy of a fuzzy set
Fuzzy set
Farthest corner
E ( S )  d ( S , nearest ) / d ( S , farthest)
Entropy
Nearest corner
(0,1)
(1,1)
x2
(0.5,0.5)
A
d(A, nearest)
(0,0)
(1,0)
x1
d(A, farthest)
Definition
Distance between two fuzzy sets
n
d ( S1 , S 2 )   |  s1 ( xi )  s2 ( xi ) |
i 1
L1 - norm
Let C = fuzzy set represented by the centre point
d(c,nearest) = |0.5-1.0| + |0.5 – 0.0|
=1
= d(C,farthest)
=> E(C) = 1
Definition
Cardinality of a fuzzy set
n
m( s )   s ( xi ) [generalization of cardinality of
i 1
classical sets]
Union, Intersection, complementation, subset hood
 s s ( x)  max[  s ( x),  s ( x)]x U
1
2
1
2
 s s ( x)  min[  s ( x),  s ( x)]x U
1
2
 s ( x)  1   s ( x)
c
1
2
Note on definition by extension and intension
S1 = {xi|xi mod 2 = 0 } – Intension
S2 = {0,2,4,6,8,10,………..} – extension
How to define subset hood?