Powerpoint - itk.ilstu.edu

Download Report

Transcript Powerpoint - itk.ilstu.edu

For Friday
• Read chapter 14, sections 1-3
• No homework
• Program 2 due
Program 2
• Any questions?
• Special rules for handling belief:
– If I believe something, I believe that I believe
it.
– Need to still provide a way to indicate that two
names refer to the same thing.
Knowledge and Belief
• How are they related?
• Knowing whether something is true
• Knowing what
And Besides Logic?
• Semantic networks
• Frames
Semantic Networks
• Use graphs to represent concepts and the
relations between them.
• Simplest networks are ISA hierarchies
• Must be careful to make a type/token
distinction:
Garfield isa Cat
Cat isa Feline
Cat(Garfield)
"x (Cat (x)  Feline(x))
• Restricted shorthand for a logical
representation.
Semantic Nets/Frames
• Labeled links can represent arbitrary
relations between objects and/or concepts.
• Nodes with links can also be viewed as
frames with slots that point to other objects
and/or concepts.
First Order Representation
Rel(Alive,Animals,T)
Rel(Flies,Animals,F)
Birds  Animals
Mammals  Animals
Rel(Flies,Birds,T)
Rel(Legs,Birds,2)
Rel(Legs,Mammals,4)
Penguins  Birds
Cats  Mammals
Bats  Mammals
Rel(Flies,Penguins,F)
Rel(Legs,Bats,2)
Rel(Flies,Bats,T)
Opus  Penguins
Bill  Cats
Pat  Bats
Name(Opus,"Opus")
Name(Bill,"Bill")
Friend(Opus,Bill)
Friend(Bill,Opus)
Name(Pat,"Pat")
Inheritance
• Inheritance is a specific type of inference that allows
properties of objects to be inferred from properties of
categories to which the object belongs.
– Is Bill alive?
– Yes, since Bill is a cat, cats are mammals, mammals are animals,
and animals are alive.
• Such inference can be performed by a simple graph
traversal algorithm and implemented very efficiently.
• However, it is basically a form of logical inference
"x (Cat(x)  Mammal(x))
"x (Mammal(x)  Animal(x))
"x (Animal(x)  Alive(x))
Cat(Bill)
|- Alive(Bill)
Backward or Forward
• Can work either way
• Either can be inefficient
• Usually depends on branching factors
Semantic of Links
• Must be careful to distinguish different
types of links.
• Links between tokens and tokens are
different than links between types and types
and links between tokens and types.
Link Types
Link Type
Semantics
Example
A subset B
AB
Cats  Mammals
A member B
AB
Bill  Cats
Bill
A
R
B
R(A,B)
A
R
B
Birds Legs 2
"x, x  A 
R(x,B)
"x  y, x  A  y Birds Parent Birds
 B  R(x,y)
A
R
B
Age
12
Inheritance with Exceptions
• Information specified for a type gives the
default value for a relation, but this may be
overridden by a more specific type.
– Tweety is a bird. Does Tweety fly?
Birds fly. Yes.
– Opus is a penguin. Does Opus fly?
Penguin's don't fly. No.
Multiple Inheritance
• If hierarchy is not a tree but a directed
acyclic graph (DAG) then different
inheritance paths may result in different
defaults being inherited.
• Nixon Diamond
Nonmonotonicity
• In normal monotonic logic, adding more
sentences to a KB only entails more
conclusions.
if KB |- P then KB  {S} |- P
• Inheritance with exceptions is not
monotonic (it is nonmonotonic)
–
–
–
–
Bird(Opus)
Fly(Opus)? yes
Penguin(Opus)
Fly(Opus)? no
• Nonmonotonic logics attempt to formalize
default reasoning by allow default rules of
the form:
– If P and concluding Q is consistent, then
conclude Q.
– If Bird(X) then if consistent Fly(x)
Defaults with Negation as Failure
• Prolog negation as failure can be used to
implement default inference.
fly(X) : bird(X), not(ab(X)).
ab(X) : penguin(X).
ab(X) : ostrich(X).
bird(opus).
? fly(opus).
Yes
penguin(opus).
? fly(opus).
No
Uncertainty
• Everyday reasoning and decision making is
based on uncertain evidence and inferences.
• Classical logic only allows conclusions to
be strictly true or strictly false
• We need to account for this uncertainty and
the need to weigh and combine conflicting
evidence.
Coping with Uncertainty
• Straightforward application of probability theory
is impractical since the large number of
conditional probabilities required are rarely, if
ever, available.
• Therefore, early expert systems employed fairly
ad hoc methods for reasoning under uncertainty
and for combining evidence.
• Recently, methods more rigorously founded in
probability theory that attempt to decrease the
amount of conditional probabilities required have
flourished.
Probability
• Probabilities are real numbers 01 representing
the a priori likelihood that a proposition is true.
P(Cold) = 0.1
P(¬Cold) = 0.9
• Probabilities can also be assigned to all values
of a random variable (continuous or discrete)
with a specific range of values (domain), e.g.
low, normal, high.
P(temperature=normal)=0.99
P(temperature=98.6) = 0.99
Probability Vectors
• The vector form gives probabilities for all
values of a discrete variable, or its
probability distribution.
P(temperature) = <0.002, 0.99, 0.008>
• This indicates the prior probability, in
which no information is known.
Conditional Probability
• Conditional probability specifies the
probability given that the values of some
other random variables are known.
P(Sneeze | Cold) = 0.8
P(Cold | Sneeze) = 0.6
• The probability of a sneeze given a cold is
80%.
• The probability of a cold given a sneeze is
60%.
Cond. Probability cont.
• Assumes that the given information is all that is
known, so all known information must be
given.
P(Sneeze | Cold  Allergy) = 0.95
• Also allows for conditional distributions
P(X |Y) gives 2D array of values for all P(X=xi|Y=yj)
• Defined as
P (A | B) = P (A  B)
P(B)
Axioms of Probability Theory
• All probabilities are between 0 and 1.
0  P(A)  1
• Necessarily true propositions have probability 1,
necessarily false have probability 0.
P(true) = 1
P(false) = 0
• The probability of a disjunction is given by
P(A  B) = P(A) + P(B) - P(A  B)
Joint Probability Distribution
• The joint probability distribution for a set of random
variables X1…Xn gives the probability of every
combination of values (an ndimensional array with vn
values if each variable has v values)
P(X1,...,Xn)
Sneeze
Cold
0.08
¬Cold
0.01
¬Sneeze
0.01
0.9
• The probability of all possible cases (assignments of values
to some subset of variables) can be calculated by summing
the appropriate subset of values from the joint distribution.
• All conditional probabilities can therefore also be
calculated
Bayes Theorem
P(H | e) = P(e | H) P(H)
P(e)
• Follows from definition of conditional
probability:
P (A | B) = P (A  B)
P(B)
Other Basic Theorems
• If events A and B are independent then:
P(A  B) = P(A)P(B)
• If events A and B are incompatible then:
P(A  B) = P(A) + P(B)
Simple Bayesian Reasoning
• If we assume there are n possible disjoint
diagnoses, d1 … dn
P(di | e) = P(e | di) P(di)
P(e)
• P(e) may not be known but the total
probability of all diagnoses must always be
1, so all must sum to 1
• Thus, we can determine the most probable
without knowing P(e).
Efficiency
• This method requires that for each disease
the probability it will cause any possible
combination of symptoms and the number
of possible symptom sets, e, is exponential
in the number of basic symptoms.
• This huge amount of data is usually not
available.
Bayesian Reasoning with Independence
(“Naïve” Bayes)
• If we assume that each piece of evidence (symptom) is
independent given the diagnosis (conditional independence),
then given evidence e as a sequence {e1,e2,…,ed} of
observations, P(e | di) is the product of the probabilities of the
observations given di.
• The conditional probability of each individual symptom for
each possible diagnosis can then be computed from a set of data
or estimated by the expert.
• However, symptoms are usually not independent and frequently
correlate, in which case the assumptions of this simple model
are violated and it is not guaranteed to give reasonable results.
Bayes Independence Example
• Imagine there are diagnoses ALLERGY, COLD,
and WELL and symptoms SNEEZE, COUGH,
and FEVER
Prob
P(d)
P(sneeze|d)
P(cough | d)
Well
0.9
0.1
0.1
Cold
0.05
0.9
0.8
Allergy
0.05
0.9
0.7
P(fever | d)
0.01
0.7
0.4
• If symptoms sneeze & cough & no fever:
P(well | e) = (0.9)(0.1)(0.1)(0.99)/P(e) = 0.0089/P(e)
P(cold | e) = (.05)(0.9)(0.8)(0.3)/P(e) = 0.01/P(e)
P(allergy | e) = (.05)(0.9)(0.7)(0.6)/P(e) = 0.019/P(e)
• Diagnosis: allergy
P(e) = .0089 + .01 + .019 = .0379
P(well | e) = .23
P(cold | e) = .26
P(allergy | e) = .50
Problems with Probabilistic Reasoning
• If no assumptions of independence are made,
then an exponential number of parameters is
needed for sound probabilistic reasoning.
• There is almost never enough data or
patience to reliably estimate so many very
specific parameters.
• If a blanket assumption of conditional
independence is made, efficient probabilistic
reasoning is possible, but such a strong
assumption is rarely warranted.
Practical Naïve Bayes
• We’re going to assume independence, so
what numbers do we need?
• Where do the numbers come from?