Bayesian models of inductive learning

Transcript Bayesian models of inductive learning

Part II: Graphical models
Challenges of probabilistic models
• Specifying well-defined probabilistic models
with many variables is hard (for modelers)
• Representing probability distributions over
those variables is hard (for computers/learners)
• Computing quantities using those distributions
is hard (for computers/learners)
Representing structured distributions
Four random variables:
X1
X2
X3
X4
coin toss produces heads
pencil levitates
friend has psychic powers
friend has two-headed coin
Domain
{0,1}
{0,1}
{0,1}
{0,1}
Joint distribution
• Requires 15 numbers to specify
probability of all values x1,x2,x3,x4
– N binary variables, 2N-1 numbers
• Similar cost when computing
conditional probabilities
P(x1  1, x 2, x 3 , x 4 )
P(x 2, x 3 , x 4 | x1  1) 
P(x1  1)
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
How can we use fewer numbers?
Four random variables:
X1
X2
X3
X4
coin toss produces heads
coin toss produces heads
coin toss produces heads
coin toss produces heads
Domain
{0,1}
{0,1}
{0,1}
{0,1}
Statistical independence
• Two random variables X1 and X2 are independent if
P(x1|x2) = P(x1)
– e.g. coinflips: P(x1=H|x2=H) = P(x1=H) = 0.5
• Independence makes it easier to represent and
work with probability distributions
• We can exploit the product rule:
P(x1, x 2 , x 3, x 4 )  P(x1 | x 2, x 3 , x 4 )P(x 2 | x 3, x 4 )P(x 3 | x 4 )P(x 4 )
If x1, x2, x3, and x4 are all independent…
P(x1, x 2 , x 3, x 4 )  P(x1)P(x 2 )P(x 3 )P(x 4 )
Expressing independence
• Statistical independence is the key to efficient
probabilistic representation and computation
• This has led to the development of languages for
indicating dependencies among variables
• Some of the most popular languages are based on
“graphical models”
Part II: Graphical models
• Introduction to graphical models
– representation and inference
• Causal graphical models
– causality
– learning about causal relationships
• Graphical models and cognitive science
– uses of graphical models
– an example: causal induction
Part II: Graphical models
• Introduction to graphical models
– representation and inference
• Causal graphical models
– causality
– learning about causal relationships
• Graphical models and cognitive science
– uses of graphical models
– an example: causal induction
Graphical models
• Express the probabilistic dependency
structure among a set of variables (Pearl, 1988)
• Consist of
– a set of nodes, corresponding to variables
– a set of edges, indicating dependency
– a set of functions defined on the graph that
specify a probability distribution
Quick Time™ a nd a
TIFF ( Un co mpr es sed ) d eco mp res so r
ar e n eed ed to s ee thi s pi ctu re.
Undirected graphical models
• Consist of
X1
X3
X4
– a set of nodes
X2
X5
– a set of edges
– a potential for each clique, multiplied together to
yield the distribution over variables
• Examples
– statistical physics: Ising model, spinglasses
– early neural networks (e.g. Boltzmann machines)
Directed graphical models
• Consist of
X1
X3
X4
– a set of nodes
X2
X5
– a set of edges
– a conditional probability distribution for each
node, conditioned on its parents, multiplied
together to yield the distribution over variables
• Constrained to directed acyclic graphs (DAGs)
• Called Bayesian networks or Bayes nets
Bayesian networks and Bayes
• Two different problems
– Bayesian statistics is a method of inference
– Bayesian networks are a form of representation
• There is no necessary connection
– many users of Bayesian networks rely upon
frequentist statistical methods
– many Bayesian inferences cannot be easily
represented using Bayesian networks
Properties of Bayesian networks
• Efficient representation and inference
– exploiting dependency structure makes it easier
to represent and compute with probabilities
• Explaining away
– pattern of probabilistic reasoning characteristic of
Bayesian networks, especially early use in AI
Properties of Bayesian networks
• Efficient representation and inference
– exploiting dependency structure makes it easier
to represent and compute with probabilities
• Explaining away
– pattern of probabilistic reasoning characteristic of
Bayesian networks, especially early use in AI
Efficient representation and inference
Four random variables:
X1
X2
X3
X4
coin toss produces heads
pencil levitates
friend has psychic powers
friend has two-headed coin
P(x4) X4
P(x1|x3, x4)
X3 P(x3)
X1
X2 P(x2|x3)
The Markov assumption
Every node is conditionally independent of its nondescendants, given its parents
P(xi | xi1,...,xk )  P(xi | Pa(X i ))
where Pa(Xi) is the set of parents of Xi
k
P(x1,...,x k )   P(x i | Pa(X i ))
i1
(via the product rule)
Efficient representation and inference
Four random variables:
X1
X2
X3
X4
coin toss produces heads
pencil levitates
friend has psychic powers
friend has two-headed coin
1 P(x4) X4
4
total = 7 (vs 15)
P(x1|x3, x4)
X3 P(x3) 1
X1
X2 P(x2|x3)
2
P(x1, x2, x3, x4) = P(x1|x3, x4)P(x2|x3)P(x3)P(x4)
Reading a Bayesian network
• The structure of a Bayes net can be read as the
generative process behind a distribution
• Gives the joint probability distribution over
variables obtained by sampling each variable
conditioned on its parents
Reading a Bayesian network
Four random variables:
X1
X2
X3
X4
coin toss produces heads
pencil levitates
friend has psychic powers
friend has two-headed coin
P(x4) X4
P(x1|x3, x4)
X3 P(x3)
X1
X2 P(x2|x3)
P(x1, x2, x3, x4) = P(x1|x3, x4)P(x2|x3)P(x3)P(x4)
Reading a Bayesian network
• The structure of a Bayes net can be read as the
generative process behind a distribution
• Gives the joint probability distribution over
variables obtained by sampling each variable
conditioned on its parents
• Simple rules for determining whether two variables
are dependent or independent
• Independence makes inference more efficient
Computing with Bayes nets
P(x1  1, x 2, x 3 , x 4 )
P(x 2, x 3 , x 4 | x1  1) 
P(x1  1)
P(x4) X4
P(x1|x3, x4)
X3 P(x3)
X1
X2 P(x2|x3)
P(x1, x2, x3, x4) = P(x1|x3, x4)P(x2|x3)P(x3)P(x4)

Computing with Bayes nets
P(x1 1) 
P(x
1
x2 ,x3 ,x4
P(x4) X4
P(x1|x3, x4)
1, x2, x3, x4 )
sum over 8 values
X3 P(x3)
X1
X2 P(x2|x3)
P(x1, x2, x3, x4) = P(x1|x3, x4)P(x2|x3)P(x3)P(x4)
Computing with Bayes nets
P(x1 1) 
P(x
1
| x3, x4 )P(x2 | x3 )P(x3 )P(x4 )
x2 ,x3 ,x4
P(x4) X4
P(x1|x3, x4)
X3 P(x3)
X1
X2 P(x2|x3)
P(x1, x2, x3, x4) = P(x1|x3, x4)P(x2|x3)P(x3)P(x4)
Computing with Bayes nets
P(x1 1) 
P(x
1
x3 ,x4
sum over 4 values
P(x4) X4
P(x1|x3, x4)
| x3, x4 )P(x3 )P(x4 )
X3 P(x3)
X1
X2 P(x2|x3)
P(x1, x2, x3, x4) = P(x1|x3, x4)P(x2|x3)P(x3)P(x4)
Computing with Bayes nets
• Inference algorithms for Bayesian networks exploit
dependency structure
• Message-passing algorithms
– “belief propagation” passes simple messages between
nodes, exact for tree-structured networks
• More general inference algorithms
– exact: “junction-tree”
– approximate: Monte Carlo schemes (see Part IV)
Properties of Bayesian networks
• Efficient representation and inference
– exploiting dependency structure makes it easier
to represent and compute with probabilities
• Explaining away
– pattern of probabilistic reasoning characteristic of
Bayesian networks, especially early use in AI
Explaining away
Rain
Sprinkler
Grass Wet
P( R, S ,W )  P( R) P(S ) P(W | S , R)
Assume grass will be wet if and only if it rained last
night, or if the sprinklers were left on:
P(W  w | S , R)  1 if S  s or R  r
 0 if R  r and S  s.
Explaining away
Rain
Sprinkler
Grass Wet
P( R, S ,W )  P( R) P(S ) P(W | S , R)
P(W  w | S , R)  1 if S  s or R  r
 0 if R  r and S  s.
Compute probability it
rained last night, given
that the grass is wet:
P(r | w) 
P( w | r ) P(r )
P( w)
Explaining away
Rain
Sprinkler
Grass Wet
P( R, S ,W )  P( R) P(S ) P(W | S , R)
P(W  w | S , R)  1 if S  s or R  r
 0 if R  r and S  s.
Compute probability it
rained last night, given
that the grass is wet:
P (r | w) 
P( w | r ) P(r )
 P(w | r , s) P(r , s)
r , s 
Explaining away
Rain
Sprinkler
Grass Wet
P( R, S ,W )  P( R) P(S ) P(W | S , R)
P(W  w | S , R)  1 if S  s or R  r
 0 if R  r and S  s.
Compute probability it
rained last night, given
that the grass is wet:
P(r | w) 
P(r )
P(r , s)  P(r , s)  P(r , s)
Explaining away
Rain
Sprinkler
Grass Wet
P( R, S ,W )  P( R) P(S ) P(W | S , R)
P(W  w | S , R)  1 if S  s or R  r
 0 if R  r and S  s.
Compute probability it
rained last night, given
that the grass is wet:
P(r | w) 
P(r )
P(r )  P(r , s)
Explaining away
Rain
Sprinkler
Grass Wet
P( R, S ,W )  P( R) P(S ) P(W | S , R)
P(W  w | S , R)  1 if S  s or R  r
 0 if R  r and S  s.
Compute probability it
rained last night, given
that the grass is wet:
P(r | w) 
P(r )
P(r )  P(r ) P( s)
Between 1 and P(s)
 P(r )
Explaining away
Rain
Sprinkler
Grass Wet
P( R, S ,W )  P( R) P(S ) P(W | S , R)
P(W  w | S , R)  1 if S  s or R  r
 0 if R  r and S  s.
Compute probability it
rained last night, given
that the grass is wet and
sprinklers were left on:
P(r | w, s) 
P( w | r , s ) P(r | s )
P( w | s )
Both terms = 1
Explaining away
Rain
Sprinkler
Grass Wet
P( R, S ,W )  P( R) P(S ) P(W | S , R)
P(W  w | S , R)  1 if S  s or R  r
 0 if R  r and S  s.
Compute probability it
rained last night, given
that the grass is wet and
sprinklers were left on:
P(r | w, s)  P(r | s)  P(r )
Explaining away
Rain
Sprinkler
Grass Wet
P( R, S ,W )  P( R) P(S ) P(W | S , R)
P(W  w | S , R)  1 if S  s or R  r
 0 if R  r and S  s.
P(r )
P(r )  P(r ) P( s)
P(r | w, s)  P(r | s)  P(r )
P(r | w) 
 P(r )
“Discounting” to
prior probability.
Contrast w/ production system
Rain
Sprinkler
Grass Wet
• Formulate IF-THEN rules:
– IF Rain THEN Wet
– IF Wet THEN Rain
IF Wet AND NOT Sprinkler
THEN Rain
• Rules do not distinguish directions of inference
• Requires combinatorial explosion of rules
Contrast w/ spreading activation
Rain
Sprinkler
Grass Wet
• Excitatory links: Rain
Wet, Sprinkler
Wet
• Observing rain, Wet becomes more active.
• Observing grass wet, Rain and Sprinkler become
more active
• Observing grass wet and sprinkler, Rain cannot
become less active. No explaining away!
Contrast w/ spreading activation
Rain
Sprinkler
Grass Wet
• Excitatory links: Rain
• Inhibitory link: Rain
Wet, Sprinkler
Sprinkler
Wet
• Observing grass wet, Rain and Sprinkler become
more active
• Observing grass wet and sprinkler, Rain becomes
less active: explaining away
Contrast w/ spreading activation
Rain
Burst pipe
Sprinkler
Grass Wet
• Each new variable requires more inhibitory connections
• Not modular
– whether a connection exists depends on what others exist
– big holism problem
– combinatorial explosion
Contrast w/ spreading activation
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
(McClelland & Rumelhart, 1981)
Graphical models
• Capture dependency structure in distributions
• Provide an efficient means of representing
and reasoning with probabilities
• Allow kinds of inference that are problematic
for other representations: explaining away
– hard to capture in a production system
– more natural than with spreading activation
Part II: Graphical models
• Introduction to graphical models
– representation and inference
• Causal graphical models
– causality
– learning about causal relationships
• Graphical models and cognitive science
– uses of graphical models
– an example: causal induction
Causal graphical models
• Graphical models represent statistical
dependencies among variables (ie. correlations)
– can answer questions about observations
• Causal graphical models represent causal
dependencies among variables
(Pearl, 2000)
– express underlying causal structure
– can answer questions about both observations and
interventions (actions upon a variable)
Qu ickT ime™ and a
TIF F (U ncom pres sed) deco mpre ssor
are nee ded t o see this pict ure.
Bayesian networks
Nodes: variables
Links: dependency
Four random variables:
X1
X2
X3
X4
Each node has a conditional
probability distribution
coin toss produces heads
pencil levitates
friend has psychic powers
friend has two-headed coin
Data: observations of x1, ..., x4
P(x4) X4
P(x1|x3, x4)
X3 P(x3)
X1
X2 P(x2|x3)
Causal Bayesian networks
Nodes: variables
Links: causality
Four random variables:
X1
X2
X3
X4
Each node has a conditional
probability distribution
coin toss produces heads
pencil levitates
friend has psychic powers
friend has two-headed coin
Data: observations of and
interventions on x1, ..., x4
P(x4) X4
P(x1|x3, x4)
X3 P(x3)
X1
X2 P(x2|x3)
Interventions
Four random variables:
Cut all incoming links for the
node that we intervene on
X1
X2
X3
X4
Compute probabilities with
“mutilated” Bayes net
coin toss produces heads
pencil levitates
friend has psychic powers
friend has two-headed coin
hold down pencil
P(x4) X4
X3 P(x3)
X
P(x1|x3, x4)
X1
X2 P(x2|x3)
Learning causal graphical models
C
B
E
C
B
E
• Strength: how strong is a relationship?
• Structure: does a relationship exist?
Causal structure vs. causal strength
C
B
E
C
B
E
• Strength: how strong is a relationship?
Causal structure vs. causal strength
B
C
B
w0
w1
w0
E
C
E
• Strength: how strong is a relationship?
– requires defining nature of relationship
Parameterization
• Structures: h1 =
C
B
h0 =
E
E
• Parameterization:
C B
0
1
0
1
0
0
1
1
Generic
h1: P(E = 1 | C, B)
p00
p10
p01
p11
C
B
h0: P(E = 1| C, B)
p0
p0
p1
p1
Parameterization
• Structures: h1 =
B
C
w0
w1
h0 =
C
B
w0
E
E
w0, w1: strength parameters for B, C
• Parameterization:
C B
0
1
0
1
0
0
1
1
Linear
h1: P(E = 1 | C, B)
0
w1
w0
w1+ w0
h0: P(E = 1| C, B)
0
0
w0
w0
Parameterization
• Structures: h1 =
B
C
w0
w1
h0 =
C
B
w0
E
E
w0, w1: strength parameters for B, C
• Parameterization:
C B
0
1
0
1
0
0
1
1
“Noisy-OR”
h1: P(E = 1 | C, B)
0
w1
w0
w1+ w0 – w1 w0
h0: P(E = 1| C, B)
0
0
w0
w0
Parameter estimation
• Maximum likelihood estimation:
maximize i P(bi,ci,ei; w0, w1)
• Bayesian methods: as in Part I
Causal structure vs. causal strength
C
B
E
C
B
E
• Structure: does a relationship exist?
Approaches to structure learning
• Constraint-based:
C
B
– dependency from statistical tests (eg. 2)
– deduce structure from dependencies
E
(Pearl, 2000; Spirtes et al., 1993)
Approaches to structure learning
• Constraint-based:
C
B
– dependency from statistical tests (eg. 2)
– deduce structure from dependencies
E
(Pearl, 2000; Spirtes et al., 1993)
Approaches to structure learning
• Constraint-based:
C
B
– dependency from statistical tests (eg. 2)
– deduce structure from dependencies
E
(Pearl, 2000; Spirtes et al., 1993)
Approaches to structure learning
• Constraint-based:
C
B
– dependency from statistical tests (eg. 2)
– deduce structure from dependencies
E
(Pearl, 2000; Spirtes et al., 1993)
Attempts to reduce inductive problem to deductive problem
Approaches to structure learning
• Constraint-based:
C
B
– dependency from statistical tests (eg. 2)
– deduce structure from dependencies
E
(Pearl, 2000; Spirtes et al., 1993)
• Bayesian:
– compute posterior
probability of structures,
given observed data
P(h|data)  P(data|h) P(h)
C
B
C
B
E
E
P(h1|data)
P(h0|data)
(Heckerman, 1998; Friedman, 1999)
Bayesian Occam’s Razor
P(d | h )
h0 (no relationship)
h1 (relationship)
All possible data sets d
For any model h,
P(d | h)
d
1
Causal graphical models
• Extend graphical models to deal with
interventions as well as observations
• Respecting the direction of causality results
in efficient representation and inference
• Two steps in learning causal models
– strength: parameter estimation
– structure: structure learning
Part II: Graphical models
• Introduction to graphical models
– representation and inference
• Causal graphical models
– causality
– learning about causal relationships
• Graphical models and cognitive science
– uses of graphical models
– an example: causal induction
Uses of graphical models
• Understanding existing cognitive models
– e.g., neural network models
• Representation and reasoning
– a way to address holism in induction (c.f. Fodor)
• Defining generative models
– mixture models, language models (see Part IV)
• Modeling human causal reasoning
Human causal reasoning
• How do people reason about interventions?
(Gopnik, Glymour, Sobel, Schulz, Kushnir & Danks, 2004;
Lagnado & Sloman, 2004; Sloman & Lagnado, 2005;
Steyvers, Tenenbaum, Wagenmakers & Blum, 2003)
• How do people learn about causal relationships?
– parameter estimation
(Shanks, 1995; Cheng, 1997)
– constraint-based models
(Glymour, 2001)
– Bayesian structure learning
(Steyvers et al., 2003; Griffiths & Tenenbaum, 2005)
Causation from contingencies
C present C absent
(c+)
(c-)
E present (e+)
a
c
E absent (e-)
b
d
“Does C cause E?”
(rate on a scale from 0 to 100)
Two models of causal judgment
• Delta-P (Jenkins & Ward, 1965):
P  P(e | c  )  P(e | c  )  a (a  b) 
• Power PC (Cheng, 1997):
P
P

Powerp 
 
1  P(e | c ) d (c  d )
Buehner and Cheng (1997)
P
0.00
People
P
Power
0.25
0.50
0.75 1.00
Buehner and Cheng (1997)
People
P
Power
Constant P, changing judgments
Buehner and Cheng (1997)
People
P
Power
Constant causal power, changing judgments
Buehner and Cheng (1997)
People
P
Power
P = 0, changing judgments
Causal structure vs. causal strength
B
C
B
w0
w1
w0
E
C
E
• Strength: how strong is a relationship?
• Structure: does a relationship exist?
Causal strength
• Assume structure:
B
C
w0
w1
E
 P and causal power are maximum likelihood
estimates of the strength parameter w1, under
different parameterizations for P(E|B,C):
– linear  P, Noisy-OR  causal power
Causal structure
• Hypotheses: h1 =
h0 =
C
B
C
B
E
E
• Bayesian causal inference:
support =
P(d|h1)
likelihood ratio (Bayes factor)
gives evidence in favor of h1
P(d|h0)
  P(d | w ,w ) p(w ,w | h ) dw
1
P(d | h0 )   P(d | w 0 ) p(w 0 | h0 ) dw 0
0
P(d | h1 ) 
1
1
0
0
0
1
0
1
1
0
dw1
Buehner and Cheng (1997)
People
P (r = 0.89)
Power (r = 0.88)
Support (r = 0.97)
The importance of parameterization
• Noisy-OR incorporates mechanism assumptions:
– generativity: causes increase probability of effects
– each cause is sufficient to produce the effect
– causes act via independent mechanisms
(Cheng, 1997)
• Consider other models:
– statistical dependence: 2 test
– generic parameterization (cf. Anderson, 1990)
People
Support (Noisy-OR)
2
Support (generic)
Generativity is essential
P(e+|c+)
P(e+|c-)
100
50
0
8/8
8/8
6/8
6/8
4/8
4/8
2/8
2/8
0/8
0/8
Support
• Predictions result from “ceiling effect”
– ceiling effects only matter if you believe a cause
increases the probability of an effect
Blicket detector
Both objects activate
the detector
Object A does not
activate the detector
by itself
Chi
each
Then
they
maket
(Dave Sobel, Alison Gopnik,
and
colleagues)
Backward Blocking Condition
Procedure used in Sobel et al. (2002), Experiment 2
e Condition
the detector
See this?activateIt’s
a
by itself
blicket machine.
Blickets
Blocking
Condition make it go.
s activate
tector
Object A does not
s activate
tector
Object Aactivates the
detector by itself
Both objects activate
the detector
Children are asked if
each is a blicket
Thenare asked to
they
makethe machine go
Let’s put this one
on the machine.
Children are asked if
each is a blicket
Thenare asked to
they
Object Aactivates the
detector by itself
Oooh, it’s a
blicket!
Chi
each
Then
they
maket
Both objects activate
the detector
Object A does not
activate the detector
by itself
“Backwards blocking”
el et al. (2002), Experiment 2Figure 13: Procedure used in Sobel et al. (2002), Experiment 2
Backward Blocking Condition
One-Cause Condition
(Sobel, Tenenbaum & Gopnik, 2004)
A
t A does not
e the detector
by itself
Aactivates the
tor by itself
Children are a
each is a blicke
Thenare asked t
they
makethe machin
B
Children are asked
Bothif objects activate
each is a blicket
the detector
Thenare asked to
they
makethe machine go
–
–
–
–
AB Trial
Both objects activate
Object A does not
the detector
activate the detector
by itself
A Trial
Object Aactivates the
Children are asked if
detector by itself
each is a blicket
Thenare asked to
they
makethe machine go
Two objects:
A and B
Backward Blocking Condition
Trial 1: A B on detector – detector active
Trial 2: A on detector – detector active
4-year-olds judge whether each object is a blicket
Children are asked
Bothif objects activate
Object Aactivates the
• A:
blicket
(100%
say
yes)
each isa
a blicket
the detector
detector by itself
Thenare asked to
they
makethe machine go
• B:
probably not a blicket (34% say yes)
Children are asked if
each is a blicket
Thenare asked to
they
makethe machine go
Children are a
each is a blicke
Thenare asked t
they
makethe machin
A
Possible hypotheses
A
B
A
E
A
A
E
B
A
A
A
B
A
A
A
B
A
A
A
B
A
A
A
B
A
A
A
B
A
A
B
E
B
E
B
E
E
B
E
B
E
E
B
E
B
E
E
E
B
E
B
E
E
B
E
B
E
E
B
E
B
E
E
B
E
A
E
B
A
B
B
A
B
E
Bayesian inference
• Evaluating causal models in light of data:
P(d | hi )P(hi )
P(hi | d) 
 P(d | h j )P(h j )
j
• Inferring a particular causal relation:

P( A  E | d ) 
 P ( A  E | h j ) P( h j | d )
h H
j
Bayesian inference
With a uniform prior on hypotheses, and the generic parameterization
Probability of being a blicket
A
B
0.32
0.32
0.34
0.34
Modeling backwards blocking
Assume…
• Links can only exist from blocks to detectors
• Blocks are blickets with prior probability q
• Blickets always activate detectors, but detectors
never activate on their own
– deterministic Noisy-OR, with wi = 1 and w0 = 0
Modeling backwards blocking
P(h00) = (1 – q)2
A
P(E=1 | A=0, B=0):
P(E=1 | A=1, B=0):
P(E=1 | A=0, B=1):
P(E=1 | A=1, B=1):
B
P(h01) = (1 – q) q
A
B
P(h10) = q(1 – q)
A
B
P(h11) = q2
A
B
E
E
E
E
0
0
0
0
0
0
1
1
0
1
0
1
0
1
1
1
P(B  E | d)  P(h01 )  P(h11 )  q
Modeling backwards blocking
P(h00) = (1 – q)2
A
P(E=1 | A=1, B=1):
B
P(h01) = (1 – q) q
A
B
P(h10) = q(1 – q)
A
B
P(h11) = q2
A
B
E
E
E
E
0
1
1
1
P(h01 )  P(h11 )
q
P(B  E | d) 

P(h01 )  P(h10 )  P(h11 ) q  q(1 q)
Modeling backwards blocking
P(h01) = (1 – q) q
A
B
P(h10) = q(1 – q)
A
B
P(h11) = q2
A
B
E
E
E
P(E=1 | A=1, B=0):
0
1
1
P(E=1 | A=1, B=1):
1
1
1
P(h11 )
P(B  E | d) 
q
P(h10 )  P(h11 )
Manipulating prior probability
(Tenenbaum, Sobel, Griffiths, & Gopnik, submitted)
Figure 13: Procedure used in Sobel et al. (2002), Experiment 2
One-Cause Condition
Both objects activate
the detector
Object A does not
activate the detector
by itself
Figure 13: Procedure used in Sobel et al. (2002), Experiment 2
ed
et al.
(2002), Experiment
2
et in
al.Sobel
(2002),
Experiment
2
Figure 13: Procedure used in Sobel et al. (2002), Experiment 2
One-Cause Condition
Backward Blocking Condition
One-Cause Condition
Both objects activate
the detector
A does not
A does notObject
activate the detector
he detector
by itself
y itself
Object A does not
Children are asked if
each
is a blicket
activate
the
detector
Both
objects
activate
Object Aactivates the
are asked if
Children are askedChildren
if
Then
they
are
asked
to
by
itself
Both
objects
activate
Object
A
does
not
the
detector
detector by itself
each
is
a
blicket
each is a blicket
make
t
he
machine
go
the
detector
activate
the
detector
Then
Thenare asked to they are asked to
they
by itself
make
Initial
AB Trial
A Trial
Children are ask
each is a blicket
Thenare asked to
they
makethe machine
Children are ask
each
Children are asked
if is a blicket
Thenare asked to
each is a blicket they
Thenare asked to makethe machine
they
Summary
• Graphical models provide solutions to many
of the challenges of probabilistic models
– defining structured distributions
– representing distributions on many variables
– efficiently computing probabilities
• Causal graphical models provide tools for
defining rational models of human causal
reasoning and learning

Bayesian models of inductive learning

Transcript Bayesian models of inductive learning

Directory