PPT slides for 15 September - Psychological Sciences
Download
Report
Transcript PPT slides for 15 September - Psychological Sciences
Let’s do a Bayesian analysis
Greg Francis
PSY 626: Bayesian Statistics for Psychological Science
Fall 2016
Purdue University
Visual Search
A classic experiment in perception/attention involves visual search
Respond as quickly as possible whether an image contains a target (a
green circle) or not
https://coglab.cengage.com/?labname=visual_search
Log in with ID=GregTest-145, password=12345678
Visual Search
A classic experiment in perception/attention involves visual search
Respond as quickly as possible whether an image contains a target (a green
circle) or not
Vary number of distractors: 4, 16, 32, 64
Vary type of distractors: feature (different color), conjunctive (different color or
shape)
Visual Search
A classic experiment in perception/attention involves visual search
Respond as quickly as possible whether an image contains a target (a green
circle) or not
Vary number of distractors: 4, 16, 32, 64
Vary type of distractors: feature (different color), conjunctive (different color or
shape)
Visual Search
A classic experiment in perception/attention involves visual search
Respond as quickly as possible whether an image contains a target (a green
circle) or not
Vary number of distractors: 4, 16, 32, 64
Vary type of distractors: feature (different color), conjunctive (different color or
shape)
Measure reaction time: time between onset of image and participant’s response
5 trials for each of 16 conditions: 4 number of distractors x 2 target (present or
absent) x 2 distractor types (feature or conjunctive) = 80 trials
Visual Search
Typical results: For conjunctive distractors, response time increases with the
number of distractors
Linear model
Suppose you want to model the search time on the
Conjunctive search trials when the target is Absent as a
linear equation
Let’s do it for a single participant
We are basically going through Section 4.4 of the text, but
using a new data set
Download files from the class web site and follow along in
class
Read in data
rm(list=ls(all=TRUE)) # clear all variables
Clear old variables out from R’s memory
graphics.off()
Remove old graphics from the display
VSdata<-read.csv(file="VisualSearch.csv",header=TRUE,stringsAsFactors=FALSE)
Reads in the contents of the file VisualSearch.csv
Uses the contents of the first row as a header, and creates variables by the names of those headers
(no spaces in the header names!)
VSdata2<-subset(VSdata, VSdata$Participant=="Francis200S16-2" &
VSdata$Target=="Absent" & VSdata$DistractorType=="Conjunction")
Creates a new variable that contains just the data for a particular Participant
Only includes data from the Target absent trials
Only includes data for the Conjunction distractors
Check the contents of VSdata2 in the R console
Define the model
library(rethinking)
This loads up the R library created by the textbook author
It has some nice functions for plotting and displaying tables
VSmodel <- map(
alist( RT_ms ~ dnorm(mu, sigma),
mu <- a + b*NumberDistractors,
a ~ dnorm(500, 10),
b ~ dnorm(0, 100),
sigma ~ dunif(0, 500)
), data=VSdata2 )
Defines the linear model
Assigns prior distributions to each parameter (a, b, sigma)
Prior distributions
a ~ dnorm(500, 10),
We are saying that we expect the intercept (RT_ms with 0 distractors)
to be around 500
This is a claim about the population parameter, not about the data,
per se
b ~ dnorm(0, 100),
We are saying that we expect the slope to be around 0, but with a lot
of possible other choices
sigma ~ dunif(0, 500)
We are saying that sigma could be anything between 0 (standard
deviation cannot be negative!) or 500
It cannot be outside this range!
Maximum a Posterior
Here, defining the model basically does all the work in the
“map” function
This function takes the priors and the data, and computes
posterior distributions
The main output of the map( ) calculation is the set of a, b,
and sigma that have the highest probabilities
Often this is actually probability density rather than probability
Posterior Distribution
Remember, these are distributions of the population
parameters!
Posterior Distribution
Remember, these are distributions of the population
parameters!
Posterior Distribution
Remember, these are distributions of the population
parameters!
Maximum a Posterior
The main output of the map( ) calculation is the set of a, b, and
sigma that have the highest probabilities
Often this is actually probability density rather than probability
Posterior Distributions
It is more complicated because we have three parameters
with a joint posterior probability density function
Finding the peak of a multidimensional surface is not a trivial
problem
The map() function allows you to specify a couple of different
methods
It sometimes produces errors that correspond to trouble finding the
peak
Caution, model may not have converged.
Code 1: Maximum iterations reached.
MAP estimates
summaryTable <- summary(VSmodel)
print(VSmodel)
Prints out a table of Maximum A Posteriori estimates of a, b, and
sigma
MAP estimates
Maximum a posteriori (MAP) model fit
Formula:
RT_ms ~ dnorm(mu, sigma)
mu <- a + b * NumberDistractors
a ~ dnorm(500, 10)
b ~ dnorm(0, 100)
sigma ~ dunif(0, 500)
MAP values:
a
b
sigma
500.88486 48.29027 355.39830
Log-likelihood: -147.8
MAP estimates (2nd run)
Maximum a posteriori (MAP) model fit
Formula:
RT_ms ~ dnorm(mu, sigma)
mu <- a + b * NumberDistractors
a ~ dnorm(500, 10)
b ~ dnorm(0, 100)
sigma ~ dunif(0, 500)
MAP values:
a
b
sigma
501.66287 48.48289 388.33640
Log-likelihood: -147.61
MAP estimate
Red line is the MAP “best fitting” straight line
plot(RT_ms ~ NumberDistractors, data=VSdata2)
abline(a=coef(VSmodel)["a"], b=coef(VSmodel)["b"], col=col.alpha("red",1.0))
2500
2000
1500
1000
RT_ms
3000
3500
10
20
30
40
NumberDistractors
50
60
MAP estimate
Grey lines are samples of the population parameters
from the posterior distribution
numVariableLines=2000
numVariableLinesToPlot=20
post<-extract.samples(VSmodel, n= numVariableLines)
for(i in 1: numVariableLinesToPlot){
3000
3500
2000
1500
1000
}
RT_ms
2500
abline(a=post$a[i], b=post$b[i], col=col.alpha("black",0.3))
10
20
30
40
NumberDistractors
50
60
MAP estimate
Now is a good time to see if the model makes any
sense
Model checking
3500
Do the priors have the
2500
2000
reasonably?
1500
Does the model behave
RT_ms
3000
influence we expected?
1000
10
20
30
40
NumberDistractors
50
60
MAP estimate
I see several issues.
1) The lines for samples population parameters all
converge
Intercept!
a=832.94
b=41.38
3000
2500
2500
2000
2000
1500
1500
fit line is rather different
1000
1000
2) The “traditional” best
RT_ms
RT_ms
3500
They all have a common
intercept value
10
20
30
40
NumberDistractors
50
60
Prior
We used
a ~ dnorm(500, 10)
Which means the intercept has to be pretty close to the
value 500 (we got 501.7) regardless of the data
Suppose we use
a ~ dnorm(500, 100)
Now we get:
MAP values:
a
b
sigma
624.42893 45.84785 357.47658
Prior
What about
a ~ dnorm(500, 500)
Now we get:
MAP values:
a
b
sigma
818.79695 41.66933 327.17288
With
a ~ dnorm(500, 1000)
MAP values:
a
b
Error in map(alist(RT_ms ~ dnorm(mu, sigma), mu <- a + b * NumberDistractors, :
non-finite finite-difference value [3]
Start values for parameters may be too far from MAP.
Try better priors or use explicit start values.
If you sampled random start values, just trying again may work.
Start values used in this attempt:
a = 1619.54309994221
b = -58.0206822367026
sigma = 318.546253605746
sigma
832.69534 41.37111 327.51426
MAP Estimate
sigma ~ dunif(0, 100)
sigma ~ dunif(0, 1000)
Can you change the prior
for b to “break” the model?
b ~ dnorm(0, 100)
3500
3000
2500
2000
What happens when you
use:
1500
Check the other priors
1000
Now it looks pretty good (and closely matches the traditional
fit)
RT_ms
10
20
30
40
NumberDistractors
50
60
Why bother?
Once the model makes sense, it gives us an answer pretty
close to standard regression:
MAP: a=832.7, b= 41.4, sigma= 327.5
Standard: a=832.9, b=41.4
The difference is in the question being asked
Standard linear regression
What parameter values a, b, minimize the prediction error?
This is necessarily just a pair of numbers a, b
MAP estimates
Uncertainty about the
estimates
3500
3000
2500
2000
But the posterior
contains a lot more
information
1500
The maximal values
are just a pair of
numbers a, b
1000
What parameter values a, b, maximize the posterior
distribution?
RT_ms
Uncertainty about the prediction10
20
30
40
NumberDistractors
Compare to confidence intervals
50
60
Posterior
mu_at_35 <- post$a +post$b *35
dev.new() # make a new plot window
dens(mu_at_35, col=rangi2,
lwd=2, xlab="mu|NumDistract=35")
0.005
0.004
Density
0.003
0.002
For example, what is the
posterior distribution of the
predicted mean value for
35 distractors?
0.001
You can ask all kinds of questions about predictions and so
forth by just using probability
0.000
2000
2100
2200
2300
mu|NumDistract=35
2400
2500
Posterior
You can ask all kinds of questions about predictions and so
forth by just using probability
What is the probability that
RT_ms is < 2200?
length(mu_at_35[mu_at_35 <= 2200])/length(mu_at_35)
[1] 0.1425
This variability can matter!
Density
0.003
0.002
Most of those values are close to
the “traditional” values, but some
are rather different
0.001
Note, this estimate is computed
by considering “all” possible values
of a, b, sigma
0.000
0.004
0.005
2000
2100
2200
2300
mu|NumDistract=35
2400
2500
Posterior
What is the probability that predicted RT_ms is greater than
3000 for 42 distractors?
What is the probability that a is more than 900?
What is the probability that b is less than 35?
Explore: What happens when you set?
numVariableLines=100
numVariableLines=10000
What happens to these probability estimates when
you change the priors?
Conclusions
That wasn’t so bad
It does take more care than a standard linear regression
analysis
You get a lot more out of it!