PPT slides for 15 September - Psychological Sciences

Download Report

Transcript PPT slides for 15 September - Psychological Sciences

Let’s do a Bayesian analysis
Greg Francis
PSY 626: Bayesian Statistics for Psychological Science
Fall 2016
Purdue University
Visual Search



A classic experiment in perception/attention involves visual search
Respond as quickly as possible whether an image contains a target (a
green circle) or not
https://coglab.cengage.com/?labname=visual_search
 Log in with ID=GregTest-145, password=12345678

Visual Search




A classic experiment in perception/attention involves visual search
Respond as quickly as possible whether an image contains a target (a green
circle) or not
Vary number of distractors: 4, 16, 32, 64
Vary type of distractors: feature (different color), conjunctive (different color or
shape)
Visual Search




A classic experiment in perception/attention involves visual search
Respond as quickly as possible whether an image contains a target (a green
circle) or not
Vary number of distractors: 4, 16, 32, 64
Vary type of distractors: feature (different color), conjunctive (different color or
shape)
Visual Search






A classic experiment in perception/attention involves visual search
Respond as quickly as possible whether an image contains a target (a green
circle) or not
Vary number of distractors: 4, 16, 32, 64
Vary type of distractors: feature (different color), conjunctive (different color or
shape)
Measure reaction time: time between onset of image and participant’s response
5 trials for each of 16 conditions: 4 number of distractors x 2 target (present or
absent) x 2 distractor types (feature or conjunctive) = 80 trials
Visual Search

Typical results: For conjunctive distractors, response time increases with the
number of distractors
Linear model




Suppose you want to model the search time on the
Conjunctive search trials when the target is Absent as a
linear equation
Let’s do it for a single participant
We are basically going through Section 4.4 of the text, but
using a new data set
Download files from the class web site and follow along in
class
Read in data

rm(list=ls(all=TRUE)) # clear all variables


Clear old variables out from R’s memory
graphics.off()
 Remove old graphics from the display

VSdata<-read.csv(file="VisualSearch.csv",header=TRUE,stringsAsFactors=FALSE)
 Reads in the contents of the file VisualSearch.csv
 Uses the contents of the first row as a header, and creates variables by the names of those headers
(no spaces in the header names!)

VSdata2<-subset(VSdata, VSdata$Participant=="Francis200S16-2" &
VSdata$Target=="Absent" & VSdata$DistractorType=="Conjunction")
 Creates a new variable that contains just the data for a particular Participant
 Only includes data from the Target absent trials
 Only includes data for the Conjunction distractors

Check the contents of VSdata2 in the R console
Define the model

library(rethinking)
 This loads up the R library created by the textbook author
 It has some nice functions for plotting and displaying tables

VSmodel <- map(

alist( RT_ms ~ dnorm(mu, sigma),

mu <- a + b*NumberDistractors,

a ~ dnorm(500, 10),

b ~ dnorm(0, 100),

sigma ~ dunif(0, 500)

), data=VSdata2 )
 Defines the linear model
 Assigns prior distributions to each parameter (a, b, sigma)
Prior distributions

a ~ dnorm(500, 10),
 We are saying that we expect the intercept (RT_ms with 0 distractors)
to be around 500
 This is a claim about the population parameter, not about the data,
per se

b ~ dnorm(0, 100),
 We are saying that we expect the slope to be around 0, but with a lot
of possible other choices

sigma ~ dunif(0, 500)
 We are saying that sigma could be anything between 0 (standard
deviation cannot be negative!) or 500
 It cannot be outside this range!
Maximum a Posterior



Here, defining the model basically does all the work in the
“map” function
This function takes the priors and the data, and computes
posterior distributions
The main output of the map( ) calculation is the set of a, b,
and sigma that have the highest probabilities
 Often this is actually probability density rather than probability
Posterior Distribution

Remember, these are distributions of the population
parameters!
Posterior Distribution

Remember, these are distributions of the population
parameters!
Posterior Distribution

Remember, these are distributions of the population
parameters!
Maximum a Posterior
The main output of the map( ) calculation is the set of a, b, and
sigma that have the highest probabilities
 Often this is actually probability density rather than probability
Posterior Distributions


It is more complicated because we have three parameters
with a joint posterior probability density function
Finding the peak of a multidimensional surface is not a trivial
problem
 The map() function allows you to specify a couple of different
methods
 It sometimes produces errors that correspond to trouble finding the
peak

Caution, model may not have converged.

Code 1: Maximum iterations reached.
MAP estimates

summaryTable <- summary(VSmodel)

print(VSmodel)
 Prints out a table of Maximum A Posteriori estimates of a, b, and
sigma
MAP estimates
Maximum a posteriori (MAP) model fit
Formula:
RT_ms ~ dnorm(mu, sigma)
mu <- a + b * NumberDistractors
a ~ dnorm(500, 10)
b ~ dnorm(0, 100)
sigma ~ dunif(0, 500)
MAP values:
a
b
sigma
500.88486 48.29027 355.39830
Log-likelihood: -147.8
MAP estimates (2nd run)
Maximum a posteriori (MAP) model fit
Formula:
RT_ms ~ dnorm(mu, sigma)
mu <- a + b * NumberDistractors
a ~ dnorm(500, 10)
b ~ dnorm(0, 100)
sigma ~ dunif(0, 500)
MAP values:
a
b
sigma
501.66287 48.48289 388.33640
Log-likelihood: -147.61
MAP estimate
Red line is the MAP “best fitting” straight line

plot(RT_ms ~ NumberDistractors, data=VSdata2)

abline(a=coef(VSmodel)["a"], b=coef(VSmodel)["b"], col=col.alpha("red",1.0))
2500
2000
1500
1000
RT_ms
3000
3500

10
20
30
40
NumberDistractors
50
60
MAP estimate

Grey lines are samples of the population parameters
from the posterior distribution
numVariableLines=2000

numVariableLinesToPlot=20

post<-extract.samples(VSmodel, n= numVariableLines)

for(i in 1: numVariableLinesToPlot){
3000
3500

2000
1500
1000
}
RT_ms

2500
abline(a=post$a[i], b=post$b[i], col=col.alpha("black",0.3))

10
20
30
40
NumberDistractors
50
60
MAP estimate

Now is a good time to see if the model makes any
sense
 Model checking
3500
Do the priors have the
2500
2000
reasonably?
1500
Does the model behave
RT_ms

3000
influence we expected?
1000

10
20
30
40
NumberDistractors
50
60
MAP estimate

I see several issues.

1) The lines for samples population parameters all
converge
 Intercept!
 a=832.94
 b=41.38
3000
2500
2500
2000
2000
1500
1500
fit line is rather different
1000
1000
2) The “traditional” best
RT_ms
RT_ms

3500
 They all have a common
intercept value
10
20
30
40
NumberDistractors
50
60
Prior

We used

a ~ dnorm(500, 10)

Which means the intercept has to be pretty close to the
value 500 (we got 501.7) regardless of the data

Suppose we use

a ~ dnorm(500, 100)

Now we get:

MAP values:


a
b
sigma
624.42893 45.84785 357.47658
Prior

What about

a ~ dnorm(500, 500)

Now we get:

MAP values:
a

b
sigma

818.79695 41.66933 327.17288

With

a ~ dnorm(500, 1000)

MAP values:


a
b
Error in map(alist(RT_ms ~ dnorm(mu, sigma), mu <- a + b * NumberDistractors, :
non-finite finite-difference value [3]
Start values for parameters may be too far from MAP.
Try better priors or use explicit start values.
If you sampled random start values, just trying again may work.
Start values used in this attempt:
a = 1619.54309994221
b = -58.0206822367026
sigma = 318.546253605746
sigma
832.69534 41.37111 327.51426
MAP Estimate
sigma ~ dunif(0, 100)

sigma ~ dunif(0, 1000)


Can you change the prior
for b to “break” the model?
b ~ dnorm(0, 100)
3500
3000
2500

2000
What happens when you
use:
1500

Check the other priors
1000

Now it looks pretty good (and closely matches the traditional
fit)
RT_ms

10
20
30
40
NumberDistractors
50
60
Why bother?

Once the model makes sense, it gives us an answer pretty
close to standard regression:

MAP: a=832.7, b= 41.4, sigma= 327.5

Standard: a=832.9, b=41.4

The difference is in the question being asked
Standard linear regression

What parameter values a, b, minimize the prediction error?

This is necessarily just a pair of numbers a, b
MAP estimates


Uncertainty about the
estimates
3500
3000
2500
2000
But the posterior
contains a lot more
information
1500

The maximal values
are just a pair of
numbers a, b
1000

What parameter values a, b, maximize the posterior
distribution?
RT_ms

Uncertainty about the prediction10
20
30
40
NumberDistractors
 Compare to confidence intervals
50
60
Posterior
mu_at_35 <- post$a +post$b *35

dev.new() # make a new plot window

dens(mu_at_35, col=rangi2,
lwd=2, xlab="mu|NumDistract=35")
0.005
0.004
Density
0.003

0.002
For example, what is the
posterior distribution of the
predicted mean value for
35 distractors?
0.001

You can ask all kinds of questions about predictions and so
forth by just using probability
0.000

2000
2100
2200
2300
mu|NumDistract=35
2400
2500
Posterior


You can ask all kinds of questions about predictions and so
forth by just using probability
What is the probability that
RT_ms is < 2200?
length(mu_at_35[mu_at_35 <= 2200])/length(mu_at_35)

[1] 0.1425

This variability can matter!
Density
0.003
0.002
Most of those values are close to
the “traditional” values, but some
are rather different
0.001

Note, this estimate is computed
by considering “all” possible values
of a, b, sigma
0.000

0.004
0.005

2000
2100
2200
2300
mu|NumDistract=35
2400
2500
Posterior

What is the probability that predicted RT_ms is greater than
3000 for 42 distractors?

What is the probability that a is more than 900?

What is the probability that b is less than 35?

Explore: What happens when you set?
 numVariableLines=100
 numVariableLines=10000

What happens to these probability estimates when
you change the priors?
Conclusions

That wasn’t so bad
It does take more care than a standard linear regression
analysis


You get a lot more out of it!