Bayesian Analysis Workshop Presentation

Download Report

Transcript Bayesian Analysis Workshop Presentation

Applied Bayesian Analysis
for the Social Sciences
Philip Pendergast
Computing and Research Services
Department of Sociology
[email protected]
Sponsored by Computing and Research Services and the Institute of Behavioral Science
Suspending Disbelief-- Faith in
Classical Statistics
What are some issues that
we have with classical
statistics? Think back to
your introductory class…
QuickTime™ and a
decompressor
are needed to see this picture.
Suspending Disbelief-- Faith in
Classical Statistics
• Conducting an infinite number of
experiments/ repeated sampling
• Assume that some parameter  is
unknown but has a fixed value
• P-value worship
• Null hypothesis testing
• Multiple comparisons
• Strict data assumptions, often unmet
• Confidence Interval interpretation
• Small samples are an issue
QuickTime™ and a
decompressor
are needed to see this picture.
The Coin Flip
• Frequentist
– We can determine the bias of a coin (b) by
repeatedly flipping it and counting heads. As
long as we repeat the process enough times,
we should be able to estimate the “true” bias
of the coin.
– If p<.05 that b= 0.5, we reject the null
hypothesis that it is unbiased.
QuickTime™ and a
decompressor
are needed to see this picture.
The Nail Flip
• Frequentist
– We determine the bias of a nail (b) by
repeatedly flipping it and counting “heads”
(landing on its flat base).
– If p<.05 that b= 0.5, we reject the null
hypothesis that it is unbiased.
QuickTime™ and a
decompressor
are needed to see this picture.
Does this seem reasonable? Don’t we
know that the nail is biased?
Classical Statistics is Atheoretical
• Science is an iterative process, we
should learn from past research.
• Theory should guide us in how we
analyze data.
– Typically, beyond the lit. review, informs:
•
•
•
•
Variable selection
Model building
Choice of model (e.g. SEM, HLM)
NOT the actual way parameters are
estimated in the analysis
QuickTime™ and a
decompressor
are needed to see this picture.
Bayesian Statistics and Theory
• Bayesian statistics considers 
to be unknown, possessing a
probability distribution
reflecting our degree of
uncertainty about it.
• We take into consideration
theory and uncertainty when
estimating .
• The Posterior: A probably
distribution for  given our data
on hand.
• The Data: Needs only meet the
assumption of exchangeability.
• The Prior: A distribution based
on knowledge about , and our
certainty.
p(|y)p(y|)p()
The Nail Flip
• Bayesian
– Prior Beliefs: We consult several nail experts, who
are relatively certain that nails will land on their
heads only 1/50 times, or 2% of the time.
– Data on Hand: We flip the nail 100 times.
– Posterior: We sample from the joint probability of
our prior beliefs given our data (the Posterior
distribution) to see whether the experts’ opinions
are reasonable and/or if our nail shares a similar
bias to other nails.
Well if we
examine the
anatomy of the nail…
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
Priors and Subjectivity
• “B-B-B-Bbbbut wait, aren’t these priors
subjective? We are objective
scientists!”
– Variable selection, model choice, research
questions are all subjective decisions.
• By making these subjective decisions explicit, we
open ourselves to critique and are forced to
thoughtfully choose and defend our choice of priors. If
we have no good theory, we must choose a prior that
lets the data speak for itself.
Choosing Sensible Priors
• How much do we know? How accurate do
we take this information to be?
– Informative priors: Historical data, expert opinion,
past research findings, theoretical implications.
– Non-informative prior: Uniform distribution over a
sensible range of values.
• If the prior has high precision (1/2) or N is
small, it will heavily influence the posterior
distribution. If it has low precision or N is large,
the data influences the posterior more.
Conjugate Prior Distributions
• Conjugate priors have a distribution that
yields a posterior distribution in the same
family as the prior when combined with data.
Data Distribution
Conjugate Prior
Normal
Normal or Uniform
Poisson
Gamma
Binomial
Beta
The Posterior Distribution and
Monte Carlo Integration
• Recall that p(|y) is a
probability distribution.
• It is computationally
demanding to directly
derive summary measures
of p(|y).
• Instead, we repeatedly
sample from p(|y) and
summarize the
distribution formed by
these samples.
– This is called Monte Carlo
Integration
QuickTime™ and a
decompressor
are needed to see this picture.
Monte Carlo Markov Chains,
Explained
QuickTime™ and a
decompressor
are needed to see this picture.
QuickTime™ and a
decompressor
are needed to see this picture.
Markov Chains, Continued
• We specify the number of chains as well as
the number of iterations made.
• They “dance” around the posterior from
starting values, moving to areas of higher
density.
• Chains stabilize around the posterior
mean.
• Once stabilized, discard early iterations
(Burn-in samples).
• Estimates of the posterior come from the
post-burn-in period.
Bayesian Analysis (Finally!)
• Decide on a model.
• Specify the # of Markov Chains, # of
iterations, a burn-in period, and your prior
beliefs.
• Run model diagnostics to check for
convergence.
• Compare results of models with different
specifications of priors, parameters, etc.
to see which best “returns” the data inhand or obtains the highest model fit (e.g.
BIC, Bayes Factor, Deviance).
Overcoming Classical
Shortcomings
• Conducting an infinite
number of experiments/
repeated sampling
• Assume that some
parameter  is unknown
but has a fixed value
• P-value worship
• Null hypothesis testing
• Multiple comparisons
• Strict data assumptions,
often unmet
• Confidence Interval
Interpretation
• Small samples are an
issue
•Only use data on hand, no
extrapolating to other
potential(ly conflicting) data
•Directly estimate our
uncertainty of 
•Report HDIs, thoughtfully
draw conclusions
•More meaningful hypothesis
testing (e.g. different priors)
•Not an issue
•Minimal assumptions
(exchangability)
•HDI shows the believability
(probability) of values
•If strong priors, still useful
References
Kruschke, J. K. (2011). Doing Bayesian
Data Analysis: A tutorial with R and
BUGS. Oxford: Academic Press.
Kaplan, D. (2014). Bayesian Statistics for
the Social Sciences. New York:
Guilford Press.
R “MCMCpack” Tutorial
• Run simple models predicting job
satisfaction as a function of income.
• One model uses an uninformative prior
(specifically, the uniform distribution)
• The other uses an informed prior from
earlier data
• Compare the Bayes Factors to see
which “retrieves” the data better (i.e. is
a better fit)
R “MCMCpack” Tutorial
• Open R
• Click “Packages”-->Set CRAN mirror-->
Pick anything in the US.
• Open “Packages” again-->Install
Packages-->Scroll down to MCMCpack.
• Say “yes” to a new library.
• Type “library(MCMCpack)” to load it in,
also type “library(foreign)” to enable
reading of the STATA file.