Transcript x 13
MRC Cognition and Brain Sciences Unit
Graduate Statistics Course
http://imaging.mrc-cbu.cam.ac.uk/statswiki/StatsCourse2009
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
1
1: The Anatomy of Statistics
Models, Hypotheses, Significance and
Power
Ian Nimmo-Smith
8 October 2009
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
2
The Naming of Parts
Experiments and Data
Models and Parameters
Probability vs. Statistics
Likelihood
Hypotheses and Inference
Tests, Significance and Power
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
3
Experiments and Data
An Experiment E is prescribed by a
Method.
When the Experiment E is performed
Data X are observed.
Repeated performance of E may
produce Data which vary.
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
4
A Simple Experiment
Method
16 volunteers were randomly selected from the
large population of those suffering from Fear of
Statistics Syndrome (FOSS).
They were given a brief experimental therapy
(‘CBU desensitization’ or CD).
Results
At the end of the course 13 volunteers were found
to be cured. (Data: N=16; X=13).
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
5
Models and Parameters (1)
A Model describes how Data arise, by
identifying Systematic and Unexplained
components.
Data = Systematic + Unexplained.
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
6
Models and Parameters (2)
The Systematic and Unexplained
components are linked together through
one or more Parameters via a
Probability formulation.
Parameters can relate to either the
Systematic components (e.g. Mean) or the
Unexplained components (e.g. Standard
deviation; Variance; Degrees of Freedom)
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
7
A Model for our Data (1)
Data
N observations
X without FOSS following CD
Parameter
Parameter p: 0<p<1
p = ‘Rate of recovery from FOSS’
‘each person independently has the same chance p
of recovery.’
Model
Probability(X|N,p) =
N X
p (1 p) N X
X
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
8
A Model for our Data (2)
Probability(X|N,p)=
N X
p (1 p) N X
X
N =16, X=13, p is unknown
The formula expresses the combination of
13 events with probability p, 3 with
probability 1-p and the number of different
ways that the 13 Recoverers could occur in
the 16 volunteers
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
9
A Model for our Data (3)
Probability(X|N,p)=
N X
p (1 p) N X
X
N is fixed by the experimental design
p represents the Systematic component we
want to say something about
the Data X has potentially unexplained
variability, described by a Binomial
Distribution
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
10
Probability
Probability is a fundamental concept
which it is difficult to define.
There are divergent theories on what it
means.
There is however common agreement
on the calculus it obeys.
Intuitions can easily lead one astray.
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
11
The Birthdays paradox
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
12
Probability vs. Statistics
PARAMETERS
‘how the world is’
DATA
MODEL
MODEL
Inferences?
DATA
PARAMETERS
‘how the world is’
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
15
Hypotheses and Inference
What kinds of things can we say about
the ‘true’ value of p?
Point estimates and Confidence Intervals?
Is the Data compatible with the ‘true’ value
of p being, say, 0.75?
Is the weight of the evidence sufficient to
say that we would prefer to say that p=0.75
rather than that p=0.5?
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
16
The weight of the evidence
We will step through a sequence of
possible values of p looking to see how
our data X=13 look.
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
17
p = 0.1
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
18
p = 0.2
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
19
p = 0.3
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
20
p = 0.4
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
21
… at last ...
X = 13 just begins to show up on the
radar at p=0.4
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
22
p = 0.5
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
23
p = 0.6
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
24
p = 0.7
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
25
p = 0.8
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
26
p = 0.9
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
27
The Rise and Fall of Probability (1)
The Probability of the Data 13/16
Recovered rises and falls as p moves from
from near 0 to near 1.
This behaviour is described as the
Likelihood Function for p relative to the
Data X.
Here is a graph of the Likelihood Function,
first for the values of p we have look at so
far ...
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
28
Likelihood values
0.1 0.2 0.3
0.4
0.5
0.6
0.7
0.8
0.9
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
29
The Rise and Fall of Probability (2)
… and here is a complete graph covering
all possible values of p
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
31
The Likelihood Function
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
32
Estimation and Inferences (1)
The Likelihood Function is pivotal in
understanding how the Data throw light
on the Parameters
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
33
Estimation and Inferences (2)
The value of p where the Likelihood
takes its largest value can be a sensible
starting point for estimating p.
This is called the Maximum Likelihood
Estimate (MLE).
Often the MLE is the ‘natural one’:
MLE(p) =13/16=0.833
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
34
Estimation and Inferences (3)
The sharpness of the peak of the curve
tells us the possible scale of the error in
this estimate
Confidence Intervals can be based on this.
The relative heights (Likelihood Ratios)
are a principal tool for comparing
different Parameters.
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
35
Key Questions
Does the experimental manipulation have an
effect?
To what extent does it have an effect?
Does the treatment work?
How well does it work?
Does behaviour B predict pathology P?
How well does it predict it?
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
36
Schools of Statistical
Inference
Ronald Aylmer FISHER
Jergy NEYMAN and Egon PEARSON
Rev. Thomas BAYES
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
37
Fisherian Inference
R A Fisher
Likelihood
P values
Tests of Significance
Null Hypothesis Testing
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
38
Neyman & Pearson Inference
J Neyman and E Pearson
Testing between Alternative Hypotheses
Size
Power
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
39
Bayesian Inference
T Bayes
Prior and Posterior Probabilities
Revision of beliefs in the light of the data
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
40
R A Fisher: P values and Significance Tests (1)
Null Hypothesis H0
e.g. H0: p = 0.5
Data may give evidence against H0.
Order possible outcomes in terms of
degree of deviation from H0.
This may involve a judicious choice of Test
Statistic
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
41
R A Fisher: P values and Significance Tests (2)
P value is the sum of the probabilities of
possible outcomes of the Experiment at
least as extreme (improbable) as the
Data.
P value is also known as the
Significance Level or Significance of
the Data.
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
42
R A Fisher: P values and Significance Tests (3)
Sometime we quote the actual P value.
P = 0.112
Sometimes we quote the P value
relative to conventional values, e.g.
P>0.1, P<0.01 etc.
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
43
R A Fisher: P values and Significance Tests (4)
Sometimes, especially in Tables, a
Baedeker starring system operates:
* means 0.01 <= P < 0.05;
** means 0.001 <= P < 0.01;
*** means P <= 0.001
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
44
R A Fisher and the Design of Experiments
Fisher’s influence on mainstream
scientific methodology is enormous
In particular he created a new science of
the Design of Experiments
Factors
Covariates
Interaction
Confounding
Randomization
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
45
Neyman and Pearson: Hypothesis Testing
Deciding between a Null Hypothesis
and an Alternative Hypothesis
E.g. Hnull: p=0.5 vs. Halt: p=0.75
Two permitted decisions
Accept Hnull
Reject Hnull
Two types of Error
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
46
A Tale of Two Errors (1)
Type I
When we incorrectly Reject Hnull although
Hnull is correct
Alpha (Type I error rate)
‘False Alarms’
‘Size’ = Alpha
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
47
A Tale of Two Errors (2)
Type II
When we incorrectly decide to Accept Hnull
although Halt is correct
Beta (Type II error rate)
‘Missed Signals’
‘Power’ = 1-Beta
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
48
N-P Hypothesis Testing
Fix Alpha (in advance!)
Find the Rejection Region with the given
Size Alpha and smallest possible Beta.
This is intimately linked to the Likelihood
Function (strictly Likelihood Ratios).
Look to see if Data fall in the Rejection
Region
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
49
N-P Hypothesis Testing (2)
If Data fall in Rejection Region then
If Data fall outside RR then
‘We Reject the Null Hypothesis’.
We don’t accept the alternative hypothesis.
‘We Do Not Reject the Null Hypothesis’.
We don’t accept the null hypothesis.
Alpha in Advance gets entangled with
Observed P value.
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
50
Conventional Hybrid Inference
Dress things up as Hypothesis Testing
Use observed P values as differential
indicators of significance
Be aware, and Beware!
Read Gerd Gigerenzer (1993). The superego, the ego, and
the id in statistical reasoning. In A handbook for data
analysis. Hillsdale, NJ: Erlbaum, (pp. 311-339)
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
51
How Statistics took over the scientific world
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
52
Neyman-Pearson Tests
N-P tests are those that have maximum
power for a given maximum size.
For the comparison of two simple
hypotheses these have rejection regions
determined by the likelihood ratio.
LR( x) P( x | p1 ) / P( x | p0 )
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
53
Likelihood Ratio Test
In the case of testing between 2
binomial distributions the LR is an
increasing function of the data X
So N-P test are of the form
Reject H0 if X is greater than or equal to
some critical value
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
54
Size and Power when critical
value = 12
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
55
The Size of various Tests as a
function of critical value
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
56
The Power of various Tests as a
function of critical value
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
57
The Size/Power Trade-Off
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
58
What you buy with larger
samples
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
59
Bayesian Inference
Prior probability distribution over space
of parameters, expressing prior beliefs;
Multiply by likelihood for observed data,
yielding ...
Posterior probability distribution,
expressing revised beliefs having
observed new data
Summary based on posterior
distribution
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
60
A Tale of Two Bayesians
<=>
One Likelihood
Vague
Opinionated
Two Prior Distributions
More influenced by data
Less influenced by data
Two Posterior Distributions
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
61
Conclusion
The concepts we have outlined are the basis of all
the statistical procedures that we use, though we
usually have to take the mathematical details on
trust.
The concepts are not very easy and efforts made
establishing a clear understanding will yield
dividends.
Used effectively, Statistics are a good support; they
can however be a soft underbelly for examiners,
referees, and journal editors.
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
62
Finding out more ...
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
63
Next Week ...
Peter WATSON will speak on ...
Exploratory Data Analysis
MRC CBU Graduate Statistics Lectures
8 October 2009
1 The Anatomy of Statistics
64