Understanding Variability and Statistical Decision
Download
Report
Transcript Understanding Variability and Statistical Decision
Understanding the Variability of Your Data:
Dependent Variable
• Two "Sources" of Variability in DV (Response Variable)
– Independent (Predictor/Explanatory) Variable(s)
– Extraneous Variables
Understanding the Variability of Your Data:
Dependent Variable
• Two Types of Variability in DV
– Unsystematic: changes in DV that do not
covary with changes in the levels of the IV
– Systematic: changes in DV that do covary
with changes in the levels of the IV
Understanding the Variability of Your Data:
Dependent Variable
• Three "labels" for the variability in DV
– Error Variability – unsystematic (type) due to
extraneous variables (source)
• Within conditions (level of IV) variability
• Individuals in same condition affected differently
• Affects standard deviation, not mean, in long term
Understanding the Variability of Your Data:
Dependent Variable
• Three "labels" for the variability in DV
– Error Variability - unsystematic due to
extraneous variables
Common sources
individual differences
uncontrolled procedural variations
measurement error
Understanding the Variability of Your Data:
Dependent Variable
• Three "labels" for the variability in DV
– Primary Variability –
• systematic variability (type) of DV due to
independent variable (source)
DV does covary with IV,
and variability is due to IV
Understanding the Variability of Your Data:
Dependent Variable
• Three "labels" for the variability in DV
– Primary Variability – systematic due to
independent variable
•
•
•
•
Between conditions (levels) variability
Individuals in same condition affected similarly
Individuals in different conditions affected differently
Affects mean, not standard deviation, in long term
Understanding the Variability of Your Data:
Dependent Variable
• Three "labels" for the variability in DV
– Secondary Variability –
systematic variability (type) of DV due to
extraneous variable (source) (which happens to covary with IV)
DV does covary with IV,
but variability is due to EV
Understanding the Variability of Your Data:
Dependent Variable
• Three "labels" for the variability in DV
– Secondary Variability – systematic due to
extraneous variable
•
•
•
•
Between conditions (levels) variability
Individuals in same condition affected similarly
Individuals in different conditions affected differently
Affects mean, not standard deviation, in long term
Understanding the Variability of Your Data:
Dependent Variable
• Roles played in the Research Situation
– Error Variability - unsystematic
• A nuisance – the ‘noise’ in the research situation
– Primary Variability - systematic
• The focus – the potentially meaningful source (signal)
– Secondary Variability - systematic
• The ‘evil’ – confounds the results (alternative signal)
Example
• Two sections of the same course
– Impact of each type of variability on the
summary statistics
• Error variability – affects the variability within a group, so has impact
on standard deviation – more Error Variability = higher SD
• Primary variability – affects those in same condition in similar way,
so all scores change, and mean is changed –
more Primary Variability = greater change in the mean
• Secondary variability – affects those in same condition in similar
way, so all scores change the same amount, and mean is changed more Secondary Variability = greater change in the mean
Note that the position of
the distributions
remains the same, no
change in mean, but
the shapes change to
reflect more or less
variability around the
mean.
Changes in Original Distribution (black) with an
INCREASE in Error Variance (red) and with a
DECREASE in Error Variance (blue)
Note that the shape of
the distributions
remains the same, no
change in error
variance, but the
means change.
Changes in Original Distribution (black) with a
Positive change in Systematic Variance (red) and with a
Negative change in Systematic Variance (blue)
Example
Individual’s score as combination of
‘sources’
• Impact on each individual
• Select 3 students at random from each class
• What would you predict as their test scores?
Jane Joe
Chris
Julie Jim Sandy
In study
75
75
75
75 75
75
Need to Achieve
Front/Back or Light/Dark
Measurement
2
-1
0
-1
-1
1
-1
1
1
-3
1
2
1
0
0
After Unsystematic
76
74
76
75 74
(extraneous variables
that vary across those in
same condition)
75.3
2
-1
-2
76
75
Method of Instruction (IV)
(Systematic due to IV same effect on all in
same condition)
-5
-5
-5
5
5
5
Time of day (EV)
(Systematic due to EV same effect on all in
same condition)
-5
-5
-5
5
5
5
66
64
66
85 84
Final Results
65.3
85
86
What if all
High Need
to Achieve
ended up in
one group?
Statistical decision-making
• The logic behind inferential statistics
Deciding if there is ‘systematic variability’
Does DV covary with IV?
–
No distinction - primary vs. secondary
•
(must ‘design ‘ secondary out of data)
• What do the data tell us?
• What decisions should we make?
Statistical decision-making
• A Research Example –
– compare ‘sample’ statistic to ‘known population’ statistic
– Research Hypothesis
– IF students chant the “Statistician’s Mantra”
before taking their Methods exam THEN they
will earn higher scores on the exam.
Statistical decision-making
• A Research Example –
– based on standardized exam
Your Class (M = 80, SD = 15, n = 25)
(a sample)
compared to a known
population Mean (M = 70) for a
standardized exam – is Class mean
consistent with this mean?
Statistical decision-making
A Research Example – to the board/handout
Can estimate the Sampling Distribution
based on your sample
See if Population mean ‘fits’
Cause effect relationship not clear
(is it the Chant?)
Statistical decision-making
• A Research Example using experimental
approach
– Comparing 2 samples from ‘same’ population
– Research Hypothesis
– IF students chant the “Statistician’s Mantra”
(vs. not chanting) before taking their Methods
exam THEN they will earn higher scores on
the exam.
Statistical decision-making
• Procedure
– Randomly divide class into two groups
• Chanters – are taught the “Statistician’s Chant”
and chant together for 5 minutes before the exam
• Non-chanters – sing Kumbaya together for 5
minutes before the exam (placebo chant)
Statistical decision-making
• Results
– Compute exam scores for all students and
organize by ‘condition’ (levels of IV).
No Chant
M = 70
SD = 10
n = 25
SE = 2
Chant
M = 80
SD = 10
n = 25
SE = 2
Statistical decision-making
• Results
– Compute exam scores for all students and
organize by ‘condition’ (levels of IV).
– Compare Mean Exam Scores for two conditions
No Chant
M = 70
Chant
M = 80
Statistical decision-making
• Results
– Compute exam scores for all students and
organize by ‘condition’ (levels of IV).
– Compare Means Exam Scores for two
conditions
No Chant
M = 70
Chant
M = 80
– What will you find? Difference = 10
– What will you need to find to confirm
hypothesis? (How much difference is enough?)
Statistical decision-making
• Research Hypotheses generally imprecise
– Predictions are not specific - what size difference
– So “testing” the Research Hypothesis, using
the available data, not reasonable
– Do results ‘fit’ the prediction?
you have nothing to compare your outcome to
Statistical decision-making
• Null Hypothesis – a precise alternative
– Identifies outcome expected when
NO systematic variability is present
• In this case, when the expected difference
between means is zero
M no chant = M chant, so difference expected = 0
Statistical decision-making
• Null Hypothesis – a precise alternative
– Identifies outcome expected when NO
systematic variability is present
– But still must decide how close to the
expected outcome you must be to ‘believe’ in
the ‘truth’ of the Null Hypothesis
Statistical decision-making
• The Null Hypothesis Sampling Distribution
– Why is it more appropriate than finding the Research Hypothesis
Sampling Distribution?
Statistical decision-making
• The Null Hypothesis Sampling Distribution
– All possible outcomes (differences between
means) when the Null Hypothesis is true
• (when there is no ‘systematic’ variability present in
the data)
• What is the Mean of the Null Hypothesis Sampling
Distribution in this case?
Statistical decision-making
• The Null Hypothesis Sampling Distribution
– All possible outcomes when the Null
Hypothesis is true
• (when there is no ‘systematic’ variability present in
the data)
– Finding all the possible outcomes?
– Estimate from what we know
– Mean, Std Error, Shape?
Statistical decision-making
• The Null Hypothesis Sampling Distribution
– All possible outcomes when the Null
Hypothesis is true
• (when there is no ‘systematic’ variability present in
the data)
– Finding all the possible outcomes?
– Seeing where your results fit into the Null
Hypothesis Sampling Distribution
Statistical decision-making
• Deciding what to conclude based on the ‘fit’
– In the Null Hypothesis Sampling Distribution
Reject Null
Unlikely, but
possible
outcomes
when Ho is
true
Do not reject Null hypothesis
Most likely outcomes when Ho
true
0
Typical
difference
expected
Reject Null
Unlikely, but
possible
outcomes
when Ho is
true
Statistical decision-making
Reject Null
Do not reject Null hypothesis
Most likely outcomes when Ho
true
- approx.
2 SEs
0 diff
Reject Null
+ approx.
2 SEs
Using 2 SE’s (or 2.06 SE’s) provides what ‘confidence? Now need the SEdiff
Statistical decision-making
• Deciding what to conclude based on the ‘fit’
Reject Ho
Decision
Not Reject Ho
“True” State of the World
Ho True
Ho False
Error
Correct Rejection
Correct
Error
Nonrejection
_____________________________
100%
100%
Statistical decision-making
• Deciding what to conclude based on the ‘fit’
“True” State of the World
Ho True
Ho False
Reject Ho
Type 1 (p)
Correct Rejection
(Power = 1 – Type 2)
Decision
Not Reject Ho
Correct
Type 2
Nonrejection ___________
100%
100%
Deciding what confidence you want to have that you have not made
any errors
The Research Hypothesis (Hr) Sampling Distribution.
The Null Hypothesis (Ho) Sampling Distribution.
All possible outcomes when the Hr is TRUE.
All possible outcomes when the Ho is TRUE.
The location of this distribution is unknown, since the true
systematic difference associated with the IV is unknown. If
the Hr is truly an alternative to the Ho, all we know is the
mean difference should not be 0.
The location of this distribution is known, because it
would be the mean when the No is true. In this case,
a 2 group design, the mean would be 0, since the Ho
predicts a 0 difference between levels of the IV. The
‘spread’ of the distribution is a function of
unsystematic variability, and can be estimated using
the SDs for the sample.
The ‘spread’ of the Hr should be the same as the Ho, since the
unsystematic variability would be the same no matter which
one is true.
If you get an outcome that exists in this set of
outcomes, you have evidence consistent with the Ho.
If you get an outcome that exists in this set of outcomes, you
have evidence consistent with the Hr
Assume Type 1
error probability
of .05 is desired
2.5% in each tail, on
or outside red line
Not 0
0
So – where, on these two distributions would you find each of 4 outcomes?
Type 1 error - your choice based on desired confidence – but not only error possible!
Correct Non-rejection
Type 2 error
Correct Rejection
Ho
HR
Not 0
0
In the bottom
example, you
have more ‘error’
variability in your
data – what
changes?
Statistical decision-making
• Trade-offs between Types of Errors
I believe I can fly?
• Factors affecting Type 2 Errors (Power)
– “Real” systematic variability (size of effect)
– Choice of Type 1 probability
– Precision of estimates (sample size)
The Research Hypothesis (Hr) Sampling Distribution.
The Null Hypothesis (Ho) Sampling Distribution.
All possible outcomes when the Hr is TRUE.
All possible outcomes when the Ho is TRUE.
The location of this distribution is unknown, since the true
systematic difference associated with the IV is unknown. If
the Hr is truly an alternative to the Ho, all we know is the
mean difference should not be 0.
The location of this distribution is known, because it
would be the mean when the No is true. In this case,
a 2 group design, the mean would be 0, since the Ho
predicts a 0 difference between levels of the IV. The
‘spread’ of the distribution is a function of
unsystematic variability, and can be estimated using
the SDs for the sample.
The ‘spread’ of the Hr should be the same as the Ho, since the
unsystematic variability would be the same no matter which
one is true.
If you get an outcome that exists in this set of
outcomes, you have evidence consistent with the Ho.
If you get an outcome that exists in this set of outcomes, you
have evidence consistent with the Hr
Assume Type 1
error probability
of .05 is desired
2.5% in each tail, on
or outside red line
Not 0
0
Effect of Change in REAL size of effect –
Effect of Change in Type 1 probability –
Effect of Change in Sample Size –
Statistical decision-making
• So, how does this apply to our case?
• Factors affecting Type 2 Errors (Power)
– “Real” systematic variability (size of effect)
• You can decide what size would be worth detecting
– Choice of Type 1 probability
• You can choose – based on desired confidence in avoiding this error
– Precision of estimates (sample size)
• You can choose, or at least know
Statistical decision-making
• So, how does this apply to our case?
• Factors affecting Type 2 Errors (Power)
– “Real” systematic variability (size of effect)
• Assume .5 * SD, a moderate size effect is good
In the case of the Chanting example
– Choice of Type 1 probability
• Use traditional .05
– Precision of estimates (sample size)
• Sample of 50 (2 groups of 25)
Statistical decision-making
• Factors affecting Type 2 Errors (Power)
– Type 2 error probability = .59
– Power = .41 for the Chant/No Chant experiment
– So, to be able to detect at least a ‘moderate’ effect,
– and have a 5% chance of a Type 1 error,
– with your sample size of 25 per group
– your probability of making a Type 2 error is 59%
Statistical decision-making
• Each ‘Decision” has an associated ‘error’
• Can only make Type 1 if “Reject”
• Can only make Type 2 if “Not Reject”
•
• Reject Ho
Decision
• Not Reject Ho
•
“True” State of the World
Ho True
Ho False
Type 1 Error
Correct Rejection (Power)
Correct
Nonrejection
Type 2 Error
Statistical decision-making
• But, these decisions are based ONLY on
the probability of getting the outcome you
found if the Null Hypothesis is actually true
• Also might want to know how much of an
effect was there, or how strong is the
relationship between the variables
Statistical decision-making
Interpreting “Significant” Statistical Results
Statistical Significance vs. Practical Significance
How unlikely is the event in these circumstances
(Statistical significance)
(when Ho true)
versus
How much of an effect was there
(Practical significance)
minimal difference likely (at some probability)
or
‘explained’ variability in DV (0% - 100% scale)
Statistical decision-making
Interpreting “Significant” Statistical Results
Having decided to “reject” the Null Hypothesis
you can:
– State probability of Type 1 error
– State confidence interval for population value
– State percent of variability in DV ‘accounted for’
or likely ‘size’ of the difference
Statistical decision-making
Interpreting “Significant” Statistical Results
• For Chant vs. No Chant example
– State probability of Type 1 error
• .05
– State confidence interval for population value
• 95% CI is approximately +2 * SE (was found to be 2.8)
• Point estimate of 10 + 5.6 but Interval estimate clearer
– (“Real” difference somewhere between 4.4 and 15.6, the 95%CI)
– State percent of variability in DV ‘accounted for’
• eta2 = .20, or 20%
Statistical decision-making
Interpreting “Non-significant” Statistical Results
Having decided you cannot reject the Ho
State the estimated ‘power’ of your research
with respect to some ‘effect size’
What is the problem when you have too little (low) power?
Can you have too much power?
Group Statistics
Ease of Return to Work
Colleagues' Acceptance
Customers' Acceptance
Future Productivity
Likelihood of R elaps e
Back Injury N ervous
Breakdown
Back Injury
N
Mean
Std. Deviation
1065
6.1531
2.02012
Nervous Breakdow n
1053
4.6182
1.98869
Back Injury
1065
6.9192
1.84479
Nervous Breakdow n
1053
5.3286
2.05862
Back Injury
1065
6.8986
1.78734
Nervous Breakdow n
1053
5.7559
2.09841
Back Injury
1065
7.6761
1.48084
Nervous Breakdow n
1053
6.4577
1.84148
Back Injury
1064
4.9088
1.95436
Nervous Breakdow n
1053
5.1975
1.94333
Diff btwn
Means
1.53482
1.59066
1.14266
1.21832
-.28870
Ratings on a 9-point scale “Definitely No (1) to (9) Definitely Yes
Difference between means needed to be
‘statistically significant’ at .05 =.17
95% CI for .17 would be .01 to .33 which means what?
Are we ‘detecting’ the meaningless low probability event?