Transcript Document

Hypothesis
Testing
Hypothesis
• An educated opinion
• What you think will happen, based on
• previous research
• anecdotal evidence
• reading the literature
Body fat level of 8th graders
• National Norm:
• Mean = 23%, SD = 7%
• postulated parameter ( and )
• Your 8th grade PE program (N=200)
How does my program compare??
Your gut feeling
• You expect to find, you want to find,
your instincts tell you that your
students are better.
Your gut feeling
• You expect to find, you want to find,
your instincts tell you that your
students are better.
But are they??
Question
• Is any observed difference between
your sample mean (representative
of your 8th grade population mean)
and the National Norm (population
of all 8th graders) attributable to
random sampling errors, or is there
a real difference?
Question
• Is any observed difference between your
sample mean (representative of your 8th
grade population mean) and the
National Norm (population of all 8th
graders) attributable to random sampling
errors, or is there a real difference?
• Is the mean of your class REALLY the
same as the National Norm?
How to determine this
• Research Question
• is my POPULATION mean really 23%
• Statistical Question
•  = 23%
• set the Null Hypothesis that the mean of YOUR
group is 23% (equal to the National Norm)
• assume that your group is NOT REALLY different
Null Hypothesis
• Ho:  = 23%
• The true difference between your sample
and the population mean is 0.
• There is NO real difference between your
sample mean and the population mean.
• The performance of your students is not
really different from the national norm.
Null Hypothesis
• In inferential statistics, we usually want
to reject the Null hypothesis
• to say that the differences are more than
what would be expected by random
sampling error
• this was our initial gut feeling
• our program is better
3 Possible Outcomes
• No difference between groups
• do not reject the null hypothesis
3 Possible Outcomes
• No difference between groups
• One specific group is higher than the other
• directional hypothesis
• What you EXPECT to happen when planning the
experiment/measurement
3 Possible Outcomes
• No difference between groups
• One specific group is higher than the other
• Either group mean is higher
• non-directional hypothesis
• The possible outcome of the
experiment/measurement
Alternative Hypothesis
• Our research hypothesis (what we
expect to see)
• HA:   23%
• non-directional hypothesis
• interested to see if my grade body composition
is better than or worse than the national norm
Alternative Hypothesis
• Our research hypothesis (what we
expect to see)
• HA:  < 23% (HA:  > 23%)
• directional hypothesis
• expect to see my grade mean less than (better
than) the national norm
• expect to see my grade mean greater than
(worse than) that of the national norm
Comparing My Class to the
National Norm
• My 8th grade PE program (N = 200)
• National Norm = 23%
• postulated parameter
• At the end of the semester, calculate the
mean % body fat
• Using a random sample ( n = 25)
• mean % body fat of 20 %
Is my sample mean different from the National Norm?
Need to Test Ho
• Determine whether the observed
difference is means is attributable to
random sampling error rather than a true
difference between the groups (my class
and the national norm)
• treatment effect
Hypothesis Testing
Null Hypothesis
• No true difference between two means
(sample mean and national norm)
• Infers: my sample is drawn from the
identified population
• Nothing more than random sampling errors
accounts for any observed difference
between the means.
An element of uncertainty is inherent
in any act of observation (Menard’s Philosophy)
Alternative Hypothesis
• A true difference does exist between two
means
• Infers: my sample is not drawn from the
identified population
• Observed difference between the means is
larger than what we are willing to attribute
to random sampling error
Testing Ho
• Test the probability that the observed
difference between means is attributable
to random sampling error alone
• Evaluate the probability that Ho is not to
be rejected
• reject or do not reject Ho
What amount of risk
are you willing to take?
Weatherman Example
• 85% chance of rain
• put up the sunroof
• 5% chance of rain
• it may happen, but the chance is slight
• not very likely to rain
• willing to risk being wrong to avoid the
inconvenience of having to put up the sunroof.
If we do not put up the sunroof:
We reject
the
hypothesis
that
it will rain
If we do not put up the sunroof:
We could be right
or
We could be wrong
Wait for certainty
means to
wait forever
What risk are
YOU
willing to take
1%?? 5%?? 10%%
Applied Research
 = 0.10
 = 0.05
 = 0.01
 = 0.05
• With these observed conditions
• 5 times in 100 it will rain
• 5 times in 100 it will rain when we have
kept the sunroof down
• 95 times in 100 it will not rain
• 95 times in 100 it will not rain when we
have kept the sunroof down
 = 0.05
• Reject Ho if the observed mean
difference is greater than what we would
expect to occur by chance (random
sampling error) less than 5 times in 100
instances
• reported in research as a statistically
significant difference
Testing Ho at  = 0.05
• If p > 0.05 : do not reject Ho
• difference is attributable to random
sampling error (expected variability in mean
drawn from a population)
• If p  0.05 : reject Ho
• difference is attributable to something other
than random sampling error
Decision Table
DECISION
Ho TRUE
Ho FALSE
Decision Table
DECISION
Ho TRUE
R
E
A
L
I
T
Y
Ho TRUE
Ho FALSE
Ho FALSE
Decision Table: Correct
DECISION
Ho TRUE
R
E
A
L
I
T
Y
Ho TRUE
Ho FALSE
Ho FALSE
Decision Table: Incorrect (RT1)
DECISION
Ho TRUE
R
E
A
L
I
T
Y
Ho TRUE
Ho FALSE
Ho FALSE
Decision Table: Incorrect (AFII)
DECISION
Ho TRUE
R
E
A
L
I
T
Y
Ho TRUE
Ho FALSE
Ho FALSE
Belief in God
as Decision
Table
Ho: God does not exist
DECISION
R
E
A Ho TRUE
L
I Ho FALSE
T
Y
Ho TRUE
Life no hope
Lost out on
Eternal life
Ho FALSE
Lived life of hope
Eternal life
To this juncture
• Sampling involves error
• Expect differences between samples
To this juncture
• Sampling involves error
• Expect differences between samples
• If we expect a difference between
treatments/conditions, BUT we also
expect a difference because of random
sampling error
To this juncture
• Sampling involves error
• Expect differences between samples
• If we expect a difference between
treatments/conditions, BUT we also expect a
difference because of random sampling error
• HOW do we determine if difference is
statistically significant (> than RSE)?
Testing Ho requires
• Mean value
• measure of typical performance level
• Standard deviation
• measure of the variability
• n of cases
• known to affect
• variability expected with the estimate of the population
mean
z test for one sample
• Our beginning point
• National Norm BF = 23% (SD = 7%)
• Our sample performance
• n = 25
• Mean = 20%
• SD = 6%
Do my students differ
from the National Norm??
Our hypotheses
• Research Hypothesis
• Do my students differ from the national norm
• want to know if better OR worse
• Ho
• There is no real difference in the BF% of my
students and the national norm
•  = 0.05
Recall
• z-score of > 1.96 or < -1.96 occurs less
than 5% of the time
• see table of the Normal Curve
• That is, the probability of obtaining a zscore value this extreme purely by
chance is 5% (only 5 times in 100)
(explain).
Relevance to Hypothesis Testing
• Use the same general idea to evaluate
the probability of obtaining a sample
mean score of 20% with n = 25 if the
true population mean is 23%
• Recall the concept of the distribution of
sampling means
Recall: Z score equation
X-X
Z=
SD
Introduce: Z test equation
Z=
X -
SEm
Standard Error of the Mean
SD
SEm 
n
Z test equation
Mean
difference
Z=
X-
SEm
Z test equation
Z=
Expected
variability
in sample means
X-
SEm
Our given & required data
•
•
•
•
•
•
•
•
X = 20%
SD = 6%
n = 25
= 23%
 = 7%
SEm = ???
X -  = ???
Z = ???
X-

Z=
SEm
Our given & required data
•
•
•
•
•
•
•
•
X = 20%
SD = 6%
n = 25
= 23%
 = 7%
SEm = 7/5 = 1.4
X -  = ???
Z = ???
X-

Z=
SEm
Use the population
standard deviation (SDp)
Our given & required data
•
•
•
•
•
•
•
•
X = 20%
SD = 6%
n = 25
= 23%
 = 7%
SEm = 7/5 = 1.4
X -  = 20% - 23% = -3%
z = ???
X-
Z=
SEm
Our given & required data
•
•
•
•
•
•
•
•
X = 20%
SD = 6%
n = 25
= 23%
 = 7%
SEm = 7/5 = 1.4
X -  = 20% - 23% = -3%
Z = -3 / 1.4 = -2.14
-3%
Z=
1.4
Decision Making
• What is the probability of obtaining a Z = -2.14
IF the difference is attributable only to random
sampling error?
• Is the observed probability (p) LESS THAN or
EQUAL TO the  level set?
• Is
p?
From the tables
• Z > 1.96 or Z < -1.96 has a 5% chance of
occurring purely by chance (explain).
• Since Zobserved = -2.14, our statistical
conclusion is to reject Ho
• the difference of -2.14 is not likely to have occurred
by chance
• The data indicate/suggest (not prove) that
our class HAS less body fat than the norm.
Graphically, 
Zcritical = 
-1.96
= 0.05
1.96
1.96
Z observed = -2.14
Graphically, 
Zcritical = 
= 0.05
1.96
Region of
Non-Rejection
-1.96
1.96
Z observed = -2.14
Graphically, 
Zcritical = 
= 0.05
1.96
Region of
Rejection
-1.96
Region of
Rejection
1.96
Z observed = -2.14
Graphically, 
Zcritical = 
Region of
Rejection
-1.96
= 0.05
1.96
Region of
Non-Rejection
Region of
Rejection
1.96
Z observed = -2.14
Reporting the Results
 = 0.05
The observed mean of our treatment group of 25
students was 20% ( 6%) body fat. The z-test for one
sample indicates that the difference between the
observed mean of 20% and the National Norm of 23%
was statistically significant (Zobs = -2.14, p  0.05).
These data suggest that our measured percent body
fat was less than the national norm.
Reporting the Results
 = 0.01
The observed mean of our treatment group was
20% ( 6%) body fat. The z-test for one sample
indicates that the difference between the observed
mean of 20% and the National Norm of 23% was not
statistically significant (Zobs = -2.14, p > 0.01). Our
measured percent body fat was not significantly
different from the national norm.
Reporting the Results, you
set  = 0.01
The observed mean of our treatment group was
20% ( 6%) body fat. With  = 0.01, the z-test for
one sample indicates that the difference between
the observed mean of 20% and the National Norm
of 23% was not statistically significant (Zobs = -2.14,
p = 0.028). Our measured percent body fat was
not significantly different from the national norm.
Consider all possible reasons
for your outcome
Statistics humour
What does a statistician call it
when the heads of 10 rats are
cut off and 1 survives?
Statistics humour
What does a statistician call it
when the heads of 10 rats are
cut off and 1 survives?
Non-significant.
Do not reject H0 vs Accept H0
Accept infers that we are sure Ho is valid
Do not reject H0 vs Accept H0
Accept infers that we are sure Ho is valid
Do not reject reflects that this time we are
unable to say with a high enough degree of
confidence that the difference observed is
attributable to other than sampling error.
Examples
• Zobs = -3.45
•  = 0.05
• Decision (statistical conclusion) = ???
Examples
• Zobs = 1.45
•  = 0.01
• Decision (statistical conclusion) = ???
Examples
• Zobs = 1.96
•  = 0.05
• Decision (statistical conclusion) = ???
Examples
• Zobs = -1.96
•  = 0.01
• Decision (statistical conclusion) = ???
Examples
• Zobs = 1.96
•  = 0.01
• Decision (statistical conclusion) = ???
Examples
• Zobs = -1.95
•  = 0.05
• Decision (statistical conclusion) = ???
Z-test vs t-test
• SPSS does not provide the z-test
• Can only use z-test if you know population SD
• Typically, all population parameter values are
estimated from sample statistics
• Mean
• Standard deviation
• Standard error
• SPSS uses t-test
• Same concept, different assumptions
• t-test more robust against departures from normality
(doesn’t affect the accuracy of the p-estimate as much)
When population mean is not
known…changing distributions
• The Z-test uses one sample statistic to
estimate population parameters
• sample mean  population mean
• Population standard deviation is known
• The t-test uses two sample statistics to
estimate population parameters
• sample mean  population mean
• sample standard error population SD
t-test equation
• So the test statistic now becomes
X  0
t
sX
Estimated population SD
• To estimate pop SD from sample
SD, the sample SD is inflated a
little…
s   est 
You may have noticed this modification earlier

( x  x)
n 1
2
SEm from estimated SD population
• To estimate standard error from
sample SD, use the estimated SD
again, thus…
s
sX 
n
Recall factors affecting Sx
• Size of estimated SE obviously
depends on both SD of sample, and
sample size
s
sX 
n
When population mean is not
known…changing distributions
• The distribution used to evaluate calculated
ratio switches from the normal distribution to
the t-distribution
• Sampling variation in Z-distribution reflected
variability with respect to sample mean
• BUT sampling variation in t-distribution reflects
variability with respect to sample mean and
standard error of the mean
• So…as the sample gets smaller (and the
standard error of the mean increases) the
sampling distribution of t differs from that of Z
• The good old 1.96 for 95% is toast
Concept of
Degrees of Freedom (df)
• The number of independent pieces of information a
sample of observations can provide for purposes of
statistical inference
• E.g. 3 numbers in a sample: 2, 2, 5
• Sample mean = 3; deviations are –1, -1, 2
• Are these independent?
• No – when you know two, you’ll know the other because
( X  X )  0
• For any sample of size “n” you have “n-1” values
that are free to vary – the last value is fixed
Sampling distribution of t
Large n  t-dist pretty much like the z-dist
(because sample SD is a good estimate of pop SD,
& sample SE is a good estimate of pop SE)
Sampling distribution of t
• Because distribution gets flatter as n
gets smaller, this implies t for
significance gets bigger as n gets
smaller
http://duke.usask.ca/~rbaker/Tables.html
Work an example with SPSS
• Heart Rate (bpm) following aerobic activity
•
•
•
•
•
147
155
132
165
133
• National standard: 158
• Group Mean : 146.4 ( 14.21)
Atble351.sav
SPSS Output
e
E
e
e
N
e
H
m
u
o
n
a
l
r
e
p
d
t
r
w
p
H
2
0
6
6
Statistics and beer
Time Out