Transcript Document

Stats 95
t-Tests
• Single Sample
• Paired Samples
• Independent Samples
t Distributions
• t dist. are used when we
know the mean of the
population but not the SD of
the population from which
our sample is drawn
• t dist. are useful when we
have small samples.
• t dist is flatter and has fatter
tails
• As sample size approaches
30, t looks like z (normal)
dist.
• Same Three Assumptions
• Dependent Variable is scale
• Random selection
• Normal Distribution
Fat Tails Lose Weight With Larger
Sample Size
The Robust Nature of the t
Statistics
• Unfortunately, we very seldom know the if the population is
normal because usually all the information we have about a
population is in our study, a sample of 10-20.
• Fortunately,
• 1) distributions in social sciences often approximate a normal
curve, and
• 2) according to Central Limit Theorem the sample mean you
have gathered is part of a normal distribution of sample means,
and
• 3) in practice t tests statisticians have found the test is accurate
even with populations far from normal
The Robust Nature of the t
Statistics
• The only situation in which using a t test is
likely to give a seriously distorted result is
when you are using a one-tailed test and the
population is highly skewed.
z Statistic Versus t Statistic
z Statistic
• When you know the Mean
and Standard deviation of a
population.
t Statistic
• When you do not know the
Mean and Standard
Deviation of the population
– E.g., a farmer picks 200,000
apples, the mean weight is
112 grams, the SD is
12grams.
– E.g., a farmer picks 30 out of
his 200,000 apples, and finds
the sample has a Mean of 112
grams.
• Calculate the Standard Error
of the sample mean
• Calculate the Estimate of
the Standard Error of the
sample mean
Scenarios When you would use a
Single Sample t test
• A newspaper article reported that the typical American family spent an
average of $81 for Halloween candy and costumes last year. A sample of N
= 16 families this year reported spending a mean of M = $85, with s = $20.
What statistical test would we use to determine whether these data indicate
a significant change in holiday spending?
• Many companies that manufacture lightbulbs advertise their 60-watt bulbs
as having an average life of 1000 hours. A cynical consumer bought 30
bulbs and burned them until they failed. He found that they burned for an
average of M = 1233, with a standard deviation of s = 232.06. What
statistical test would this consumer use to determine whether the average
burn time of lightbulbs differs significantly from that advertised?
Difference Between Calculating
z Statistic and t Statistic
z Statistic
 
 X   
N
m 
z
t Statistic

N
(M  M )
M
2
( X  M ) 2
s
N 1
sm 
s
N
(M   m )
t
sm
Standard Deviation
of a Sample:
Estimates the
Population
Standard Deviation
Standard Error of a
Sample: estimates
the Sample Error of
the Population
t Statistic
for SingleSample t
Test
Estimating Population from a
Sample
• Main difference between t Tests and z score:
– use the standard deviation of the sample to estimate the
standard deviation of the population.
• How? Subtract 1 from sample size! (called degrees of
freedom)
SD 
 X   
N
2
( X  M )
s
N 1
2
Standard Deviation
of a Sample:
Estimates the
Population
Standard Deviation
• Use degrees of freedom (df) in the t distribution chart
t Distribution
Table
Example of Single Sample t Test
• The mean emission of all engines of a new design needs to be below
20ppm if the design is to meet new emission requirements. Ten engines are
manufactured for testing purposes, and the emission level of each is
determined. Data:
• 15.6, 16.2, 22.5, 20.5, 16.4, 19.4, 16.6, 17.9, 12.7, 13.9
• Does the data supply sufficient evidence to conclude that type of engine
meets the new standard, assuming we are willing to risk a Type I error
(false alarm, reject the Null when it is true)) with a probability = 0.01?
• Step 1: Assumptions: dependent variable is scale, Randomization, Normal
Distribution
• Step 2: State H0 and H1:
– H0 Emissions are equal to (or greater than) 20ppm;
– H1
Emissions are lesser than 20ppm (One-Tailed Test)
Example of Single Sample t Test
• The mean emission of all engines of a new design needs to be below
20ppm if the design is to meet new emission requirements. Ten engines are
manufactured for testing purposes, and the emission level of each is
determined. Data:
• 15.6, 16.2, 22.5, 20.5, 16.4, 19.4, 16.6, 17.9, 12.7, 13.9
• Step 3: Determine Characteristics of Sample
Mean =
( X  M ) 2
s
N 1
Standard Deviation of Sample =
s
Standard Error of Sample =
sm 
N
• Step 4: Determine Cutoff
– df = N-1 = 10-1 =9
– t statistic cut-off = -2.822
Example of Single Sample t Test
• The mean emission of all engines of a new design needs to be below
20ppm if the design is to meet new emission requirements. Ten engines are
manufactured for testing purposes, and the emission level of each is
determined. Data:
• 15.6, 16.2, 22.5, 20.5, 16.4, 19.4, 16.6, 17.9, 12.7, 13.9
• Step 3: Determine Characteristics of Sample
Mean
M = 17.17
( X  M )
s

Standard Deviation of Sample
= 2.98
N s1
s
Standard Error of Sample
= 0.942
s msm
N
• Step 4: Determine Cutoff
– df = N-1 = 10-1 =9
Step 5: Calculate t Statistic
– t statistic cut-off = -2.822
2
(M   m )
t
sm
Example of Single Sample t Test
• The mean emission of all engines of a new design needs to be below
20ppm if the design is to meet new emission requirements. Ten engines are
manufactured for testing purposes, and the emission level of each is
determined. Data:
• 15.6, 16.2, 22.5, 20.5, 16.4, 19.4, 16.6, 17.9, 12.7, 13.9
• Mean M = 17.17
Standard Deviation of Sample
Standard Error of Sample
sm = 0.942
Step 5: Calculate t Statistic
s
= 2.98
Step 6: Decide (Draw It)
t statistic cut-off = -2.822
(M   m ) (17.17  20)
t

 3.00
sm
0.942
t statistic = -3.00
Decide to reject the Null
Hypothesis
Paired Sample t Test
• The paired samples test is a kind of research called repeated
measures test (aka, within-subjects design), commonly used in
before-after-designs.
• Comparing a mean of difference scores to a distribution of
means of difference scores
– Population of measures at Time 1 and Time 2
– Population of difference between measures at Time 1 and Time 2
– Population of mean difference between measures at Time 1 and Time 2
–
(Whew!)
Paired Sample t Test
Single-Sample
• Single observation from
each participant
• The observation is
independent from that of
the other participants
• Comparing a mean score to
a distribution of mean
scores .
Paired-Sample
• Two observations from each
participant
• The second observation is
dependent upon the first
since they come from the
same person.
• Comparing a mean of
difference scores to a
distribution of means of
difference scores
•
(I don’t make this stuff up)
Paired Sample t Test
• A distribution of scores.
• A distribution of differences
between scores.
• Central Limit Theorem Revisited. If you plot the mean of
randomly sampled observations, the plot will approach a
normal distribution. This is true for scores and for
differences between scores.
Difference Between Calculating
Single-Sample t and Paired-Sample t Statistic
Single Sample t Statistic
( X  M ) 2
SS
s

N 1
N 1
sm 
s
Paired Sample t Statistic
Standard
Deviation
of a
Sample
s
Standard Error of a
Sample
N
(M   m )
t
sm
( D x  y  M x  y ) 2
N 1
Standard Deviation of Sample
Differences
sm 
t Statistic for
SingleSample t
Test
t

SS
N 1
s
N
(M x y   x y )
sm
Standard Error of
Sample
Differences
T Statistic for
Paired-Sample t
Test
Paired Sample t Test Example
• We need to know if there is a difference in the salary for the same job in
Boise, ID, and LA, CA. The salary of 6 employees in the 25th percentile in
the two cities is given .
Profession
Boise
Los Angeles
Executive Chef
53,047
62,490
Genetics Counselor
49,958
58,850
Grants Writer
41,974
49,445
Librarian
44,366
52,263
School teacher
40,470
47,674
Social Worker
36,963
43,542
• Six Steps of Hypothesis testing for Paired Sample Test
Paired Sample t Test Example
• We need to know if there is a difference in the salary for the
same job in Boise, ID, and LA, CA.
• Step 1: Define Pops. Distribution and Comparison
Distribution and Assumptions
– Pop. 1. Jobs in Boise
– Pop. 2.. Jobs in LA
– Comparison distribution will be a distribution of mean differences, it
will be a paired-samples test because every job sampled contributes
two scores, one in each condition.
– Assumptions: the dependent variable is scale, we do not know if the
distribution is normal, we must proceed with caution; the jobs are not
randomly selected, so we must proceed with caution
Paired Sample t Test Example
•
•
•
We need to know if there is a difference in the salary for the same job in Boise, ID,
and LA, CA.
Step 3: Determine the Characteristics of Comparison Distribution (mean,
standard deviation, standard error)
M = 7914.333
Sum of Squares (SS) = 5,777,187.333
s
SS
5,777,186.333

 1074.93
N 1
5
Profession
Boise
Los Angeles
sm 
X-Y
s
N

1074.93
D (X-Y)-M
6
 438.83
D^2
M = 7914.33
Executive Chef
53,047
62,490
-9,443
-1,528.67 2,336,821.78
Genetic Counselor
Grants Writer
Librarian
School teacher
Social Worker
49,958
41,974
44,366
40,470
36,963
58,850
49,445
52,263
47,674
43,542
-8,892
-7,471
-7,897
-7,204
-6,579
-977.67 955,832.11
443.33 196,544.44
17.33
300.44
710.33 504,573.44
1,335.33 1,783,115.11
Paired Sample t Test Example
• We need to know if there is a difference in the salary for the
same job in Boise, ID, and LA, CA.
• Step 4: Determine Critical Cutoff
• df = N-1 = 6-1= 5
• t statistic for 5 df , p < .05, two-tailed, are -2.571 and 2.571
• Step 5: Calculate t Statistic
t
(M x y   x y )
sm
• Step 6 Decide
(7914.333  0)

 18.04
438.333
Independent t Test
• Compares the difference between two means of
two independent groups.
• The comparison distribution is a difference
between means to a distribution of differences
between means.
– Population of measures for Group 1 and Group 2
– Sample means from Group 1 and Group 2
– Population of differences between sample means of
Group 1 and Group 2
Independent t Test
Paired-Sample
• Two observations from each
participant
• The second observation is
dependent upon the first
since they come from the
same person.
• Comparing a mean
difference to a distribution
of mean difference scores
Independent t Test
• Single observation from each
participant from two
independent groups
• The observation from the
second group is independent
from the first since they come
from different subjects.
• Comparing a the difference
between two means to a
distribution of differences
between mean scores .
Independent t Test: Steps
Step 1
Step 2
Step 3
Step 4
Step 5
( X  M ) 2
sx 
N 1
( X  M ) 2
sy 
N 1
df total  df x  df y
s
s
2
pooled
2
Mx

s
 df x  2  df y  2
s y
s x  
 
 df total 
 df total 
2
pooled
Nx
2
s My

s 2pooled
Ny
2
2
2
s Difference
 s Mx
 s My
Step 6
2
s Difference  s Difference
Step 6
Step 7
s Difference  s
2
Difference
(M X  M Y )
t
s Difference
t Statistic for an independentSamples t Test
( M X  M Y ) Lower  t ( s Difference)  ( M X  M Y ) Sample
( M X  M Y )Upper  t ( s Difference)  ( M X  M Y ) Sample
d
(M X  M Y )
s pooled
Independent t Test
• Similar to previous steps
except it takes more time
to calculate the estimate of
the standard error, called
the pooled estimate of the
standard error.
• Must calculate Pooled
2
Variance, a weighted
s
pooled
2
s My

average of the estimates of
Ny
the variance from both
samples.
2
s Mx

s 2pooled
Nx
sm 
s
N
Standard Error of a Sample:
estimates the Sample Error of
the Population
( X  M )
s
N 1
df  N  1
(M   m )
t
sm
2
Standard Deviation of a Sample:
Estimates the Population Standard
Deviation
x
(M  M y )
t Statistic
for SingleSample t
Test
t
Statistic
for
Difference
( M x  y   x  y ) T StatisticIndepen
for
dent tt
t  for Single
Paired-Sample
Degrees of Freedom
Test
s Sample Test
Sample t Test, and Paired
2

(
X

M
)
s2 
N 1
t
s
m
Variance for a
sample
df  df ( x)  df ( y )
s Pooled
2
2
s Mx

s
Degrees of Freedom for
Independent Samples t Test
 df x  2  df y 
s
s x  
 
 df total 
 df total 
s
2
pooled
s
Nx
2
Difference
s
2
My
2
Mx
s Difference  s

s
s 2pooled
Pooled Variance. Like adding
2 together the weighted average of
y the variance from Variable X and
Variable Y.
Variance for a Distribution of
means for Indep.-Samples t Test
Ny
2 Variance for a Distribution of
My Differences between Means
SD of the distribution of
2
Differences Between Means
Difference
X1
X2
2
3
2
3
4
5
4
5
6
7
6
7