Transcript Document
Statistical Techniques I
EXST7005
The t distribution
Still here
The t-test of Hypotheses
the t distribution is used the same as Z
distribution, except it is used where sigma (s) ,i
unknown (or where`Y is used instead of m to
calculate deviations)
ti = (Yi - `Y)/S
t = (`Y - m0)/S`Y
= (`Y - m0)/(S/n)
where;
–S
= the sample standard deviation, (calculated
using `Y)
– S`Y = the sample standard error
The t-test of Hypotheses
(continued)
E(t) = 0
The variance of the t distribution is greater than
that of the Z distribution (except where n ),
since S2 estimates s2, but is never as good
(reliability is less)
Z distribution
mean
0
variance
1
t distribution
0
1
CHARACTERISTICS OF THE t
DISTRIBUTION
It is symmetrically distributed about a mean of 0
t ranges to ±
(i.e. - t +)
There is a different t distribution for each degree
of freedom (df), since the distribution changes a
the degrees of freedom change.
It has a broader spread for smaller df, and narrow
(approaching the Z distribution) as df increase
CHARACTERISTICS OF THE t
DISTRIBUTION (continued)
4) As the df (g, gamma) approaches infinity (),
the t distribution approaches the Z distribution.
For example;
Z (no df associated); middle 95% is between ±
1.96
t with 1 df; middle 95% is between ± 12.706
t with 10 df; middle 95% is between ± 2.228
t with 30 df; middle 95% is between ± 2.042
t with df; middle 95% is between ± 1.96
Tables in general
The tables we will use will ALL be giving the are
in the tail (a). However, if you examine a
number of tables you will find that this is not
always true. Even when it is true, some tables
will give the value of a as if it were in two tails,
and some as if it were in one tail.
For example, we want to conduct a two- tailed Z
test at the a=0.05 level. We happen to know th
Z=1.96.
Tables in general (continued)
If we look at this value in the Z tables we expec
to see a value of 0.025, or a/2. But many table
would show the probability for 1.96 as 0.975, an
some as 0.05.
Why the difference? I just depends on how the
tables are presented.
Some of the alternatives are shown below.
Tables in general (continued)
Table gives cumulative distribution starting at infinity. You want to find the probability
corresponding to 1-a/2.
Table value,
0.975
-4 -3 -2 -1 0
1
1-a=0.975
2
3
4
a=0.025
Tables in general (continued)
Some tables may also start at zero (0.0) and
give the cumulative area from this point. This
would be less common. The value that leaves
.025 in the upper tail would be 0.475.
Among the tables like ours, that give the area in
the tail, some are called two tailed tables and
some are one tailed tables.
Tables in general (continued)
One tailed table.
Table value,
0.0.025
-4 -3 -2 -1 0
1
1-a=0.975
2
3
4
a=0.025
Tables in general (continued)
Two tailed table.
Table value,
0.050
-4 -3 -2 -1 0
a/2
1-a
1
2
3
4
a/2
Tables in general (continued)
Why the extra confusion at this point?
The Z tables we used gave the area in one tail. F
a two tailed test you doubled it.
All our tables will give the area in the tail.
For the F tables and Chi square tables covered
later, this area will be a single tail as with the Z
tables. This is because these distributions are not
symmetric.
Tables in general (continued)
Traditionally, many t-tables have given the area in
TWO TAILS instead of on one tail.
– Most older textbooks have this type of tables.
– SAS will also usually give two-tailed values for ttests
Our tables will have both two-tailed probabilities
(top row) and one-tailed probabilities (bottom row),
so you my use either.
Tables in general (continued)
The same patterns are true for many of the
computer programs that you may use to get
probabilities. For example in EXCEL
– If you use the NORMDIST(1.96) function it return
a value of 0.975, cumulative from -
– If you enter NORMSINV(0.025) it returns -1.96.
– If you enter TINV(0.05,9999) it returns 1.96, so i
is two-tailed.
– The TDIST(1.96,9999,1) function allows you to
specify 1 or 2 tails in the function call.
The T TABLES
My t-tables are created in EXCEL, but patterned
after Steel & Torrie, 1980, pg. 577
df (g) is given on the left side.
The probability of randomly selecting a larger valu
of t is given at the top (and bottom) of the page.
P(t t0) given at the bottom
P(|t| t0) given at the top
The T TABLES (continued)
Each row represents a different t distribution
(with different df).
The whole Z table was a single distribution
The t table has many different distributions.
Only the POSITIVE side of the table is given, bu
as with the Z distribution, the t distribution is
symmetric
The T TABLES (continued)
The Z table had many probabilities,
corresponding to Z values of 0.00, 0.01, 0.02,
0.03, etc. About 400 in the tables we used.
They all fit on one page.
If we are going to give many different tdistributions on one page, we lose something.
We will only give a few selected probabilities, th
ones we are most likely to use.
e.g., 0.10, 0.05, 0.025, 0.01, 0.005.
Partial t-table - 1 or 2 tails?
df
1
2
3
4
5
6
7
8
9
10
0.100
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.282
0.050
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.645
0.025
12.71
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
1.960
0.010
31.82
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.326
0.005
63.656
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
2.576
Partial t-table - 1 or 2 tails?
df
1
2
3
4
5
6
7
8
9
10
0.200
3.078
1.886
1.638
1.533
1.476
1.440
1.415
1.397
1.383
1.372
1.282
0.100
6.314
2.920
2.353
2.132
2.015
1.943
1.895
1.860
1.833
1.812
1.645
0.050
12.71
4.303
3.182
2.776
2.571
2.447
2.365
2.306
2.262
2.228
1.960
0.020
31.82
6.965
4.541
3.747
3.365
3.143
2.998
2.896
2.821
2.764
2.326
0.01
63.656
9.925
5.841
4.604
4.032
3.707
3.499
3.355
3.250
3.169
2.576
Our t-tables
Selected d.f. on the left side.
The table stabilizes fairly quickly. Many tables do
go over about d.f. = 30. The Z tables give a good
approximation for larger d.f.
Our tables will give d.f. as follows down the leftmo
column of the table,
– 1, 2, 3, 4, 5 ,6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19 ,20, 21, 22, 23, 24, 25, 26, 27, 28, 29
30, 32, 34, 36, 38, 40, 45, 50, 75, 100,
Our t-tables
Selected probabilities.
In the topmost row of the table selected
probabilities will be given as a for a TWO TAILED
TEST.
In the bottommost row of the table selected
probabilities will be given as a for a ONE TAILED
TEST.
– Probabilities in out tables are,
–
–
0.50 0.40 0.30 0.20 0.10 0.050 0.02 0.010 0.002 0.0010
0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.001 0.0005
The T TABLES (continued)
HELPFUL HINT: Don't try to memorize "two tai
top, one tail bottom", just recall the
characteristics of the distribution when df =
and t = 1.96. This leaves 5% in both tails and
2.5% in one tail.
So look at any t-table and look to see what
probability corresponds to df= and t = 1.96. If
the value is 0.025, it is the area in one tail. If it
0.050 it is a two tailed table. If the area is 0.975
...
The T TABLES (continued)
This trick of recalling 1.96 also works for differe
Z tables
The tables we use give the area in the tail of the
distribution, Z=1.96 corresponds to a probability of
0.025
Some Z tables give the cumulative area under the
curve starting at -, the probability at Z=1.96 would
be 0.975
Other Z tables give the cumulative area starting a
0, the probability at Z=1.96 would be 0.475
Working with our t-tables
Example 1. Let d.f. = g = 10
H0: m = m0 versus H1: m m0
a = 0.05
P(|t|t0)=0.05; 2P(tt0)=0.05; P(tt0)=0.025
(Probabilities at the top of the table)
t0=2.228
-4
-3
-2
-1
0
1
2
3
4
Working with our t-tables
(continued)
Example 2. Let d.f. = g = 10
H0: m = m0 versus H1: m m0
a = 0.05
P(tt0)=0.05 (probabilities at the bottom of the
table)
t0=1.812
-4
-3
-2
-1
0
1
2
3
4
Working with our t-tables
(continued)
Look up the following values.
Find
Find
Find
Find
Find
Find
Find
Find
Find
Find
the t value for H1: m m0, a=0.050, d.f.=
the t value for H1: m m0, a=0.025, d.f.=
the t value for H1: m m0, a=0.010, d.f.=12
the t value for H1: m m0, a=0.025, d.f.=22
the t value for H1: m m0, a=0.200, d.f.=35
the t value for H1: m m0, a=0.002, d.f.=5
the t value for H1: m m0, a=0.100, d.f.=8
the t value for H1: m m0, a=0.010, d.f.=75
the P value for t=-1.740, H1: m m0, d.f.=17
the P value for t=4.587, H1: m m0, d.f.=10
1.960
1.960
3.055
2.074
1.306+
5.894
-1.397
-2.377
0.050
0.001
t-test of Hypothesis
We want to determine if a new drug has an effe
on blood pressure of rhesus monkeys before an
after treatment. We are looking for a net chang
in pressure, either up or down (two tailed test).
We obtain a random sample of 10 individuals.
Note: n = 10, but d.f. = g = 9
1) H0: m = m0
2) H1: m m0
t-test of Hypothesis (continued)
3) Assume:
Independence (randomly selected sample)
CHANGE in blood pressure is normally distributed
4) a = 0.01
t-test of Hypothesis (continued)
5) Obtain values from the sample of 10
individuals (n = 10). The values for change in
blood pressure were;
0, 4, -3, 2, 0, 1, -4, 5, -1, 4
SYi = 8; SYi2 = 88; `Y = SYi /n = 8/10 = 0.8
S2 = (SYi2-(SYi)2)/(n-1) = (88-6.4)/9 = 9.067
S = 9.067 = 3.011
S`Y = S/n = 3.011/10 = 0.952
t = (`Y-m0)/S`Y = (0.8-0)/0.952 = 0.840
–
with 9 d.f.
t-test of Hypothesis (continued)
6) Get the critical limit and compare to the test
statistic.
Critical value of t for
–a
2 tailed test
– with 9 d.f. (n = 10, but d.f. = g = 9)
– a = 0.01
– P(|t|t0)=0.01; 2P(tt0)=0.01; P(tt0)=0.005
– t0 = 3.250
test statistics t = (`Y-m0)/S`Y = 0.840 (9 d.f.)
t-test of Hypothesis (continued)
6) Get the critical limit and compare to the test
statistic.
t0 = 3.250
The area leaving 0.005 in each tail is almost too
small to show on our usual graphs
t-test of Hypothesis (continued)
6) Get the critical limit and compare to the test
statistic.
t0 = 3.250
test statistics t = (`Y-m0)/S`Y = 0.840 (9 d.f.)
The test statistic is clearly in the region of
"acceptance", so we fail to reject the H0.
7) Conclude that the new drug does not effect
the blood pressure of rhesus monkeys.
Is there an error? Maybe a Type II error.
t-test of H0 : Example 2
A company manufacturing environmental
monitoring equipment claims that their
thermograph (a machine that records
temperature) requires (on the average) no more
than 0.8 amps to operate under normal
conditions. We wish to test this claim before
buying their equipment. We want to reject the
equipment if the electricity demand exceeds 0.8
amps.
t-test of H0 : Example 2 (continued
1) H0: m = m0
2) H1: m m0, where m0 = 0.8
3) Assume (1) independence and (2) a normal
distribution of amp values, or at least of the
mean that we will test.
4) a = 0.05
t-test of H0 : Example 2 (continued
5) Draw a sample. We have 16 machines for
testing. The values for amp readings were not
recorded. Summary statistics are given below;
`Y = 0.96
S = 0.32
S`Y = S/n = 0.32/16 = 0.08
t = (`Y-m0)/S`Y = (0.96-0.8)/0.08 = 2
with 15 d.f.
t-test of H0 : Example 2 (continued
6) Get the critical limit and compare to the test
statistic.
Critical value of t for
a 1 tailed test (see H1:)
with 15 d.f. (n = 16, but d.f. = g = 15)
a = 0.05
P(tt0)=0.05; t0 = 1.753
t-test of H0 : Example 2 (continued
6 continued) Get the critical limit and compare t
the test statistic.
t0 = 1.753
test statistics t = (`Y-m0)/S`Y = 2 (15 d.f.)
-4
-3
-2
-1
0
1
2
3
4
t-test of H0 : Example 2 (continued
6 continued) Get the critical limit and compare t
the test statistic.
The test statistic falls in the area of rejection.
7) Conclusion: We would conclude that the
machines require more electricity than the
claimed 0.8 amperes.
Of course, there is a possibility of a Type I error.
t test with SAS
Recall our test of blood pressure change of
Rhesus monkeys. We can take the values of
blood pressure change, and enter them in SAS
PROC UNIVARIATE.
Values: 0, 4, -3, 2, 0, 1, -4, 5, -1, 4
SAS PROGRAM DATA step
OPTIONS NOCENTER NODATE NONUMBER LS=78 PS=61;
TITLE1 't-tests with SAS PROC UNIVARIATE';
DATA monkeys; INFILE CARDS MISSOVER;
TITLE2 'Analysis of Blood Pressure change in Rhesus Monkeys';
INPUT BPChange;
CARDS; RUN;
t test with SAS (continued)
SAS PROGRAM procedures
PROC PRINT DATA=monkeys; RUN;
PROC UNIVARIATE DATA=monkeys PLOT; VAR BPChange;
TITLE2 'PROC Univariate on Blood Pressure Change'; RUN;
t test with SAS (continued)
SAS PROGRAM output
N
Mean
Std Dev
Skewness
USS
CV
T:Mean=0
Num ^= 0
M(Sign)
Sgn Rank
Moments
10 Sum Wgts
0.8 Sum
3.011091 Variance
-0.15751 Kurtosis
88 CSS
376.3863 Std Mean
0.840168 Pr>|T|
8 Num > 0
1 Pr>=|M|
6.5 Pr>=|S|
10
8
9.066667
-0.95777
81.6
0.95219
0.4226
5
0.7266
0.3984
Notes on SAS PROC Univariate
Note that all values we calculated match the
values given is SAS.
Note that the standard error is called the "Std
Mean". This is unusual, it is called the "Std
Error" in most other SAS procedures.
The test statistic value matches our calculated
value (0.840).
SAS also provides a "Pr>|T| 0.4226"
Notes on SAS PROC Univariate
(continued)
The value provided by SAS is a P value (Pr>|T
= 0.4226)
It indicates that the calculated value of t=0.840
would leave 0.4226 (or 42.46 percent) of the
distribution in the tails. Note that it is for the
absolute value of t, so it leaves 0.2113 (or 21.13
%) in each tail.
Notes on SAS PROC Univariate
(continued)
The P-value indicates our calculated value wou
leave 21.13% in each tail, our critical region has
only 0.5% in each tail. Clearly we are in the
region of "acceptance".
Lower
Critical
region
Calculated
Upper
value
Critical
region
Example 2 with SAS
Testing the thermographs using SAS PROC
UNIVARIATE. We didn't have data, so we canno
test with SAS.
A NOTE. SAS automatically tests the mean of the
values in PROC UNIVARIATE against 0. In the
thermograph example our hypothesized value was
0.8, not 0.0.
But from what we know of transformations, we can
subtract 0.8 from each value without changing the
characteristics of the distribution.
Example 2 with SAS (continued)
Another example .
Freund & Wilson (1993) Example 4.2
We receive a shipment of apples that are
supposed to be "premium apples", with a
diameter of at least 2.5 inches. We will take a
sample of 12 apples, and test the hypothesis th
the mean size is equal 2.5 inches, and thus
qualify as premium apples. If LESS THAN 2.5
inches, we reject.
Example 2 with SAS (continued)
1) H0: m = m0
2) H1: m m0
3) Assume:
Independence (randomly selected sample)
Apple size is normally distributed.
4) a = 0.05
5) Draw a sample. We will take 12 apples, and
let SAS do the calculations.
Example 2 with SAS (continued)
The sample values for the 12 apples are;
2.9, 2.1, 2.4, 2.8, 3.1, 2.8, 2.7, 3.0, 2.4, 3.2, 2.3, 3
As mentioned, SAS automatically tests against
zero, and we want to test against 2.5. So, we
subtract 2.5 from each value and test against zero
The test should give the same results.
Example 2 with SAS (continued)
SAS Program data step
options ps=61 ls=78 nocenter nodate nonumber;
data apples; infile cards missover;
TITLE1 'Test the diameter of apples against 2.5 inches';
LABEL diam = 'Diameter of the apple';
input diam; diff = diam - 2.5;
cards; run;
SAS Program procedures
proc print data=apples; var diam diff; run;
proc univariate data=apples plot; var diff; run;
Example 2 with SAS (continued)
SAS PROC UNIVARIATE Output
N
Mean
Std Dev
Skewness
USS
CV
T:Mean=0
Num ^= 0
M(Sign)
Sgn Rank
Moments
12 Sum Wgts
0.258333 Sum
0.394181 Variance
-0.11842 Kurtosis
2.51 CSS
152.5863 Std Mean
2.270258 Pr>|T|
12 Num > 0
2 Pr>=|M|
25 Pr>=|S|
12
3.1
0.155379
-0.8353
1.709167
0.11379
0.0443
8
0.3877
0.0493
Example 2 with SAS (continued)
6) Now we want compare the observed value
to the calculated value. This case is a little
tricky.
We have a one tailed test (H1: m m0)
And we chose a = 0.05
The usual critical limit would be a t value wit
11 d.f. This value is -1.796.
SAS gives us a t value of 2.27.
Example 2 with SAS (continued)
Reject? No, it is a positive 2.27, not negativ
So we would not reject the hypothesis.
7) Conclude the size of the apples is not
significantly below the 2.5 inch diameter we
required.
Example 2 with SAS (continued)
I used the t values here, not the SAS provided P
values. Why? Because we were doing a one
tailed test and the SAS P values are 2 tailed.
However, we can use them if we understand
them.
Our critical limit
was here,
t=-1.796.
-4
-3
-2
-1
0
1
2
3
4
Example 2 with SAS (continued)
The two tailed P value provided by SAS showed
that the area in two tails was 0.0443, so the are
in each tail was 0.02215.
Our calculated
value was here,
t=2.27.
Our critical limit
was here,
t=-1.796.
-4
-3
-2
-1
0
1
2
3
4
Example 2 with SAS (continued)
If the values were not in different tails, we can
see that the size of the calculated value tails
(0.02215 on each side) is well within the region
of rejection. So if they had been on the same
side, they would have been significantly differen
Because they were in different tails they not
cause rejection of the null hypothesis.
The Null hypothesis
We could not reject the apples as too small. Ha
we noticed that the mean was greater than 2.5,
we would not even have had to conduct the tes
and do the calculations.
But other hypotheses could have been tested.
Maybe "Prime apples" are supposed to have a
mean size greater than 2.5. We could reject the
apples if we could not prove that the size was
greater than 2.5.
The Null hypothesis (continued)
Previously we tested,
1) H0: m = m0
2) H1: m m0
But we could have tested
1) H0: m = m0
2) H1: m m0
In this case if we set a to a one sided 0.05 we
would have rejected the H0 since the tail was
0.0443/2 = 0.02215. We would still have taken
the apple shipment.
The Null hypothesis (continued)
But which is the right test?
It depends on what is important to you. Do you
loose your job for sending back apples that wer
not really too small, or do you loose your job for
accepting apples that did not meet the criteria.
The correct alternative depends on what you
have to prove (with an a*100% chance of error)
and what is important to you.
One more example in SAS
Test for differences in seed production at two
levels on a plant (top and bottom). We have ten
vigorous plants bearing Lucerne flowers, each o
which has flowers at the top & bottom. We wan
to test for differences in the number of seeds fo
the average of two pods in each position.
For each plant take two pods from the top and get
and average, and two from the bottom for an
average.
One more example in SAS
(continued)
Calculate the difference between the mean for the
top and mean for the bottom and test to see if the
difference is zero (i.e. no difference).
1) H0: m = m0
2) H1: m m0
3) Assume:
Independence (randomly selected sample)
Number per pod is normally distributed.
4) a = 0.05
One more example in SAS
(continued)
5) Take a sample. We have 10 plants, so n = 1
and d.f. = 9.
TOP
4.0
5.2
5.7
4.2
4.8
3.9
4.1
3.0
4.6
6.8
BOTTOM
4.4
3.7
4.7
2.8
4.2
4.3
3.5
3.7
3.1
1.9
One more example in SAS
(continued)
SAS PROGRAM DATA step
options ps=61 ls=78 nocenter nodate nonumber;
data flowers; infile cards missover;
TITLE1 'Seed production for top and bottom flowers';
LABEL top = 'Flowers from the top of the plant';
LABEL bottom = 'Flowers from the bottom of the plant';
LABEL diff = 'Difference between top and bottom';
input top bottom;
diff = top - bottom;
cards; run;
SAS PROGRAM procedures
proc print data=flowers; var top bottom diff; run;
proc univariate data=flowers plot; var diff; run;
One more example in SAS
(continued)
SAS Output
OBS
1
2
3
4
5
6
7
8
9
10
TOP
4.0
5.2
5.7
4.2
4.8
3.9
4.1
3.0
4.6
6.8
BOTTOM
4.4
3.7
4.7
2.8
4.2
4.3
3.5
3.7
3.1
1.9
DIFF
-0.4
1.5
1.0
1.4
0.6
-0.4
0.6
-0.7
1.5
4.9
One more example in SAS
(continued)
SAS Output
N
Mean
Std Dev
Skewness
USS
CV
T:Mean=0
Num ^= 0
M(Sign)
Sgn Rank
Moments
10 Sum Wgts
1 Sum
1.598611 Variance
1.669385 Kurtosis
33 CSS
159.8611 Std Mean
1.978141 Pr>|T|
10 Num > 0
2 Pr>=|M|
19.5 Pr>=|S|
10
10
2.555556
3.934593
23
0.505525
0.0793
7
0.3438
0.0469
One more example in SAS
(continued)
6) Compare the test statistic to the critical limits
With 9 d.f. we know that our critical limit for a tw
tailed test at an a = 0.05 would be t=2.262.
One more example in SAS
(continued)
SAS reports;
Mean = 1
t = 1.978
P(>|t|) = 0.0793. This area leaves almost 4% in
each tail (0.0793/2=0.03965) and our critical region
includes only 2.5% in each tail. Therefore, the
observed value falls in the area of "acceptance".
We fail to reject the null hypothesis.
One more example in SAS
(continued)
7) Conclude the number of seeds does not diffe
between the top and bottom of the plant.
Of course, we may have made a Type II error.
SAS program. log and output for all examples is
available on line as both raw text (ASCII code)
and as HTML.
Summary
The t distribution is similar the Z distribution, bu
it is used where the value of s2 is not known,
and is estimated from the sample. This is a
much more common case.
The t distribution is very similar to the Z
distribution in that it is a bell shaped curve,
centered on zero and ranges from ±.
Summary (continued)
It can be also be used with observations or
samples, the formulas are
ti = (Yi - `Y)/S
t = (`Y - m0)/S`Y = (`Y - m0)/(S/n)
Not all tables or computer algorithms give the
area in the tail. Some give the cumulative
frequency starting at -.
Summary (continued)
In the t-tables
Each row represents a different t distribution (with
different df).
The t table has many different distributions. Only
the POSITIVE side of the table is given, since the t
distribution is symmetric.
Only selected probabilities were provided in the ttable,
The t-test of hypothesis was very similar to the
test we had done before.
Summary (continued)
SAS PROC UNIVARIATE will do t-test, but it
only tests the hypothesized value of zero.