Ttests - Faculty Web Pages

Download Report

Transcript Ttests - Faculty Web Pages

Programming in R
Data Analysis Module: Bivariate
Testing
Data Analysis Module
 Basic Descriptive Statistics and Confidence Intervals
 Basic Visualizations
 Histograms
 Pie Charts
 Bar Charts
 Scatterplots
 Ttests/Bivariate testing
 One Sample
 Paired
 Independent Two Sample
 ANOVA
 Chi Square and Odds
 Regression Basics
2
Data Analysis Module:Bivariate Testing
The first part of these notes will
address ttesting basics.
The second part of these notes will
address z test (or proportion testing)
basics.
Data Analysis Module:Bivariate Testing
The term “Ttest” comes from the application of the tdistribution to evaluate a hypothesis. The t-distribution is
used when the sample size is too small (less than 30) to use
s/SQRT(n) as a substitute for the population std.
In practice, even hypothesis tests with sample sizes greater
than 30, which utilize the normal distribution, are commonly
referred to as “ttests”.
Note: a “t-statistic” and a “z-score” are conceptually similar
– both convert measurements into standardized scores
which follow a roughly normal distribution.
Data Analysis Module:Bivariate Testing
A side note of interest from Wikipedia:
The t-statistic was introduced in 1908 by William Sealy Gosset,
a chemist working for the Guiness Brewery in Dublin, Ireland.
Gosset had been hired due to Claude Guinness's innovative
policy of recruiting the best graduates from Oxford and
Cambridge to apply biochemistry and statistics to Guinness'
industrial processes. Gosset devised the t-test as a way to
cheaply monitor the quality of beer. He published the test in
Biometrika in 1908, but was forced to use a pen name by his
employer, who regarded the fact that they were using
statistics as a trade secret.
Data Analysis Module: Bivariate Testing
Ttests take three forms:
1.One Sample Ttest - compares the mean of the sample to
a given number.
• e.g. Is average monthly revenue per customer who
switches >$50 ?
Formal Hypothesis Statement examples:
H0:   $50
H1:  > $50
H0:  = $50
H1:   $50
Data Analysis Module: Bivariate Testing
Example:
After a massive outbreak of salmonella, the CDC
determined that the source was from a particular
manufacturer of ice cream. The CDC sampled 9
production runs if the manufacturer, with the following
results (all in MPN/g):
.593 .142 .329 .691 .231 .793 .519 .392 .418
Use this data to determine if the avg level of salmonella is
greater than .3 MPN/g, which is considered to be
dangerous.
Data Analysis Module: Bivariate Testing
First, Identify the Hypothesis Statements,
including the Type I and Type II
errors…and your assignment of alpha.
Then, do the computation by hand…
Data Analysis Module: Bivariate Testing
#here, the syntax is:
t.test(vector to be analyzed, vector to be
analyzed, * alternative hypothesis)
* paired = TRUE for a paired ttest
One sample t test is the default
Data Analysis Module: Bivariate Testing
2. Two Sample Ttest - compares the mean of the first
sample minus the mean of the second sample to a
given number.
•
e.g. Is there a difference in the production output
of two facilities?
Formal Hypothesis Statement examples:
H0: a - b =0
H1: a - b  0
Data Analysis Module: Bivariate Testing
When dealing with two sample or paired ttests, it is
important to check the following assumptions:
1. The samples are independent
2. The samples have approximately equal variance
3. The distribution of each sample is approximately normal
Note – if the assumptions are violated and/or if the sample
sizes are very small, we first try a transformation (e.g.,
take the log or the square root). If this does not work,
then we engage in non-parametric analysis: Wilcoxan
Rank Sum or Wilcoxan Signed Rank tests.
Data Analysis Module: Bivariate Testing
# here the syntax is:
t.test(vector to be tested~two level factor, data =
data, var.equal=FALSE*)
plot(t.test(vector to be tested~two level factor,
data = data)
*If the variances are similar, this would be set to
TRUE
Data Analysis Module: Bivariate Testing
3. Paired Sample Ttest - compares the mean of the
differences in the observations to a given number.
e.g. Is there a difference in the production output
of a facility after the implementation of new
procedures?
Formal Hypothesis Statement example:
H0: diff=0
H1: diff  0
Data Analysis Module: Bivariate Testing
#here, the syntax is:
t.test (vector to be analyzed, vector to be
analyzed, paired = TRUE for a paired ttest,
alternative = “greater”*)
*the alternative hypothesis could also be “less
than”. The default is not equal.
Data Analysis Module: Bivariate Testing
Z testing…or proportion based
testing…
Data Analysis Module: Bivariate Testing
The testing formula for a one sample proportion is a simple z
calculation:
Z = (sample estimate – Null value)/Null Standard Error
For a proportion, this would be:
Z=(p-po)/SQRT((po(1-po)/n)
Data Analysis Module: Bivariate Testing
Example of a one sample proportion test:
If 30% of cars on a street are found to be speeding, the city
will install “traffic calming” devices.
John used his radar gun to measure the speeds of 400 cars
on his street. He found that 32% were speeding. Will John
get “traffic calming” devices on his street?
Data Analysis Module: Bivariate Testing
Table object1<-table(factor)
Sum(object1)
Prop.test(object1[factor level],totaln, correct=FALSE, p=
null hypothesis)
Example:
loveatfirst.count <- table(PSU$atfirst)
prop.test(loveatfirst.count[3],227, correct=FALSE, p=0.45)
Note that the “3” indicates the third level of the factor –
which is “Yes”.
Data Analysis Module: Bivariate Testing
Answer the following:
1. Identify the Null and Alternative Hypotheses
2. Identify the Type I and Type II errors, including the
implications
3. What is an appropriate alpha value?
4. What is the associated p-value?
5. What is your conclusion?
Data Analysis Module: Bivariate Testing
2. Two Sample Test - compares the proportion of the first
sample minus the proportion of the second sample to a
given number. It is of common interest to test of two
population proportions are equal.
•
e.g. Is there a difference in the percentage of
students who pass a standardized test between
those who took a prep course and those who did
not?
Formal Hypothesis Statement examples:
H0: pa - pb =0
H0: pa - pb <0
H1: pa - pb  0
H1: pa - pb > 0
Data Analysis Module: Bivariate Testing
Before you undertake a two sample test, there are few
things to be determined:
1. The two samples must be independent
2. The number of individuals with each trait of interest and
the number without the trait of interest must be at least
10 in each sample.
Data Analysis Module: Bivariate Testing
#here, the code is pretty easy…just make the 2x2 table and then
apply the prop.test function:
FactorVar1.by.FactorVar2<-table(FactorVar1,FactorVar2)
prop.test(FactorVar1.by.FactorVar2, correct=FALSE)
Example:
PSU$Wt <- ifelse(PSU$WtFeel=="RightWt","Right",
ifelse(PSU$WtFeel=="OverWt"|PSU$WtFeel=="UnderWt",
"Wrong","" ,))
PSU <- PSU[-which(PSU$Wt==""),]
sex.by.wt <- table(PSU$Sex, PSU$Wt)
prop.test(sex.by.wt, correct=FALSE)
Data Analysis Module: Bivariate Testing
Answer the following:
1.
2.
3.
4.
Identify the Null and Alternative Hypotheses
Identify the Type I and Type II errors, including the implications
What is an appropriate alpha value?
Using the formula, determine the test statistic. What is the
associated p-value?
5. What is your conclusion?