Hypothesis Testing - NYU Stern School of Business

Download Report

Transcript Hypothesis Testing - NYU Stern School of Business

Statistics & Data Analysis
Course Number
Course Section
Meeting Time
B01.1305
31
Wednesday 6-8:50 pm
Hypothesis Testing
Class Outline
 Review of midterm exam
 Hypothesis Testing
One-sample tests
Two-sample tests
 P-values
 Relationship with Confidence Intervals
Professor S. D. Balkin -- July 1, 2002
-2-
Review of Last Class
 Statistical Inference
 Point Estimation
 Confidence Intervals
Professor S. D. Balkin -- July 1, 2002
-3-
Reminder: Statistical Inference
 Problem of Inferential Statistics:
• Make inferences about one or more population parameters based on
observable sample data
 Forms of Inference:
• Point estimation: single best guess regarding a population parameter
• Interval estimation: Specifies a reasonable range for the value of the
parameter
• Hypothesis testing: Isolating a particular possible value for the
parameter and testing if this value is plausible given the available data
Professor S. D. Balkin -- July 1, 2002
-4-
Point Estimators
 Computing a single statistic from the sample data to estimate
a population parameter
 Choosing a point estimator:
• What is the shape of the distribution?
• Do you suspect outliers exist?
• Plausible choices:
•
•
•
•
Mean
Median
Mode
Trimmed Mean
Professor S. D. Balkin -- July 1, 2002
-5-
Confidence Intervals
 Specification of a “probably range” for a parameter
 Used to understand how statistics may vary from sample
to sample
 States explicit allowance for random sampling error (not
selection biases)
 We have 95% confidence that the population parameter
falls within the bounds of the interval
 Or…the interval is the result of a process that in the long
run has a 95% probability of being correct
Professor S. D. Balkin -- July 1, 2002
-6-
Hypothesis Testing
Chapter 8
Overview
 A research hypothesis typically states that there is a real change, a real
difference, or real effect in the underlying population or process. The the
opposite, null hypothesis, then states that there is no real change,
difference, or effect
 The basic strategy of hypothesis testing is to try to support a research
hypothesis by showing that the sample results are highly unlikely,
assuming the null hypothesis, and more likely, assuming the research
hypothesis
 The strategy can be implemented in equivalent to raise by creating a
formal rejection region, by obtaining a plea value, were like seeking
whether the null hypothesis value falls within a confidence interval
 There are risks of false positive and a false negative errors
 Tests of a mean usually are based on the t-distribution
 Tests of the proportion may be done by using a normal approximation
Professor S. D. Balkin -- July 1, 2002
-8-
Overview
 Very often sample data will suggest that something relevant is happening
in the underlying population
• A sample of potential customers may show that a higher proportion prefer a
new brand to the existing one
• A sampling of telephone response time by reservation clerks may show an
increase in average customer waiting time
• A sample of the service times may indicate customers are receiving poorer
service fan in the company thinks it is providing
 The question of whether the apparent defects in the sample is an
indication of something happening in the underlying population and more if
he apparent effect is merely a fluke
Professor S. D. Balkin -- July 1, 2002
-9-
What is Hypothesis Testing
 Method for checking whether an apparent result from a
sample could possibly be due to randomness
 Checks on how strong the evidence is
 Are sample data reflecting a real effect or random fluke?
 Results of a hypothesis test indicate how good the
evidence is, not how important the result is
Professor S. D. Balkin -- July 1, 2002
- 10 -
Motivating Case Study #1
 FCC has been receiving complaints from customers ordering
new telephone service
 Big telecommunications company tells the FCC that the
average time a new customer has to wait for new service
installation is 72 hours (excluding weekends) with a standard
deviation of 24 hours
 The FCC randomly samples 100 new customers from the
telecom company and asks how long each had to wait for
new service installation
Professor S. D. Balkin -- July 1, 2002
- 11 -
Testing Hypotheses
 Research Hypothesis, or Alternative Hypothesis is what the is
trying to prove
• Denoted: Ha
 Null Hypothesis is the denial of the research hypothesis. It is
what is trying to be disproved
• Denoted: H0
Professor S. D. Balkin -- July 1, 2002
- 12 -
Hypothesis Testing Components
 Define research hypothesis direction:
• One-sided (< or >)
• Two-sided ()
 Strategy is to attempt to support the research hypothesis by
contradicting the null hypothesis
• The null hypothesis is contradicted if when assuming it is true, the
sample data are highly unlikely and more likely given the research
hypothesis
 Test Statistic: Summary of the sample data
Professor S. D. Balkin -- July 1, 2002
- 13 -
Basic Logic
1. Assume that H0: m=72 is true;
2. Calculate the value of the test statistic
 Sample mean, proportion, etc.
3. If this value is highly unlikely, reject H0 and support Ha

We can use the sampling distribution to determine what
values of the test statistic are sufficiently unlikely given the
null hypothesis
Professor S. D. Balkin -- July 1, 2002
- 14 -
Rejection Region
 Specification of the rejection region must recognize the possibility of
error
• Type I Error: Rejecting the null hypothesis when in fact it is true
• In establishing a rejection region, we must specify the maximum tolerable
probability of this type of error (denoted a)
• Type II Error: Failing to reject the null hypothesis when in fact it is false
(beyond scope)
 Rejection region can be based on sampling distribution of the
sample statistic
• Remember, we want to reject the null hypothesis if the value of the test
statistic is highly unlikely assuming H0 is true
• Can uses the tails of a normal distribution
Professor S. D. Balkin -- July 1, 2002
- 15 -
Rejection Region
m=72
Professor S. D. Balkin -- July 1, 2002
- 16 -
Rejection Region (cont)
 To determine whether or not to reject the null hypothesis, we
can compute the number of standard errors the sample
statistic lies above the assumed population mean
 This is done by computing a z-statistic for the sample mean:
z
Professor S. D. Balkin -- July 1, 2002
Y  m0
/ n
- 17 -
Rejection Region (cont)
For a  0.05 reject H 0 : m  72 if the observed value of
Y is more than 1.645 Y above m  72.
For a  0.05 reject H 0 : m  72 if computed
z statistic is greater th an 1.645
a=0.05
Rejection
m=72
Region
m+3.948
Professor S. D. Balkin -- July 1, 2002
- 18 -
Example
 The FCC sample of 100 randomly selection new
service customers resulted in a mean of 80 hours.
Setup the hypothesis test
Calculate the test statistic
Interpret the hypothesis
Professor S. D. Balkin -- July 1, 2002
- 19 -
Example
 A researcher claims that the amount of time urban preschool
children age 3-5 watch television has a mean of 22.6 hours
and a standard deviation of 6.1 hours.
 A market research firm believes this is too low
 The television habits of a random sample of 60 urban
preschool children are measured and resulted in the following
• Sample mean: 25.2
 Should the researcher’s claim be rejected at an a value of
0.01?
Professor S. D. Balkin -- July 1, 2002
- 20 -
Summary for Z Test with  Known
H 0 : m  m0
H a : 1. m  m 0
2. m  m 0
3. m  m 0
Test Statistic :
Y  m0
z
/ n
Rejection Region :
1. z  za
2. z   za
3. z  za / 2 or z   za / 2
Professor S. D. Balkin -- July 1, 2002
- 21 -
Example
 A researcher claims that the amount of time urban preschool children age
3-5 watch television has a mean of 22.6 hours and a standard deviation of
6.1 hours.
 A market research firm believes this is incorrect, but does not know in
which direction
 The television habits of a random sample of 60 urban preschool children
are measured and resulted in the following
• Sample mean: 25.2
 Should the researcher’s claim be rejected at an a value of 0.01?
Professor S. D. Balkin -- July 1, 2002
- 22 -
Z-values Worth Remembering
z0.05
z0.025
z0.01
z0.005
Professor S. D. Balkin -- July 1, 2002
= 1.645
= 1.96
= 2.326
= 2.576
- 23 -
P-Value
 Probability of a test statistic value equal to or more extreme
than the actual observed value
 Recall basic strategy
• Hope to support the research hypothesis and reject the null hypothesis
by showing that the data are highly unlikely assuming that the null
hypothesis is true
• As the test statistic gets farther into the rejection region, the data
become more unlikely, hence the weight of evidence against the null
hypothesis becomes more conclusive and p-value become smaller
Professor S. D. Balkin -- July 1, 2002
- 24 -
P-Value (cont)
 Small p-values indicate strong, conclusive evidence for
rejecting the null hypothesis
 Computation is straightforward in our z-test example:
One  tailed p - value
P(Z  z)
 Compute the p-value for our telecom example
Professor S. D. Balkin -- July 1, 2002
- 25 -
P-Value (cont)
 P-value is also referred to as attained level of significance
• Results of a test are said to be statistically significant at the specified
p-value
 Statistically significant says the difference between what is
observed and what is assumed correct is most likely not due
to random variation
 It DOES NOT MEAN the difference is important!
 It DOES NOT tell you that the difference is meaningful from
business perspective (practical significance)
 With large enough sample size, any difference can become
meaningful
Professor S. D. Balkin -- July 1, 2002
- 26 -
P-Value for a z Test
The p - value is the probabilit y, assuming that
the null hypothesis is true, of obtaining a test
statistic at least as extreme as the observed value.
H a : m  m0 , p  value  P( z  zactual )
H a : m  m0 , p  value  P( z  zactual )
H a : m  m 0 , p  value  2  P( z | zactual |)
Professor S. D. Balkin -- July 1, 2002
- 27 -
Hypothesis Testing with the t Distribution
 Population standard deviation is rarely known
 Basic ideas of hypothesis testing are not changed, we simply
switch sampling distributions
t
Professor S. D. Balkin -- July 1, 2002
Y  m0
s/ n
n 1 df
~ ta
- 28 -
T Test for Hypotheses about m
H 0 : m  m0
H a : 1. m  m 0
2. m  m 0
3. m  m 0
Test Statistic :
t
Y  m0
s/ n
Rejection Region :
1. t  t a
2. t  ta
3. | t | ta / 2
where t α cuts off a right - tail area of a in a t distributi on
with n-1 degrees of freedom.
Professor S. D. Balkin -- July 1, 2002
- 29 -
Example
 Airline institutes a ‘snake system’ waiting line at its counters to try to
reduce the average waiting time
 Mean waiting time under specific conditions with the previous system was
6.1.
 A sample of 14 waiting times is taken
• Sample mean: 5.043
• Standard deviation: 2.266
 Test the null hypothesis of no change against an appropriate research
hypothesis using a=0.10.
•
•
•
•
Calculate the rejection region
Calculate the t-statistic
Perform and interpret the hypothesis test
Calculate the associated p-value
Professor S. D. Balkin -- July 1, 2002
- 30 -
Example
 Performance based benefits are a way of giving employees more of
a stake in their work
 A study was conducted to find out how managers of 343 firms view
the effectiveness of various kinds of employee relations programs
 Each rated the effect of employee stock ownership on product quality
using a scale from –2 (large negative effect) to 2 (large positive
effect).
• Sample Mean: 0.35
• Standard Error: 0.14
 Do managers view employee stock ownership as a worthwhile
technique?
• Create a 95% confidence interval for the population parameter
• Perform a hypothesis test that the population mean isn’t equal to zero
Professor S. D. Balkin -- July 1, 2002
- 31 -
Example
 To help your restaurant marketing campaign target the
right age levels, you want to find out if there is a
statistically significant difference, on the average,
between the age of your customers and the age of the
general population in town, which is 43.1 years.
 A random sample of 50 customers shows an average of
33.6 years with a standard deviation of 16.2 years
 Perform a two-sided test at the 1% significance level
 What is the p-value?
Professor S. D. Balkin -- July 1, 2002
- 32 -
t-Test Assumptions
 Hypothesis tests allow for random variation, but not for bias
 Measurements are statistically independent
 Underlying population distribution should be symmetric
• Skewness affects p-value
Professor S. D. Balkin -- July 1, 2002
- 33 -
Hypothesis Testing a Proportion
 We can also perform hypothesis tests for proportions /
percentages by using a normal approximation to the binomial
distribution
z
y  n 0
n 0 (1   0 )
; where y is the number of successes
 0
z
; where  is the proportion of successes
 0 (1   0 ) / n
Professor S. D. Balkin -- July 1, 2002
- 34 -
Testing a Population Proportion
H0 :   0
H a : 1.    0
2.    0
3.    0
Test Statistic :
z
y  n 0
n 0 (1   0 )
Rejection Region :
1. z  z a
2. z   za
3. z  za / 2 or z   za / 2
Note :  0 is the null - hypothesis value of the population proportion  .
Professor S. D. Balkin -- July 1, 2002
- 35 -
Example
 A company figures out that the launch of their new product will
only be successful if more than 23% of consumers try the
product
 Based on a pilot study based on 205 consumers, you expect
44.1% of consumers to try it
 How sure are you that the percentage of people who will try
the new product is above the break-even point of 23%?
Professor S. D. Balkin -- July 1, 2002
- 36 -
Using A Confidence Interval
 Construct a confidence interval (say at 95% confidence) in the usual way
 If m0 is outside the interval, it is not a reasonable value for the population
parameter and you fail to reject the research hypothesis
 Why does this work?
• Confidence interval says that the probability that the population parameter is in
the random confidence interval is 0.95.
• If the null hypothesis was true, then the probability that m0 is in the interval is
also 95%
• When the null is true, you will make the correct decision in 95% of all cases
Professor S. D. Balkin -- July 1, 2002
- 37 -
R Tutorial on Hypothesis Testing
Professor S. D. Balkin -- July 1, 2002
- 38 -
Testing Two Samples
 Can test whether two samples are significantly different or
not, on the average
• Unpaired test: test whether two independent columns of numbers are
different
• Paired test: test whether two columns of numbers are different when
there is a natural pairing between them
Professor S. D. Balkin -- July 1, 2002
- 39 -
R Tutorial on Two Sample Hypothesis Testing
Professor S. D. Balkin -- July 1, 2002
- 40 -
Next Time…
 Regression Analysis
Professor S. D. Balkin -- July 1, 2002
- 41 -