Transcript Document

Midterm Review Session
Things to Review
• Concepts
• Basic formulae
• Statistical tests
Things to Review
• Concepts
• Basic formulae
• Statistical tests
Populations <-> Parameters;
Samples <-> Estimates
Nomenclature
Population
Sample
Statistics
Mean
Parameter

Variance

s2
Standard
Deviation


x
s
In a random sample, each
member of a population has
an equal and independent
chance of being selected.
Review - types of variables
Nominal
• Categorical variables
Ordinal
Discrete
• Numerical variables
Continuous
Reality
Result
Reject Ho
Do not reject Ho
Ho true
Ho false
Type I error
correct
correct
Type II error
Sampling distribution of the mean, n=10
Sampling distribution of the mean, n=100
Sampling distribution of the mean, n = 1000
Things to Review
• Concepts
• Basic formulae
• Statistical tests
Things to Review
• Concepts
• Basic formulae
• Statistical tests
Sample
Test statistic
Null hypothesis
compare
Null distribution
How unusual is this test statistic?
P < 0.05
Reject Ho
P > 0.05
Fail to reject Ho
Statistical tests
• Binomial test
• Chi-squared goodness-of-fit
– Proportional, binomial, poisson
• Chi-squared contingency test
• t-tests
– One-sample t-test
– Paired t-test
– Two-sample t-test
Statistical tests
• Binomial test
• Chi-squared goodness-of-fit
– Proportional, binomial, poisson
• Chi-squared contingency test
• t-tests
– One-sample t-test
– Paired t-test
– Two-sample t-test
Quick reference summary:
Binomial test
• What is it for? Compares the proportion of successes
in a sample to a hypothesized value, po
• What does it assume? Individual trials are randomly
sampled and independent
• Test statistic: X, the number of successes
• Distribution under Ho: binomial with parameters n and
po.
• Formula:
n x
nx
P(x)   p 1 p
x 
P(x) = probability of a total of x successes
p = probability of success in each trial
n = total number of trials
P = 2 * Pr[xX]
Binomial test
Null hypothesis
Pr[success]=po
Sample
Test statistic
x = number of successes
compare
Null distribution
Binomial n, po
How unusual is this test statistic?
P < 0.05
Reject Ho
P > 0.05
Fail to reject Ho
Binomial test
H0: The relative frequency of successes in the population is p0
HA: The relative frequency of successes in the population is not p0
Statistical tests
• Binomial test
• Chi-squared goodness-of-fit
– Proportional, binomial, poisson
• Chi-squared contingency test
• t-tests
– One-sample t-test
– Paired t-test
– Two-sample t-test
Quick reference summary:
2 Goodness-of-Fit test
• What is it for? Compares observed frequencies in
categories of a single variable to the expected
frequencies under a random model
• What does it assume? Random samples; no expected
values < 1; no more than 20% of expected values < 5
• Test statistic: 2
• Distribution under Ho: 2 with
df=# categories - # parameters - 1
• Formula:
 
2

all classes
Observedi  Expectedi 
2
Expectedi
2 goodness of fit test
Null hypothesis:
Data fit a particular
Discrete distribution
Sample
Calculate expected values
Test statistic
 
2
Observedi  Expectedi 
2

all classes
Expectedi
compar
e
Null distribution:
2 With
N-1-param. d.f.

How unusual is this test statistic?
P < 0.05
Reject Ho
P > 0.05
Fail to reject Ho
2 Goodness-of-Fit test
H0: The data come from a certain distribution
HA: The data do not come from that distrubition
Possible distributions
n x
nx
Pr[x]   p 1 p
x 


e 
Pr X  
X!
X
Pr[x] = n * frequency of occurrence
Proportional
Given a number of categories
Probability proportional to number of opportunities
Days of the week, months of the year
Binomial
Number of successes in n trials
Have to know n, p under the null hypothesis
Punnett square, many p=0.5 examples
Poisson
Number of events in interval of space or time
n not fixed, not given p
Car wrecks, flowers in a field
Statistical tests
• Binomial test
• Chi-squared goodness-of-fit
– Proportional, binomial, poisson
• Chi-squared contingency test
• t-tests
– One-sample t-test
– Paired t-test
– Two-sample t-test
Quick reference summary:
2 Contingency Test
• What is it for? Tests the null hypothesis of no association
between two categorical variables
• What does it assume? Random samples; no expected
values < 1; no more than 20% of expected values < 5
• Test statistic: 2
• Distribution under Ho: 2 with
df=(r-1)(c-1) where r = # rows, c = # columns
• Formulae:
RowTotal*ColTotal
Expected
GrandTotal
2 

all classes
Observedi  Expectedi 
2
Expectedi
2 Contingency Test
Null hypothesis:
No association
between variables
Sample
Calculate expected values
Test statistic
 
2
Observedi  Expectedi 
2

all classes
Expectedi
compar
e
Null distribution:
2 With
(r-1)(c-1) d.f.

How unusual is this test statistic?
P < 0.05
Reject Ho
P > 0.05
Fail to reject Ho
2 Contingency test
H0: There is no association between these two variables
HA: There is an association between these two variables
Statistical tests
• Binomial test
• Chi-squared goodness-of-fit
– Proportional, binomial, poisson
• Chi-squared contingency test
• t-tests
– One-sample t-test
– Paired t-test
– Two-sample t-test
Quick reference summary:
One sample t-test
• What is it for? Compares the mean of a numerical
variable to a hypothesized value, μo
• What does it assume? Individuals are randomly
sampled from a population that is normally distributed.
• Test statistic: t
• Distribution under Ho: t-distribution with n-1 degrees of
freedom.
• Formula:
Y  o
t
SEY
One-sample t-test
Null hypothesis
The population mean
is equal to o
Sample
Test statistic
Y  o
t
s/ n
compare
Null distribution
t with n-1 df
How unusual is this test statistic?

P < 0.05
Reject Ho
P > 0.05
Fail to reject Ho
One-sample t-test
Ho: The population mean is equal to o
Ha: The population mean is not equal to o
Paired vs. 2 sample
comparisons
Quick reference summary:
Paired t-test
• What is it for? To test whether the mean difference in a
population equals a null hypothesized value, μdo
• What does it assume? Pairs are randomly sampled
from a population. The differences are normally
distributed
• Test statistic: t
• Distribution under Ho: t-distribution with n-1 degrees of
freedom, where n is the number of pairs
d  do
• Formula:
t
SEd

Paired t-test
Null hypothesis
The mean difference
is equal to o
Sample
Test statistic
d  do
t
SEd
compare
Null distribution
t with n-1 df
*n is the number of pairs
How unusual is this test statistic?
P < 0.05
Reject Ho
P > 0.05
Fail to reject Ho
Paired t-test
Ho: The mean difference is equal to 0
Ha: The mean difference is not equal 0
Quick reference summary:
Two-sample t-test
• What is it for? Tests whether two groups have the
same mean
• What does it assume? Both samples are random
samples. The numerical variable is normally
distributed within both populations. The variance of
the distribution is the same in the two populations
• Test statistic: t
• Distribution under Ho: t-distribution with n1+n2-2
degrees of freedom.

1 
2 1
SE Y Y  s p   
Y

Y
1
2
n1 n 2 
• Formulae:
t
1
SE Y Y
1
2
df1s12  df2 s22
s 
df1  df2
2
p

2
Two-sample t-test
Null hypothesis
The two populations
have the same mean
Sample
12
Test statistic
Y1  Y2
t
SE Y Y
1
compare
Null distribution
t with n1+n2-2 df
2
How unusual is this test statistic?

P < 0.05
Reject Ho
P > 0.05
Fail to reject Ho
Two-sample t-test
Ho: The means of the two populations are
equal
Ha: The means of the two populations are
not equal
Which test do I use?
1
Methods for a
single variable
How many variables
am I comparing?
2
Methods for
comparing two
variables
Methods for one variable
Categorical
Is the variable
categorical
or numerical?
Comparing to a
single proportion po
or to a distribution?
po
Binomial test
Numerical
distribution
2 Goodnessof-fit test
One-sample t-test
Methods for two variables
Y
X
Explanatory variable
Response variable
Categorical
Numerical
Contingency table
Grouped bar graph
Categorical
Mosaic plot
Multiple histograms
Scatter plot
Cumulative frequency distributions
Numerical
Methods for two variables
Y
X
Explanatory variable
Response variable
Categorical
Numerical
Contingency table
Contingency
Logistic
Grouped bar graph
Categorical
analysis
regression
Mosaic plot
Multiple histograms
Scatter plot
t-test
Regression
Cumulative frequency distributions
Numerical
Methods for two variables
Is the response variable
categorical or numerical?
Categorical
Contingency
analysis
Numerical
t-test
How many variables
am I comparing?
2
1
Is the variable
categorical
or numerical?
Is the response variable
categorical or numerical?
Categorical
Comparing to a
single proportion po
or to a distribution?
po
Binomial test
Numerical
Numerical
Categorical
distribution
2 Goodnessof-fit test
One-sample t-test
Contingency
analysis
t-test
Sample Problems
An experiment compared the testes sizes of four
experimental populations of monogamous flies to four
populations of polygamous flies:
a. What is the difference in mean testes size for males from monogamous populations
compared to males from polyandrous populations? What is the 95% confidence interval for
this estimate?
b. Carry out a hypothesis test to compare the means of these two groups. What conclusions
can you draw?
Sample Problems
In Vancouver, the probability of rain during a winter day
is 0.58, for a spring day 0.38, for a summer day 0.25,
and for a fall day 0.53. Each of these seasons lasts one
quarter of the year.
What is the probability of rain on a randomly-chosen
day in Vancouver?
Sample problems
A study by Doll et al. (1994) examined the relationship
between moderate intake of alcohol and the risk of heart
disease. 410 men (209 "abstainers" and 201 "moderate
drinkers") were observed over a period of 10 years, and the
number experiencing cardiac arrest over this period was
recorded and compared with drinking habits. All men were
40 years of age at the start of the experiment. By the end of
the experiment, 12 abstainers had experienced cardiac
arrest whereas 9 moderate drinkers had experienced
cardiac arrest.
Test whether or not relative frequency of cardiac arrest was
different in the two groups of men.
Sample Problems
An RSPCA survey of 200 randomly-chosen Australian
pet owners found that 10 said that they
had met their partner through owning the pet.
A. Find the 95% confidence interval for the proportion
of Australian pet owners who find love through their
pets.
B. What test would you use to test if the true proportion
is significantly different from 0.01? Write the formula
that you would use to calculate a P-value.
Sample Problems
One thousand coins were each flipped 8 times, and
the number of heads was recorded for each coin.
Here are the results:
Does the distribution of coin flips match the
distribution expected with fair coins? ("Fair coin"
means that the probability of heads per flip is 0.5.)
Carry out a hypothesis test.
Sample problems
Vertebrates are thought to be unidirectional in growth, with size either increasing or holding
steady throughout life. Marine iguanas from the Galápagos are unusual in a number of ways, and a
team of researchers has suggested that these iguanas might actually shrink during the low food
periods caused by El Niño events (Wikelski and Thom 2000). During these events, up to 90% of the
iguana population can die from starvation. Here is a plot of the changes in body length of 64
surviving iguanas during the 1992-1993 El Niño event.
The average change in length was −5.81mm, with standard deviation 19.50mm.
Test the hypothesis that length did not change on average during the El Niño event
.