Review for Midterm
Download
Report
Transcript Review for Midterm
Sociology 601: Midterm review, October 15, 2009
• Basic information for the midterm
–
–
–
–
–
Date: Tuesday October 20, 2009
Start time: 2 pm.
Place: usual classroom, Art/Sociology 3221
Bring a sheet of notes, a calculator, two pens or pencils
Notify me if you anticipate any timing problems
• Review for midterm
–
–
–
–
–
–
terms
symbols
steps in a significance test
testing differences in groups
contingency tables and measures of association
equations
1
Important terms from chapter 1
Terms for statistical inference:
•
•
•
•
population
sample
parameter
statistic
Key idea: You use a sample to make inferences
about a population
2
Important terms from chapter 2
2.1) Measurement:
•
•
•
•
•
•
variable
interval scale
ordinal scale
nominal scale
discrete variable
continuous variable
2.2-2.4) Sampling:
• simple random sample
• probability sampling
• stratified sampling
• cluster sampling
• multistage sampling
• sampling error
Key idea: Statistical inferences depend on measurement and sampling.
3
Important terms from chapter 3
3.1) Tabular and graphic description
•
•
•
•
frequency distribution
relative frequency distribution
histogram
bar graph
3.2-3.4) Measures of central tendency and variation
•
•
•
•
•
•
•
•
mean
median
mode
proportion
standard deviation
variance
interquartile range
quartile, quintile, percentile
4
Important terms from chapter 3
Key ideas:
1.) Statistical inferences are often made about a measure of
central tendency.
2.) Measures of variation help us estimate certainty about an
inference.
5
Important terms from Chapter 4
•
•
•
•
•
•
•
probability distribution
sampling distribution
sample distribution
normal distribution
standard error
central limit theorem
z-score
Key ideas:
1.) If we know what the population is like, we can predict what a sample
might be like.
2.) A sample statistic gives us a best guess of the population parameter.
2.) If we work carefully, a sample can tell us how confident to be about our
sample statistic.
6
Important terms from chapter 5
•
•
•
•
•
point estimator
estimate
unbiased
efficient
confidence interval
Key ideas:
1.) We have a standard set of equations we use to make estimates.
2.) These equations are used because they have specific desirable
properties.
3.) A confidence interval provides your best guess of a parameter.
4.) A confidence interval provides your best guess of how close your
best guess (in part 3.)) will typically be to the parameter. 7
Important terms from chapter 6
6.1 – 6.3) Statistical inference: Significance tests
•
•
•
•
•
•
•
•
•
assumptions
hypothesis
test statistic
p-value
conclusion
null hypothesis
one-sided test
two-sided test
z-statistic
8
Key Idea from chapter 6
A significance test is a ritualized way to ask about a
population parameter.
1.) Clearly state assumptions
2.) Hypothesize a value for a population parameter
3.) Calculate a sample statistic.
4.) Estimate how unlikely it is for the hypothesized
population to produce such a sample statistic.
5.) Decide whether the hypothesis can be thrown out.
9
More important terms from chapter 6
6.4, 6.7) Decisions and types of errors in hypothesis tests
• type I error
• type II error
• power
6.5-6.6) Small sample tests
• t-statistic
• binomial distribution
• binomial test
Key ideas:
1.) Modeling decisions and population characteristics can affect the
probability of a mistaken inference.
2.) Small sample tests have the same principles as large sample
10
tests, but require different assumptions and techniques.
symbols
ˆ
Yi
Y
s2
2
ˆ
t
H0
0
s
z
P
ˆ
n
0
ˆ
Y
ˆ Y
df
Ha
11
Significance tests, Step 1: assumptions
• An assumption that the sample was drawn at random.
– this is pretty much a universal assumption for all significance
tests.
• An assumption whether the variable has two outcome
categories (proportion) or many intervals (mean).
• An assumption that enables us to assume a normal
sampling distribution. This is assumption varies from
test to test.
– Some tests assume a normal population distribution.
– Other tests assume different minimum sample sizes.
– Some tests do not make this assumption.
• Declare α level at the start, if you use one.
12
Significance Tests, Step 2: Hypothesis
• State the hypothesis as a null hypothesis.
– Remember that the null hypothesis is about the
population from which you draw your sample.
• Write the equation for the null hypothesis.
• The null hypothesis can imply a one- or two-sided
test.
– Be sure the statement and equation are consistent.
13
Significance Tests, Step 3: Test statistic
For the test statistic, write:
• the equation,
• your work, and
• the answer.
– Full disclosure maximizes partial credit.
– I recommend four significant digits at each computational
step, but present three as the answer.
14
Significance tests, Step 4: p-value
Calculate an appropriate p-value for the test-statistic.
– Use the correct table for the type of test;
– Use the correct degrees of freedom if applicable;
– Use a correct p-value for a one- or two-sided test, as you
declared in the hypothesis step.
15
Significance Tests, Step 5: Conclusion
Write a conclusion
– write the p-value, your decision to reject H0 or not;
– a statement of what your decision means;
– discuss the substantive importance of your sample
statistic.
16
Useful STATA outputs
• immediate test for sample mean using TTESTI:
. * for example, in A&F problem 6.8, n=100 Ybar=508 sd=100 and mu0=500
. ttesti 100 508 100 500, level(95)
One-sample t test
----------------------------------------------------------------------------|
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf.
Interval]
---------+------------------------------------------------------------------x |
100
508
10
100
488.1578
527.8422
----------------------------------------------------------------------------Degrees of freedom: 99
Ho: mean(x) = 500
Ha: mean < 500
t =
0.8000
P < t =
0.7872
Ha: mean != 500
t =
0.8000
P > |t| =
0.4256
Ha: mean > 500
t =
0.8000
P > t = 200.2128
Useful STATA outputs
• immediate test for sample proportion using PRTESTI:
•
•
. * for proportion: in A&F problem 6.12, n=832 p=.53 and p0=.5
. prtesti 832 .53 .50, level(95)
•
One-sample test of proportion
•
•
•
•
•
-----------------------------------------------------------------------------Variable |
Mean
Std. Err.
[95% Conf. Interval]
-------------+---------------------------------------------------------------x |
.53
.0173032
.4960864
.5639136
------------------------------------------------------------------------------
•
•
•
•
x: Number of obs =
832
Ho: proportion(x) = .5
Ha: x < .5
z = 1.731
P < z = 0.9582
Ha: x != .5
z = 1.731
P > |z| = 0.0835
Ha: x > .5
z = 1.731
P > z = 0.0418
21
Useful STATA outputs
• Comparison of two means using ttesti
•
•
ttesti 4252 18.1 12.9 6764 32.6 18.2, unequal
•
Two-sample t test with unequal variances
•
•
•
•
•
•
•
•
•
•
•
-----------------------------------------------------------------------------|
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------x |
4252
18.1
.1978304
12.9
17.71215
18.48785
y |
6764
32.6
.221294
18.2
32.16619
33.03381
---------+-------------------------------------------------------------------combined |
11016
27.00323
.1697512
17.8166
26.67049
27.33597
---------+-------------------------------------------------------------------diff |
-14.5
.2968297
-15.08184
-13.91816
-----------------------------------------------------------------------------Satterthwaite's degrees of freedom: 10858.6
•
•
•
•
Ho: mean(x) - mean(y) = diff = 0
Ha: diff < 0
t = -48.8496
P < t =
0.0000
Ha: diff != 0
t = -48.8496
P > |t| =
0.0000
Ha: diff > 0
t = -48.8496
P > t =
1.0000
24
Chapter 6: Significance Tests for Single Sample
or
mean
proportion
mean
proportion
sample size
large
large
small
small
best test
z-test for Ybar - 0
z-test for hat - 1
t-test for Ybar - 0
Fisher’s exact test
32
Equations for tests of statistical significance
Y 0
z
ˆY
ˆ 0
z
ˆ
Y 0
t
ˆY
33
Chapter 7: Comparing scores for two groups
or
sample size
mean
large
proportion large
mean
small
proportion small
mean
large
proportion large
mean
small
sample scheme
independent
independent
independent
independent
dependent
dependent
dependent
best test
z-test for 2 - 1
z-test for 2 - 1
t-test for 2 - 1
Fisher’s exact test
z-test for D
McNemar test
t-test for D
proportion small
dependent
Binomial test34
Two Independent Groups:
Large Samples, Means
•
It is important to be able to recognize the parts of the equation,
what they mean, and why they are used.
•
Equal variance assumption? NO
7.1. difference of two large sample means
:
Y2 Y1 0
z
s12 s2 2
n1 n 2
35
Two Independent Groups:
Large Samples, Proportions
•
Equal variance assumption? YES (if proportions are equal
then so are variances).
•
df = N1 + N2 - 2
7.2 difference of 2 large sample proportions
: z
ˆ 2
ˆ1 0
ˆ (1
ˆ)
n1
ˆ (1
ˆ)
n2
36
Two Independent Groups:
Small Samples, Means
7.3 Difference of two small sample means:
(Y2 Y1 ) 0
t(or z)
ˆ Y Y
2
1
(Y2 Y1 )
(n1 1)s12 (n 2 1)s22
*
n1 n 2 2
1
1
n1
n2
Equal variance assumption: SOMETIMES (for ease)
NO (in computer programs)
37
Two Independent Groups:
Small Samples, Proportions
Fisher’s exact test
• via stata, SAS, or SPSS
• calculates exact probability of all possible
occurences
38
Dependent Samples:
D
D
• Means: t(or z)
sD
ˆD
• Proportions:
z
n
n12 n 21
n12 n 21
39
Chapter 8: Analyzing associations
• Contingency tables and their terminologies:
– marginal distributions and joint distributions
– conditional distribution of R, given a value of E.
(as counts or percentages in A & F)
– marginal, joint, and conditional probabilities.
(as proportions in A & F)
• “Are two variables statistically independent?”
40
Descriptive statistics you need to know
• How to draw and interpret contingency tables (crosstabs)
• Frequency and probability/ percentage terms
– marginal
– conditional
– joint
• Measures of relationships:
– odds, odds ratios
– gamma and tau-b
41
Observed and expected cell counts
• fo, the observed cell count, is the number of cases in a
given cell.
• fe, the expected cell count, is the number of cases we
would predict in a cell if the variables were independent
of each other.
•
fe = row total * column total / N
– the equation for fe is a correction for rows or columns
with small totals.
42
Chi-squared test of independence
• Assumptions: 2 categorical variables, random sampling, fe
>= 5
• Ho: variables are statistically independent (crudely, the
score for one variable is independent of the score for the
other.)
• Test statistic: 2 = ((fo-fe)2/fe)
• p-value from 2 table, df = (r-1)(c-1)
• Conclusion; reject or do not reject based on p-value and
prior -level, if necessary. Then, describe your conclusion.
43
Probabilities, odds, and odds ratios.
• Given a probability, you can calculate an odds and a log
odds.
– odds = p / (1-p)
• 50/50 = 1.0
• 0 ∞
– log odds = log (p / (1-p) ) = log (p) – log(1-p)
• 50/50 = 0.0
• -∞ +∞
– odds ratio = [ p1 / (1-p1) ] / [ p2 / (1-p2) ]
• Given an odds, you can calculate a probability.
p = odds / ( 1 + odds)
44
Measures of association with ordinal data
• concordant observations C:
– in a pair, one is higher on both x and y
• discordant observations D:
– in a pair, one is higher on x and lower on y
• ties
– in a pair, same on x or same on y
• gamma
CD
CD
(ignores ties)
• tau-b is a gamma that adjusts for “ties”
– gamma often increases with more collapsed tables
– b and both have standard errors in computer output
45
– b can be interpreted as a correlation coefficient