Review for Midterm

Transcript Review for Midterm

Sociology 601: Midterm review, October 15, 2009
• Basic information for the midterm
–
–
–
–
–
Date: Tuesday October 20, 2009
Start time: 2 pm.
Place: usual classroom, Art/Sociology 3221
Bring a sheet of notes, a calculator, two pens or pencils
Notify me if you anticipate any timing problems
• Review for midterm
–
–
–
–
–
–
terms
symbols
steps in a significance test
testing differences in groups
contingency tables and measures of association
equations
1
Important terms from chapter 1
Terms for statistical inference:
•
•
•
•
population
sample
parameter
statistic
Key idea: You use a sample to make inferences
about a population
2
Important terms from chapter 2
2.1) Measurement:
•
•
•
•
•
•
variable
interval scale
ordinal scale
nominal scale
discrete variable
continuous variable
2.2-2.4) Sampling:
• simple random sample
• probability sampling
• stratified sampling
• cluster sampling
• multistage sampling
• sampling error
Key idea: Statistical inferences depend on measurement and sampling.
3
Important terms from chapter 3
3.1) Tabular and graphic description
•
•
•
•
frequency distribution
relative frequency distribution
histogram
bar graph
3.2-3.4) Measures of central tendency and variation
•
•
•
•
•
•
•
•
mean
median
mode
proportion
standard deviation
variance
interquartile range
quartile, quintile, percentile
4
Important terms from chapter 3
Key ideas:
1.) Statistical inferences are often made about a measure of
central tendency.
2.) Measures of variation help us estimate certainty about an
inference.
5
Important terms from Chapter 4
•
•
•
•
•
•
•
probability distribution
sampling distribution
sample distribution
normal distribution
standard error
central limit theorem
z-score
Key ideas:
1.) If we know what the population is like, we can predict what a sample
might be like.
2.) A sample statistic gives us a best guess of the population parameter.
2.) If we work carefully, a sample can tell us how confident to be about our
sample statistic.
6
Important terms from chapter 5
•
•
•
•
•
point estimator
estimate
unbiased
efficient
confidence interval
Key ideas:
1.) We have a standard set of equations we use to make estimates.
2.) These equations are used because they have specific desirable
properties.
3.) A confidence interval provides your best guess of a parameter.
4.) A confidence interval provides your best guess of how close your
best guess (in part 3.)) will typically be to the parameter. 7
Important terms from chapter 6
6.1 – 6.3) Statistical inference: Significance tests
•
•
•
•
•
•
•
•
•
assumptions
hypothesis
test statistic
p-value
conclusion
null hypothesis
one-sided test
two-sided test
z-statistic
8
Key Idea from chapter 6
A significance test is a ritualized way to ask about a
population parameter.
1.) Clearly state assumptions
2.) Hypothesize a value for a population parameter
3.) Calculate a sample statistic.
4.) Estimate how unlikely it is for the hypothesized
population to produce such a sample statistic.
5.) Decide whether the hypothesis can be thrown out.
9
More important terms from chapter 6
6.4, 6.7) Decisions and types of errors in hypothesis tests
• type I error
• type II error
• power
6.5-6.6) Small sample tests
• t-statistic
• binomial distribution
• binomial test
Key ideas:
1.) Modeling decisions and population characteristics can affect the
probability of a mistaken inference.
2.) Small sample tests have the same principles as large sample
10
tests, but require different assumptions and techniques.
symbols
ˆ
Yi
Y
s2
2
 ˆ
t
H0
0
s
z

P

ˆ
n
0
ˆ
Y

ˆ Y
df
Ha
11
Significance tests, Step 1: assumptions
• An assumption that the sample was drawn at random.
– this is pretty much a universal assumption for all significance
tests.
• An assumption whether the variable has two outcome
categories (proportion) or many intervals (mean).
• An assumption that enables us to assume a normal
sampling distribution. This is assumption varies from
test to test.
– Some tests assume a normal population distribution.
– Other tests assume different minimum sample sizes.
– Some tests do not make this assumption.
• Declare α level at the start, if you use one.
12
Significance Tests, Step 2: Hypothesis
• State the hypothesis as a null hypothesis.
– Remember that the null hypothesis is about the
population from which you draw your sample.
• Write the equation for the null hypothesis.
• The null hypothesis can imply a one- or two-sided
test.
– Be sure the statement and equation are consistent.
13
Significance Tests, Step 3: Test statistic
For the test statistic, write:
• the equation,
• your work, and
• the answer.
– Full disclosure maximizes partial credit.
– I recommend four significant digits at each computational
step, but present three as the answer.
14
Significance tests, Step 4: p-value
Calculate an appropriate p-value for the test-statistic.
– Use the correct table for the type of test;
– Use the correct degrees of freedom if applicable;
– Use a correct p-value for a one- or two-sided test, as you
declared in the hypothesis step.
15
Significance Tests, Step 5: Conclusion
Write a conclusion
– write the p-value, your decision to reject H0 or not;
– a statement of what your decision means;
– discuss the substantive importance of your sample
statistic.
16
Useful STATA outputs
• immediate test for sample mean using TTESTI:
. * for example, in A&F problem 6.8, n=100 Ybar=508 sd=100 and mu0=500
. ttesti 100 508 100 500, level(95)
One-sample t test
----------------------------------------------------------------------------|
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf.
Interval]
---------+------------------------------------------------------------------x |
100
508
10
100
488.1578
527.8422
----------------------------------------------------------------------------Degrees of freedom: 99
Ho: mean(x) = 500
Ha: mean < 500
t =
0.8000
P < t =
0.7872
Ha: mean != 500
t =
0.8000
P > |t| =
0.4256
Ha: mean > 500
t =
0.8000
P > t = 200.2128
Useful STATA outputs
• immediate test for sample proportion using PRTESTI:
•
•
. * for proportion: in A&F problem 6.12, n=832 p=.53 and p0=.5
. prtesti 832 .53 .50, level(95)
•
One-sample test of proportion
•
•
•
•
•
-----------------------------------------------------------------------------Variable |
Mean
Std. Err.
[95% Conf. Interval]
-------------+---------------------------------------------------------------x |
.53
.0173032
.4960864
.5639136
------------------------------------------------------------------------------
•
•
•
•
x: Number of obs =
832
Ho: proportion(x) = .5
Ha: x < .5
z = 1.731
P < z = 0.9582
Ha: x != .5
z = 1.731
P > |z| = 0.0835
Ha: x > .5
z = 1.731
P > z = 0.0418
21
Useful STATA outputs
• Comparison of two means using ttesti
•
•
ttesti 4252 18.1 12.9 6764 32.6 18.2, unequal
•
Two-sample t test with unequal variances
•
•
•
•
•
•
•
•
•
•
•
-----------------------------------------------------------------------------|
Obs
Mean
Std. Err.
Std. Dev.
[95% Conf. Interval]
---------+-------------------------------------------------------------------x |
4252
18.1
.1978304
12.9
17.71215
18.48785
y |
6764
32.6
.221294
18.2
32.16619
33.03381
---------+-------------------------------------------------------------------combined |
11016
27.00323
.1697512
17.8166
26.67049
27.33597
---------+-------------------------------------------------------------------diff |
-14.5
.2968297
-15.08184
-13.91816
-----------------------------------------------------------------------------Satterthwaite's degrees of freedom: 10858.6
•
•
•
•
Ho: mean(x) - mean(y) = diff = 0
Ha: diff < 0
t = -48.8496
P < t =
0.0000
Ha: diff != 0
t = -48.8496
P > |t| =
0.0000
Ha: diff > 0
t = -48.8496
P > t =
1.0000
24
Chapter 6: Significance Tests for Single Sample
 or 
mean
proportion
mean
proportion
sample size
large
large
small
small
best test
z-test for Ybar - 0
z-test for hat - 1
t-test for Ybar - 0
Fisher’s exact test
32
Equations for tests of statistical significance
Y  0
z
ˆY



ˆ  0

z
 ˆ
Y  0
t
ˆY

33
Chapter 7: Comparing scores for two groups
 or 
sample size
mean
large
proportion large
mean
small
proportion small
mean
large
proportion large
mean
small
sample scheme
independent
independent
independent
independent
dependent
dependent
dependent
best test
z-test for 2 - 1
z-test for 2 - 1
t-test for 2 - 1
Fisher’s exact test
z-test for D
McNemar test
t-test for D
proportion small
dependent
Binomial test34
Two Independent Groups:
Large Samples, Means
•
It is important to be able to recognize the parts of the equation,
what they mean, and why they are used.
•
Equal variance assumption? NO
7.1. difference of two large sample means
:
Y2  Y1  0

z
s12 s2 2

n1 n 2
35
Two Independent Groups:
Large Samples, Proportions
•
Equal variance assumption? YES (if proportions are equal
then so are variances).
•
df = N1 + N2 - 2
7.2 difference of 2 large sample proportions
: z
ˆ 2  
ˆ1  0

ˆ (1 
ˆ)

n1

ˆ (1 
ˆ)

n2
36
Two Independent Groups:
Small Samples, Means
7.3 Difference of two small sample means:
(Y2  Y1 )  0
t(or z) 
ˆ Y Y

2

1
(Y2  Y1 )
(n1 1)s12  (n 2 1)s22
*
n1  n 2  2
1
1

n1
n2
Equal variance assumption: SOMETIMES (for ease)
NO (in computer programs)
37
Two Independent Groups:
Small Samples, Proportions
Fisher’s exact test
• via stata, SAS, or SPSS
• calculates exact probability of all possible
occurences
38
Dependent Samples:
D
D

• Means: t(or z) 
sD
ˆD


• Proportions:
z
n
n12  n 21
n12  n 21
39
Chapter 8: Analyzing associations
• Contingency tables and their terminologies:
– marginal distributions and joint distributions
– conditional distribution of R, given a value of E.
(as counts or percentages in A & F)
– marginal, joint, and conditional probabilities.
(as proportions in A & F)
• “Are two variables statistically independent?”
40
Descriptive statistics you need to know
• How to draw and interpret contingency tables (crosstabs)
• Frequency and probability/ percentage terms
– marginal
– conditional
– joint
• Measures of relationships:
– odds, odds ratios
– gamma and tau-b
41
Observed and expected cell counts
• fo, the observed cell count, is the number of cases in a
given cell.
• fe, the expected cell count, is the number of cases we
would predict in a cell if the variables were independent
of each other.
•
fe = row total * column total / N
– the equation for fe is a correction for rows or columns
with small totals.
42
Chi-squared test of independence
• Assumptions: 2 categorical variables, random sampling, fe
>= 5
• Ho: variables are statistically independent (crudely, the
score for one variable is independent of the score for the
other.)
• Test statistic: 2 = ((fo-fe)2/fe)
• p-value from 2 table, df = (r-1)(c-1)
• Conclusion; reject or do not reject based on p-value and
prior -level, if necessary. Then, describe your conclusion.
43
Probabilities, odds, and odds ratios.
• Given a probability, you can calculate an odds and a log
odds.
– odds = p / (1-p)
• 50/50 = 1.0
• 0 ∞
– log odds = log (p / (1-p) ) = log (p) – log(1-p)
• 50/50 = 0.0
• -∞  +∞
– odds ratio = [ p1 / (1-p1) ] / [ p2 / (1-p2) ]
• Given an odds, you can calculate a probability.
p = odds / ( 1 + odds)
44
Measures of association with ordinal data
• concordant observations C:
– in a pair, one is higher on both x and y
• discordant observations D:
– in a pair, one is higher on x and lower on y
• ties
– in a pair, same on x or same on y
• gamma
CD

CD
(ignores ties)
• tau-b is a gamma that adjusts for “ties”
– gamma often increases with more collapsed tables
– b and  both have standard errors in computer output

45
– b can be interpreted as a correlation coefficient

Review for Midterm

Transcript Review for Midterm

Directory