Lecture 9

download report

Transcript Lecture 9

Hypothesis Testing
• It is frequently expected that you have
clear hypotheses when you have a study
using quantitative data.
• Older citizens are more likely to vote. Men
are more likely to like computers. Rural
schools perform higher than urban schools
• How do we test these hypotheses?
• It’s complicated. Regression, chi-square, ttest
• But we need to start at the beginning
The Normal Curve
• The mean and standard deviation, in
conjunction with the normal curve allow for
more sophisticated description of the data
and (as we see later) statistical analysis
• For example, a school is not that
interested in the raw GRE score, it is
interested in how you score relative to
others.
• Even if the school knows the average
(mean) GRE score, your raw score still
doesn’t tell them much, since in a perfectly
normal distribution, 50% of people will
score higher than the mean.
• This is where the standard deviation is so
helpful. It helps interpret raw scores and
understand the likelihood of a score.
• So if I told you if I scored 710 on the
quantitative section and the mean score is
591. Is that good?
• It’s above average, but who cares.
• What if I tell you the standard deviation is
148?
• What does that mean?
• What if I said the standard deviation is 5?
• Calculating z-scores
Converting raw scores to z scores
What is a z score? What does it represent
Z = (x-µ) / σ
Converting z scores into raw scores
X=zσ+µ
Z = (710-563)/140 = 147/140 = 1.05
Finding Probabilities under the Normal
Curve
So what % of GRE takers scored above and
below 710?
Why is this important? Inferential Statistics
(to be cont.)
Why is the normal curve so important?
• If we define probability in terms of the
likelihood of occurrence, then the normal
curve can be regarded as a probability
distribution (the probability of occurrence
decreases as we move away from the
center – central tendency).
• With this notion, we can find the probability
of obtaining a raw score in a distribution,
given a certain mean and SD (or standard
error).
Example of number 1:
• President of UNLV states that the average
salary of a new UNLV graduate is
$60,000. We are skeptical and test this by
taking a random sample of a 100 UNLV
students. We find that the average is only
$55,000. Do we declare the President a
liar?
Not Yet!!!!
We need to make a probabilistic statement
regarding the likelihood of the President’s
statement. How do we do that?
With the aid of the standard error of the
mean we can calculate confidence
intervals - the range of mean values within
with our true population mean is likely to
fall.
How do we do that?
• First, we need the sample mean
• Second, we need the standard error, a.ka.
standard deviation of the sampling
distribution of means
• Third, select a threshold to reject the null
hypothesis: 5%, 10%, one-tail, two-tail
(1.96 is usually the magic number for large
samples)
So is the President wrong?
CI = Mean + or – 1.96 (SE)
= 55,000 +/- 1.96 (3000)
= 55,000 +/- 5880 = $49,120 to 60,880
We predict that the “true average salary” is within
this range. Which means the President may be
correct.
Confidence Intervals for
Proportions
Calculate the standard error of the
proportion:
Sp =
P1  P 
N
95% conf. Interval =
P +/- (1.96)Sp
Example
• National sample of 531 Democrats (or
Democratic-leaning) - Sept. 14-16, 2007
• Clinton 47%; Obama 25%; Edwards 11%
• P(1-P) = .47(1-.47) = .47(.53) = .2491
• Divide by N = .2491/531 = .000469
• Square root of .000469 = .0217
• 95% CI = .47 +/- 1.96 (.0217)
• .47 +/- .04116 or 0.429 to .511
• We are 95% confident the true population
ranges from 42.9% to 51.1%
Stone
• Political system (politicians, public, special
interests) not interested in testing
hypotheses to get at causal relationships
• Interested in causal stories
• Actors are more likely to define the cause
rather than “test hypotheses” or examine
alternative explanations.
• Causal stories often used to assign blame
or deflect blame, gain support.