Lecture 9

download report

Transcript Lecture 9

Hypothesis Testing
• It is frequently expected that you have
clear hypotheses when you have a study
using quantitative data.
• Older citizens are more likely to vote. Men
are more likely to like computers. Rural
schools perform higher than urban schools
• How do we test these hypotheses?
• It’s complicated. Regression, chi-square, ttest
• But we need to start at the beginning
The Normal Curve
• The mean and standard deviation, in
conjunction with the normal curve allow for
more sophisticated description of the data
and (as we see later) statistical analysis
• For example, a school is not that
interested in the raw GRE score, it is
interested in how you score relative to
• Even if the school knows the average
(mean) GRE score, your raw score still
doesn’t tell them much, since in a perfectly
normal distribution, 50% of people will
score higher than the mean.
• This is where the standard deviation is so
helpful. It helps interpret raw scores and
understand the likelihood of a score.
• So if I told you if I scored 710 on the
quantitative section and the mean score is
591. Is that good?
• It’s above average, but who cares.
• What if I tell you the standard deviation is
• What does that mean?
• What if I said the standard deviation is 5?
• Calculating z-scores
Converting raw scores to z scores
What is a z score? What does it represent
Z = (x-µ) / σ
Converting z scores into raw scores
Z = (710-563)/140 = 147/140 = 1.05
Finding Probabilities under the Normal
So what % of GRE takers scored above and
below 710?
Why is this important? Inferential Statistics
(to be cont.)
Why is the normal curve so important?
• If we define probability in terms of the
likelihood of occurrence, then the normal
curve can be regarded as a probability
distribution (the probability of occurrence
decreases as we move away from the
center – central tendency).
• With this notion, we can find the probability
of obtaining a raw score in a distribution,
given a certain mean and SD (or standard
Example of number 1:
• President of UNLV states that the average
salary of a new UNLV graduate is
$60,000. We are skeptical and test this by
taking a random sample of a 100 UNLV
students. We find that the average is only
$55,000. Do we declare the President a
Not Yet!!!!
We need to make a probabilistic statement
regarding the likelihood of the President’s
statement. How do we do that?
With the aid of the standard error of the
mean we can calculate confidence
intervals - the range of mean values within
with our true population mean is likely to
How do we do that?
• First, we need the sample mean
• Second, we need the standard error, a.ka.
standard deviation of the sampling
distribution of means
• Third, select a threshold to reject the null
hypothesis: 5%, 10%, one-tail, two-tail
(1.96 is usually the magic number for large
So is the President wrong?
CI = Mean + or – 1.96 (SE)
= 55,000 +/- 1.96 (3000)
= 55,000 +/- 5880 = $49,120 to 60,880
We predict that the “true average salary” is within
this range. Which means the President may be
Confidence Intervals for
Calculate the standard error of the
Sp =
P1  P 
95% conf. Interval =
P +/- (1.96)Sp
• National sample of 531 Democrats (or
Democratic-leaning) - Sept. 14-16, 2007
• Clinton 47%; Obama 25%; Edwards 11%
• P(1-P) = .47(1-.47) = .47(.53) = .2491
• Divide by N = .2491/531 = .000469
• Square root of .000469 = .0217
• 95% CI = .47 +/- 1.96 (.0217)
• .47 +/- .04116 or 0.429 to .511
• We are 95% confident the true population
ranges from 42.9% to 51.1%
• Political system (politicians, public, special
interests) not interested in testing
hypotheses to get at causal relationships
• Interested in causal stories
• Actors are more likely to define the cause
rather than “test hypotheses” or examine
alternative explanations.
• Causal stories often used to assign blame
or deflect blame, gain support.