Transcript Lecture 5

Review
• Measures of Central Tendency
– Mean, median, mode
• Measures of Variation
– Variance, standard deviation
Variance is defined as
2
s = Σ ( X - X̄ )
N
2
The Normal Curve
• The mean and standard deviation, in
conjunction with the normal curve allow for
more sophisticated description of the data
and (as we see later) statistical analysis
• For example, a school is not that
interested in the raw GRE score, it is
interested in how you score relative to
others.
• Even if the school knows the average
(mean) GRE score, your raw score still
doesn’t tell them much, since in a perfectly
normal distribution, 50% of people will
score higher than the mean.
• This is where the standard deviation is so
helpful. It helps interpret raw scores and
understand the likelihood of a score.
• So if I told you if I scored 710 on the
quantitative section and the mean score is
591. Is that good?
• It’s above average, but who cares.
• What if I tell you the standard deviation is
148?
• What does that mean?
• What if I said the standard deviation is 5?
• Calculating z-scores
Converting raw scores to z scores
What is a z score? What does it represent
Z = (x-µ) / σ
Converting z scores into raw scores
X=zσ+µ
Z = (710-563)/140 = 147/140 = 1.05
Finding Probabilities under the Normal
Curve
So what % of GRE takers scored above and
below 710?
The importance of Table A
Why is this important? Inferential Statistics
(to be cont.)
Stuff you don’t need to know:
pi = ≈3.14159265
e = ≈2.71
The Normal Curve and Sampling
A.
A sample will (almost) always be different from
the true population
B.
This is called “sampling error”
C.
The difference between a sample and the true
population, regardless of how well the survey
was designed or implemented
D.
Different from measurement error or sample bias
Sampling distribution of Means
• The existence of sampling error means
that if you take a 1000 random samples
from a population and calculate a 1000
means and plot the distribution of those
means you will get a consistent distribution
that has the following characteristics:
Characteristics of a Sampling
Distribution
• 1. the distribution approximates a normal curve
• 2. the mean of a sampling distribution of means
is equal to the true population
• 3. the standard deviation of a sampling
distribution is smaller than the standard
deviation of the population. Less variation in
the distribution because we are not dealing with
raw scores but rather central tendencies.
Why is the normal curve so
important?
• If we define probability in terms of the
likelihood of occurrence, then the normal
curve can be regarded as a probability
distribution (the probability of occurrence
decreases as we move away from the
center).
• With this notion, we can find the probability
of obtaining a raw score in a distribution,
given a certain mean and SD.
Probability and the Normal
Curve
In chapter 6 – we are not interested in the
distribution of raw scores but rather the
distribution of sample means and making
probability statements about those sample
means.
Probability and the Sampling
Distribution
Why is making probabilistic statements about
a central tendency important?
• 1. it will allow us to engage in inferential
statistics (later in ch. 7)
• 2. it allows us to produce confidence
intervals
Example of number 1:
• President of UNLV states that the average
salary of a new UNLV graduate is
$60,000. We are skeptical and test this by
taking a random sample of a 100 UNLV
students. We find that the average is only
$55,000. Do we declare the President a
liar?
Not Yet!!!!
We need to make a probabilistic statement
regarding the likelihood of Harter’s
statement. How do we do that?
With the aid of the standard error of the
mean we can calculate confidence
intervals - the range of mean values within
with our true population mean is likely to
fall.
How do we do that?
• First, we need the sample mean
• Second, we need the standard deviation of
the sampling distribution of means (what’s
another name for this?)
• a.k.a standard error of the mean
What’s the Problem?
• The problem is…
• We don’t have the standard deviation of
the sampling distribution of means?
• What do we do?
First – let’s pretend
• Let’s pretend that I know the Standard Deviation
of the Sampling Distribution of Means (a.k.a. the
standard error of the mean). It’s 3000
• For a 95% confidence interval we multiply the
standard error of the mean by 1.96 and add &
subtract that product to our sample mean
• Why 1.96?
• What’s the range?
So is President Ashley Lying?
CI = Mean + or – 1.96 (SE)
= 55,000 +/- 1.96 (3000)
= 55,000 +/- 5880 = $49,120 to 60,880
Let’s stop pretending
• We Can Estimate the Standard Error of
the Mean.
– Divide the standard deviation of the sample
by √N-1
• Multiply this estimate by t rather than 1.96
and then add this product to our sample
mean.
• Why t?
The t Distribution
• Empirical testing and models shows that a
standard deviation from a sample
underestimates the standard deviation of
the true population
• This is why we use N-1 not N when
calculating the standard deviation and the
standard error
• So in reality, we are calculating t-scores,
not z-scores since we are not using the
true sd.
• So when we are using a sample and
calculating a 95% confidence interval (CI)
we need to multiply the standard error by t,
not 1.96
• How do we know what t is?
• Table in back of book
• Df = N - 1
Confidence Intervals for
Proportions
Calculate the standard error of the
proportion:
Sp =
P1  P 
N
95% conf. Interval =
P +/- (1.96)Sp
Example
• National sample of 531 Democrats and
Democratic-leaning independents, aged
18 and older, conducted Sept. 14-16, 2007
• Clinton 47%; Obama 25%; Edwards 11%
• P(1-P) = .47(1-.47) = .47(.53) = .2491
• Divide by N = .2491/531 = .000469
• Take square root = .0217
• 95% CI = .47 +/- 1.96 (.0217)
• .47 +/- .04116 or 0.429 to .511
Midterm
• Key terms from Schutt chapters 1-5
• Statistical Calculations by hand
– Mean, Median, Mode
– Variance/Standard Deviation
– Z-scores
– Standard errors and confidence intervals
using z or t