Transcript Chapter 7a

Chapter 7
Probability and Samples: The
Distribution of Sample Means
Samples and Sampling Error


The scores we have looked at thus far are
z-scores and probabilities where the
sample consists of a single score.
This chapter will extend the concepts of zscores and probability to cover situations
with larger samples.

Ex: A z-score for an entire sample
Z-scores (review)


Describes exactly where the score is
located in the distribution
Ex: a z-score of +2.00 is extreme
Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the
Wadsworth Group, a division of Thomson Learning
Figure 6.4
The normal distribution following a z-score transformation
Central,
Representative
Sample
Extreme
Sample
Probability (review)


If the score is normal, should be able to
determine the probability value for each
score.
A score with a z-score of +2.00 has a
probability of only p = .0028
Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the
Wadsworth Group, a division of Thomson Learning
Figure 6.4
The normal distribution following a z-score transformation
Central,
Representative
Sample
Extreme
Sample
Z-Scores

So far we have been limited to situations
where the sample consists of a single
score.


Most studies have larger samples
We will now extend the concepts of z-scores
and probability to cover situations with larger
samples.



A z-score near zero indicates a central,
representative sample
A z-score beyond +/- 2.00 indicates an
extreme example
It will be possible to determine exact
probabilities for a sample
Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the
Wadsworth Group, a division of Thomson Learning
Figure 6.4
The normal distribution following a z-score transformation
Central,
Representative
Sample
Extreme
Sample
Difficulties with using samples


Samples provide an incomplete picture of
the population
Any stats computed will not be identical to
the corresponding parameters for the
entire population


Ex: IQ for a sample of 25 students is
different for IQ of all population
The difference is called a sampling error
Sampling Error

This difference, or error between the
sample stats and the corresponding
population parameters, is called sampling
error

A sampling error is the discrepancy, or
amount of error between a sample statistic
and its corresponding population parameter.
Questions




How can you tell which sample is giving
the best description of the population?
Can you predict how a sample will
describe its population?
What is the probability of selecting a
sample that has a certain sample mean?
We can answer these, but we need to set
rules that relate samples to populations.
Distribution of Sample Means


Many different samples come up with
different results.
A huge set of possible samples forms a
relatively simple, orderly, and predictable
pattern

makes it possible to predict the characteristics
of a sample with some accuracy.
Distribution of Sample Means (cont.)


The ability to predict sample
characteristics is based on the distribution
of sample means.
The distribution of sample means is the
collection of sample means for all the
possible random samples of a particular
size (n) that can be obtained from a
population
Distribution of Sample Means (cont.)

It is necessary to have all the possible
values in order to compute probabilities.

If a set has 100 samples, the probability of
obtaining any specific sample is 1 out of 100
or p = 1/100.


Before we only discussed scores, now we
are discussing statistics (sample means);
Because statistics are obtained from
samples, a distribution of statistics is
referred to as a sampling distribution.
Sampling Distribution

A sampling distribution is a distribution of
statistics obtained by selecting all the
possible samples of a specific size from a
population.
To construct a sample mean:








Take a sample
Get the mean
Replace
Get the sample
Get the mean
Replace
Do this until you have gotten all possible sample
combinations.
Look at Ex. 7.1 – 4 scores n=2 16 sample means
– look at histogram p. 147.
Sample Means



Note that the sample means tend to pile
up around the population mean
m=5
The sample means are clustered around a
value of 5
Sample Means (cont.)


Samples are supposed to be
representative of the population
Therefore, the sample means tend to
approximate the population mean.
Sample Means (cont.)



The distribution of sample means is
approximately normal in shape.
Can use the distribution of sample means to
answer probability questions about sample
means.
Ex: if you take a sample of n=2 scores from the
original population, what is the probability of
obtaining a sample mean greater than 7?

P (X > 7) = ?
Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the
Wadsworth Group, a division of Thomson Learning
Figure 7.1
Frequency distribution for a population of four scores: 2, 4, 6, 8
Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the
Wadsworth Group, a division of Thomson Learning
Table 7.1
The possible samples of n = 2 scores from the population in Figure 7.1

Ex: if you take a sample of n=2 scores
from the original population, what is the
probability of obtaining a sample mean
greater than 7?


P (X > 7) = ?
Because probability is equivalent to
proportion, the probability question can be
restated as follows:



Of all the possible sample means, what
proportion has values greater than 7?
In Figure 7.2 – All the possible sample
means are pictured, and only 1 out of the
16 means has a value greater than 7.
Answer: 1 out of 16 or p = 1/16
Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the
Wadsworth Group, a division of Thomson Learning
Figure 7.2
The distribution of sample means for n = 2
The Central Limit Theorem





It might not be possible to list all the samples
and compute all the possible sample means.
As the size of n increases, the number of
possible samples increases too.
Therefore, it is necessary to develop the general
characteristics of the distribution of sample
means that can be applied in any situation.
Characteristics are specified in Central Limit
Theorem
Cornerstone for much of inferential statistics
Central Limit Theorem

For any population with mean m and
standard deviation s, the distribution of
sample means for sample size n will have
a mean of m and a standard deviation of
s/ n
and will approach a normal
distribution as n approaches infinity.
Central Limit Theorem



Describes the distribution of sample means for
any population, no matter what shape, mean, or
standard deviation.
The distribution of sample means “approaches”
a normal distribution very rapidly.
Describes the distribution of sample means by
identifying the three basic characteristics that
describe any distribution: shape, central
tendency, and variability.
Shape of the Distribution of Means




Sample means tends to be a normal
distribution
Can be almost perfect shape if:
The population from which the samples
are selected is a normal distribution
The number of scores (n) in each sample
is relatively large, around 30 or more.
Mean of the Distribution of Means


The expected value of X
The mean of the distribution of sample
means is equal to m (the population mean)
and is called the expected value of X.
Standard Error of X


We have considered the shape and the
central tendency of the distribution of
sample means.
To completely describe this distribution,
we need one more characteristic

Variability
Standard Error of X



We will be working with the standard
deviation for the distribution of sample
means.
Called the standard error of X
The standard error defines the standard,
or typical, distance from the mean.


Remember, a sample is not expected to
provide a perfectly accurate reflection of
its population.
There will be some error between the
sample and the population
Standard Error of X


The standard deviation of the distribution
of sample means is called the standard
error of X.
The standard error measures the standard
amount of difference between X and m
due to chance
Standard Error of X
Standard error = s x = standard distance between
X and m
s
s
s indicates that we are measuring a standard
deviation or a standard distance from the mean
The subscript x indicates that we are measuring
the standard deviation for a distribution of
sample means.
Standard Error



Valuable because it specifies precisely how
well a sample mean estimates its
population mean
How much error you should expect on the
average
Can use the sample mean as an estimate
of the population mean
Standard Error

Magnitude determined by two factors

Size of the sample


The larger the sample size (n), the more probable
it is that the sample mean will be close to the
population
The standard deviation of the population from
which the sample is selected
standard error = s x = s
n
Standard error


When the sample size increases, the
standard error decreases
As n decreases, the error increases
Probability and the Distribution of Sample Means



Primary use of the distribution of sample means
is to find the probability associated with any
specific sample.
Remember probability is equivalent to
proportion.
Because the distribution of sample means
presents the entire set of all possible X’s, we can
use proportion of this distribution to determine
probabilities.
Example 7.2



Population of SAT scores
m = 500 s = 100
If you take a random sample of n = 25
students, what is the probability that the
sample mean would be greater than X =
540?

Restate probability question as a
proportion question



Out of all the possible sample means, what
proportion has values greater than 540?
all the possible sample means is the
distribution of sample means
The problems is to find a specific portion
of this distribution

What we know




The distribution is normal becausse the
population of SAT scores is normal
The distribution has a mean of 500 because
the population mean is m = 500
The distribution has a standard error of s X =
20
s X = s = 100 = 100 = 20
n 25
5
Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the
Wadsworth Group, a division of Thomson Learning
Figure 7.3
A distribution of sample means





We are interested in sample means
greater than 540 – the shaded area
Next, find the s-score value that defines
the exact location of X = 540
The value of 540 is located above the
mean by 40 pts.
This is 2 s.d. (in this case, 2 standard
errors) above the mean
The z-score for X = 540 is z = +2.00



Because this distribution of sample means
is normal, you can use the unit normal
table to find the probability associated
with z=+2.00
The table indicates that 0.0228 of the
distribution is located in the tail of the
distribution beyond z = +2.00
Conclusion – it is very unlikely, p = 0.0228
(2.28%) to obtain a random sample of n
= 25 students with an average SAT score
greater than 540
Z-scores


It is possible to use a z-score to describe
the position of any specific sample within
the distribution of sample means
Z-score tells exactly where a specific
sample is located in relation to all the
other possible samples that could have
been obtained.
Copyright © 2002 Wadsworth Group. Wadsworth is an imprint of the
Wadsworth Group, a division of Thomson Learning
Figure 7.8
Showing standard error in a graph