Transcript sample mean

Review
Application of the
Normal Distribution
V. Katch Movement Science 250
1
Overview
 Continuous random variable
Infinitely many values, and those values
are often associated with
measurements on a continuous scale
with no gaps or interruptions
 Normal distribution - If a continuous
random variable has distribution that
is symmetric and bell-shaped we call
it a normal distribution
V. Katch Movement Science 250
2
Overview
 Continuous random variable
 Normal distribution
Curve is bell shaped
and symmetric
µ
Score
Normal Curve Formula
V. Katch Movement Science 250
y=
e
1
2

x-µ
2
(  )
2p
3
Definition
Standard Normal Deviation
a normal probability
distribution that has a mean
of 0 and a standard
deviation of 1
V. Katch Movement Science 250
4
Characteristics of the
Normal Curve
• The curve is bell-shaped and symmetrical.
• The mean, median, and mode are all equal.
• The highest frequency is in the middle of
the curve.
• The frequency gradually tapers off as the
scores approach the ends of the curve.
• The curve approaches, but never meets, the
abscissa at both high and low ends.
V. Katch Movement Science 250
5
Standard Normal Distribution
=1
µ=0
0
V. Katch Movement Science 250
x
z
6
To find:
z Score
the distance along horizontal scale of the
standard normal distribution; refer to the
leftmost column and top row of Table
Area
the region under the curve; refer to the
values in the body of Table
V. Katch Movement Science 250
7
The Empirical Rule
Standard Normal Distribution: µ = 0 and  = 1
99.7% of data are within 3 standard deviations of the mean
95% within
2 standard deviations
68% within
1 standard deviation
34%
34%
2.4%
2.4%
0.1%
0.1%
13.5%
x - 3s
x - 2s
V. Katch Movement Science 250
13.5%
x-s
x
x+s
x + 2s
x + 3s
8
Notation
P(a < z < b)
denotes the probability that the z score is between a
and b
P(z > a)
denotes the probability that the z score is greater
than a
P (z < a)
denotes the probability that the z score is less
than a
V. Katch Movement Science 250
9
Chapter 9
V. Katch Movement Science 250
Means and
Proportions
as Random
Variables
What happens if we
regard a summary
statistic for an entire
random sample as a
random variable
V. Katch Movement Science 250
If we take a random sample
and find the proportion who
have a certain trait, that
proportion is the numerical
outcome of a random event:
the sample proportion is
itself a random variable
V. Katch Movement Science 250
By understanding how a
sample mean behaves as a
random variable, we can
begin to understand the
population from which it
came.
V. Katch Movement Science 250
Work backwards
(from sample to pop)
ask question about pop
collect sample
measure/test
answer question about sample
based on stats: infer about pop
V. Katch Movement Science 250
Understanding Dissimilarity
Among Samples
Key: Need to understand what kind of
dissimilarity we should expect to see in
various samples from the same population.
• Suppose most samples were likely to provide an
answer that is within ±10% of the population.
• Then we also know the population answer should
be within ±10% of whatever our specific sample
value.
• => Have a good guess about the population value
15
based on sample value.
V. Katch Movement Science 250
Statistics and Parameters
A statistic is a numerical value computed from
a sample. Its value may differ for different
samples.
x
e.g. sample mean, sample
standard deviations,
p̂
and sample proportion .
A parameter is a numerical value associated
with a population. Considered fixed and
unchanging. e.g. population mean m, population
standard deviation , and population
proportion p.
V. Katch Movement Science 250
16
Statistics and Parameters
For categorical variables, statistics
associated with a sample include the
number or proportion of the sample
who fall into various categories
Both the frequency and percentage
are statistics associated with samples
V. Katch Movement Science 250
17
Sampling Distributions
Each new sample taken =>
sample statistic will change.
The distribution of possible values of a
statistic for repeated samples of the same
size from a population is called the
sampling distribution of the statistic.
Many statistics of interest have sampling distributions
that are approximately normal distributions
V. Katch Movement Science 250
18
Example Sampling Dist - Mean Hours Sleep
Survey of n = 190 college students.
“How many hours of sleep did you get last night?”
Sample mean = 7.1 hours.
If we repeatedly took
samples of 190 and each
time computed the
sample mean, the
histogram of the
resulting sample mean
values would look like
this histogram.
V. Katch Movement Science 250
19
Sampling Distributions
for Sample Proportions
• Suppose (unknown to us) 40% of a population
carry the gene for a disease, (p = 0.40).
• We will take a random sample of 25 people from
this population and count X = number with gene.
• Although we expect (on average) to find 10 people
(40%) with the gene, we know the number will
vary for different samples of n = 25.
• In this case, X is a binomial random variable
with n = 25 and p = 0.4.
V. Katch Movement Science 250
20
Many Possible Samples
Four possible random samples of 25 people:
Sample 1: X =12, proportion with gene =12/25 = 0.48 or 48%.
Sample 2: X = 9, proportion with gene = 9/25 = 0.36 or 36%.
Sample 3: X = 10, proportion with gene = 10/25 = 0.40 or 40%.
Sample 4: X = 7, proportion with gene = 7/25 = 0.28 or 28%.
Note:
• Each sample gave a different answer, which did not
always match the population value of 40%.
• Although we cannot determine whether one sample will
accurately reflect the population, statisticians have
determined what to expect for most possible samples.
V. Katch Movement Science 250
21
The Normal Curve Approximation
Rule for Sample Proportions
Let p = population proportion of interest
or binomial probability of success.
Let p̂ = sample proportion or proportion of successes.
If numerous random samples or repetitions of the same size n
are taken, the distribution of possible values of p̂ is
approximately a normal curve distribution with
• Mean = p
p (1  p )
• Standard deviation = s.d.( p̂ ) =
n
This approximate distribution is sampling distribution of p̂ .
V. Katch Movement Science 250
22
The Normal Curve Approximation
Rule for Sample Proportions
Normal Approximation Rule can be applied in two situations:
Situation 1: A random sample is taken from a population.
Situation 2: A binomial experiment is repeated numerous times.
In each situation, three conditions must be met:
Condition 1: The Physical Situation
There is an actual population or repeatable situation.
Condition 2: Data Collection
A random sample is obtained or situation repeated many times.
Condition 3: The Size of the Sample or Number of Trials
The size of the sample or number of repetitions is relatively large,
np and np(1-p) must be at least 5 and preferable at least 10.
V. Katch Movement Science 250
23
Examples for which Rule Applies
• Election Polls: to estimate proportion who favor a
candidate; units = all voters.
• Television Ratings: to estimate proportion of
households watching TV program; units = all households
with TV.
• Consumer Preferences: to estimate proportion of
consumers who prefer new recipe compared with old;
units = all consumers.
• Testing ESP: to estimate probability a person can
successfully guess which of 5 symbols on a hidden card;
repeatable situation = a guess.
V. Katch Movement Science 250
24
Example: Possible Sample Proportions
Favoring a Candidate
Suppose 40% all voters favor Candidate X. Pollsters take a
sample of n = 2400 voters. Rule states the sample proportion
who favor X will have approximately a normal distribution with
mean = p = 0.4 and s.d.( p̂ ) =
p (1  p )
n

0.4(1  0.4)
2400
 0.01
Histogram at right
shows sample
proportions resulting
from simulating this
situation 400 times.
V. Katch Movement Science 250
25
Estimating the Population Proportion
from a Single Sample Proportion
In practice, we don’t know the true population proportion p,
so we cannot compute the standard deviation of p̂ ,
s.d.( p̂ ) =
p (1  p )
n
.
In practice, we only take one random sample, so we only have
one sample proportion p̂ . Replacing p with p̂ in the standard
deviation expression gives us an estimate that is called the
standard error of p̂ .
s.e.( p̂ ) =
pˆ (1  pˆ )
n
.
If p̂ = 0.39 and n = 2400, then the standard error is 0.01. So
the true proportion who support the candidate is almost surely
between 0.39 – 3(0.01) = 0.36 and 0.39 + 3(0.01) = 0.42.
V. Katch Movement Science 250
26
Estimating the Population Proportion
from a Single Sample Proportion
Review
Value of interest is proportion falling into
one category of a categorical variable or
the probability of success in a binomial
experiment.
If we know the size of the sample and the
magnitude of the true proportion, we can
determine an interval of values that was
likely to cover the sample proportion.
V. Katch Movement Science 250
27
What to Expect of Sample Means
• Suppose we are interested in estimating
the mean of a quantitative variable in a
population of millions of people.
• If we sample 25 people and compute the
mean of the sample, how close will that
sample mean be to the population mean?
• Each time we take a sample we will get a
different sample mean.
• Can we say anything about what we
expect those means to be?
V. Katch Movement Science 250
28
What to Expect of Sample Means
• Suppose we want to estimate the mean weight loss
for all who attend clinic for 10 weeks. Suppose
(unknown to us) the distribution of weight loss is
approximately Mean = 8 pounds, SD = 5
pounds).
• We will take a random sample of 25 people from
this population and record for each X = weight
loss.
• We know the value of the sample mean will vary
for different samples of n = 25.
• What do we expect those means to be?
V. Katch Movement Science 250
29
Many Possible Samples
Four possible random samples of 25 people:
Sample 1: Mean = 8.32 pounds, standard deviation = 4.74 lbs.
Sample 2: Mean = 6.76 pounds, standard deviation = 4.73 lbs.
Sample 3: Mean = 8.48 pounds, standard deviation = 5.27 lbs.
Sample 4: Mean = 7.16 pounds, standard deviation = 5.93 lbs.
Note:
• Each sample gave a different answer, which did not always
match the population mean of 8 pounds.
• Although we cannot determine whether one sample mean
will accurately reflect the population mean, statisticians
have determined what to expect for most possible sample
means.
V. Katch Movement Science 250
30
The Normal Curve Approximation
Rule for Sample Means
Let m = mean for population of interest.
Let  = standard deviation for population of interest.
Let x = sample mean.
If numerous random samples of the same size n are taken, the
distribution of possible values of x is approximately a normal
curve distribution with
• Mean = m

• Standard deviation = s.d.( x ) =
n
This approximate distribution is sampling distribution of x .
V. Katch Movement Science 250
31
The Normal Curve Approximation
Rule for Sample Means
Normal Approximation Rule can be applied in two situations:
Situation 1: The population of measurements of interest is
bell-shaped and a random sample of any size is measured.
Situation 2: The population of measurements of interest is not
bell-shaped but a large random sample is measured.
Note: Difficult to get a Random Sample?
Researchers usually willing to use Rule as long
as they have a representative sample with no
obvious sources of confounding or bias.
V. Katch Movement Science 250
32
Examples for which Rule Applies
• Average Weight Loss: to estimate average weight
loss; weight assumed bell-shaped; population = all
current and potential clients.
• Average Age At Death: to estimate average age at
which left-handed adults (over 50) die; ages at death not
bell-shaped so need n  30; population = all left-handed
people who live to be at least 50.
• Average Student Income: to estimate mean monthly
income of students at university who work; incomes not
bell-shaped and outliers likely, so need large random
sample of students; population = all students at university
who work.
V. Katch Movement Science 250
33
Example: Hypothetical Mean Weight Loss
Suppose the distribution of weight loss is approximately µ=
8 pounds, s.d. = 5 lbs) and we will take a random sample
of n = 25 clients. Rule states the sample mean weight loss
will have a normal distribution with

5

 1 pound
mean = m = 8 pounds and s.d.( x ) =
n
25
Histogram at right shows
sample means resulting
from simulating this
situation 400 times.
Empirical Rule:
It is almost certain that
the sample mean will be
between 5 and 11 pounds.
V. Katch Movement Science 250
-3 sd
+3 sd
34
Standard Error of the Mean
In practice, the population standard deviation  is rarely
known, so we cannot compute the standard deviation of µ , so
we use the sd of the sample.

s.d.( x ) =
.
n
In practice, we only take one random sample, so we only have
the sample mean x and the sample standard deviation s.
Replacing  with s in the standard deviation expression gives
us an estimate that is called the standard error of x .
s
s.e.( x ) =
.
n
For a sample of n = 25 weight losses,
the standard deviation is s = 4.74 pounds.
So the standard error of the mean is 0.948 pounds.
V. Katch Movement Science 250
35
Increasing the Size of the Sample
Suppose we take n = 100 people instead of just 25.
The standard deviation of the mean would be
s.d.( x ) =

n

5
 0.5 pounds.
100
• For samples of n = 25,
sample means are likely to
range between 8 ± 3 pounds
=> 5 to 11 pounds.
• For samples of n = 100,
sample means are likely to
range only between 8 ± 1.5
pounds => 6.5 to 9.5 pounds.
Larger samples result in more accurate estimates
of population values than smaller samples.36
V. Katch Movement Science 250
Sampling for a Long, Long Time:
The Law of Large Numbers
LLN: the sample mean x will eventually get
“close” to the population mean m no matter how
small a difference you use to define “close.”
LLN = peace of mind to casinos, insurance companies.
• Eventually, after enough gamblers or customers,
the mean net profit will be close to the theoretical mean.
• Price to pay = must have enough $ on hand to pay the
occasional winner or claimant.
V. Katch Movement Science 250
37
Central Limits Theorem
The Central Limit Theorem states that if n is
sufficiently large, the sample means of random
samples from a population with mean m and
finite standard deviation  are approximately
normally distributed with mean m and
standard deviation  n .
Technical Note:
The mean and standard deviation given in the CLT hold
for any sample size; it is only the “approximately normal”
shape that requires n to be sufficiently large.
V. Katch Movement Science 250
38
Central Limit Theorem
Conclusions:
1. The distribution of sample x will, as the
sample size increases, approach a normal
distribution.
2. The mean of the sample means will be the
population mean µ.
3. The standard deviation of the sample means
n
will approach 
V. Katch Movement Science 250
39
Practical Rules Commonly Used:
For samples of size n larger than 30, the distribution
of the sample means can be approximated
reasonably well by a normal distribution. The
approximation gets better as the sample size n
becomes larger.
If the original population is itself normally
distributed, then the sample means will be
normally distributed for any sample size n (not
just the values of n larger than 30).
V. Katch Movement Science 250
40
Notation
the mean of the sample means
µx = µ
the standard deviation of sample mean

x = n
(often called standard error of the mean)
V. Katch Movement Science 250
41
Distribution of 200 digits from
Social Security Numbers
Frequency
(Last 4 digits from 50 students)
20
10
0
0
1
2
3
4
5
6
7
8
9
Distribution of 200 digits
V. Katch Movement Science 250
42
x
SSN digits
1
5
9
5
9
4
7
9
5
7
2
6
2
2
5
0
2
7
8
5
V. Katch Movement Science 250
8
3
8
1
3
2
7
1
3
3
7
7
3
4
4
4
5
1
3
6
6
3
8
2
3
6
1
5
3
4
6
7
3
7
3
3
8
3
7
6
4
6
8
5
5
2
6
4
9
4.75
4.25
8.25
3.25
5.00
3.50
5.25
4.75
5.00
2
6
1
9
5
7
8
6
4
0
7
4.00
5.25
4.25
4.50
4.75
3.75
5.25
3.75
4.50
6.00
43
Frequency
Distribution of 50 Sample Means
for 50 Students
15
10
5
0
0
V. Katch Movement Science 250
1
2
3
4
5
6
7
8
9
44
Sampling Distribution Rule
As the sample size
increases, the sampling
distribution of sample
means approaches a
normal distribution.
V. Katch Movement Science 250
45
Example: Given the population of women has normally
distributed weights with a mean of 143 lb and a standard
deviation of 29 lb,
a.) if one woman is randomly selected, the probability that
her weight is greater than 150 lb. is 0.4052.
0.5 - 0.0948 = 0.4052 -Katch table
1-0.5948=0.4052-Utts table
0.0948
m = 143
= 29
V. Katch Movement Science 250
0
150
Z=150-143/29
Z = 0.24
0.24
46
Example: Given the population of women has normally
distributed weights with a mean of 143 lb and a standard
deviation of 29 lb, if 36 different women are randomly
selected, the probability that their mean weight is greater
than 150 lb is 0.0735.
z = 150-143 = 1.45
29
36
0.5 - 0.4265 = 0.0735
0.4265
mx = 143
x = 4.83333
V. Katch Movement Science 250
0
150
1.45
47
Example: Given the population of women has normally
distributed weights with a mean of 143 lb and a standard
deviation of 29 lb,
a.) if one woman is randomly selected, find the probability that her
weight is greater than 150 lb.
P(x > 150) = 0.4052
b.) if 36 different women are randomly selected, their mean weight is
greater than 150 lb.
P(x > 150) = 0.0735
It is much easier for an individual to deviate from the mean than it is
for a group of 36 to deviate from the mean.
V. Katch Movement Science 250
48
Example: California Decco Winnings
California Decco lottery game: mean amount lost per
ticket over millions of tickets sold is m = $0.35; standard
deviation  = $29.67 => large variability in possible
amounts won/lost, from net win of $4999 to net loss of $1.
Suppose store sells 100,000 tickets in a year. CLT =>
distribution of possible sample mean loss per ticket
is approximately normal with …
mean (loss) = m = $0.35and s.d.( x) =

$29.67

 $0.09
n
100000
Empirical Rule: The mean loss is almost surely between
$0.08 and $0.62 => total loss for the 100,000 tickets is
likely between $8,000 to $62,000!
There are better ways to invest $100,000.
V. Katch Movement Science 250
49
Mini Test
Assume that the population
of human body temp has a
mean = 98.6 (SD=0.62). If
sample of size n=106 is
randomly selected, find the
probability of getting a mean
of 98.2 or lower)
V. Katch Movement Science 250
50
Solution:
µx= µ = 98.6 (by assumption)
x=  n = 0.62/ 106 = 0.0602197
Z = x - µx / x = 98.20 - 98.6/ 0.0602197
Z = -6.64
Look up on table to find P(-6.64) = 0.0001
We conclude:
V. Katch Movement Science 250
51
Sampling Distribution for Any
Statistic
Every statistic has a sampling distribution,
but the appropriate distribution may not always
be normal, or even approximately bell-shaped.
Construct an approximate sampling distribution
for a statistic by actually taking repeated samples
of the same size from a population and constructing
a relative frequency histogram for the values of the
statistic over the many samples.This is not
always possible.
V. Katch Movement Science 250
52
Student’s t-Distribution
For small sample sizes the approximations for
CLT does not hold - the standardized statistics do
not exactly conform to the standard normal
distribution… so we can use a different standard
distribution to approximate the sample
distribution.
We use the t-distribution to approximate the
normal distribution. T-distribution has a bell
shape with a mean = 0; the sd is slightly different
than 1.0, but close.
V. Katch Movement Science 250
53
Student t Distributions for
n = 3 and n = 12
V. Katch Movement Science 250
54
Student t Distribution
If the distribution of a population is
essentially normal, then the distribution of
t =
x-µ
s
n
 is essentially a Student t Distribution for all
samples of size n, and is used to find critical
values denoted by t/2.
V. Katch Movement Science 250
55
Student’s t-Distribution:Replacing 
with s
Dilemma: we generally don’t know (pop SD). Using s we
have:
xm
xm
n (x  m)
t 


s.d .( x )
s
/ n
If the sample size n is small,
this standardized statistic will
not have a N(0,1) distribution
but rather a t-distribution with
n – 1 degrees of freedom (df).
V. Katch Movement Science 250
56
Degrees of Freedom (df )
Corresponds to the number of
sample values that can vary after
certain restrictions have been
imposed on all data values
df = n – 1
V. Katch Movement Science 250
57
Using the Normal and t Dist
V. Katch Movement Science 250
58
 Not Known
Assumptions
1. The sample is a simple random sample.
2. Either the sample is from a normally
distributed population, or n > 30.
Use Student t distribution
V. Katch Movement Science 250
59
Important Properties of the Student t Dist
1. The Student t distribution is different for different
sample sizes
2. The Student t distribution has the same general
symmetric bell shape as the normal distribution but it
reflects the greater variability (with wider
distributions) that is expected with small samples.
3. The Student t distribution has a mean of t = 0 (just as
the standard normal distribution has a mean of z = 0).
4. The standard deviation of the Student t distribution
varies with the sample size and is greater than 1 (unlike
the standard normal distribution, which has a  = 1).
5. As the sample size n gets larger, the Student t
distribution gets closer to the normal distribution.
V. Katch Movement Science 250
60
Example. Standardized Mean Weights
Claim: mean weight loss is m = 8 pounds.
Sample of n =25 people gave a sample mean
weight loss of x = 8.32 pounds and a sample
standard deviation of s = 4.74 pounds.
Is the sample mean weight loss of 8.32 pounds
reasonable to expect if m = 8 pounds?
t
x m
s
n

8.328
4.74
25
 0.34
The sample mean of 8.32 is only about one-third
of a standard error above 8, which is consistent
with a population mean weight loss of 8 pounds.
V. Katch Movement Science 250
61
Statistical Inference [make conclusions
about populations based on samples]
• Confidence Intervals: uses sample data
to provide an interval of values that the researcher
is confident covers the true value for the
population.
• Hypothesis Testing or Significance Testing:
uses sample data to attempt to reject the
hypothesis that nothing interesting is happening,
i.e. to reject the notion that
chance alone can explain the sample results.
V. Katch Movement Science 250
62