Transcript document
Random samples and estimation
Chapter 9: Random samples & sampling distributions
Samples and populations
Χ2, t, and F distributions
Chapter 10: Parameter estimation
Point estimation
Standard error of a statistic
Method of maximum likelihood
Method of moments
One-sample and two-sample confidence interval estimation
Foundation for understanding the next few chapters
1
ETM 620 - 09U
Ch. 9: Populations and samples
Population: “a group of individual persons, objects, or items
from which samples are taken for statistical measurement”
Sample: “a finite part of a statistical population whose properties
are studied to gain information about the whole”
(Merriam-Webster Online Dictionary, http://www.m-w.com/, October 5, 2004)
2
ETM 620 - 09U
Examples
Population
Students pursuing graduate
engineering degrees
Cars capable of speeds in
excess of 160 mph.
Potato chips produced at the
Frito-Lay plant in Kathleen
Freshwater lakes and rivers
Samples
In general, (x1, x2, x3, …, xn) are random samples of size n if:
the x’s are independent random variables
every observation is equally likely (has the same probability)
3
ETM 620 - 09U
Sampling distributions
If we conduct the same experiment several times with the same
sample size, the probability distribution of the resulting statistic is
called a sampling distribution
Sampling distribution of the mean: if n observations are taken from a
normal population with mean μ and variance σ2, then:
x
...
n
2
2
2
2
2
...
x2
2
n
n
4
ETM 620 - 09U
An important consideration …
x will be different for every sample
For example, suppose we know the time to complete a typical
homework problem, in minutes, is known to be uniformly
distributed between 5 and 25. Four people are asked to record
the time it takes them to complete each of 31 different
problems.
x
5
ETM 620 - 09U
Individual data points
Problem #
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
6
1
12.64
22.69
22.26
5.65
10.70
12.44
13.52
24.82
19.10
11.00
6.49
14.74
5.81
7.01
21.18
20.12
16.05
24.41
21.11
7.30
24.73
15.02
5.76
16.69
9.01
11.00
23.08
15.33
20.78
17.39
22.01
2
7.01
24.17
7.77
8.28
11.86
12.11
11.08
10.13
21.33
20.00
8.97
15.22
9.61
10.13
19.49
17.53
19.23
18.74
10.24
6.19
23.51
18.50
20.93
8.04
9.12
21.04
5.78
10.13
18.52
19.44
16.14
3
16.93
5.29
9.90
9.39
16.07
23.21
24.51
24.03
24.45
12.03
6.28
12.47
5.10
20.51
6.70
8.47
16.10
15.58
8.56
20.23
23.08
14.80
18.43
22.84
11.68
18.92
19.18
10.83
20.11
24.36
22.46
4
22.98
13.15
5.91
5.34
12.15
14.32
21.13
6.07
14.33
20.51
12.17
24.72
23.52
18.59
7.65
13.10
8.62
20.93
22.34
19.77
15.90
7.92
19.63
12.56
11.50
10.43
14.07
21.04
23.97
12.37
13.82
Histogram - Uniform Distribution
18
16
14
12
10
8
6
4
2
0
5.1
6.9
8.7
10.5
12.3
14.1
15.9
17.7
19.4
21.2
23.0
24.8
μ = __________________
σ2 = _________________
σ = __________________
ETM 620 - 09U
Sample means
Problem #
7
1
2
3
4
1
2
3
4
12.64
22.69
22.26
5.65
7.01
24.17
7.77
8.28
16.93
5.29
9.90
9.39
22.98
13.15
5.91
5.34
average
14.89
16.32
11.46
7.17
5
6
7
8
9
10.70
12.44
13.52
24.82
19.10
11.86
12.11
11.08
10.13
21.33
16.07
23.21
24.51
24.03
24.45
12.15
14.32
21.13
6.07
14.33
12.70
15.52
17.56
16.26
19.80
10
11
12
13
11.00
6.49
14.74
5.81
20.00
8.97
15.22
9.61
12.03
6.28
12.47
5.10
20.51
12.17
24.72
23.52
15.89
8.48
16.79
11.01
6
14
15
16
7.01
21.18
20.12
10.13
19.49
17.53
20.51
6.70
8.47
18.59
7.65
13.10
14.06
13.75
14.81
0
17
18
19
20
21
16.05
24.41
21.11
7.30
24.73
19.23
18.74
10.24
6.19
23.51
16.10
15.58
8.56
20.23
23.08
8.62
20.93
22.34
19.77
15.90
15.00
19.91
15.56
13.37
21.80
22
23
24
25
15.02
5.76
16.69
9.01
18.50
20.93
8.04
9.12
14.80
18.43
22.84
11.68
7.92
19.63
12.56
11.50
14.06
16.19
15.03
10.33
26
27
28
11.00
23.08
15.33
21.04
5.78
10.13
18.92
19.18
10.83
10.43
14.07
21.04
15.35
15.53
14.33
29
30
31
20.78
17.39
22.01
18.52
19.44
16.14
20.11
24.36
22.46
23.97
12.37
13.82
20.84
18.39
18.61
Histogram - Sample Means
16
14
12
10
8
4
2
7.2
10.1
13.0
15.9
18.9
21.8
x = __________________
x2 = _________________
x = __________________
ETM 620 - 09U
Central Limit Theorem
Given:
X : the mean of a random sample of size n taken from a population
with mean μ and finite variance σ2,
Then,
the limiting form of the distribution of
X
Z
,n
/ n
is _________________________
8
ETM 620 - 09U
Central Limit Theorem
If the population is known to be normal, the sampling
distribution of X will follow a normal distribution.
Even when the distribution of the population is not normal,
the sampling distribution of X is normal when n is large.
NOTE: when n is not large, we cannot assume the distribution of X
is normal.
9
ETM 620 - 09U
Sampling distribution of S2 : Χ2
Given:
Z12, Z22, … , Zk2 normally distributed random variables, with mean
μ and standard deviation σ = 1.
2
2
2
2
Z
Z
...
Z
Then,
1
2
k
follows a χ2 distribution with k degrees of freedom and
distribution function,
f (u )
μ=k
10
1
k /2 k
2
2
u ( k /2)1e u /2 , u 0.
(eq. 9-15, pg. 208)
σ2 = 2k
ETM 620 - 09U
χ2 Distribution
χ2
χα2 represents the χ2 value above which we find an area of α, that is,
for which P(χ2 > χα2 ) = α.
In Excel, =CHIDIST(x,degrees_freedom)
χ2 is additive, so if Y =∑ χi2 , then kY =∑ki
Sample variance, ( n 1) s 2
2
~
2
11
ETM 620 - 09U
Student’s t Distribution
If Z ~N(0,1) and V is a chi-square random variable with k degrees
of freedom, then
Z
T
V /k
follows a t-distribution with k degrees of freedom. The
probability density function is,
k 1
1
2
f (t )
,
( k 1) /2
2
k k /2 t
1
k
12
t
ETM 620 - 09U
t- Distribution
Example 9-7 shows that
X
T
S/ n
follows a t distribution. In other words, x ~t(n-1) when σ is not
know but is estimated by s.
In Excel, =TDIST(x,degrees_freedom,tails) gives the probability
associated with getting a value above x (tails = 1) or outside +x (tails
=2). =TINV(probability,degrees_freedom) gives the value associated with
a desired probability, α.
13
ETM 620 - 09U
F-Distribution
Given:
S12 and S22, the variances of independent random samples of size n1
and n2 taken from normal populations with variances σ12 and σ22,
respectively,
Then,
S12 / 12 22S12
F 2 2 2 2
S2 / 2 1 S2
follows an F-distribution with ν1 = n1 - 1 and ν2 = n2 – 1 degrees of
freedom.
Table V, pp 605-609 gives F-values associated with given α values.
In Excel, =FDIST(x,degrees_freedom1,degrees_freedom2) gives
probability associated with a given x-value, while
=FINV(probability,degrees_freedom1,degrees_freedom2) gives F-value
associated with a given α.
14
ETM 620 - 09U
Ch. 10: Parameter estimation
Example: Say we have 5 numbers from a random sample, as
follows:
19, 58, 31, 44, 43
̅x = ____________________ is an estimate of μ
s2 = _____________________ is an estimate of σ2
We want to use “good” estimators (unbiased, minimum error)
Unbiased, i.e. E(̂θ) = θ (e.g., E(̅x) = ___, and E(S2) = __)
Minimum error,
MSE(θ̂ - θ) = E(θ̂ - θ)2 = Var(θ̂ )
15
ETM 620 - 09U
Finding good estimators
Method of maximum likelihood
take n random samples (x1, x2, x3, .., xn) from a distribution with
function f(x,θ)
Likelihood function, L(θ) = f(x1,θ) ∙ f(x2,θ) ∙ f(x3,θ) ∙ ∙ ∙ f(xn,θ)
Take the derivative with respect to θ and set to 0.
See example 10-4, pg. 222
not always unbiased, but can be modified to make it so.
Method of moments
First k moments about the origin of any function is
't E(X t ) x t f ( x;1, 2 ,..., k )dx,
t 1, 2, ..., k
Can produce good estimators, but sometimes not as good as MLE
16
(for example).
ETM 620 - 09U
Interval estimation
(1 – α)100% confidence interval for the unknown parameter
For some statistic, θ (e.g., μ) looking for L and U such that
P{L < θ < U} = 1 – α
17
or
_______________
or
________________
ETM 620 - 09U
Single sample: Estimating the mean
Given:
σ is known and X is the mean of a random sample of size n,
Then,
the (1 – α)100% confidence interval for μ is given by
X z /2 (
n
) X z /2 (
n
)
Z
18
ETM 620 - 09U
Example: mean with known variance
A random sample of size 25 is taken from a normal distribution
with unknown mean and known variance of 4 (i.e., N(μ,4)). X
of the sample is determined to be 13.2. What is the 90%
confidence interval around the mean?
19
ETM 620 - 09U
What does this mean?
Measure of the precision of the estimate
Length of the interval is a function of
confidence level
variance
sample size
Can vary n to decrease the length of the interval for the same
confidence level.
2
z /2
n
E
For our example, suppose we want an error of 0.25 or less.
Then,
n = ___________________________________________
20
ETM 620 - 09U
What if σ2 is unknown?
If n is sufficiently large (> _______), then the large sample confidence
interval is:
s
X z /2 ( )
n
Otherwise, must use the t-statistic …
21
21
EGR
ETM252
620- -Ch.
09U
9
Single sample estimate of the mean
(σ unknown, n not large)
Given:
σ is unknown and X is the mean of a random sample of size n
(where n is not large),
Then,
the (1 – α)100% confidence interval for μ is given by
s
s
X t /2,n 1 ( ) X t /2,n 1 ( )
n
n
-5
22
22
-4
-3
-2
-1
0
1
2
3
4
5
EGR
ETM252
620- -Ch.
09U
9
Example
A traffic engineer is concerned about the delays at an intersection near
a local school. The intersection is equipped with a fully actuated
(“demand”) traffic light and there have been complaints that traffic on
the main street is subject to unacceptable delays.
To develop a benchmark, the traffic engineer randomly samples 25
stop times (in seconds) on a weekend day. The average of these times
is found to be 13.2 seconds, and the sample variance, s2, is found
to be 4 seconds2.
Based on this data, what is the 95% confidence interval (C.I.) around
the mean stop time during a weekend day?
23
23
EGR
ETM252
620- -Ch.
09U
9
Example (cont.)
X = ______________
s = _______________
α = ________________
α/2 = _____________
t0.025,24 = _____________
__________________ < μ < ___________________
24
24
EGR
ETM252
620- -Ch.
09U
9
C.I. on the variance
Given that
2
( n 1)s 2
2
is ~ Χ2 with n-1 degrees of freedom.
then,
( n 1)S 2
2 /2,n 1
2
( n 1)S 2
12 /2,n 1
gives the 100(1-α)% two-sided confidence interval on the
variance.
25
ETM 620 - 09U
Confidence interval on a proportion
The proportion, P, in a binomial experiment may be estimated
by
X
P
n
where X is the number of successes in n trials.
For a sample, the point estimate of the parameter is
x
p
n
The mean for the sample proportion is p
pq
and the sample variance is
n
p
2
26
p
ETM 620 - 09U
C.I. for proportions
An approximate (1-α)100% confidence interval for p is:
p z /2
pq
n
Large-sample C.I. for p1 – p2 is:
( p1 p2 ) z /2
p1 q1 p2 q2
n1
n2
Interpretation: _______________________________
27
ETM 620 - 09U
Example 10.17 (pg. 240)
n = 75 x = 12
pˆ ____________
z0.025= ________
Picture:
C.I.:
Interpretation: ____________________________________
28
ETM 620 - 09U
Setting the sample size …
If the estimate for p from the initial estimate seems pretty
reliable, then
2
z
n /2 pˆ(1 pˆ )
E
e.g., for our example if we want to be 95% confident that the
error in our estimate is less than 0.05, then
n = __________________
If we’re not at all sure how to estimate p, then assume p = 0.5
and use
2
z
n /2 0.25
E
29
ETM 620 - 09U
Example: comparing 2 proportions
Look at example 10-23, pg. 250
30
1.
C.I. = (-0.07, 0.15), therefore no reason to believe there is a
significant decrease in the proportion defectives using the new
process.
2.
What if the interval were (+0.07, 0.15)?
3.
What if the interval were (-0.9, -0.7)?
ETM 620 - 09U
Difference in 2 means, both σ2 known
Given two independent random samples, a point estimate the
difference between μ1 and μ2 is given by the statistic
x1 x2
We can build a confidence interval for μ1 - μ2 (given σ12 and σ22
known) as follows:
( x1 x2 ) z /2
31
12
n1
22
n2
1 2 ( x1 x2 ) z /2
12
n1
22
n2
ETM 620 - 09U
An example
A farm equipment manufacturer wants to compare the average
daily downtime of two sheet-metal stamping machines located in
two different factories. Investigation of company records for 100
randomly selected days on each of the two machines gave the
following results:
̅x1 = 12 minutes
12 = 12
n1 = n2 = 100
̅x2 = 10 minutes
22 = 8
Construct a 95% C.I. for μ1 – μ2
32
ETM 620 - 09U
Solution
α/2 = _____________
Picture
z_____ = ____________
( x1 x2 ) z /2
12
n1
22
n2
1 2 ( x1 x2 ) z /2
12
n1
22
n2
__________________ < μ1 – μ2 < _________________
Interpretation:
33
ETM 620 - 09U
Differences in 2 means, σ2 unknown
Case 1: σ12 and σ22 unknown but equal
( x1 x 2 ) t /2,n1 n2 2S p
1 1
1 1
1 2 ( x1 x 2 ) t /2,n1 n2 2S p
n1 n2
n1 n2
Where,
2
2
(
n
1
)
S
(
n
1
)
S
1
2
2
S 2p 1
n1 n2 2
34
ETM 620 - 09U
Differences in 2 means, σ2 unknown
Case 2: σ12 and σ22 unknown and not equal
( x1 x 2 ) t /2,
s12 s22
1 2 ( x1 x 2 ) t /2,
n1 n2
Where,
35
s12 s22
n1 n2
(S12 / n1 S22 / n2 )2
2
2
2
2
S1 / n1
S 2 / n2
n1 1 n2 1
ETM 620 - 09U
Example, σ2 unknown
Suppose the farm equipment manufacturer was unable to
gather data for 100 days. Using the data they were able to
gather, they would still like to compare the downtime for the
two machines. The data they gathered is as follows:
x1 = 12 minutes
s12 = 12
n1 = 18
x2 = 10 minutes
s22 = 8
n2 = 14
Construct a 95% C.I. for μ1 – μ2 assuming:
1. σ12 and σ22 unknown but equal
2. σ12 and σ22 unknown and not equal
36
ETM 620 - 09U
Solution: Case 1
x1 x2 _____________
Picture
t____ , ________= ____________
S 2p
( n1 1)S12 ( n2 1)S22
_____________________
n1 n2 2
( x1 x 2 ) t /2,n1 n2 2S p
1 1
1 1
1 2 ( x1 x 2 ) t /2,n1 n2 2S p
n1 n2
n1 n2
__________________ < μ1 – μ2 < _________________
Interpretation:
37
ETM 620 - 09U
Your turn …
Solve Case 2 (assuming variances are not equal)
38
ETM 620 - 09U
Paired Observations
Suppose we are evaluating observations that are not
independent …
For example, suppose a teacher wants to compare results of a
pretest and posttest administered to the same group of
students.
Paired-observation or Paired-sample test …
Example: murder rates in two consecutive years for several US
cities (see attached.) Construct a 90% confidence interval
around the difference in consecutive years.
39
ETM 620 - 09U
Solution
Picture
D = ____________
tα/2, n-1 = _____________
2
(d i d )
sd
_________
n 1
sd
sd
) D d t /2,n 1 ( )
a (1-α)100% CI for μD is: d t /2,n 1 (
n
n
__________________ < μ1 – μ2 < _________________
Interpretation:
40
ETM 620 - 09U
C. I. for the ratio of two variances
If X1 and X2 are independent normal random variables with
unknown and unequal means and variances, then the confidence
interval on the ratio σ12/ σ22 is given by:
S12
12 S12
F
2 2 F /2,n2 1,n1 1
2 1 /2,n2 1,n1 1
S2
2 S2
Note: for F-values not given in table V, recall that
F1 /2,n2 1,n1 1
1
F /2,n1 1,n2 1
or use = FINV(probability,degrees_freedom1,degrees_freedom2)
41
ETM 620 - 09U
Example 10-22
n1 = 12, s1 = 0.85
n2 = 15, s2 = 0.98
F____ , ____ , ____= ____________
Picture
F____ , ____ , ____= ____________
S12
12 S12
F
2 2 F /2,n2 1,n1 1
2 1 /2,n2 1,n1 1
S2
2 S2
__________________ < σ12/ σ22 < _________________
Interpretation:
42
ETM 620 - 09U