Transcript document

Random samples and estimation
 Chapter 9: Random samples & sampling distributions
 Samples and populations
 Χ2, t, and F distributions
 Chapter 10: Parameter estimation
 Point estimation
 Standard error of a statistic
 Method of maximum likelihood
 Method of moments
 One-sample and two-sample confidence interval estimation
 Foundation for understanding the next few chapters
1
ETM 620 - 09U
Ch. 9: Populations and samples
 Population: “a group of individual persons, objects, or items
from which samples are taken for statistical measurement”
 Sample: “a finite part of a statistical population whose properties
are studied to gain information about the whole”
(Merriam-Webster Online Dictionary, http://www.m-w.com/, October 5, 2004)
2
ETM 620 - 09U
Examples
Population
 Students pursuing graduate
engineering degrees
 Cars capable of speeds in
excess of 160 mph.
 Potato chips produced at the
Frito-Lay plant in Kathleen
 Freshwater lakes and rivers
Samples




 In general, (x1, x2, x3, …, xn) are random samples of size n if:
 the x’s are independent random variables
 every observation is equally likely (has the same probability)
3
ETM 620 - 09U
Sampling distributions
 If we conduct the same experiment several times with the same
sample size, the probability distribution of the resulting statistic is
called a sampling distribution
 Sampling distribution of the mean: if n observations are taken from a
normal population with mean μ and variance σ2, then:
x 
      ...  

n
2
2
2
2
2






...


 x2 

2
n
n
4
ETM 620 - 09U
An important consideration …

x will be different for every sample
 For example, suppose we know the time to complete a typical
homework problem, in minutes, is known to be uniformly
distributed between 5 and 25. Four people are asked to record
the time it takes them to complete each of 31 different
problems.
x
5
ETM 620 - 09U
Individual data points
Problem #
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
6
1
12.64
22.69
22.26
5.65
10.70
12.44
13.52
24.82
19.10
11.00
6.49
14.74
5.81
7.01
21.18
20.12
16.05
24.41
21.11
7.30
24.73
15.02
5.76
16.69
9.01
11.00
23.08
15.33
20.78
17.39
22.01
2
7.01
24.17
7.77
8.28
11.86
12.11
11.08
10.13
21.33
20.00
8.97
15.22
9.61
10.13
19.49
17.53
19.23
18.74
10.24
6.19
23.51
18.50
20.93
8.04
9.12
21.04
5.78
10.13
18.52
19.44
16.14
3
16.93
5.29
9.90
9.39
16.07
23.21
24.51
24.03
24.45
12.03
6.28
12.47
5.10
20.51
6.70
8.47
16.10
15.58
8.56
20.23
23.08
14.80
18.43
22.84
11.68
18.92
19.18
10.83
20.11
24.36
22.46
4
22.98
13.15
5.91
5.34
12.15
14.32
21.13
6.07
14.33
20.51
12.17
24.72
23.52
18.59
7.65
13.10
8.62
20.93
22.34
19.77
15.90
7.92
19.63
12.56
11.50
10.43
14.07
21.04
23.97
12.37
13.82
Histogram - Uniform Distribution
18
16
14
12
10
8
6
4
2
0
5.1
6.9
8.7
10.5
12.3
14.1
15.9
17.7
19.4
21.2
23.0
24.8
 μ = __________________
 σ2 = _________________
 σ = __________________
ETM 620 - 09U
Sample means
Problem #
7
1
2
3
4
1
2
3
4
12.64
22.69
22.26
5.65
7.01
24.17
7.77
8.28
16.93
5.29
9.90
9.39
22.98
13.15
5.91
5.34
average
14.89
16.32
11.46
7.17
5
6
7
8
9
10.70
12.44
13.52
24.82
19.10
11.86
12.11
11.08
10.13
21.33
16.07
23.21
24.51
24.03
24.45
12.15
14.32
21.13
6.07
14.33
12.70
15.52
17.56
16.26
19.80
10
11
12
13
11.00
6.49
14.74
5.81
20.00
8.97
15.22
9.61
12.03
6.28
12.47
5.10
20.51
12.17
24.72
23.52
15.89
8.48
16.79
11.01
6
14
15
16
7.01
21.18
20.12
10.13
19.49
17.53
20.51
6.70
8.47
18.59
7.65
13.10
14.06
13.75
14.81
0
17
18
19
20
21
16.05
24.41
21.11
7.30
24.73
19.23
18.74
10.24
6.19
23.51
16.10
15.58
8.56
20.23
23.08
8.62
20.93
22.34
19.77
15.90
15.00
19.91
15.56
13.37
21.80
22
23
24
25
15.02
5.76
16.69
9.01
18.50
20.93
8.04
9.12
14.80
18.43
22.84
11.68
7.92
19.63
12.56
11.50
14.06
16.19
15.03
10.33
26
27
28
11.00
23.08
15.33
21.04
5.78
10.13
18.92
19.18
10.83
10.43
14.07
21.04
15.35
15.53
14.33
29
30
31
20.78
17.39
22.01
18.52
19.44
16.14
20.11
24.36
22.46
23.97
12.37
13.82
20.84
18.39
18.61
Histogram - Sample Means
16
14
12
10
8
4
2
7.2
10.1
13.0
15.9
18.9
21.8

 x = __________________

 x2 = _________________

 x = __________________
ETM 620 - 09U
Central Limit Theorem
 Given:
 X : the mean of a random sample of size n taken from a population
with mean μ and finite variance σ2,
 Then,
 the limiting form of the distribution of
X 
Z
,n  
/ n
is _________________________
8
ETM 620 - 09U
Central Limit Theorem
 If the population is known to be normal, the sampling
distribution of X will follow a normal distribution.
 Even when the distribution of the population is not normal,
the sampling distribution of X is normal when n is large.
 NOTE: when n is not large, we cannot assume the distribution of X
is normal.
9
ETM 620 - 09U
Sampling distribution of S2 : Χ2
 Given:
 Z12, Z22, … , Zk2 normally distributed random variables, with mean
μ and standard deviation σ = 1.

2
2
2
2


Z

Z

...

Z
Then,
1
2
k
follows a χ2 distribution with k degrees of freedom and
distribution function,
f (u ) 
 μ=k
10
1
k /2  k 
2  
2
u ( k /2)1e u /2 , u  0.
(eq. 9-15, pg. 208)
σ2 = 2k
ETM 620 - 09U
χ2 Distribution
χ2
 χα2 represents the χ2 value above which we find an area of α, that is,
for which P(χ2 > χα2 ) = α.
 In Excel, =CHIDIST(x,degrees_freedom)
 χ2 is additive, so if Y =∑ χi2 , then kY =∑ki
 Sample variance, ( n  1) s 2
2
~

2

11
ETM 620 - 09U
Student’s t Distribution
 If Z ~N(0,1) and V is a chi-square random variable with k degrees
of freedom, then
Z
T
V /k
follows a t-distribution with k degrees of freedom. The
probability density function is,
k 1 


1
2 

f (t ) 

,
( k 1) /2
2
k k /2  t

  1
 k

12
t 
ETM 620 - 09U
t- Distribution
 Example 9-7 shows that
X 
T
S/ n
follows a t distribution. In other words, x ~t(n-1) when σ is not
know but is estimated by s.
 In Excel, =TDIST(x,degrees_freedom,tails) gives the probability
associated with getting a value above x (tails = 1) or outside +x (tails
=2). =TINV(probability,degrees_freedom) gives the value associated with
a desired probability, α.
13
ETM 620 - 09U
F-Distribution
 Given:
 S12 and S22, the variances of independent random samples of size n1
and n2 taken from normal populations with variances σ12 and σ22,
respectively,
 Then,
S12 / 12  22S12
F 2 2  2 2
S2 / 2  1 S2
follows an F-distribution with ν1 = n1 - 1 and ν2 = n2 – 1 degrees of
freedom.
 Table V, pp 605-609 gives F-values associated with given α values.
 In Excel, =FDIST(x,degrees_freedom1,degrees_freedom2) gives
probability associated with a given x-value, while
=FINV(probability,degrees_freedom1,degrees_freedom2) gives F-value
associated with a given α.
14
ETM 620 - 09U
Ch. 10: Parameter estimation
 Example: Say we have 5 numbers from a random sample, as
follows:
19, 58, 31, 44, 43
 ̅x = ____________________ is an estimate of μ
 s2 = _____________________ is an estimate of σ2
 We want to use “good” estimators (unbiased, minimum error)
 Unbiased, i.e. E(̂θ) = θ (e.g., E(̅x) = ___, and E(S2) = __)
 Minimum error,
 MSE(θ̂ - θ) = E(θ̂ - θ)2 = Var(θ̂ )
15
ETM 620 - 09U
Finding good estimators
 Method of maximum likelihood
 take n random samples (x1, x2, x3, .., xn) from a distribution with
function f(x,θ)
 Likelihood function, L(θ) = f(x1,θ) ∙ f(x2,θ) ∙ f(x3,θ) ∙ ∙ ∙ f(xn,θ)
 Take the derivative with respect to θ and set to 0.
 See example 10-4, pg. 222
 not always unbiased, but can be modified to make it so.
 Method of moments
 First k moments about the origin of any function is


 't  E(X t )  x t f ( x;1, 2 ,..., k )dx,
t  1, 2, ..., k

 Can produce good estimators, but sometimes not as good as MLE
16
(for example).
ETM 620 - 09U
Interval estimation
 (1 – α)100% confidence interval for the unknown parameter
 For some statistic, θ (e.g., μ) looking for L and U such that
P{L < θ < U} = 1 – α
17
or
_______________
or
________________
ETM 620 - 09U
Single sample: Estimating the mean
 Given:
 σ is known and X is the mean of a random sample of size n,
 Then,
 the (1 – α)100% confidence interval for μ is given by
X  z /2 (

n
)    X  z /2 (

n
)
Z
18
ETM 620 - 09U
Example: mean with known variance
A random sample of size 25 is taken from a normal distribution
with unknown mean and known variance of 4 (i.e., N(μ,4)). X
of the sample is determined to be 13.2. What is the 90%
confidence interval around the mean?
19
ETM 620 - 09U
What does this mean?
 Measure of the precision of the estimate
 Length of the interval is a function of
 confidence level
 variance
 sample size
 Can vary n to decrease the length of the interval for the same
confidence level.
2
 z /2 
n
 E 
 For our example, suppose we want an error of 0.25 or less.
Then,
n = ___________________________________________
20
ETM 620 - 09U
What if σ2 is unknown?
 If n is sufficiently large (> _______), then the large sample confidence
interval is:
s
X  z /2 ( )
n
 Otherwise, must use the t-statistic …
21
21
EGR
ETM252
620- -Ch.
09U
9
Single sample estimate of the mean
(σ unknown, n not large)
 Given:
 σ is unknown and X is the mean of a random sample of size n
(where n is not large),
 Then,
 the (1 – α)100% confidence interval for μ is given by
s
s
X  t /2,n 1 ( )    X  t /2,n 1 ( )
n
n
-5
22
22
-4
-3
-2
-1
0
1
2
3
4
5
EGR
ETM252
620- -Ch.
09U
9
Example
A traffic engineer is concerned about the delays at an intersection near
a local school. The intersection is equipped with a fully actuated
(“demand”) traffic light and there have been complaints that traffic on
the main street is subject to unacceptable delays.
To develop a benchmark, the traffic engineer randomly samples 25
stop times (in seconds) on a weekend day. The average of these times
is found to be 13.2 seconds, and the sample variance, s2, is found
to be 4 seconds2.
Based on this data, what is the 95% confidence interval (C.I.) around
the mean stop time during a weekend day?
23
23
EGR
ETM252
620- -Ch.
09U
9
Example (cont.)
X = ______________
s = _______________
α = ________________
α/2 = _____________
t0.025,24 = _____________
__________________ < μ < ___________________
24
24
EGR
ETM252
620- -Ch.
09U
9
C.I. on the variance
 Given that
2 
( n  1)s 2
2
is ~ Χ2 with n-1 degrees of freedom.
 then,
( n  1)S 2
2 /2,n 1
 2 
( n  1)S 2
12 /2,n 1
gives the 100(1-α)% two-sided confidence interval on the
variance.
25
ETM 620 - 09U
Confidence interval on a proportion
 The proportion, P, in a binomial experiment may be estimated
by

X
P
n
where X is the number of successes in n trials.
 For a sample, the point estimate of the parameter is

x
p
n
 The mean for the sample proportion is    p
pq
and the sample variance is   
n
p
2
26
p
ETM 620 - 09U
C.I. for proportions
 An approximate (1-α)100% confidence interval for p is:

p  z /2

pq
n
 Large-sample C.I. for p1 – p2 is:


( p1  p2 )  z /2
 
 
p1 q1 p2 q2

n1
n2
Interpretation: _______________________________
27
ETM 620 - 09U
Example 10.17 (pg. 240)
n = 75 x = 12
pˆ  ____________
z0.025= ________
Picture:
C.I.:
Interpretation: ____________________________________
28
ETM 620 - 09U
Setting the sample size …
 If the estimate for p from the initial estimate seems pretty
reliable, then
2
z


n    /2  pˆ(1  pˆ )
 E 
e.g., for our example if we want to be 95% confident that the
error in our estimate is less than 0.05, then
n = __________________
 If we’re not at all sure how to estimate p, then assume p = 0.5
and use
2
z 
n    /2  0.25
 E 
29
ETM 620 - 09U
Example: comparing 2 proportions
Look at example 10-23, pg. 250

30
1.
C.I. = (-0.07, 0.15), therefore no reason to believe there is a
significant decrease in the proportion defectives using the new
process.
2.
What if the interval were (+0.07, 0.15)?
3.
What if the interval were (-0.9, -0.7)?
ETM 620 - 09U
Difference in 2 means, both σ2 known
 Given two independent random samples, a point estimate the
difference between μ1 and μ2 is given by the statistic
x1  x2
We can build a confidence interval for μ1 - μ2 (given σ12 and σ22
known) as follows:
( x1  x2 )  z /2
31
 12
n1

 22
n2
 1  2  ( x1  x2 )  z /2
 12
n1

 22
n2
ETM 620 - 09U
An example
A farm equipment manufacturer wants to compare the average
daily downtime of two sheet-metal stamping machines located in
two different factories. Investigation of company records for 100
randomly selected days on each of the two machines gave the
following results:
̅x1 = 12 minutes
12 = 12
n1 = n2 = 100
̅x2 = 10 minutes
 22 = 8
Construct a 95% C.I. for μ1 – μ2
32
ETM 620 - 09U
Solution
α/2 = _____________
Picture
z_____ = ____________
( x1  x2 )  z /2
 12
n1

 22
n2
 1  2  ( x1  x2 )  z /2
 12
n1

 22
n2
__________________ < μ1 – μ2 < _________________
Interpretation:
33
ETM 620 - 09U
Differences in 2 means, σ2 unknown
 Case 1: σ12 and σ22 unknown but equal
( x1  x 2 )  t /2,n1  n2 2S p
1 1
1 1
  1  2  ( x1  x 2 )  t /2,n1  n2 2S p

n1 n2
n1 n2
Where,
2
2
(
n

1
)
S

(
n

1
)
S
1
2
2
S 2p  1
n1  n2  2
34
ETM 620 - 09U
Differences in 2 means, σ2 unknown
 Case 2: σ12 and σ22 unknown and not equal
( x1  x 2 )  t /2,
s12 s22

 1  2  ( x1  x 2 )  t /2,
n1 n2
Where,

35



s12 s22

n1 n2
(S12 / n1  S22 / n2 )2
2
2
2
2


S1 / n1
S 2 / n2 


n1  1   n2  1 




ETM 620 - 09U
Example, σ2 unknown

Suppose the farm equipment manufacturer was unable to
gather data for 100 days. Using the data they were able to
gather, they would still like to compare the downtime for the
two machines. The data they gathered is as follows:
x1 = 12 minutes
s12 = 12
n1 = 18
x2 = 10 minutes
s22 = 8
n2 = 14
Construct a 95% C.I. for μ1 – μ2 assuming:
1. σ12 and σ22 unknown but equal
2. σ12 and σ22 unknown and not equal
36
ETM 620 - 09U
Solution: Case 1
x1  x2  _____________
Picture
t____ , ________= ____________
S 2p
( n1  1)S12  ( n2  1)S22

 _____________________
n1  n2  2
( x1  x 2 )  t /2,n1  n2 2S p
1 1
1 1
  1  2  ( x1  x 2 )  t /2,n1  n2 2S p

n1 n2
n1 n2
__________________ < μ1 – μ2 < _________________
Interpretation:
37
ETM 620 - 09U
Your turn …
 Solve Case 2 (assuming variances are not equal)
38
ETM 620 - 09U
Paired Observations
 Suppose we are evaluating observations that are not
independent …
For example, suppose a teacher wants to compare results of a
pretest and posttest administered to the same group of
students.
 Paired-observation or Paired-sample test …
Example: murder rates in two consecutive years for several US
cities (see attached.) Construct a 90% confidence interval
around the difference in consecutive years.
39
ETM 620 - 09U
Solution
Picture
D = ____________
tα/2, n-1 = _____________
2
 (d i  d )
sd 
 _________
n 1
sd
sd
)   D  d  t /2,n 1 ( )
a (1-α)100% CI for μD is: d  t /2,n 1 (
n
n
__________________ < μ1 – μ2 < _________________
Interpretation:
40
ETM 620 - 09U
C. I. for the ratio of two variances
 If X1 and X2 are independent normal random variables with
unknown and unequal means and variances, then the confidence
interval on the ratio σ12/ σ22 is given by:
S12
 12 S12
F
 2  2 F /2,n2 1,n1 1
2 1 /2,n2 1,n1 1
S2
 2 S2
Note: for F-values not given in table V, recall that
F1 /2,n2 1,n1 1 
1
F /2,n1 1,n2 1
or use = FINV(probability,degrees_freedom1,degrees_freedom2)
41
ETM 620 - 09U
Example 10-22
n1 = 12, s1 = 0.85
n2 = 15, s2 = 0.98
F____ , ____ , ____= ____________
Picture
F____ , ____ , ____= ____________
S12
 12 S12
F
 2  2 F /2,n2 1,n1 1
2 1 /2,n2 1,n1 1
S2
 2 S2
__________________ < σ12/ σ22 < _________________
Interpretation:
42
ETM 620 - 09U