Chapter 5 and 6 short

Transcript Chapter 5 and 6 short

5
Joint Probability
Distributions and
Random Samples
Copyright © Cengage Learning. All rights reserved.
5.3
Statistics and Their
Distributions
Copyright © Cengage Learning. All rights reserved.
Statistics and Their Distributions
Consider selecting two different samples of size n from the
same population distribution.
The xi’s in the second sample will virtually always differ at
least a bit from those in the first sample. For example, a
first sample of n = 3 cars of a particular type might result in
fuel efficiencies x1 = 30.7, x2 = 29.4, x3 = 31.1, whereas a
second sample may give x1 = 28.8, x2 = 30.0, and
x3 = 32.5.
Before we obtain data, there is uncertainty about the value
of each xi.
3
Statistics and Their Distributions
Because of this uncertainty, before the data becomes
available we view each observation as a random variable
and denote the sample by X1, X2, . . . , Xn (uppercase
letters for random variables).
This variation in observed values in turn implies that the
value of any function of the sample observations—such as
the sample mean, sample standard deviation, or sample
fourth spread—also varies from sample to sample. That is,
prior to obtaining x1, . . . , xn, there is uncertainty as to the
value of , the value of s, and so on.
4
Statistics and Their Distributions
Definition
5
Statistics and Their Distributions
Thus the sample mean, regarded as a statistic (before a
sample has been selected or an experiment carried out), is
denoted by ; the calculated value of this statistic is .
Similarly, S represents the sample standard deviation
thought of as a statistic, and its computed value is s.
If samples of two different types of bricks are selected and
the individual compressive strengths are denoted by
X1, . . . , Xm and Y1, . . . , Yn, respectively, then the statistic
, the difference between the two sample mean
compressive strengths, is often of great interest.
6
Statistics and Their Distributions
Any statistic, being a random variable, has a probability
distribution. In particular, the sample mean has a
probability distribution.
Suppose, for example, that n = 2 components are randomly
selected and the number of breakdowns while under
warranty is determined for each one.
Possible values for the sample mean number of
breakdowns are 0 (if X1 = X2 = 0), .5 (if either X1 = 0 and
X2 = 1 or X1 = 1 and X2 = 0), 1, 1.5, . . ..
7
Statistics and Their Distributions
The probability distribution of specifies P( = 0),
P( = .5), and so on, from which other probabilities such as
P(1   3) and P(  2.5) can be calculated.
Similarly, if for a sample of size n = 2, the only possible
values of the sample variance are 0, 12.5, and 50 (which is
the case if X1 and X2 can each take on only the values
40, 45, or 50), then the probability distribution of S2 gives
P(S2 = 0), P(S2 = 12.5), and P(S2 = 50).
8
Statistics and Their Distributions
The probability distribution of a statistic is sometimes
referred to as its sampling distribution to emphasize that
it describes how the statistic varies in value across all
samples that might be selected.
9
Random Samples
10
Random Samples
Definition
11
Random Samples
Conditions 1 and 2 can be paraphrased by saying that the
Xi’s are independent and identically distributed (iid).
If sampling is either with replacement or from an infinite
(conceptual) population, Conditions 1 and 2 are satisfied
exactly.
These conditions will be approximately satisfied if sampling
is without replacement, yet the sample size n is much
smaller than the population size N.
12
Random Samples
In practice, if n/N  .05 (at most 5% of the population is
sampled), we can proceed as if the Xi’s form a random
sample.
The virtue of this sampling method is that the probability
distribution of any statistic can be more easily obtained
than for any other sampling method.
There are two general methods for obtaining information
about a statistic’s sampling distribution. One method
involves calculations based on probability rules, and the
other involves carrying out a simulation experiment.
13
Deriving a Sampling Distribution
14
Deriving a Sampling Distribution
Probability rules can be used to obtain the distribution of a
statistic provided that it is a “fairly simple” function of the
Xi’s and either there are relatively few different X values in
the population or else the population distribution has a
“nice” form.
Our next example illustrate such situation.
15
Example 5.21
A certain brand of MP3 player comes in three
configurations: a model with 2 GB of memory, costing $80,
a 4 GB model priced at $100, and an 8 GB version with a
price tag of $120.
If 20% of all purchasers choose the 2 GB model, 30%
choose the 4 GB model, and 50% choose the 8 GB model,
then the probability distribution of the cost X of a single
randomly selected MP3 player purchase is given by
with  = 106,  2 = 244
(5.2)
16
Example 5.21
cont’d
Suppose on a particular day only two MP3 players are sold.
Let X1 = the revenue from the first sale and X2 the revenue
from the second.
Suppose that X1 and X2 are independent, each with the
probability distribution shown in (5.2) [so that X1 and X2
constitute a random sample from the distribution (5.2)].
17
Example 5.21
cont’d
Table 5.2 lists possible (x1, x2) pairs, the probability of each
[computed using (5.2) and the assumption of
independence], and the resulting and s2 values. [Note
that when n = 2, s2=(x1 – )2 + (x2 – )2.]
Outcomes, Probabilities, and Values of x and s2 for Example 20
Table 5.2
18
Example 5.21
cont’d
Now to obtain the probability distribution of , the sample
average revenue per sale, we must consider each possible
value and compute its probability. For example, = 100
occurs three times in the table with probabilities .10, .09,
and .10, so
Px (100) = P(
= 100) = .10 + .09 + .10 = .29
Similarly,
pS2(800) = P(S2 = 800) = P(X1 = 80, X2 = 120 or X1 = 120,
X2 = 80)
= .10 + .10 = .20
19
Example 5.21
The complete sampling distributions of
(5.3) and (5.4).
cont’d
and S2 appear in
(5.3)
(5.4)
20
Example 5.21
cont’d
Figure 5.8 pictures a probability histogram for both the
original distribution (5.2) and the distribution (5.3). The
figure suggests first that the mean (expected value) of the
distribution is equal to the mean 106 of the original
distribution, since both histograms appear to be centered at
the same place.
Probability histograms for the underlying distribution and x distribution in Example 20
Figure 5.8
21
Example 5.21
cont’d
From (5.3),
= (80)(.04) + . . . + (120)(.25) = 106 = 
Second, it appears that the distribution has smaller
spread (variability) than the original distribution, since
probability mass has moved in toward the mean. Again
from (5.3),
= (802)(.04) +    + (1202)(.25) – (106)2
22
Example 5.21
cont’d
The variance of is precisely half that of the original
variance (because n = 2). Using (5.4), the mean value of
S2 is
S2 = E(S2) =  S2  pS2(s2)
= (0)(.38) + (200)(.42) + (800)(.20) + 244 =  2
That is, the sampling distribution is centered at the
population mean , and the S2 sampling distribution is
centered at the population variance  2.
23
Example 5.21
cont’d
If there had been four purchases on the day of interest, the
sample average revenue would be based on a random
sample of four Xi’s, each having the distribution (5.2).
More calculation eventually yields the pmf of
for n = 4 as
24
Example 5.21
cont’d
From this, x = 106 =  and
= 61 =  2/4. Figure 5.8 is a
probability histogram of this pmf.
Probability histogram for
based on n = 4 in Example 20
Figure 5.9
25
5.4
The Distribution of the
Sample Mean
Copyright © Cengage Learning. All rights reserved.
26
The Distribution of the Sample Mean
The importance of the sample mean springs from its use
in drawing conclusions about the population mean . Some
of the most frequently used inferential procedures are
based on properties of the sampling distribution of .
A preview of these properties appeared in the calculations
and simulation experiments of the previous section, where
we noted relationships between E( ) and  and also
among V( ),  2, and n.
27
The Distribution of the Sample Mean
Proposition
28
The Distribution of the Sample Mean
According to Result 1, the sampling (i.e., probability)
distribution of is centered precisely at the mean of the
population from which the sample has been selected.
Result 2 shows that the distribution becomes more
concentrated about  as the sample size n increases.
In marked contrast, the distribution of To becomes more
spread out as n increases.
Averaging moves probability in toward the middle, whereas
totaling spreads probability out over a wider and wider
range of values.
29
The Distribution of the Sample Mean
The standard deviation
is often called the
standard error of the mean; it describes the magnitude of a
typical or representative deviation of the sample mean from
the population mean.
30
Example 5.25
In a notched tensile fatigue test on a titanium specimen, the
expected number of cycles to first acoustic emission (used
to indicate crack initiation) is  = 28,000, and the standard
deviation of the number of cycles is  = 5000.
Let X1, X2, . . . , X25 be a random sample of size 25, where
each Xi is the number of cycles on a different randomly
selected specimen.
Then the expected value of the sample mean number of
cycles until first emission is E( ) = 28,000, and the
expected total number of cycles for the 25 specimens is
E(To) = n = 25(28,000) = 700,000.
31
Example 5.25
The standard deviation of
and of To are
cont’d
(standard error of the mean)
If the sample size increases to n = 100, E( ) is unchanged,
but = 500, half of its previous value (the sample size
must be quadrupled to halve the standard deviation of ).
32
The Case of a Normal Population
Distribution
33
The Case of a Normal Population Distribution
Proposition
We know everything there is to know about the and To
distributions when the population distribution is normal. In
particular, probabilities such as P(a   b) and
P(c  To  d) can be obtained simply by standardizing.
34
The Case of a Normal Population Distribution
Figure 5.15 illustrates the proposition.
A normal population distribution and sampling distributions
Figure 5.15
35
Example 5.26
The distribution of egg weights (g) of a certain type is normal
with mean value 53 and standard deviation .3 (consistent
with data in the article “Evaluation of Egg Quality Traits of
Chickens Reared under Backyard System in Western Uttar
Pradesh” (Indian J. of Poultry Sci., 2009: 261–262)).
Let 𝑋1 , 𝑋2 , … , 𝑋12 denote the weights of a dozen randomly
selected eggs; these 𝑋𝑖 ’s constitute a random sample of size
12 from the specified normal distribution
36
Example 5.26
cont’d
The total weight of the 12 eggs is 𝑇0 = 𝑋1 +. . . +𝑋12 it is
normally distributed with mean value E(𝑇0 ) = 𝑛𝜇= 12(53) =
636 and variance V(𝑇0 ) = n𝜎 2 =12(3)2 = 1.08. The
probability that the total weight is between 635 and 640 is
now obtained by standardizing and referring to Appendix
Table A.3:
37
Example 5.26
cont’d
If cartons containing a dozen eggs are repeatedly selected,
in the long run slightly more than 83% of the eggs in a
carton will weigh in total between 635 g and 640 g.
Notice that 635 < 𝑇0 < 640 is equivalent to 52.9167 < X <
53.3333 (divide each term in the original system of
inequalities by 12).
Thus P(52.9167 < X < 53.3333) ≈ .8315. This latter
probability can also be obtained by standardizing X directly.
38
Example 5.26
Now consider randomly selecting just four of these eggs.
The sample mean weight 𝑋 is then normally distributed with
mean value 𝜇𝑋 = 𝜇 = 53 and standard deviation 𝜇𝑋 = 𝜎/ 𝑛
= .3/ 4 = .15 The probability that the sample mean
weight exceeds 53.5 g is then
Because 53.5 is 3.33 standard deviations (of X ) larger than
the mean value 53, it is exceedingly unlikely that the
sample mean will exceed 53.5.
39
The Central Limit Theorem
40
The Central Limit Theorem
When the Xi’s are normally distributed, so is
sample size n.
for every
The derivations in Example 5.21 and simulation experiment
of Example 5.24 suggest that even when the population
distribution is highly nonnormal, averaging produces a
distribution more bell-shaped than the one being sampled
A reasonable conjecture is that if n is large, a suitable
normal curve will approximate the actual distribution of .
The formal statement of this result is the most important
theorem of probability.
41
Sampling
distributions
of x for
different
populations
and different
sample
sizes
42
The Central Limit Theorem
Theorem
43
The Central Limit Theorem
Figure 5.16 illustrates the Central Limit Theorem.
The Central Limit Theorem illustrated
Figure 5.16
44
Example 5.27
The amount of a particular impurity in a batch of a certain
chemical product is a random variable with mean value 4.0 g
and standard deviation 1.5 g.
If 50 batches are independently prepared, what is the
(approximate) probability that the sample average amount of
impurity is between 3.5 and 3.8 g?
According to the rule of thumb to be stated shortly, n = 50 is
large enough for the CLT to be applicable.
45
Example 5.27
cont’d
then has approximately a normal distribution with mean
value
= 4.0 and
so
46
Example 5.27
Now consider randomly selecting 100 batches, and let 𝑇0
represent the total amount of impurity in these batches.
Then the mean value and standard deviation of 𝑇0 are
100(4) = 400 and 100 (1.5) = 15, respectively, and the
CLT implies that 𝑇0 has approximately a normal distribution.
The probability that this total is at most 425 g is
47
The Central Limit Theorem
The CLT provides insight into why many random variables
have probability distributions that are approximately
normal.
For example, the measurement error in a scientific
experiment can be thought of as the sum of a number of
underlying perturbations and errors of small magnitude.
A practical difficulty in applying the CLT is in knowing when
n is sufficiently large. The problem is that the accuracy of
the approximation for a particular n depends on the shape
of the original underlying distribution being sampled.
48
The Central Limit Theorem
If the underlying distribution is close to a normal density
curve, then the approximation will be good even for a small
n, whereas if it is far from being normal, then a large n will
be required.
There are population distributions for which even an n of 40
or 50 does not suffice, but such distributions are rarely
encountered in practice.
49
6
Point Estimation
Copyright © Cengage Learning. All rights reserved.
50
6.1
Some General Concepts
of Point Estimation
Copyright © Cengage Learning. All rights reserved.
51
Some General Concepts of Point Estimation
Statistical inference is almost always directed toward
drawing some type of conclusion about one or more
parameters (population characteristics).
To do so requires that an investigator obtain sample data
from each of the populations under study.
Conclusions can then be based on the computed values of
various sample quantities.
For example, let  (a parameter) denote the true average
breaking strength of wire connections used in bonding
semiconductor wafers.
52
Some General Concepts of Point Estimation
A random sample of n = 10 connections might be made,
and the breaking strength of each one determined,
resulting in observed strengths x1, x2, . . . , x10.
The sample mean breaking strength x could then be used
to draw a conclusion about the value of .
Similarly, if  2 is the variance of the breaking strength
distribution (population variance, another parameter), the
value of the sample variance s2 can be used to infer
something about  2.
53
Some General Concepts of Point Estimation
When discussing general concepts and methods of
inference, it is convenient to have a generic symbol for the
parameter of interest.
We will use the Greek letter  for this purpose. The
objective of point estimation is to select a single number,
based on sample data, that represents a sensible value for
.
As an example, the parameter of interest might be , the
true average lifetime of batteries of a certain type.
54
Some General Concepts of Point Estimation
A random sample of n = 3 batteries might yield observed
lifetimes (hours) x1 = 5.0, x2 = 6.4, x3 = 5.9.
The computed value of the sample mean lifetime is
x = 5.77, and it is reasonable to regard 5.77 as a very
plausible value of  — our “best guess” for the value of 
based on the available sample information.
Suppose we want to estimate a parameter of a single
population (e.g.,  or  ) based on a random sample of size
n.
55
Some General Concepts of Point Estimation
The difference between the two sample mean strengths is
X – Y, the natural statistic for making inferences about
1 – 2, the difference between the population mean
strengths.
Definition
56
Some General Concepts of Point Estimation
In the foregoing battery example, the estimator used to
obtain the point estimate of  was X, and the point estimate
of  was 5.77.
If the three observed lifetimes had instead been x1 = 5.6,
x2 = 4.5, and x3 = 6.1, use of the estimator X would have
resulted in the estimate x = (5.6 + 4.5 + 6.1)/3 = 5.40.
The symbol (“theta hat”) is customarily used to denote
both the estimator of  and the point estimate resulting from
a given sample.
57
Some General Concepts of Point Estimation
Thus = X is read as “the point estimator of  is the
sample mean X .” The statement “the point estimate of  is
5.77” can be written concisely as = 5.77 .
Notice that in writing = 72.5, there is no indication of how
this point estimate was obtained (what statistic was used).
It is recommended that both the estimator and the resulting
estimate be reported.
58
Example 6.2
Reconsider the accompanying 20 observations on
dielectric breakdown voltage for pieces of epoxy resin.
24.46
27.98
25.61
28.04
26.25
28.28
26.42
28.49
26.66 27.15 27.31 27.54 27.74
28.50 28.87 29.11 29.13 29.50
27.94
30.88
The pattern in the normal probability plot given there is
quite straight, so we now assume that the distribution of
breakdown voltage is normal with mean value .
Because normal distributions are symmetric,  is also the
median lifetime of the distribution.
59
Example 6.2
cont’d
The given observations are then assumed to be the result
of a random sample X1, X2, . . . , X20 from this normal
distribution.
Consider the following estimators and resulting estimates
for  :
a. Estimator = X, estimate = x = xi /n = 555.86/20 = 27.793
b. Estimator = , estimate =
= (27.94 + 27.98)/2 = 27.960
c. Estimator = [min(Xi) + max(Xi)]/2 = the average of the
two extreme lifetimes,
estimate = [min(xi) + max(xi)]/2 = (24.46 + 30.88)/2
= 27.670
60
Example 6.2
cont’d
d. Estimator = Xtr(10), the 10% trimmed mean (discard the
smallest and largest 10% of the sample and then
average),
estimator = xtr(10)
=
= 27.838
Each one of the estimators (a)–(d) uses a different
measure of the center of the sample to estimate . Which
of the estimates is closest to the true value?
61
Example 6.2
cont’d
We cannot answer this without knowing the true value.
A question that can be answered is, “Which estimator,
when used on other samples of Xi’s, will tend to produce
estimates closest to the true value?”
We will shortly consider this type of question.
62
Some General Concepts of Point Estimation
In the best of all possible worlds, we could find an estimator
for which =  always. However, is a function of the
sample Xi ’s, so it is a random variable.
For some samples, will yield a value larger than  ,
whereas for other samples will underestimate  . If we
write
=  + error of estimation
then an accurate estimator would be one resulting in small
estimation errors, so that estimated values will be near the
true value.
63
Unbiased Estimators
64
Unbiased Estimators
The second instrument yields observations that have a
systematic error component or bias.
Definition
That is, is unbiased if its probability (i.e., sampling)
distribution is always “centered” at the true value of the
parameter.
65
Unbiased Estimators
Thus
E( ) = E
=
E(X) =
(np) = p
Proposition
No matter what the true value of p is, the distribution of the
estimator will be centered at the true value.
66
Unbiased Estimators
Proposition
The estimator that uses divisor n can be expressed as
(n – 1)S2/n, so
67
Unbiased Estimators
In Example 6.2, we proposed several different estimators
for the mean  of a normal distribution.
If there were a unique unbiased estimator for  , the
estimation problem would be resolved by using that
estimator. Unfortunately, this is not the case.
Proposition
68
Reporting a Point Estimate: The
Standard Error
69
Reporting a Point Estimate: The Standard Error
Besides reporting the value of a point estimate, some
indication of its precision should be given. The usual
measure of precision is the standard error of the estimator
used.
Definition
70
Example 6.9
Example 6.2… continued
Assuming that breakdown voltage is normally distributed,
is the best estimator of . If the value of  is known to
be 1.5, the standard error of X is
.
If, as is usually the case, the value of  is unknown, the
estimate = s = 1.462 is substituted into
to obtain the
estimated standard error
.
71
Reporting a Point Estimate: The Standard Error
When the point estimator has approximately a normal
distribution, which will often be the case when n is large,
then we can be reasonably confident that the true value of
 lies within approximately 2 standard errors (standard
deviations) of .
Thus if a sample of n = 36 component lifetimes gives
= x = 28.50 and s = 3.60, then
= .60, so within 2
estimated standard errors, translates to the interval
28.50  (2)(.60) = (27.30, 29.70).
72
Reporting a Point Estimate: The Standard Error
If is not necessarily approximately normal but is
unbiased, then it can be shown that the estimate will
deviate from  by as much as 4 standard errors at most 6%
of the time.
We would then expect the true value to lie within 4
standard errors of (and this is a very conservative
statement, since it applies to any unbiased ).
Summarizing, the standard error tells us roughly within
what distance of we can expect the true value of  to lie.
73

Chapter 5 and 6 short

Transcript Chapter 5 and 6 short

Directory