Transcript Chapter 6

Chapter 6
Random Sampling and
Data Description
Learning Objectives
• Compute and interpret the sample mean,
sample variance, sample standard deviation,
sample median, and sample range
• Explain the concepts of sample mean, sample
variance, population mean, and population
variance
• Construct and interpret visual data displays
• Explain the concept of random sampling
• Construct and interpret normal probability
plots
Data Summary and Display
• Essential to good statistical thinking
• Focus on important features of the data
• Provide insight about the type of model that
should be used
• Computer has become an important tool in
the presentation and analysis of data
• User enters the data and then selects the
types of analysis
• Packages are available for both mainframe
computers as well as personal computers
Sample Mean
• Useful to describe data features numerically
• Can characterize the location or central tendency
• Refer to this arithmetic mean as the sample mean
n
•
x1  x2  ..... xn
x

n
x
i 1
n
n
• Where the n observations in the sample are
denoted by x1, x2,…, xn
• Sample mean as a reasonable estimate of the
population mean, 
Sample Standard Deviation
• Sample mean does not provide all of the information
• Variability in the data may be described by the
sample variance or the sample standard deviation
n
s2 
2
(
x

x
)
 i
i 1
n 1
• Sample standard deviation, s, is the positive square
root of the sample variance
Sample Range
• Difference between the largest and smallest
observations, or the sample range, is a useful
measure of variability
• Sample range
r=max(xi)-min (xi)
• As the variability in sample data increases, the
sample range increases
Sample Median and Sample Mode
• Two more measure of central tendency
• Median divides the data into two equal parts,
half below the median and half above
– If the number of data points is even, the median is
halfway between the two central values
– If the number data points is odd, the median is the
central value
• Mode is the most frequently occurring data
point (s)
Example
•
•
The data below are the joint temperatures of the
O-rings (degrees F) for each test firing or actual
lunch of the space shuttle rocket motor (from
Presidential Commission on the Space Shuttle
Challenger Accident):
84 49 61 40 83 67 45 66 70 69 80 58 68 60 67
72 73 70 57 63 70 78 52 67 53 67 75 61 70 81
76 79 75 76 58 31
Compute the sample mean, sample median,
sample range, and sample standard deviation
Solution
• Sample mean is 65.85,or
X
84  49  ....  31
 65.85
35
• Sample median is 67
31 40 45 49……[67.5]……
• Sample range is 53, or
r = 84-31
• Sample standard deviation is 12.16
• 2 (84  65.85)2  (49  65.85)2  ...
S 
35  1
 147.86
Random Sampling
• Interested to work with a
sample of observations
selected from a population
• Relationship between the
population and the sample
• Impossible or impractical to
observe the entire population
• Use a probability
distribution as a model for
a population
• Sample from the population
to make decisions about the
population
Understand Random Sampling
• Wish to reach a conclusion about the proportion of people who earn at
least $35,000 in a specific year
• Let p represent the unknown value of this proportion
• Impractical to question every individual
• Make inference regarding the true proportion p
• Select a random sample
• Use the observed proportion p̂ of people
• p̂ is computed by dividing the number of individuals in the sample by
the total n
• Many random samples are possible
• Value of p̂ will vary. That is, p̂is a random variable
Statistic
• Random sample is called a statistic
• Statistic is any function of the observations in a
random sample
• Sample mean X , the sample variance S2, and
the sample standard deviation s are statistics
Data Display
• Graphical displays of sample data are very
powerful
• Many techniques
Stem-and-Leaf Diagrams
• Stem-and-leaf diagram is a good way to
represent the data
• Steps
1. Divide each data point into two parts: a stem
and a leaf
2. List the stem values in a vertical column
3. Record the leaf for each observation beside its
stem
4. Write the units for stems and leaves on the
display
Example
•
•
•
•
Consider 21, 24, 24, 26, 27, 27, 30, 32, 38, 41
Select as stem values the numbers 2,3, and 4
Record the leaf for each observation beside its stem
Last column in the diagram is a frequency count of
the no. of leaves associated with each stem
Frequency
6
2 144677
3
3 028
1
4 1
24
Frequency Distributions
• More compact summary of data than a stem-andleaf
• Must divide the range of the data into intervals
• Called class intervals, cells, or bins
• Number of bins depends on the number of
observations
• Equal to the square root of the number of
observations
Histograms
• Visual display of the frequency distribution
• Gives insight about possible choices of
probability distribution
• Stages for constructing
– 1) Label the bin boundaries on a horizontal scale
– 2) Mark and label the vertical scale with the
frequencies
– 3) Draw a rectangle where height is equal to the
frequency corresponding to that class
Cumulative Frequency Plot
• Variation of the
histogram
• Useful in data
interpretation
• Height of each bar is
the total number of
observations that are
less or equal to the
upper limit of the class
• Illustrated in the right
graph
Example
• Consider the following data on the motor fuel
octane ratings of several blends of gasoline.
• Construct a frequency distribution and histogram
– Use 8 classes
Solution
• Illustrated in the
right
Frequency Histogram
0.4
0.3
frequency
0.2
0.1
0
82
86
90
94
Octane Data
98
102
Probability Plots
• Graphical method for determining whether sample
data points conform to a hypothesized distribution
• Very simple and can be constructed quickly
• Uses special graph paper, known as probability
paper
• Focus primarily on normal probability plot
Constructing a Probability Plot
• Sample data points are first ranked from smallest
to largest
• x1, x2,..., xn is arranged
x(1),x(2),…, x(n)
• Plotted against their observed cumulative
frequency [(j -0.5)/n] on the probability paper
• Plotted points fall approximately along a line
• Constructed on ordinary graph paper by plotting
the standardized normal scores zj against x(j)
• Standardized normal scores satisfy
[(j-0.5)/n]= P(Zzj)=(zj)
Example
•
•
•
A soft-drink bottler is studying the internal
pressure of 1-liter glass bottles. A random sample
of 16 bottles is tested, and the pressure strength
(psi) are obtained. The data are shown below.
226.16 202.20 219.54 219.54 193.73 208.15
195.45 193.71 200.81 211.14 203.62 188.12
224.39 221.31 204.55 202.21 201.63
Does it seem reasonable to conclude that
pressure strength is normally distributed?
Solution
• Use the steps to
construct a
probability plot
• Assumption of
normality appears
reasonable
• Data falls along a
straight line
Normal Probability Plot
99.9
99
95
80
cumul.
percent 50
20
5
1
0.1
180
190
200
210
Pressure Strength
220
230
Next Agenda
• Discusses point estimation of parameters
• Introduces some of the important properties
of estimators, the method of maximum
likelihood, sampling distributions, and the
central limit theorem