Chapter 1 - People Server at UNCW
Download
Report
Transcript Chapter 1 - People Server at UNCW
Chapter 1
Looking at Data
Types of variables
Looking at Data
Be sure that each variable really does
measure what you want it to. A poor
choice of variables can lead to misleading
conclusions!! For example, in most
situations, a rate is more meaningful than
a simple count.
Distributions can be of the form
Table
Graph
Formula
Categorical variables
Count = frequency (# of times that category was observed)
Percent = relative frequency = proportion
How to display categorical variables:
Table
Pie chart
Uses the relative frequency to construct angles
Relative frequency defines how big the “slice” of the pie is
Bar graph
Can have both relative frequency and frequency bar graphs
Height of the bar indicates either the relative frequency or
the frequency of that category
Categorical variables continued
Figure 1.3 2002 Statistical Abstract of the United States
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
Quantitative Variables
Stemplot
Determine stems and leaves
Write down ALL stems from smallest to largest
Write leaves along side corresponding stems
Order leaves
Histograms
Breaks the range of a variable into intervals (called
classes)
Classes should be of equal length
Stemplot
Table 1.2
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
Stemplot for Female
Figure 1.5 Female
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
Example of histogram
Table 1.3
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
Histogram
Figure 1.7
Introduction to the Practice of Statistics, Sixth Edition
Quantitative Variables continued
Examining distributions of Quantitative Variables
is best done by looking at graphs
Overall pattern (shape, spread, center)
Outliers (values outside pattern of data)
Modes – the peaks in a distribution (unimodal,
bimodal, no modes)
Shape of distribution
Symmetric
Right Skewed
Left Skewed
Example of Outliers
Two lower outliers (at 0) were because the bonds between the wire and the
wafer were not made. The high outlier at 3150 was a measurement error.
Figure 1.9
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
Time Plot
Shows how variable changes over time
(time is always on the horizontal axis)
Seasonal variation – systematic pattern
that keeps reappearing
Trend - persistent long-term rise or fall
Example of Time plot
Table 1.4
Volume of water discharged by
Mississippi River into the Gulf of
Mexico
Figure 1.10
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
1.2 Describing Distributions with
Numbers
Measuring center
Mean
Median (see data next page)
Mode
In a symmetric distribution, the mean and
median are close to each other
Right skewed – mean is higher than
median
Left skewed – mean is lower than median
Comparing Mean and Median
Figure 1.27
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
Table 1.8
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
1.2 Continued
If outliers are present in data, it is better
to use median (also better to use median
if the distribution is skewed)
Why is spread so important?
Measuring spread
Range
Standard deviation
Quartiles
Measuring Spread
Range
Maximum – Minimum
Standard deviation
Average deviation from mean
Properties of standard deviation
Measures spread about mean (should only be used
when mean is used as the measure of central
tendency
s = 0 only when there is no spread
Outliers affect s
Quartiles
Quartiles
pth percentile - p% fall at or below that value
(100-p)% falls above
25th percentile = 1st Quartile (Q1)
50th percentile = 2nd Quartile (Q2)
75th percentile = 3rd Quartile (Q3)
Quantiles
To find Quartiles
Order data
Find median
First Quartile is the median of the first half of
data
Third Quartile is the median of the second
half of data
Use Guinea pig example
Data is already order, n=72
IQR
Inter-quartile range (IQR)=Q3-Q1
Five-number summary
Minimum, Q1, Median, Q3, Maximum
Boxplot – displays the five-number
summary
Box from Q1 to Q3
Line at the median
“Whiskers” to the maximum and minimum
Two-seater cars versus
Minicompact cars
Figure 1.19
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
Boxplot
Example from text book
Five-number summary:
Modified boxplot (helps detect outliers)
43,82.5,102.5,151.5,598
Calculate 1.5*IQR
Q1 – 1.5*IQR
Q3+1.5*IQR
Draw box and line (similar to before). Draw
whiskers to minimum and maximum observation
within (Q1 – 1.5*IQR, Q3+1.5*IQR).
Observations outside this range should be
plotted separately.
Example of Modified Boxplot
From text book
IQR = 151.5-82.5 = 69
1.5*IQR = 103.5
82.5-103.5=0 (just truncated at 0)
151.5+103.5 = 255
Possible outliers?
Draw boxplot
Choosing a Summary for Data Set
If distribution is skewed or has outliers, it
is best to use the five-number summary.
If distribution is “reasonably” symmetric,
use the mean and standard deviation.
ALWAYS PLOT DATA BEFORE DECIDING
ON A NUMERICAL SUMMARY
1.3 The Normal Distribution
Density curve
Always on or above horizontal axis
Area under curve equal to 1
Symmetric density curves have equal
mean and median
Normal distribution
Mean=Median=Mode
Symmetric, unimodal
Area under curve = 1
Mean and spread of the normal
distribution
Figure 1.28
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
Empirical Rule(68-95-99.7% Rule)
Approximately 68% of the data will fall
within one standard deviation of the mean
Approximately 95% of the data will fall
within two standard deviations of the
mean
Approximately 99.7% of the data will fall
within three standard deviations of the
mean
Figure 1.29
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
Example
Weights of apples are normally distributed
with a mean of 10 oz and a standard
deviation of 2 oz.
The middle 68% of apples weigh between
_____ and _____.
Middle 95%
Middle 99.7%
Approximately what percent of apples weigh
below 6oz?
Approximately what percent of apples weigh
above 4 oz?
Z-scores
Tells # of standard deviations an
observation is from the mean.
Negative z-scores (observation is below
the mean)
Positive z-scores (observation is above the
mean)
Z-score = 0 (observation is equal to the
mean)
Z-scores
Z= (X-m)/s
Find z-score for an apple that weighs 11
oz.
15 oz?
5 oz?
If we assume the distribution of the
variable is normal, then the z-scores have
a standard normal distribution.
Standard Normal Distribution
The standard normal distribution has a
mean of 0 and a standard deviation of 1.
Can use Table A to get area under the
curve for a standard normal.
Area under curve = proportion (percent)
Look at table
What percent of apples weigh below 7 oz?
What percent of apples weigh more than
5oz?