Chapter 1 - People Server at UNCW

Download Report

Transcript Chapter 1 - People Server at UNCW

Chapter 1
Looking at Data
Types of variables
Looking at Data


Be sure that each variable really does
measure what you want it to. A poor
choice of variables can lead to misleading
conclusions!! For example, in most
situations, a rate is more meaningful than
a simple count.
Distributions can be of the form



Table
Graph
Formula
Categorical variables



Count = frequency (# of times that category was observed)
Percent = relative frequency = proportion
How to display categorical variables:
 Table
 Pie chart
 Uses the relative frequency to construct angles
 Relative frequency defines how big the “slice” of the pie is
 Bar graph
 Can have both relative frequency and frequency bar graphs
 Height of the bar indicates either the relative frequency or
the frequency of that category
Categorical variables continued
Figure 1.3 2002 Statistical Abstract of the United States
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
Quantitative Variables

Stemplot





Determine stems and leaves
Write down ALL stems from smallest to largest
Write leaves along side corresponding stems
Order leaves
Histograms


Breaks the range of a variable into intervals (called
classes)
Classes should be of equal length
Stemplot
Table 1.2
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
Stemplot for Female
Figure 1.5 Female
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
Example of histogram
Table 1.3
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
Histogram
Figure 1.7
Introduction to the Practice of Statistics, Sixth Edition
Quantitative Variables continued

Examining distributions of Quantitative Variables
is best done by looking at graphs




Overall pattern (shape, spread, center)
Outliers (values outside pattern of data)
Modes – the peaks in a distribution (unimodal,
bimodal, no modes)
Shape of distribution



Symmetric
Right Skewed
Left Skewed
Example of Outliers
Two lower outliers (at 0) were because the bonds between the wire and the
wafer were not made. The high outlier at 3150 was a measurement error.
Figure 1.9
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
Time Plot



Shows how variable changes over time
(time is always on the horizontal axis)
Seasonal variation – systematic pattern
that keeps reappearing
Trend - persistent long-term rise or fall
Example of Time plot
Table 1.4
Volume of water discharged by
Mississippi River into the Gulf of
Mexico
Figure 1.10
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
1.2 Describing Distributions with
Numbers

Measuring center






Mean
Median (see data next page)
Mode
In a symmetric distribution, the mean and
median are close to each other
Right skewed – mean is higher than
median
Left skewed – mean is lower than median
Comparing Mean and Median
Figure 1.27
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
Table 1.8
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
1.2 Continued



If outliers are present in data, it is better
to use median (also better to use median
if the distribution is skewed)
Why is spread so important?
Measuring spread



Range
Standard deviation
Quartiles
Measuring Spread

Range


Maximum – Minimum
Standard deviation


Average deviation from mean
Properties of standard deviation
Measures spread about mean (should only be used
when mean is used as the measure of central
tendency
 s = 0 only when there is no spread
 Outliers affect s

Quartiles

Quartiles





pth percentile - p% fall at or below that value
(100-p)% falls above
25th percentile = 1st Quartile (Q1)
50th percentile = 2nd Quartile (Q2)
75th percentile = 3rd Quartile (Q3)
Quantiles

To find Quartiles





Order data
Find median
First Quartile is the median of the first half of
data
Third Quartile is the median of the second
half of data
Use Guinea pig example

Data is already order, n=72
IQR


Inter-quartile range (IQR)=Q3-Q1
Five-number summary


Minimum, Q1, Median, Q3, Maximum
Boxplot – displays the five-number
summary



Box from Q1 to Q3
Line at the median
“Whiskers” to the maximum and minimum
Two-seater cars versus
Minicompact cars
Figure 1.19
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
Boxplot


Example from text book
Five-number summary:


Modified boxplot (helps detect outliers)




43,82.5,102.5,151.5,598
Calculate 1.5*IQR
Q1 – 1.5*IQR
Q3+1.5*IQR
Draw box and line (similar to before). Draw
whiskers to minimum and maximum observation
within (Q1 – 1.5*IQR, Q3+1.5*IQR).
Observations outside this range should be
plotted separately.
Example of Modified Boxplot

From text book






IQR = 151.5-82.5 = 69
1.5*IQR = 103.5
82.5-103.5=0 (just truncated at 0)
151.5+103.5 = 255
Possible outliers?
Draw boxplot
Choosing a Summary for Data Set



If distribution is skewed or has outliers, it
is best to use the five-number summary.
If distribution is “reasonably” symmetric,
use the mean and standard deviation.
ALWAYS PLOT DATA BEFORE DECIDING
ON A NUMERICAL SUMMARY
1.3 The Normal Distribution

Density curve




Always on or above horizontal axis
Area under curve equal to 1
Symmetric density curves have equal
mean and median
Normal distribution



Mean=Median=Mode
Symmetric, unimodal
Area under curve = 1
Mean and spread of the normal
distribution
Figure 1.28
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
Empirical Rule(68-95-99.7% Rule)



Approximately 68% of the data will fall
within one standard deviation of the mean
Approximately 95% of the data will fall
within two standard deviations of the
mean
Approximately 99.7% of the data will fall
within three standard deviations of the
mean
Figure 1.29
Introduction to the Practice of Statistics, Sixth Edition
© 2009 W.H. Freeman and Company
Example

Weights of apples are normally distributed
with a mean of 10 oz and a standard
deviation of 2 oz.





The middle 68% of apples weigh between
_____ and _____.
Middle 95%
Middle 99.7%
Approximately what percent of apples weigh
below 6oz?
Approximately what percent of apples weigh
above 4 oz?
Z-scores




Tells # of standard deviations an
observation is from the mean.
Negative z-scores (observation is below
the mean)
Positive z-scores (observation is above the
mean)
Z-score = 0 (observation is equal to the
mean)
Z-scores





Z= (X-m)/s
Find z-score for an apple that weighs 11
oz.
15 oz?
5 oz?
If we assume the distribution of the
variable is normal, then the z-scores have
a standard normal distribution.
Standard Normal Distribution






The standard normal distribution has a
mean of 0 and a standard deviation of 1.
Can use Table A to get area under the
curve for a standard normal.
Area under curve = proportion (percent)
Look at table
What percent of apples weigh below 7 oz?
What percent of apples weigh more than
5oz?