chap4dispqundataqt1

Download Report

Transcript chap4dispqundataqt1

Displaying & Summarizing
Quantitative Data
CHAPTER 4
Objectives
 Look at histograms, stem-and-leaf plots, and dotpots
 Summarize the spread and center of distribution
 Look at mean, median, range, and interquartile
range
 Examine mean and standard deviation
Words of Advice
 Don’t round or truncate intermediate results. Keep
full precision that your technology can carry.
 Report statistics to one decimal place more than the
precision of the data.
 Focus on the meaning in the Tell section and not on
the minor differences in numeric results. Don’t
sweat the small differences.
Histograms/Distribution
 The distribution of a quantitative variable slices up
all the possible values of the variable into equal
width bins and gives the number of values (or
counts) falling into each bin
Histograms
 A histogram uses adjacent bars to show the
distribution of a quantitative variable. Each bar
represents the frequency (or relative frequency) of
values falling in each bin.
 There are no spaces in histograms like bar graphs
 Any spaces in a histogram are actual gaps in the data
indicating a region where there are no values.
Histograms
 A relative frequency histogram can be created
replacing the counts on the vertical axis with the
percentage of total cases falling in each bin.
Stem-and-Leaf Displays
 Histograms are great but they do not show the actual
data.
 A stem-and-leaf display is like a histogram but it
shows the individual values.
 What do you see if you turn a stem-and-leaf on its
side?
Dotplot
 Is a simple display that places a dot along an axis for
each case in the data.
 Like a stem-and-leaf except dots are used in place of
digits.
Think before you graph
 You need to think before you choose which type of
graph to display data.
 Although a bar chart and histogram may look the
same they are not.
 You cannot display categorical data in a histogram or
quantitative data in a bar chart.
 Look at the distribution: talk about the shape, center
& spread.
Shape of Distribution
 Does the histogram have a single, central hump or
separate humps?
 The humps are called modes.
 A histogram with one peak is called unimodal, with
two peaks bimodal, and with three or more
multimodal.
Shape of Distribution
 A histogram that appear to not have any mode and in
which all bars are approximately the same height is
called uniform.
Shape of Distribution
 Is the histogram symmetric?
 The thinner ends of a distribution are called tails. If
one tail stretches out farther that the other, the
histogram us said to be skewed to the side of the
longer tail.
Example
 Would you expect distributions of these variables to
be uniform, unimodal, or bimodal? Symmetric of
skewed? Explain why.
 Ages of people at a Little League game.
 Number of siblings of people in your class.
 Pulse rate of college-age males.
 Number of times each face of a die shows in 100
tosses.
Shape of Distribution
 Do any unusual features stick out?
 You should always mention any stragglers or
outliers, that stand off away from the body of
distribution.
Example
A credit card company wants to see how much customers in a particular
segment of their market use their credit card. They have provided you
with data on the amount spent by 500 selected customers during a 3month period and have asked you to summarize the expectations.
Describe the shape of this distribution?
Center of Distribution: Median
 When we think of a typical value, we usually look for
the center of the distribution.
 Histograms follow the area principle: the middle
value that divides the histogram into two equal areas
is called the median.
 How do you find the median?
Spread: Range
 When we describe a distribution numerically, we
always report a measure of it spread along with its
center.
 How do you measure? Use the range.
 How do you find the range?
Spread: Interquartile Range
 A better way to describe the spread of a variable
might be to ignore the extremes and focus on the
middle of the data.
 Divide the data in half, then divide both halves in
half cutting the data into four quarters. These new
dividing points are called quartiles.
 One quarter of the data lies below the lower quartile
and one quarter of data lies above called the upper
quartile.
Spread: Interquartile Range
 The difference between the quartiles is called the
interquartile range. (IQR)
 The IQR is almost always a reasonable summary of
the spread of distribution. Even is the distribution is
skewed or has outliers, the IQR will provide useful
information.
 The lower & upper quartiles represent the 25th & 75th
percentiles of the data.
5-Numbered Summary
 The 5-numbered summary of a distribution reports
its, median, quartiles, and extremes.
Example
In the Super Bowl, by how many points does the
winning team out score the losers? Here are the
winning margins for the first 42 Super Bowl games:
25,19,9,16,3,21,7,17,10,4,18,17,4,12,17,5,10,29,22,36,19,
32,4,45,1,13,35,17,23,10,14,7,15,7,27,3,27,3,3,11,12,3
a) Find the median
b) Find the quartiles
c) Write a description based on the 5-number
summary.
The Mean
 The mean is the center because it is the point where
the histogram balances.
Total  y
y

n
n

Mean or Median
 Using the center of balance makes sense if the data is




symmetric.
Since median only considers order: it is resistant
to values that are extraordinarily large or small.
If the histogram is symmetric and there are no
outliers we prefer the mean.
If the histogram is skewed or has outliers we prefer
the median.
If not sure, report both and explain
Spread: Standard Deviation
 Takes into account how far each value is from the




mean. Standard Deviation is appropriate only for
symmetric data.
Think about spread is to examine how far each data
value is from the mean. This difference is called
deviation.
We could average the deviations but it not helpful.
To make the deviation helpful we square it.
We added up the squared deviations and find there
average (almost), which is called variance.
Spread: Standard Deviation
s

y  y 


2
 Variance =
2
n 1
Spread: Standard Deviation
 Standard Deviation =
 y  y 
2
s 

n 1
Spread: Standard Deviation
 Statistics is about variation, spread is important.
 Measures of spread help us understand what we do
not know
Homework: pg 72-78 #5, 8, 10, 13, 16, 22,24,29,32,
37,38, 44, 47