Notes Chapters 4 & 5 Describing Distributions with Numbers

Download Report

Transcript Notes Chapters 4 & 5 Describing Distributions with Numbers

More Univariate Data
Quantitative Graphs & Describing
Distributions with Numbers
Quantitative Data
• Quantitative variables take numerical
values for which it makes sense to do
arithmetic operations like adding or
averaging.
• Possible Graphs: dotplots, stemplots,
•
histograms, Cumulative
frequency plots, boxplots
Graphs
• Be sure to always:
•
*Title your graphs
•
*Label your axis including units of
•
measure
•
*number your axes in a consistent
•
and reasonable manner
Quantitative Graphs
Histograms
A histogram’s vertical axis is counts while a
relative frequency histogram’s vertical axis
is percents.
Quantitative Graphs
Stem & Leaf
This type of graph uses place values as the
stems & units as the leaves. (It’s very hard
to describe, we are going to make one for
an example.) We can also create what’s
called a back-to-back stem plot with two
data sets. It is helpful for comparing to sets
of univariate data.
Quantitative Graphs
A histogram is preferred sometimes for larger
data sets. It’s strongest asset is that it shows
shape well. It’s weakness is that the individual
data values are lost.
A stem & leaf is preferred sometimes because
it retains all data values but it’s very difficult to
create for large data sets.
Quantitative Data
The distribution of a variable tells
us what values the variable
typically takes and how often it
takes them. It is a generalization
about the variable values.
• When describing any Quantitative
distribution:
•
C – Center
•
U – Unusual Features
•
S – Shape
•
S – Spread
• &
•
B – Be
•
S - Specific
• Common Shapes of distributions/graphs
•
Symmetric
•
Skewed to the right
•
Skewed to the left
•
Bimodal
•
Uniform
• Once you have chosen a
shape, you choose a
measure of center and
spread based on that
shape.
x
Center when the distribution is symmetric
Mean: the average
formula:
x

x
i
n
Measure Spread or Variability when
the distribution is Symmetric
Standard deviation:
1
2
sx 
( xi  x )

n 1
Measure of Center when the
distribution is not symmetric:
Median – the middle value in an ordered
list. If there are two values in the middle,
then average them.
Measure Spread or Variability when
the distribution is not Symmetric
• We can also examine spread by looking at
the range of middle 50% of the data. This
is called the:
Interquartile Range (IQR).
IQR = Q3 – Q1
We also need to talk about the 5-number
summary.
The 5-number summary is made up of the
minimum, the first quartile, Q1 (where 25%
of the data lies below this value), the median,
the third quartile, Q3 (where 75% of the data
lies below this value), and the maximum.
Another Measure of Spread or
Variability
• Range – the difference between the
maximum and the minimum observations.
This is the simplest measure of spread.
We typically use this as preliminary
information or if it is the only measure of
spread we can calculate.
Another measure of spread or
variability
• Variance is the average of the squares of
the deviations of the observations from
their mean. It is the standard deviation
squared.
• An outlier is an individual observation in
data that falls outside the overall pattern of
the data.
Using the IQR, we can perform a test for
outliers.
Outlier Test:
Any value below
Q1 – 1.5(IQR)
or above
Q3 + 1.5 (IQR)
is considered an outlier.
Another Graph…
When we graph the five-number summary
along with outliers if present, it leads to a
modified boxplot.
Measures that are not strongly affected by
extreme values are said to be resistant.
The median and IQR are more resistant than
the mean and standard deviation.
The standard deviation, is even less resistant
than the mean.
Measures of Spread or Variability – Why?
We measure spread because it’s an important
description of what is happening with the data.
We need to know about the amount of variation
we can expect in a data set.