Transcript Dotplot
Exploring
Data
Graphing and Summarizing
Univariate Data
Graphing the Data
• Graphical displays of quantitative data
include:
▫
▫
▫
▫
▫
Dotplot
Stemplot
Histogram
Cumulative Frequency Plots (ogives)
Boxplots
Dotplot
• As you might guess, a dotplot is made up of dots
plotted on a graph.
• Each dot can represent a single observation
from a set of data, or a specified number of
observations from a set of data.
• The dots are stacked in a column over a
category or value, so that the height of the
column represents the frequency of
observations in the category.
Dotplot Example
Number of Dogs in Each Home in My Block
*
*
*
*
*
*
*
*
*
*
0
1
2
3
# of Dogs
Stemplot
Stems
Key:
15 1 = 151
15
14
13
12
11
10
9
8
Leaves
1
26
4579
12225799
0234457899
11478
Histogram
Note bars
touch and
variable is
quantitative
Cumulative Frequency Plot
Typical Wait Times
Often Used
for estimating
medians,
quartiles, &
Percentiles
Cum
Freq
(%)
Wait Times ( in Hrs.)
Boxplot
Min
Q1
Med
Q3
Based on 5- Number Summary
Max
SHAPES of Boxplots
• Previous was symmetric
• Below is Skewed left
• Below is Skewed Right
Checking for outliers
An outlier is any value that is either
• greater than Q3 + 1.5*IQR
OR
• less than Q1 – 1.5*IQR
Note that whiskers always end at a data
value
What Is Required on
ALL Plots?
• Title
• Labels on the horizontal and vertical axes
- be sure if you are using 3 to represent
3,000 that that information is in the label
• Scales on both axes (sometimes this is not
needed, for example on boxplots)
• Labels for each plot if the graph includes
multiple data sets (e.g. parallel boxplots)
How to Describe the Graphs
Use your SOCS:
o S hape
o O utliers and/or other unusual features
o C enter
o S pread
Discuss all characteristics IN CONTEXT.
Shape
• Four Basic Shapes:
• Symmetric
• Uniform
• Skewed left or skewed toward small values
• Skewed right or skewed toward large values
Should I Say Normal?
Be careful when you describe the shape of
a mound-shaped, approximately
symmetric distribution. The distribution
may or may not be normal. Graders will
accept the description as approximately
normal, but they will not accept that the
distribution is normal based only on a
mound-shaped, symmetric graph.
Outliers and other Unusual Features
The Usual Unusuals:
• Gaps
• Clusters
• Outliers
• Peaks – ex. Bimodal
Center
• Mean and median are both measures of
center
• Median – put the values in order and the
median is the middle value (or the mean of
the two middle values) – the median
divides a histogram into two equal areas
• Mean – add the values and divide by the
number of values you have – the mean is
the balance point for a histogram
Spread
Several ways to describe:
• Range – calculate max - min; the range
gives you the total spread in the data.
• IQR – calculate Q3 – Q1; IQR gives you
the spread of the middle 50% of the data
• Standard deviation – the average distance
of data values from the mean
How Does the shape impact
Mean and Median?
• If the shape is approximately symmetric,
the mean and median are approximately
equal.
• If the shape is skewed, the mean is closer
to the tail than the median.
Ex. Salaries – the mean will be larger
than the median because salaries are
usually skewed right
The Converse May Not Be True
Be careful –
If the mean is not equal to the median, you
cannot conclude automatically that the
shape is skewed.
Comparing Graphs Means to Compare –
not just list characteristics
• Okay to say
o The mean of x= 8 is less than the mean
of y = 9.
o The medians of x and y are about the same.
o The median of x is slightly larger.
o The shapes are both skewed left.
• Not Okay
o The mean of x is 8 and the mean of y is 9.
o Median x = 4, median y =4.
o The shapes are similar.
When Do You Use X-Bar/Sx and
When Do You Use the 5-Number
Summary?
• If the distribution is symmetric, use mean and
standard deviation.
• If the distribution is skewed, use the 5-number
summary.
• Note that the mean and standard deviation are
not resistant to outliers; the median and IQR are
resistant.
Other Key Locations on Distributions
• Percentile – the smallest value x for which n
percent of the data values are < or = x
ex. If the 80th percentile is 28, then 80% of
the data equal 28 or less
• Quartiles – the 25th, 50th, 75th percentiles.
The 25th percentile is the lower or first quartile
Q1, the 50th percentile is the median, the 75th
percentile is the upper or third quartile Q3.
• Z-score – shows how many standard
deviations a value is above or below the
mean
How do I get the summary values?
• You can calculate most of the summary values
using 1-Var Stats.
• The order on the calculator is:
1-Var Stats L1 or 1-Var Stats L1, L2
The data values are in L1 and the frequencies
are in L2
Categorical Data Displays
Frequency Tables
Grades Earned on Test 1
Grade
frequency
A
10
B
15
C
5
D
2
F
1
Bar Chart
Segmented Bar Chart
Hobbies By Gender
Two Way Tables
Favorite Leisure Activities
Dance Sports TV Total
Men
2
10
8
20
Women
16
6
8
30
Total
18
16 16
50
One Other Graph –
The Pie Chart
Sorry – couldn’t resist
GOOD LUCK ON THE EXAM!!!