Lecture Ch 7
Download
Report
Transcript Lecture Ch 7
Chapter 7
Summarizing
and Displaying
Measurement
Data
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
Turning Data Into
Meaningful Information
Data are the statisticians’ raw material
and the numbers we use to interpret
reality. ALL statistical problems involve
either the collection, description, and
analysis of data.
How can we represent data in a meaningful
way… how can we see underlying patterns
in a heap of numbers?
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
2
Picturing Data:
Stemplots,Frequency Tables &
Histograms
Histogram: better
for larger data sets,
also provides picture
of shape.
6
4
2
0
Frequency
Stemplot for Exam Scores
3|2
4|
5|5
6|012448
7|35568899
8|0023458
9|02358
Example: 3|2 = 32
8
10
Stemplot: quick
and easy way to
order numbers and
get picture of shape.
30
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
40
50
60
70
Exam Scores
80
90
100
3
Creating a Stemplot
Step 1: Create the Stems
Divide range of data into
equal units to be used on
stem. Have 6 – 15 stem
values, representing
equally spaced intervals.
Step 1: Creating the stem
3|
4|
5|
6|
7|
8|
9|
Ordered Listing of 28 Exam Scores
32, 55, 60, 61, 62, 64, 64, 68, 73, 75, 75, 76, 78, 78, 79, 79, 80, 80, 82, 83,
84, 85, 88, 90, 92, 93, 95, 98
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
4
Creating a Stemplot
Step 2: Attach the Leaves
Attach a leaf to represent Step 2: Attaching leaves
each data point. Next digit 3|
4|
in number used as leaf;
5|
drop remaining digits.
6|0
Step 3: order leaves on
each branch.
7|5
8|
9|35
Ordered Listing of 28 Exam Scores
32, 55, 60, 61, 62, 64, 64, 68, 73, 75, 75, 76, 78, 78, 79, 79, 80, 80, 82, 83,
84, 85, 88, 90, 92, 93, 95, 98
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
5
Further Details for Creating Stemplots
Stemplot B:
Splitting Stems:
Reusing digits two or five times. 5|4
5|7
5|89
6|0
6|233
6|44555
6|677
6|89
7|001
7|2
7|45
7|
7|8
Stemplot A:
5|4
5|789
6|023344
6|55567789
7|00124
7|58
Two times:
1st stem = leaves 0 to 4
2nd stem = leaves 5 to 9
Five times:
1st stem = leaves 0 and 1
2nd stem =leaves 2 and 3, etc.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
6
Obtaining Info from the Stemplot
Determine shape, identify outliers, locate center.
Pulse Rates:
5|4
5|789
6|023344
6|55567789
7|00124
7|58
Exam Scores
3|2
4|
5|5
6|024418
7|56598398
8|5430820
9|53208
Bell-shape
Centered mid 60’s
no outliers
Outlier of 32.
Apart from 55,
rest uniform from
the 60’s to 90’s.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
Median Incomes:
4|66789
5|11344
5|56666688899999
6|011112334
6|556666789
7|01223
7|
8|0022
Wide range with 4
unusually high values.
Rest bell-shape around
high $50,000s.
7
Creating Frequency Table
• Divide range of data into intervals.
• Count how many values fall into each interval this is
called the frequency.
• Also find the relative frequency by dividing each group
frequency by the total number of observations
Interval
Frequency
Relative Frequency
30-39
1
.0357
40-49
0
0
50-59
1
.0357
60-69
6
.2143
70-79
8
.2857
80-89
7
.25
90-99
5
.1786
Total:
28
1
Ordered Listing of 28 Exam Scores
32, 55, 60, 61, 62, 64, 64, 68, 73, 75, 75, 76, 78, 78, 79, 79, 80, 80, 82, 83,
84, 85, 88, 90, 92, 93, 95, 98
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
8
Creating a Histogram
0
0.000
2
0.010
0.020
Relative Freq.
6
4
Frequency
8
0.030
10
Create a bar that covers each interval and is
centered at the midpoint of that interval. The bars
height is the frequency or relative frequency of the
interval.
30
40
50
60
70
80
90
100
Exam Scores
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
30
40
50
60
70
80
90
100
Exam Scores
9
Forming Intervals
•Use intervals of equal lengths with midpoints
and endpoints at convenient round numbers.
•For a smaller data set use a small number of
intervals
•For a larger data set use more intervals
•By increasing the number of intervals we can
“stretch out” the shape of the histogram
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
10
Example 4: How Much Do Students Exercise?
How many hours do you exercise per week (nearest ½ hr)?
172 responses from
students in intro
statistics class
Most range from
0 to 10 hours with
mode of 2 hours.
Responses trail out
to 30 hours a week.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
11
Defining a Common Language
about Shape
30
50
70
Skew edLeft
Right
Skewed
90
30
50
70
90
Skewed
Skew ed Right
Left
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
0 1 2 3 4 5 6
Frequency
4
0
2
Frequency
0 2 4 6 8
Frequency
6
12
• Symmetric: if draw line through center, picture on one side
would be mirror image of picture on other side.
Example: bell-shaped data set.
• Skewed to the Right: higher values more spread out than
lower values
• Skewed to the Left: lower values more spread out and
higher ones tend to be clumped
30
50
70
90
Symmetric
Symmetric
12
Summary Statistics
What is are statistics?
These are simple numerical measurements that
summarize and hopefully characterize the entire
data set.
Any set of measurements has two important properties:
• The central or typical value
• The spread (or variability) of the data about the central value
0 3 6
Center
20
40
60
80
100
80
100
Narrow Spread
0 3 6
Center
20
40
60
Wide Spread
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
13
Centrality Statistics
6
4
2
Frequency
8
10
Estimate Mean
to be where the
“balance point”
would be
0
Mean = 75
30
40
50
60
70
80
90
100
Exam Scores
Ordered Listing of 28 Exam Scores
32, 55, 60, 61, 62, 64, 64, 68, 73, 75, 75, 76, 78, 78, 79, 79, 80, 80, 82, 83,
84, 85, 88, 90, 92, 93, 95, 98
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
14
Centrality Statistics
10
The median is the midpoint of the data and is obtained by
ordering the data from smallest to largest and finding the
middle value. If the number of data points is even, when
there would be no middle, we average the two values
around the middle.
Estimate median by dividing graph into equal boxes
median = 77
4
6
median occurs at 14th / 15th box
Ex. For our example n = 28 therefore
we average the 14th and 15th values.
0
2
Frequency
8
Total number of boxes = 28
30
40
50
60
70
80
90
100
Median = 78.5
Exam Scores
Ordered Listing of 28 Exam Scores 14th Value
15th Value
32, 55, 60, 61, 62, 64, 64, 68, 73, 75, 75, 76, 78, 78, 79, 79, 80, 80, 82, 83,
84, 85, 88, 90, 92, 93, 95, 98
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
15
Centrality Statistics
The mode is the most common value in the data set.
Mode = 75
10
Ex. In our example since since the
scores 64,75,78,79, and 80 all occur
twice, these are all considered the
modes.
4
6
Estimate Mode to be in the
midpoint of the interval with
the highest bar.
0
2
Frequency
8
Estimate from Histogram
30
40
50
60
70
80
90
100
Exam Scores
Ordered Listing of 28 Exam Scores
32, 55, 60, 61, 62, 64, 64, 68, 73, 75, 75, 76, 78, 78, 79, 79, 80, 80, 82, 83,
84, 85, 88, 90, 92, 93, 95, 98
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
16
Centrality Statistics
Comparing Mean and Median
The Mean is sensitive to outliers, which
are extreme values that are not typical of
the rest of the data.
30
50
70
Skewed
Skew ed Left
Right
90
30
50
70
90
Skewed
Skew edRight
Left
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
0 1 2 3 4 5 6
Frequency
4
0
2
Frequency
0 2 4 6 8
Frequency
6
12
How do the Mean and Median change as
the shape of the histogram changes?
30
50
70
90
Symmetric
Symmetric
17
Variability Statistics
The Range of the data is the distance between the maximum value
and the minimum value.
Range = Max Value – Min Value
= 98 - 32
= 66
The Lower and Upper Quartiles of the data are the midpoint of
the lower half of the data and the upper half of the data when the
data is divided by the median.
The Inter-Quartile Range (IQR) of the data is the distance
between the lower quartile and the upper quartile.
IQR = Upper Quartile – Lower Quartile
= 84.5 – 66
= 18.5
Ordered Listing of 28 Exam Scores
Lower
Quartile
Median
Upper
Quartile
32,Copyright
55, 60, ©2005
61, 62,
64, 64, 68, 73, 75, 75, 76, 78, 78, 79, 79, 80, 80, 82, 83, 84, 85, 88, 90, 92, 93, 95, 98
Brooks/Cole, a division of Thomson Learning, Inc.
18
66
78.5
84.5
Variability Statistics
The five-number summary display
Median
Lower Quartile
Upper Quartile
Lowest
Highest
78.5
66
84.5
32
98
Ordered Listing of 28 Exam Scores
Lower
Quartile
Median
Upper
Quartile
32,Copyright
55, 60, ©2005
61, 62,
64, 64, 68, 73, 75, 75, 76, 78, 78, 79, 79, 80, 80, 82, 83, 84, 85, 88, 90, 92, 93, 95, 98
Brooks/Cole, a division of Thomson Learning, Inc.
19
66
78.5
84.5
Creating a Boxplot for Exam Scores
1.
2.
3.
4.
Draw a box from lower quartile (66) to upper quartile (84.5).
Draw line in box at median of 78.5.
Compute IQR = 84.5 - 66 = 18.5.
Compute 1.5(IQR) = 1.5(2) = 27.75. Outlier is any value
below 66-27.75 = 38.25, or above 84.5+27.75 =111.25 .
5. Draw line from each end
of box extending down
to 55 but up to 98.
• Draw asterisks at outlier
of 32.
Box Plot Exam Scores
30
40
50
60
70
80
90
100
Ordered Listing of 28 Exam Scores
Lower
Quartile
Median
Upper
Quartile
32, 55,
60, 61,
62,Brooks/Cole,
64, 64, 68,a division
73, 75,of75,
76, 78,
78, 79,
Copyright
©2005
Thomson
Learning,
Inc.79, 80, 80, 82, 83, 84, 85, 88, 90, 92, 93,
2095, 98
66
78.5
84.5
Interpreting Boxplots
• Divide the data into fourths.
• Easily identify outliers.
• Useful for comparing
two or more groups.
Box Plot Exam Scores
Outlier: any value
more than 1.5(IQR)
beyond closest quartile.
¼ of students scored between 32
and 66
¼ scored between 66 and 78.5
¼ scored between 78.5 and 84.5
¼ scored between 84.5 and 98
30
40
50
60
70
80
90
100
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
21
Example 6: Who Are Those Crazy Drivers?
What’s the fastest you have ever driven a car? ____ mph.
Males (87 Students)
110
95
120
55
150
Females (102 Students)
89
80
95
30
130
• About 75% of men have driven 95 mph or faster,
but only about 25% of women have done so.
• Except for few outliers (120 and 130), all women’s max
speeds are close to or below the median speed for men.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
22
The Standard Deviation and Variance
Consider two sets of numbers, both with mean of 100.
Numbers
Mean
Standard Deviation
100, 100, 100, 100, 100
100
0
90, 90, 100, 110, 110
100
10
• First set of numbers has no spread or variability at all.
• Second set has some spread to it; on average, the
numbers are about 10 points away from the mean.
The standard deviation is roughly the average
distance of the observed values from their mean.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
23
Computing the Standard Deviation
1. Find the mean.
2. Find the deviation of each value from the mean.
Deviation = value – mean.
3. Square the deviations.
4. Sum the squared deviations.
5. Divide the sum by (the number of values) – 1, resulting
in the variance.
6. Take the square root of the variance.
The result is the standard deviation.
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
24
Computing the Standard Deviation
Try it for the set of values: 90, 90, 100, 110, 110.
Mean
Standard
Dev.
Value
Dev. From
Mean
Dev. Squared
90
90-100 = -10
-10^2 = 100
90
90-100 = -10
-10^2 = 100
100
100-100 = 0
0^2 = 0
110
110-100 = 10
10^2 = 100
110
110-100 = 10
10^2 = 100
Total:
Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.
400
25