Stem and Leaf
Download
Report
Transcript Stem and Leaf
Chapter 1: Exploring Data
AP Stats, 2009-2010
Questionnaire
“Please take a few minutes to answer the
following questions. I am collecting data for
my doctoral dissertation, which is on
characteristics of American private school
students. After you complete the
questionnaire, please return it in the enclosed
SASE. Thank you for your participation.”
Definitions, pp. 4-6
Individuals and variables
Categorical and quantitative
On questionnaire, which are individuals and
variables?
Which variables are categorical and which are
quantitative?
Distribution
Practice
From Questionnaire:
#12 (dot plot)
#13 (pie chart)
#3 (bar chart)
Practice
Exercises 1.1, 1.2, 1.4, p. 7.
A really nice bar graph (I made this myself, so
just nod approvingly):
Percent of Total
Education Level in U.S. (adults age 25+)
50
40
33.1
25.4
30
20
25.6
15.9
10
0
No high
school
degree
High school 1-3 years of 4+ years of
only
college
college
Years of Schooling
Bar Graph, Figure 1.1 (p. 9)
Pie Chart, Figure 1.1 (p. 9)
Dot Plot, Figure 1.3 (p. 11)
Interpreting the Dotplot
Shape, center, spread
Look for overall patterns and striking deviations
from that pattern.
Outliers
Individual observation(s) that falls outside the
overall pattern on a graph of a distribution.
In the next section, we will learn a mathematical
rule of thumb for deeming an observation an
outlier. For now, we’ll just talk in general terms.
Stemplots
Sometimes called “stem and leaf plots.”
Useful when there are a lot of data points, or
the range of values is large.
Dotplot?
What would a dotplot look like for these
data?
A stemplot (Figure 1.4, p. 14)
A stemplot (Figure 1.4, p. 14)—Split stems
How to create a stemplot
Example 1.5, p. 13
Rules of thumb:
Choosing the number of stems:
No magic number, but a minimum of 5 is good. Too few stems will
result in a skyscraper effect, too many make a pancake graph.
10 is a good starting point.
For data points with decimals, round the data so that the final digit
after rounding is suitable as a leaf.
Let’s try one:
Exercise 1.8, p. 17
Stemplot for 1.8 (StatCrunch)
Stem and Leaf
Variable: MPG
2 : 113444444
2 : 5556678888888888999
3 : 0002
Another Exercise
1.9, p. 17
Homework
Reading: Section 1.1 through p. 30.
Exercises: 1.10 and 1.11 (pp. 17-18)
Histograms
The most common way to display the
distribution of a quantitative variable.
How to make a histogram:
Example 1.6, p. 19
Read the interpretation of this graph, p.
20.
Choose between 6 and 15 classes (bars on
your graph)
Example 1.6, p. 19
Figure 1.7, p. 20
Notice y-axis: number
of values in a particular
class.
Notice the x-axis:
It is the variable of interest.
Practice Problem
1.14, p. 23
How to make a histogram with your calculator
Technology Toolbox, p. 21
Just enter raw data in L1,
then construct a histogram.
Read this carefully tonight.
Another way:
Summarize the data and put in, say, L2. Put the midpoint of the
class the data are in in L1.
Exercise 1.14, p. 23
Now, make a stemplot for these data.
Which do you prefer?
Homework
Reading: Through p. 34
Pay careful attention to Example 1.8, pp. 28-30:
how to create an ogive.
Exercise:
1.12 (p. 22)
Right Skew?
Practice Problems
Exercises:
1.13 (p. 23)
1.17 (p. 27)
Percentile
Would you rather score at the 70th or
95th percentile on the SATs?
If you scored at the 95th percentile,
what does that mean?
Ogive
Probably my favorite word to say in statistics.
Let’s practice saying it …
Used when we would like to see the relative
standing of an individual observation.
Does a histogram give us this?
Example 1.8, pp. 28-30
Look at the table on p. 29.
The two columns on the far left could be used to create a
histogram.
The fourth and fifth columns are of particular importance when we
want to construct an ogive.
Look over these briefly to see that you know where these data come
from.
Steps 2 and 3:
3: Plot a point corresponding to the relative cumulative frequency
in each class interval at the left endpoint of the next class interval.
Example 1.8, pp. 28-30
Practice problem
Exercise 1.19, p. 31
Create a frequency table.
Then, create cumulative frequency and
relative cumulative frequency columns.
Exercise 1.19, p. 31 (Data on p. 23)
Class
Freq.
0-2.99
3-5.99
6-8.99
9-11.99
12-14.99
15-17.99
18-20.99
Rel. Cum.
Cum. Freq. Freq. (%)
Exercise 1.19, p. 31
Class
Freq.
3-5.99
1
6-8.99
1
9-11.99
12
12-14.99
31
15-17.99
4
18-20.99
1
Rel. Cum.
Cum. Freq. Freq. (%)
Exercise 1.19, p. 31
Class
Freq.
0-2.99
0
3-5.99
1
6-8.99
1
9-11.99
12
12-14.99
31
15-17.99
4
18-20.99
1
Rel. Cum.
Cum. Freq. Freq. (%)
0
0
1
2
2
4
14
28
45
90
49
98
50
100
Relative Cum. Frequency
Ogive for Exercise 1.19
100
80
60
40
20
0
3
6
9
12
15
% of people 65 or older
18
21
Time plots
Used to plot the value of a variable vs.
the time in was measured.
Can detect seasonal variation, for instance
(See Figure 1.15, p. 32)
Used effectively in designed experiments.
Practice problem: Exercise 1.21, p. 33
Use your calculator—the line graph
function.
Figure 1.15 p. 32 (Example 1.9)
Time plot for Exercise 1.21
Homework
Reading: pp. 37-47
Exercise:
1.29, p. 36
1.1 Quiz on Friday
Probably 20-30 minutes
Section 1.1 Review Problems:
Displaying Distributions with Graphs
Exercises, pp. 34-36:
1.23, 1.24, 1.27, 1.28, 1.30
1.2 Describing Distributions with Numbers
Measuring center
Measuring spread
Mean, median
With quartiles: Inter-quartile range
Standard deviation
Range
Statistical summaries
Boxplots
Measuring the Center of a Distribution
Mean
Numerical average
Median
Middle value in a data set, if an odd number of
values, or the average of the middle two values, if
an even number of values
Splits the distribution exactly in half
Practice
Exercise 1.14, p. 23
Create a histogram using your calculator
Discuss shape, center, and spread of the
distribution.
Calculate 1-variable statistics using your
calculator.
Discuss difference between the mean and
median.
Resistant Measure
The mean cannot resist the influence of
extreme observations. The median can,
however.
Mean: not resistant
Median: Resistant
We generally prefer to use the median when
dealing with skewed distributions.
Powerful Numerical Summary
The 5-number summary …
min
Q1
Median (Q2)
Q3
Max
… Plus mean and standard deviation
Review HW problem
Exercise 1.36, p. 47
The Boxplot
Let’s use the data from Exercise 1.36 to create a
boxplot:
Now, use your calculator to create a modified
boxplot (box on page 46).
Outliers
Comparing Distributions
Draw side-by-side boxplots to compare
distributions.
Exercise 1.36, p. 47
Exercise 1.36, p. 47 (Statcrunch)
Column
n
Mean
Variance
Std. Dev.
Median
Range
Min
Women
18
141.05556
698.8791
26.436321
138.5
99
101
Men
20
121.25
1079.25
32.85194
114.5
117
70
Max
Q1
Q3
200
126
154
187
98
143
Measuring Spread
Calculating the central tendency of a
distribution is only half the story. We also
need to consider the spread.
Quartiles and IQR
The first quartile, Q1, is the median of
the first half of the data.
The third quartile, Q3, is the median of
the second half of the data.
Inter-Quartile Range:
IQR=Q3 - Q1
Outliers
Before, we said something is an outlier if it
looked to be one.
Now, we can use the IQR to create a
mathematical rule:
xi Q1 1.5 * IQR ?
xi Q3 1.5 * IQR ?
Another Measure of Spread:
Standard Deviation
Probably the most important and mathematically useful
measure of spread.
Used along with mean.
Like the mean, it is not a resistant measure.
Calculating it: In your notes, write this out in words.
n
s
2
(
x
x
)
i
i 1
n 1
Variance
s2
Properties of Standard Deviation (p. 51)
Measures spread about the mean
Is greater than or equal to zero.
Strong skewness or a few outliers can make s
very large.
More on Comparing Distributions
Example 1.17, p. 57
Also include numerical
summaries
Back-to-back stemplot:
HW
Reading through end of chapter
Exercises:
1.40, p. 52
1.47, p. 59
Effects of a linear transformation
Includes adding and/or multiplying by a constant.
Example 1.15, p. 53
Xnew=a+bxold
1. Enter salary data in L1. Compute summary stats.
2. Add a constant amount of 0.5 ($500,000) to each salary,
and put these new data in L2. Compute summary stats.
3. Now suppose each player receives a 10% raise (multiply
by 1.10). Put these data in L3, then compute summary stats.
Compare summary stats. What happened?
Effects of a linear transformation
Summary, p. 55:
Multiply by b:
Center (mean and median) and quartiles and
spread (s, IQR) multiply by b.
Add a:
Center (mean and median) and quartiles: add a
Spread (s, IQR): no change
Preparing for your chapter 1 test
Section 1.2 summary, p. 61
Chapter review, pp. 64-66
Exercise 1.55, p. 63
Exercises, pp. 66-72:
1.63
1.64
1.65
1.67
1.68