distribution
Download
Report
Transcript distribution
Chapter 4:
Describing Distributions
4.1 Graphs: good and bad
4.2 Displaying distributions with graphs
4.3 Describing distributions with numbers
1
Dow Jones Industrial Average
2
Pie Graph
3
Definitions
Types of variables
Categorical
Quantitative
E.g., gender, type of degree
E.g., time, mass, force, dollars
The distribution of a variable tells us what values it
takes and how often it takes these values.
4
Bar graph showing a distribution
Percent of Total
Education Level in U.S. (adults age 25+)
50
40
33.1
25.4
30
20
25.6
15.9
10
0
No high
school
degree
High school 1-3 years of 4+ years of
only
college
college
Years of Schooling
5
Exercises, pp. 207-208
4.1
4.5
6
Bar graph for 4.1
Lottery Game Sales Distribution
18000
16420
16000
Sales (million $)
14000
12000
10000
8865
8000
5245
6000
4000
5134
2776
2000
0
Instant
3-digit
4-digit
Lotto
Other
Type of Game
7
Pie Chart for 4.1
Lottery Game Sales Distribution (percent of total)
13.4
Instant
42.7
23.1
3-digit
4-digit
Lotto
Other
7.2
13.6
8
Misleading Pictogram (p. 209)
Worker Salary
$2000/mo
Manager Salary
$4000/mo
9
Dow Jones Industrial Average:
This is a line graph (p. 210)
10
Misleading Graphs?
Salaries are Going Up!
Monthly Salary ($)
Monthly Salary ($)
Salaries Barely
Increased
3000
2500
2000
1500
1000
500
0
1994
2004
Year
2350
2300
2250
2200
2150
2100
2050
2000
1994
2004
Year
11
Making good graphs (p. 213)
Graphs must have labels, legends, and titles.
Make the data stand out.
Pay attention to what the eye sees.
3-D is really not necessary!
12
Exercises, pp. 214-216
4.6 through 4.8
13
Homework
Problems, pp. 219-221, to be done in Excel:
4.11, 4.15
Email Excel file by class time on Monday
Section 4.2 Reading, pp. 221-242
14
4.2 Displaying Distributions with Graphs
15
Displaying distributions graphically
The distribution of a variable tells us what
values it takes and how often it takes these
values.
Ways to display distributions for quantitative
variables:
dotplots
histograms
stemplots
See example on pp. 221-222.
16
Figure 4.15: A histogram
17
Figure 4.16: A stemplot
18
Histograms
Most common graph of the distribution of a
quantitative variable.
How to make a histogram: Example 4.9, p. 224
Range: 5.7 to 17.6
Shoot for 6-15 classes (bars)
17.6 5.7
1.19 intervals of size 1
10
Read paragraph on p. 226
19
Example 4.9, pp. 224-226
20
Practice Problem: 4.18, p. 226
21
Exercise 4.18
Histogram
By hand
Using calculator
Stemplot
By hand
22
Interpreting the graphical displays
Concentrate on the main features.
Overall pattern (p. 230)
Shape, center, spread
Outliers
Individual observations outside the overall pattern
of the graph
23
Example 4.10, p. 230
24
Shape
Symmetric or skewed (p. 231)?
Is it unimodal (one hump) or bimodal (two
humps)?
25
Homework
Reading: pp. 221-242
26
Stemplots
Usually reserved for smaller data sets.
Advantage:
Actual (or rounded) data are provided.
Possible drawback:
Many people are not used to this type of plot, so
the presenter/writer has to describe it.
27
How to make a stemplot, p. 236
28
More problems
Exercises:
4.24 and 4.25, p. 233
4.26, p. 233
29
Practice
Exercises 4.30, p. 239 and
4.32, p. 240
4.28, p. 238
30
Wrapping up Section 4.2 …
4.28, p. 238
4.33, p. 242
4.36
4.37
31
4.3 Describing Distributions
with Numbers
Until now, we’ve been satisfied with using
words to describe the center and spread of
distributions.
Now, we will use numbers to describe these
characteristics of a distribution.
The 5-number summary:
Center: Median (p. 248)
Spread: Find the Quartiles, Q1 and Q3. (p. 250)
Spread: Min and Max
32
Boxplots
We can use this information to
construct a boxplot:
33
Practice
4.46, p. 254
Enter data in the Stat Edit menu in your
calculator, and order them.
34
Boxplot vs. Modified Boxplot
The modified boxplot shows outliers … they are
marked with a *. The lines extending from the
quartiles go to the last number which is not an
outlier.
If there are no outliers, the modified boxplot and the regular
boxplot are identical.
Below are a boxplot (on the left) and modified
boxplot (on the right) for Problem 4.39, p. 245.
35
Side-by-side boxplots (p. 252)
36
Practice
Exercises:
4.50, p. 256
4.49, p. 256
37
Testing for Outliers
Find the Inter-Quartile Range:
Multiply: 1.5*IQR
Outliers on low side:
Q1-1.5*IQR
Outliers on high side:
IQR=Q3-Q1
Q3+1.5*IQR
Are there any numbers outside of these values?
If so, they are outliers, and are marked on boxplots with an
asterisk.
The tail is drawn to the highest (or lowest) value which is
not an outlier.
38
Measures of Center and Spread
Median and IQR
Mean and Standard Deviation
Mean is the arithmetic average
Standard deviation measures the average distance of the
observations from their mean.
Variance is simply the squared standard deviation.
All of these statistics can be calculated by hand, but
we use technology to do these today …
We use 1-sample stats on our calculators, or a stats
program.
39
Properties of standard deviation (p. 259)
Use s as a measure of spread when you
use the mean.
If s=0, there is no spread.
The larger the value for s, the larger the
spread of the distribution.
40
Practice Problem
4.52, p. 263
Mike:
59,69,71,52,65,55,72,50,75,67,51,69,68,62,69
41
Practice Problem
4.55, p. 263
42
Example 4.21, p. 265
43
Choosing a summary
The book has a section on which summary to use (mean and
std. dev., or median with the quartiles).
I like to report all of them.
However, when writing about a distribution, or comparing
distributions, we should think about which summary works best.
See p. 266.
Skewed, outliers … median and quartiles
Symmetrical, no (or few) outliers … mean and std. dev.
Mean and standard deviation are most common. One reason is
that they allow for more sophisticated calculations to be used in
higher statistics.
44
More Practice …
p. 271:
4.57, 4.58, 4.60
45