distribution

Download Report

Transcript distribution

Chapter 4:
Describing Distributions



4.1 Graphs: good and bad
4.2 Displaying distributions with graphs
4.3 Describing distributions with numbers
1
Dow Jones Industrial Average
2
Pie Graph
3
Definitions

Types of variables

Categorical


Quantitative


E.g., gender, type of degree
E.g., time, mass, force, dollars
The distribution of a variable tells us what values it
takes and how often it takes these values.
4
Bar graph showing a distribution
Percent of Total
Education Level in U.S. (adults age 25+)
50
40
33.1
25.4
30
20
25.6
15.9
10
0
No high
school
degree
High school 1-3 years of 4+ years of
only
college
college
Years of Schooling
5
Exercises, pp. 207-208


4.1
4.5
6
Bar graph for 4.1
Lottery Game Sales Distribution
18000
16420
16000
Sales (million $)
14000
12000
10000
8865
8000
5245
6000
4000
5134
2776
2000
0
Instant
3-digit
4-digit
Lotto
Other
Type of Game
7
Pie Chart for 4.1
Lottery Game Sales Distribution (percent of total)
13.4
Instant
42.7
23.1
3-digit
4-digit
Lotto
Other
7.2
13.6
8
Misleading Pictogram (p. 209)
Worker Salary
$2000/mo
Manager Salary
$4000/mo
9
Dow Jones Industrial Average:
This is a line graph (p. 210)
10
Misleading Graphs?
Salaries are Going Up!
Monthly Salary ($)
Monthly Salary ($)
Salaries Barely
Increased
3000
2500
2000
1500
1000
500
0
1994
2004
Year
2350
2300
2250
2200
2150
2100
2050
2000
1994
2004
Year
11
Making good graphs (p. 213)



Graphs must have labels, legends, and titles.
Make the data stand out.
Pay attention to what the eye sees.

3-D is really not necessary!
12
Exercises, pp. 214-216

4.6 through 4.8
13
Homework

Problems, pp. 219-221, to be done in Excel:



4.11, 4.15
Email Excel file by class time on Monday
Section 4.2 Reading, pp. 221-242
14
4.2 Displaying Distributions with Graphs
15
Displaying distributions graphically


The distribution of a variable tells us what
values it takes and how often it takes these
values.
Ways to display distributions for quantitative
variables:




dotplots
histograms
stemplots
See example on pp. 221-222.
16
Figure 4.15: A histogram
17
Figure 4.16: A stemplot
18
Histograms


Most common graph of the distribution of a
quantitative variable.
How to make a histogram: Example 4.9, p. 224


Range: 5.7 to 17.6
Shoot for 6-15 classes (bars)
17.6  5.7
 1.19  intervals of size 1
10

Read paragraph on p. 226
19
Example 4.9, pp. 224-226
20
Practice Problem: 4.18, p. 226
21
Exercise 4.18

Histogram



By hand
Using calculator
Stemplot

By hand
22
Interpreting the graphical displays


Concentrate on the main features.
Overall pattern (p. 230)


Shape, center, spread
Outliers

Individual observations outside the overall pattern
of the graph
23
Example 4.10, p. 230
24
Shape


Symmetric or skewed (p. 231)?
Is it unimodal (one hump) or bimodal (two
humps)?
25
Homework

Reading: pp. 221-242
26
Stemplots


Usually reserved for smaller data sets.
Advantage:


Actual (or rounded) data are provided.
Possible drawback:

Many people are not used to this type of plot, so
the presenter/writer has to describe it.
27
How to make a stemplot, p. 236
28
More problems

Exercises:


4.24 and 4.25, p. 233
4.26, p. 233
29
Practice


Exercises 4.30, p. 239 and
4.32, p. 240
4.28, p. 238
30
Wrapping up Section 4.2 …




4.28, p. 238
4.33, p. 242
4.36
4.37
31
4.3 Describing Distributions
with Numbers

Until now, we’ve been satisfied with using
words to describe the center and spread of
distributions.


Now, we will use numbers to describe these
characteristics of a distribution.
The 5-number summary:



Center: Median (p. 248)
Spread: Find the Quartiles, Q1 and Q3. (p. 250)
Spread: Min and Max
32
Boxplots

We can use this information to
construct a boxplot:
33
Practice

4.46, p. 254

Enter data in the Stat Edit menu in your
calculator, and order them.
34
Boxplot vs. Modified Boxplot

The modified boxplot shows outliers … they are
marked with a *. The lines extending from the
quartiles go to the last number which is not an
outlier.


If there are no outliers, the modified boxplot and the regular
boxplot are identical.
Below are a boxplot (on the left) and modified
boxplot (on the right) for Problem 4.39, p. 245.
35
Side-by-side boxplots (p. 252)
36
Practice

Exercises:


4.50, p. 256
4.49, p. 256
37
Testing for Outliers

Find the Inter-Quartile Range:



Multiply: 1.5*IQR
Outliers on low side:


Q1-1.5*IQR
Outliers on high side:


IQR=Q3-Q1
Q3+1.5*IQR
Are there any numbers outside of these values?


If so, they are outliers, and are marked on boxplots with an
asterisk.
The tail is drawn to the highest (or lowest) value which is
not an outlier.
38
Measures of Center and Spread


Median and IQR
Mean and Standard Deviation




Mean is the arithmetic average
Standard deviation measures the average distance of the
observations from their mean.
Variance is simply the squared standard deviation.
All of these statistics can be calculated by hand, but
we use technology to do these today …

We use 1-sample stats on our calculators, or a stats
program.
39
Properties of standard deviation (p. 259)



Use s as a measure of spread when you
use the mean.
If s=0, there is no spread.
The larger the value for s, the larger the
spread of the distribution.
40
Practice Problem

4.52, p. 263

Mike:
59,69,71,52,65,55,72,50,75,67,51,69,68,62,69
41
Practice Problem

4.55, p. 263
42
Example 4.21, p. 265
43
Choosing a summary

The book has a section on which summary to use (mean and
std. dev., or median with the quartiles).



I like to report all of them.
However, when writing about a distribution, or comparing
distributions, we should think about which summary works best.
See p. 266.

Skewed, outliers … median and quartiles

Symmetrical, no (or few) outliers … mean and std. dev.
Mean and standard deviation are most common. One reason is
that they allow for more sophisticated calculations to be used in
higher statistics.
44
More Practice …

p. 271:

4.57, 4.58, 4.60
45