Density Curves

Download Report

Transcript Density Curves

HOW TALL ARE YOU?
2.1A
DESCRIBING LOCATION
DISTRIBUTION



IN A
Using percentiles
Making an Ogive
Calculating and Interpreting
a z-score
What is a
percentile?
Wins in Major League Baseball
The stemplot below shows the number of wins
for each of the 30 Major League Baseball teams
in 2009.
5
6
7
8
9
10
9
2455
00455589
0345667778
123557
3
Key: 5|9 represents a
team with 59 wins.
5
6
7
8
9
10
9
2455
00455589
0345667778
123557
3
Key: 5|9 represents a
team with 59 wins.
Calculate and interpret the percentiles for the
Colorado Rockies (92 wins), the New York Yankees
(103 wins) and the Cleveland Indians (65 wins).
Ogive?
Age of Senators in the 103rd
Congress (n=100)
30 −< 40
40 −< 50
50 −< 60
60 −< 70
70 −< 80
80 −< 90
Freq
Rel Freq
1
16
49
22
11
1
0.01
0.16
0.49
0.22
0.11
0.01
Cumul. Rel.
Freq
Age of Senators in the 103rd
Congress (n=100)
30 −< 40
40 −< 50
50 −< 60
60 −< 70
70 −< 80
80 −< 90
Freq
Rel Freq
Cumul. Rel.
Freq
1
16
49
22
11
1
0.01
0.16
0.49
0.22
0.11
0.01
0.01
0.17
0.66
0.88
0.99
1.00
Age of Representatives in the 103rd
Congress (n=435)
Freq
30 −< 40
40 −< 50
50 −< 60
60 −< 70
70 −< 80
80 −< 90
47
153
131
89
12
3
Rel Freq
Cumul. Rel.
Freq
State Median Household Incomes
Below is a cumulative relative frequency graph
showing the distribution of median household
incomes for the 50 states and the District of
Columbia.
a) California, with a median household income of
$57,445, is at what percentile? Interpret this
value.
b) What is the 25th percentile for this distribution?
What is another name for this value?
c) Where is the graph the steepest?
What does this indicate about the distribution?
z-score?
Macy, a 3-year-old female is 100 cm tall.
Brody, her 12-year-old brother is 158 cm
tall. Obviously, Brody is taller than
Macy—but who is taller, relatively
speaking? That is, relative to other kids
of the same ages, who is taller?
(According to the CDCP, the heights of
three-year-old females have a mean of
94.5 cm and a standard deviation of 4
cm. The mean height for 12-year-olds
males is 149 cm with a standard
deviation of 8 cm.)
Bellwork: 9/17/15
What information can be
obtain from an Ogive?
*Percentiles- Ms. McDonald screwed up!
Year
1927
1961
1998
2001
Player
Babe Ruth
Roger Maris
Mark McGwire
Barry Bonds
HR
60
61
70
73
Mean
7.2
18.8
20.7
21.4
SD
9.7
13.4
12.7
13.2
To make a fair comparison, we should see
how these performances rate relative to
others hitters during the same year.
Calculate the standardized score for each
player and compare.
Year
1927
1961
1998
2001
Player
Babe Ruth
Roger Maris
Mark McGwire
Barry Bonds
HR
60
61
70
73
Mean
7.2
18.8
20.7
21.4
SD
9.7
13.4
12.7
13.2
In 2001, Arizona Diamondback
Mark Grace’s home run total had
a standardized score of z = –0.48.
Interpret this value and calculate
the number of home runs he hit.
2.1B
DESCRIBING LOCATION IN A
DISTRIBUTION
TRANSFORM data
DEFINE and DESCRIBE density curves
0
10
20
30
40
PTS
50
60
Here is a dotplot of Kobe Bryant’s point totals for
each of the 82 games in the 2008-2009 regular
season. The mean of this distribution is 26.8 with
a standard deviation of 8.6 points.
In what percentage of games did he score
within one standard deviation of his mean?
Within two standard deviations?
70
Dot Plot
individual_player_gamebygamelog
0
2
4
6
8
SO
10
12
14
Here is a dotplot of Tim Lincecum’s strikeout totals
for each of the 32 games he pitched in during the
2009 regular season. The mean of this distribution
is 8.2 with a standard deviation of 2.8.
In what percentage of games were his strikeouts
within one standard deviation of his mean?
Within two standard deviations?
16
Effect of Adding (or Subtracting) a Constant
Adding the same number a (either positive,
zero, or negative) to each observation:
•adds a to measures of center and location
(mean, median, quartiles, percentiles), but
•Does not change the shape of the
distribution or measures of spread (range,
IQR, standard deviation).
Effect of Multiplying (or Dividing) by a Constant
Multiplying (or dividing) each observation by the
same number b (positive, negative, or zero):
•multiplies (divides) measures of center and
location by b
•multiplies (divides) measures of spread by |b | ,
but
•does not change the shape of the distribution
In 2010, Taxi Cabs in New York City charged an
initial fee of $2.50 plus $2 per mile.
In equation form: fare = 2.50 + 2(miles).
At the end of a month a businessman collects all
of his taxi cab receipts and analyzed the
distribution of fares. The distribution was
skewed to the right with a mean of $15.45 and a
standard deviation of $10.20.
a) What are the mean and standard
deviation of the lengths of his cab rides in
miles?
Density Curves
In Chapter 1, we developed a kit of
graphical and numerical tools for
describing distributions. Now, we’ll add
one more step to the strategy.
Exploring Quantitative Data
1. Always plot your data: make a graph.
2. Look for the overall pattern (shape, center,
and spread) and for striking departures
such as outliers.
3. Calculate a numerical summary to briefly
describe center and spread.
4. Sometimes the overall pattern of a large
number of observations is so regular that
we can describe it by a smooth curve.
Definition:
A density curve is a curve that
•is always on or above the horizontal axis, and
•has area exactly 1 underneath it.
A density curve describes the overall pattern
of a distribution. The area under the curve
and above any interval of values on the
horizontal axis is the proportion of all
observations that fall in that interval.
In this section, we learned that…

There are two ways of describing an individual’s location
within a distribution – the percentile and z-score.

A cumulative relative frequency graph allows us to
examine location within a distribution.

It is common to transform data, especially when
changing units of measurement. Transforming data can
affect the shape, center, and spread of a distribution.

We can sometimes describe the overall pattern of a
distribution by a density curve (an idealized description
of a distribution that smooths out the irregularities in the
actual data).