1.2 Describing Distributions with Numbers
Download
Report
Transcript 1.2 Describing Distributions with Numbers
1.2 Describing Distributions with Numbers
Is the mean a good measure of center?
Ex. Roger Maris’s yearly homerun production:
8 13 14 16 23 26 28 33 39 61
Mean/Mean…(Centers)
Both measure center in different ways, but
both are useful.
Use median if you want a “typical” number.
Mean = “Arithmetic Average Value”
Mean/Median of a symmetric distribution are
close together. If a distribution is exactly
symmetric, mean = median.
In a skewed distribution, the mean is farther
out in the long tail than the median.
Measures of Spread
Range
Quartiles
5 # Summary
Variance
Standard Deviation
Range = Largest –
Smallest Observations
in a list. What’s the
problem with this?
Better measure of
spread: Quartiles.
Male/Female Surgeons (# of
hysterectomies performed)
Put in ascending order (male dr.s): odd #
20 25 25 27 28 31 33 34 36 37 44 50 59 85 86
Min
Q1
M
Q3
Max
Put in ascending order (female dr.s): even #
5
7
10
14 18 19 25 29
31
Min
Q1
M = 18.5
Q3
33
Max
Boxplots
You can instantly see that female dr.’s perform less
hysterectomies than male doctors.
Also, there is less variation among female doctors.
Notes on boxplots
Best used for side-by-side comparisons
of more than 1 distribution.
Less detail than histograms or stem
plots.
Always include the numerical scale.
..\Simulations\Hotdog Data.xls
Travel Times to Work #1
How long does it take you to get from home to
school? Here are the travel times from home to work
in minutes for 15 workers in North Carolina, chosen
at random by the Census Bureau:
30 20 10 40 25 20 10 60 15 40
5
30 12 10 10
The distribution…
Describe
Is the longest travel time (60 minutes) an outlier?
How many of the travel times are larger than the
mean?
If you leave out the large time, how does that
change the mean?
The mean in this example is nonresistant because it
is sensitive to the influence of extreme observations.
The mean is the arithmetic average, but it may not
be a “typical“number!
Travel Times to Work #2
Travel times to work in New York State
are (on the average) longer than in
North Carolina. Here are the travel
times in minutes of 20 randomly chosen
New York workers:
10 30 5
25 40 20 10 15
30 20 15 20 85 15 65
15 60 60 40 45
Interquartile Range (IQR)
Measures the spread of the middle ½ of the
data.
An observation is an outlier if:
Less than Q1 – 1.5(IQR) or
Greater than Q3 + 1.5(IQR)
Looking at the spread….
Quartiles show spread of middle ½ of data
Spacing of the quartiles and extremes about
the median give an indication of the
symmetry or skewness of the distribution.
Symmetric distributions:1st/3rd quartiles equally
distant from the median.
In right-skewed distributions: 3rd quartile will be
farther above the median than the 1st quartile
is below it.
Is there a difference between the number of programmed
telephone numbers in girls’ cell phones and the number of
programmed numbers in boys’ cell phones? Do you think there
is a difference? If so, in what direction?
1) Count the number of programmed telephone numbers in your
cell phone and write the total on a piece of paper.
2) Make a back-to-back stemplot of this information, then draw
boxplots. When you test for outliers, how many do you find for
males and how many do you find for females using the 1.5 X
IQR test?
3) Find the 5# Summary for each group. Compare the two
distributions (SOCS!).
4) It is important in any study that you have “data integrity” (the
data is reported accurately and truthfully). Do you think this is
the case here? Do you see any suspicious observations? Can
you think of any reason someone may make up a response or
stretch the truth? If you DO see a difference between the two
groups, can you suggest a possible reason for this difference?
5) Do you think a study of cell phone programmed numbers for a
sophomore algebra class would yield similar results? Why or
why not?
Spring ’09 Student Data
Girls: 53 457 24 136 222 106 237
75 296 154 275 70 134
Boys: 298 65 81
60 176 33
95
35
141 247
Standard Deviation: A measure of spread
Standard deviation looks at how far observations are
from their mean.
It’s the natural measure of spread for the Normal
distribution
We like s instead of s-squared (variance) since the
units of measurement are easier to work with (original
scale)
S is the average of the squares of the deviations of
the observations from their mean.
S, like the mean, is strongly influenced by
extreme observations. A few outliers can
make s very large.
Skewed distributions with a few observations
in the single long tail = large s. (S is therefore
not very helpful in this case)
As the observations become more spread
about the mean, s gets larger.
Mean vs. Median
Standard Deviation vs. 5-Number Summary
The mean and standard deviation are more common than the
median and the five number summary as a measure of center
and spread.
No single # describes the spread well.
Remember: A graph gives the best overall picture of a
distribution. ALWAYS PLOT YOUR DATA!
The choice of mean/median depends upon the shape of the
distribution.
When dealing with a skewed distribution, use the median and
the 5# summary.
When dealing with reasonably symmetric distributions, use the
mean and standard deviation.
The variance and standard deviation are…
LARGE if observations are widely spread about
the mean
SMALL if observations are close to the mean
Degrees of Freedom (n-1)
Definition: the number of independent pieces of
information that are included in your measurement.
Calculated from the size of the sample. They are a
measure of the amount of information from the
sample data that has been used up. Every time a
statistic is calculated from a sample, one degree of
freedom is used up.
If the mean of 4 numbers is 250, we have degrees
of freedom (4-1) = 3. Why?
____ ____ ____ ____ mean = 250
If we freely choose numbers for the first 3 blanks,
the 4th number HAS to be a certain number in order
to obtain the mean of 250.
A person’s metabolic rate is the rate at which the body
consumes energy. Metabolic rate is important in
studies of weight gain, dieting, and exercise. Here are
the metabolic rates of 7 men who took part in a study
of dieting:
1792 1666 1362 1614 1460 1867 1439
Find the mean
Column 1: Observations (x)
Column 2: Deviations
Column 3: Squared deviations
(TI-83: STAT/Calc/1-var-Stats L1 after entering list into L1)
You do! (By Hand)
Let X = 3,7,15, 23
What is the variance and standard
deviation?
You do! (using 1 Var Stats)
During the years 1929-1939 of the Great
Depression, the weekly average hours worked
in manufacturing jobs were 45, 43, 41, 39, 39,
35, 37, 40, 39, 36, and 37. What is the variance
and standard deviation?
Miami Heat Salaries
1) Suppose that each member
receives
a $100,000 bonus. How will this
effect the
center, shape, and spread?
2) Suppose that each player
is offered 10%
increase in base salary.
What happened to
the centers and spread?
Player
Salary
Shaq
27.7
Eddie Jones
13.46
Wade
2.83
Jones
2.5
Doleac
2.4
Butler
1.2
Wright
1.15
Woods
1.13
Laettner
1.10
Smith
1.10
Anderson
.87
Dooling
.75
Wang
.75
Haslem
.62
Mourning
.33