Transcript Statistic

Statistics and Data
(Algebraic)
Sec. 9.7a
Some Definitions…
Statistic – numbers associated with a data set
(when used to describe the individuals in the data set,
they are called descriptive statistics)
Parameter – numbers associated with an entire population
(we gather information from samples of the
population, then use inferential statistics to
make inferences about parameters)
Some Definitions…
A 1996 study reported that 33% of adolescents say there is no
adult at home when they return from school. The report was
based on a survey of 600 randomly selected people aged 12 to
17 years old and had a margin of error of +4%. Did the survey
measure a parameter or a statistic, and what does that “margin
of error” mean?
The survey did not measure all adolescents in the population,
so it did not measure a parameter. They sampled 600
adolescents and found a statistic…
However, note that the first sentence is making an inference
about all American adolescents…
Some Definitions…
A 1996 study reported that 33% of adolescents say there is no
adult at home when they return from school. The report was
based on a survey of 600 randomly selected people aged 12 to
17 years old and had a margin of error of +4%. Did the survey
measure a parameter or a statistic, and what does that “margin
of error” mean?
Interpret the margin of error as meaning “between 29% and
37% of all American adolescents would say that there is no
adult home when they return from school.”
Some Definitions…
What is the mathematical meaning of the word “average?”
 Three possible meanings, all of them measures of center.
The mean of a list of n numbers
x1  x2 
X
n
x1, x2 ,
, xn  is
n
 xn 1
  xi
n i 1
The mean is also called the arithmetic mean, arithmetic average,
or average value.
EX: “The average on last week’s test was 83.4.”
Some Definitions…
What is the mathematical meaning of the word “average?”
 Three possible meanings, all of them measures of center.
The median of a list of n numbers
x1, x2 ,
, xn 
arranged in order (either ascending or descending) is
• the middle number if n is odd, and
• the mean of the two middle numbers if n is even.
EX: “The average test score puts you right in the middle
of the class.”
Some Definitions…
What is the mathematical meaning of the word “average?”
 Three possible meanings, all of them measures of center.
The mode of a list of numbers is the number that appears most
frequently in the list.
EX: “The average American student starts college at age 18.”
Note: A statistic is called resistant if it is not strongly affected
by outliers…………………………which of our three averages
would be considered resistant?
Guided Practice
Find the mean, median, and mode of the annual home run totals
for Roger Maris’s major league career:
14, 28,16,39,61,33, 23, 26,8,13,9,5
Mean:
14  28   9  5 275
X

 22.9
12
12
Is this statistic resistant ?
 Not really…
Guided Practice
Find the mean, median, and mode of the annual home run totals
for Roger Maris’s major league career:
14, 28,16,39,61,33, 23, 26,8,13,9,5
To find the median, first write the data set in order:
5,8,9,13,14,16, 23, 26, 28,33,39,61
Because there are 12 numbers, we average the middle two:
16  23
Median:
 19.5
2
Is this statistic resistant ?
Much more so than
the mean…
Guided Practice
Find the mean, median, and mode of the annual home run totals
for Roger Maris’s major league career:
14, 28,16,39,61,33, 23, 26,8,13,9,5
How about the mode???
This data set has no mode!!!
So what’s the mode for Hank Aaron’s home run totals?
(see Table 9.8 on p.764)
Mode: 44
The mode is typically the least important measure
of center, but it sometimes has statistical significance…
Guided Practice
A teacher gives a 10-point quiz and records the scores in a
frequency table shown below. Find the mode, median, and
mean of the data set.
Score
10 9 8 7 6 5 4 3 2 1 0
Frequency 2 2 3 8 4 3 3 2 1 1 1
First, how many total scores are there?
Add the frequencies  there are 30 scores
To find the mode, look for the score with the highest frequency.
Mode: 7
Guided Practice
A teacher gives a 10-point quiz and records the scores in a
frequency table shown below. Find the mode, median, and
mean of the data set.
Score
10 9 8 7 6 5 4 3 2 1 0
Frequency 2 2 3 8 4 3 3 2 1 1 1
The median will be the mean of the 15th and 16th numbers.
Count the frequencies from left to right until we come to 15.
The 15th number is 7, and the 16th number is 6.
Median: 6.5
Guided Practice
A teacher gives a 10-point quiz and records the scores in a
frequency table shown below. Find the mode, median, and
mean of the data set.
Score
10 9 8 7 6 5 4 3 2 1 0
Frequency 2 2 3 8 4 3 3 2 1 1 1
To find the mean, multiply each number by its frequency, add the
products, and divide the total by 30:
10  2   9  2   8  3  7 8   6  4   5  3
4  3  3  2   2 1  11  0 1
30
Mean: 5.93
Guided Practice
Let’s try a problem that uses the concept of weighted mean:
At a certain school, it is a policy that the final exam must count
25% of the final grade. If Sam has an 88.5 average going into
the final exam, what is the minimum exam score needed to
earn a 90 for the semester?
Assume that an 89.5 will be rounded up to a 90 on the transcript:
88.5  0.75  x  0.25  89.5  x  92.5
Sam needs to make at least a 92.5 on the final exam.
The Five-Number
Summary
The Five-Number Summary
The measures of center from last class tell part of the story, but
we also need measures of spread.
Range – the difference between the maximum and minimum
values in a data set.
Quartiles – separate a data set into fourths (just as the median
separates a data set into halves)
First Quartile (Q 1) – the median of the lower half of the data
Second Quartile – the median
Third Quartile (Q3 ) – the median of the upper half of the data
The Five-Number Summary
The measures of center from last class tell part of the story, but
we also need measures of spread.
Interquartile Range (IQR) – measures the spread between the
first and third quartiles (comprises the middle half of the data):
IQR = Q 3 – Q 1
Definition: Five-Number Summary
The five-number summary of a data set is the collection:
{minimum, Q 1 , median, Q 3 , maximum}
Guided Practice
Find the five-number summary for the male and female life
expectancies in South American nations (Table 9.12 on p.768)
and compare the spreads.
Males:
{59.0, 60.5, 61.5,
66.7, 67.9, 68.5,
69.0, 70.3, 71.4,
71.9, 72.1, 72.6}
59.0,64.1,68.75,71.65,72.6
72.8, 74.3,
74.4, 74.6, 76.5, 76.6,
78.8, 79.0, 79.4}
Females:
{66.2, 66.7, 67.7,
Five-Number Summaries:
Males:
Range: 72.6 – 59.0 = 13.6, IQR = 71.65 – 64.1 = 7.55
Females:
66.2,70.25,74.5,77.7,79.4
Range: 79.4 – 66.2 = 13.2, IQR = 77.7 – 70.25 = 7.45
Guided Practice
Five-Number Summaries:
Males:
59.0,64.1,68.75,71.65,72.6
Range: 72.6 – 59.0 = 13.6, IQR = 71.65 – 64.1 = 7.55
Females:
66.2,70.25,74.5,77.7,79.4
Range: 79.4 – 66.2 = 13.2, IQR = 77.7 – 70.25 = 7.45
Not only do the women live longer, but there is less
variability in their life expectancies (as measured by IQR).
Male life expectancy is more strongly affected by different
political conditions within countries (war, crime, etc.).
The shapes of distributions
Of the two histograms shown below, which displays a data set
with more variability? Explain your answer.
(a)
(b)
The extreme values in (a) cause the range to be big, but the
compact distribution indicate a small IQR. The data in (b)
exhibit high variability.
The shapes of distributions
Compare the medians and means for the data displayed in the
three histograms below.
(a)
(b)
(c)
Symmetric
Distribution
Skewed Right
Distribution
Skewed Left
Distribution
Mean = Median
Mean > Median
Mean < Median
Guided Practice
Determine the five-number summary, the range, and the IQR for
the annual home run production data for Mark McGwire and
Barry Bonds (Table 9.6 on p.763).
McGwire
{ 3, 9, 9, 22, 29, 32, 32, 33, 39, 39, 42, 49, 52, 58, 65, 70 }
Note: The underlined numbers are those of interest for the
five-number summary.
Five-Number Summary: { 3, 25.5, 36, 50.5, 70 }
Range: 70 – 3 = 67
IQR: 50.5 – 25.5 = 25
No Outliers
Guided Practice
Determine the five-number summary, the range, and the IQR for
the annual home run production data for Mark McGwire and
Barry Bonds (Table 9.6 on p.763).
Bonds
{ 16, 19, 24, 25, 25, 33, 33, 34, 34, 37, 37, 40, 42, 46, 49, 73 }
Note: The underlined numbers are those of interest for the
five-number summary.
Five-Number Summary: { 16, 25, 34, 41, 73 }
Range: 73 – 16 = 57
IQR: 41 – 25 = 16
Outlier: 73