chap_03_PPTx - OCVTS MATES-STAT

Transcript chap_03_PPTx - OCVTS MATES-STAT

Averages and
Variation
3
Larson/Farber 4th ed.
Copyright © Cengage Learning. All rights reserved.
Section
3.1
Larson/Farber 4th ed.
Measures of Central
Tendency: Mode,
Median, and Mean
Copyright © Cengage Learning. All rights reserved.
2
Focus Points
•
Compute mean, median, and mode from raw data
•
Interpret what mean, median, and mode tell you
•
Explain how mean, median, and mode can be affected
by extreme data values
•
What is a trimmed mean?
•
Compute a weighted average
Larson/Farber 4th ed.
3
Measures of Central Tendency
Measure of central tendency
A value that represents a typical, or central, entry of a
data set.
Most common measures of central tendency:
• Mean
• Median
• Mode
Larson/Farber 4th ed.
4
What Do These Statements Have in
Common?
• The average price of an ounce of gold is $1200.
• The Zippy car averages 39 miles per gallon on the
highway.
• A survey showed the average shoe size for women is
size 9.
Larson/Farber 4th ed.
5
Measures of Central Tendency: Mode,
Median, and Mean
In each of the preceding statements, one number is
used to describe the entire sample or population.
Such a number is called an average.
There are many ways to compute averages, but we
will study only three of the major ones. The easiest
average to compute is the mode.
Larson/Farber 4th ed.
6
Measures of Central Tendency
Measure of central tendency
Larson/Farber 4th ed.
7
Exercise 1 – Mode
Count the letters in each word of this sentence and give
the mode.
The numbers of letters in the words of the sentence are:
5
3
Larson/Farber 4th ed.
7
2
4
4
2
4
8
3
4
3
4
8
Exercise 1 – Mode
Unsorted
Sorted
5
2
3
2
7
3
2
3
4
3
4
4
2
4
4
4
8
4
3
4
4
5
3
7
4
8
Scanning the data, we see that 4 is the mode because
more words have 4 letters than any other number.
For larger data sets, it is useful to order—or sort—the
data before scanning them for the mode.
Larson/Farber 4th ed.
9
Is There Always a Mode?
Professor Fair gives equal numbers of A’s, B’s,
C’s, D’s, and F’s
No. There is no mode in this case.
Larson/Farber 4th ed.
10
Measures of Central Tendency: Mode
When no number occurs more than once in a data set,
there is no mode.
If each of two numbers occurs twice, we say the set is
bimodal.
Example (bimodal): 1, 1, 1, 2, 2, 2, 3
• 1 and 2 both have the most occurrences so they are
both modes
 Since they tie for first place, they both get it
• 3 is not a mode because it occurs less frequently
Larson/Farber 4th ed.
11
Measures of Central Tendency: Mode, Median, and
Mean
Another average that is useful is the median, or central
value, of an ordered distribution. The median is the
middle value.
When you are given the median, you know there are an
equal number of data values in the ordered
distribution that are above it and below it.
Larson/Farber 4th ed.
12
Measure of Central Tendency: Median
Median
The value that lies in the middle of the data when the
data set is ordered.
Measures the center of an ordered data set by dividing it
into two equal parts.
If the data set has an
• odd number of entries: median is the middle data
entry.
• even number of entries: median is the mean of
the two middle data entries.
Larson/Farber 4th ed.
13
Measures of Central Tendency: Mode, Median, and
Mean
Procedure:
Larson/Farber 4th ed.
14
Exercise 2 – Median
Case: n is even
What do barbecue-flavored potato chips cost?
According to Consumer Reports, Vol. 66, No. 5, the
prices per ounce in cents of the rated chips are
19
19
27
28 18
35
(a) To find the median, we first order the data, and then
note that there are an even number of entries.
So the median is constructed using the two middle
values.
Larson/Farber 4th ed.
15
Exercise 2 – Median
(Sorted) prices per ounce in cents of the rated chips are
18
19
19
27
18
35
(b) According to Consumer Reports, the brand with the
lowest overall taste rating costs 35 cents per ounce.
Eliminate that brand, and find the median price per
ounce for the remaining barbecue-flavored chips.
Larson/Farber 4th ed.
16
Exercise 2 – Median
Case: n is odd
Again order the data.
The median is simply the middle value.
18 19 19 27 28
middle value
Median = middle value = 19 cents
Larson/Farber 4th ed.
17
Measures of Central Tendency: Mode, Median, and
Mean
The median uses the position rather than the specific
value of each data entry. If the extreme values of a
data set change, the median usually does not change.
This is why the median is often used as the average for
house prices.
If one mansion costing several million dollars sells in a
community of much-lower-priced homes, the median
selling price for houses in the community would be
affected very little, if at all.
Larson/Farber 4th ed.
18
Measures of Central Tendency: Mode, Median, and
Mean
Note:
For small ordered data sets, we can easily scan the set to
find the location of the median.
However, for large ordered data sets of size n, it is
convenient to have a formula to find the middle of the
data set.
Larson/Farber 4th ed.
19
Measures of Central Tendency: Mode, Median, and
Mean
For instance, if n = 99 then the middle value is the
(99 +1)/2 or 50th data value in the ordered data.
If n = 100, then (100 + 1)/2 = 50.5 tells us that the two
middle values are in the 50th and 51st positions.
An average that uses the exact value of each entry is the
mean (sometimes called the arithmetic mean).
Larson/Farber 4th ed.
20
Measure of Central Tendency: Mean
Mean (average)
The sum of all the data entries divided by the number of
entries.
Sigma notation: Σx = add all of the data entries (x) in
the data set.
x

Population mean:
N
Sample mean:
Larson/Farber 4th ed.
x
x
n
21
Exercise 3 – Population Mean
To graduate, Linda needs at least a B in biology. She did
not do very well on her first three tests; however, she
did well on the last four. Here are all her scores:
58
67
60
84
93
98
100
Compute the mean and determine if Linda’s grade will
be a B (80 to 89 average) or a C (70 to 79 average).
Larson/Farber 4th ed.
22
Exercise 3 – Solution
x

N
Since the average is 80, Linda will get the needed B.
Larson/Farber 4th ed.
23
Exercise 4: Finding a Sample Mean
The prices (in dollars) for a sample of roundtrip flights
from Chicago, Illinois to Cancun, Mexico are listed.
What is the mean price of the flights?
872 432 397 427 388 782 397
Larson/Farber 4th ed.
24
Solution: Finding a Sample Mean
872 432 397 427 388 782 397
The sum of the flight prices is
Σx = 872 + 432 + 397 + 427 + 388 + 782 + 397 = 3695
To find the mean price, divide the sum of the prices by
the number of prices in the sample
x 3695
x

 527.9
n
7
The mean price of the flights is about $527.90.
Larson/Farber 4th ed.
25
Exercise 5: Compare Measures of Center
The credit loads of 40 randomly selected students
from a college shown below. Find the mean,
median and mode:
17
12
14
17
13
16
18
20
13
12
12
17
16
15
14
12
12
13
17
14
15
12
17
16
12
18
20
19
12
15
18
14
16
17
15
19
12
13
13
15
Larson/Farber 4th ed.
26
Exercise 5: Compare Measures of Center
Solution:
Mean
Median
Mode
15.0
15
12
If the state is going to fund the college according to the
“average” credit load, which “average” do you think
the college will report? Why?
Larson/Farber 4th ed.
27
What is a Resistant Measure?
Definition: A resistant measure is one that is not
influenced by extremely high or low data values.
• The mean is not a resistant measure of center
• We can make the mean as large as we want by
changing the size of only one data value
Larson/Farber 4th ed.
28
Can You Explain These Calculations?
Properties: January 2014
1 Main Street
13 Elm Street
17 Coconut Drive
4 Shady Grove Lane
Median Home Price
Mean Home Price
Price (Thousands of Dollars)
$250
$255
$236
$277
$253
$255
Properties: January 2015
1 Main Street
13 Elm Street
17 Coconut Drive
4 Shady Grove Lane
254 Winner's Circle
Median Home Price
Mean Home Price
Price (Thousands of Dollars)
$250
$255
$236
$277
$1,437
$255
$491
Measures of Central Tendency: Mode, Median, and
Mean
Measure
Resistant Comments
to
Outliers?
Mean
No
Median
Yes
Mode
Yes
Larson/Farber 4th ed.
Can be influenced by a
single outlier
Median home price is
often quoted by realtors
Mode is determined by
a single point on a scale
30
Measures of Central Tendency: Mode, Median, and
Mean
A measure of center that is more resistant than the mean
but still sensitive to specific data values is the
trimmed mean.
A trimmed mean is the mean of the data values left after
“trimming” a specified percentage of the smallest and
largest data values from the data set.
14 20 20 20 20 23 25 30 30 30 35 35 35 40 40 42 50 50 80 80
20 20 20 20 23 25 30 30 30 35 35 35 40 40 42 50 50 80
Larson/Farber 4th ed.
31
Measures of Central Tendency: Mode, Median, and
Mean
Usually a 5% trimmed mean is used. This implies that
we trim the lowest 5% of the data as well as the
highest 5% of the data. A similar procedure is used
for a 10% trimmed mean.
Procedure:
Larson/Farber 4th ed.
32
Exercise 6: Find measures of central tendency
The class sizes of 20 randomly chosen Introductory
Algebra classes in California are shown.
14 20 20 20 20 23 25 30 30 30
35 35 35 40 40 42 50 50 80 80
a) Compute the mean
b) Compute a 5% trimmed mean
Larson/Farber 4th ed.
33
Exercise 6 Solution: Find Trimmed Mean
Larson/Farber 4th ed.
34
Measures of Central Tendency: Mode, Median, and
Mean
Distribution
Comments
Symmetric
All measures of center are the same.
Left Skewed
Right Skewed
Mean (balance point) moves in the direction of
skew. Median divides total bar areas in half.
For skewed distributions, the median is the best
measure of center.
All
The mode corresponds to the tallest bar.
Larson/Farber 4th ed.
35
Example: Comparing the Mean, Median, and
Mode
Find the mean, median, and mode of the sample ages of
a class shown. Which measure of central tendency best
describes a typical entry of this data set? Are there any
outliers?
Ages in a class
Larson/Farber 4th ed.
20
20
20
20
20
20
21
21
21
21
22
22
22
23
23
23
23
24
24
65
36
Example: Comparing the Mean, Median, and
Mode
Ages in a class
20
20
20
20
20
20
21
21
21
21
22
22
22
23
23
23
23
24
24
65
Mean:
x 20  20  ...  24  65
x

 23.8 years
n
20
Median:
21  22
 21.5 years
2
Mode:
Larson/Farber 4th ed.
20 years (the entry occurring with the
greatest frequency)
37
What is the Shape of the Distribution?
Mean
23.8 years
Median
21.5 years
Mode
20 years
Since the mean > mode, the distribution is
skewed to the right.
Larson/Farber 4th ed.
38
Example: Comparing the Mean, Median, and
Mode
Mean ≈ 23.8 years
Median = 21.5 years
Mode = 20 years
• The mean takes every entry into account, but is influenced
by the outlier of 65.
• The median calculation is based on position, and it is not
affected by the outlier.
• In this case the mode exists, but it doesn't appear to represent
a typical entry.
Larson/Farber 4th ed.
39
Describing the Center: Mean or Median?
Because the median is based only the position of values in an
ordered data set, it is resistant to values that are extraordinarily
large or small.
Distribution Shape
Symmetric
Skewed
Outliers?
NO
YES
Measure
Mean
Median
Comparing the Mean, Median, and Mode
All three measures describe a typical entry of a data set.
Advantage of using the mean:
• The mean is a reliable measure because it takes
into account every entry of a data set.
Disadvantage of using the mean:
• Greatly affected by outliers (a data entry that is far
removed from the other entries in the data set).
Larson/Farber 4th ed.
41
Weighted Average
Larson/Farber 4th ed.
42
Weighted Average
Sometimes we wish to average numbers, but we want to
assign more importance, or weight, to some of the
numbers.
Category
Weight
Class Work
20%
Homework
20%
Test
30%
Quiz
30%
Larson/Farber 4th ed.
43
Weighted Average
The average you need is the weighted average.
Larson/Farber 4th ed.
44
Exercise 7: Find Weighted Average
You are taking a class in which your grade is
determined from five sources: 50% from your test
mean, 15% from your midterm, 20% from your final
exam, 10% from your computer lab work, and 5%
from your homework.
Your scores are 86 (test mean), 96 (midterm), 82 (final
exam), 98 (computer lab), and 100 (homework).
What is the weighted mean of your scores? If the
minimum average for an A is 90, did you get an A?
Larson/Farber 4th ed.
45
Ex 7 Solution: Finding a Weighted Mean
Source
x∙w
Score, x
Weight, w
Test Mean
86
0.50
86(0.50)= 43.0
Midterm
96
0.15
96(0.15) = 14.4
Final Exam
82
0.20
82(0.20) = 16.4
Computer Lab
98
0.10
98(0.10) = 9.8
Homework
100
0.05
100(0.05) = 5.0
Σw = 1
Σ(x∙w) = 88.6
( x  w) 88.6
x 

 88.6
w
1
Your weighted mean for the course is 88.6. You
did not get an A.
Larson/Farber 4th ed.
46
Example: Weighted Mean
The data below represents customer satisfaction ratings
from 4 different restaurants in a chain. Find the average
customer satisfaction rating.
x
Average
Location
Rating
1
2
3
4
TOTAL
Larson/Farber 4th ed.
w
xw
Number of Product
Customers
7.8
117
912.6
8.5
86
731
6.6
68
448.8
7.4
90
666
361
2758.4
47
Example: Weighted Mean
𝑤𝑒𝑖𝑔ℎ𝑡𝑒𝑑𝑎𝑣𝑔 =
Larson/Farber 4th ed.
𝑥𝑤
2758.4
=
= 7.64
𝑤
361
48
Mean of Grouped Data
Larson/Farber 4th ed.
49
Mean of Grouped Data
Mean of a Frequency Distribution
Approximated by
( x  f )
x
n
n  f
where x and f are the midpoints and frequencies of a
class, respectively
Larson/Farber 4th ed.
50
Finding the Mean of a Frequency Distribution
In Words
1. Find the midpoint of each
class.
In Symbols
x
(lower limit)+(upper limit)
2
2. Find the sum of the products
of the midpoints and the
frequencies.
( x  f )
3. Find the sum of the
frequencies.
n  f
4. Find the mean of the
frequency distribution.
Larson/Farber 4th ed.
( x  f )
x
n
51
Exercise 8: Find the Mean of a Frequency
Distribution
Use the frequency distribution to approximate the mean
number of minutes that a sample of Internet subscribers
spent online during their most recent session.
Larson/Farber 4th ed.
Class
Midpoint
Frequency, f
7 – 18
12.5
6
19 – 30
24.5
10
31 – 42
36.5
13
43 – 54
48.5
8
55 – 66
60.5
5
67 – 78
72.5
6
79 – 90
84.5
2
52
Solution: Find the Mean of a Frequency
Distribution
Class
Midpoint, x
Frequency, f
(x∙f)
7 – 18
12.5
6
12.5∙6 = 75.0
19 – 30
24.5
10
24.5∙10 = 245.0
31 – 42
36.5
13
36.5∙13 = 474.5
43 – 54
48.5
8
48.5∙8 = 388.0
55 – 66
60.5
5
60.5∙5 = 302.5
67 – 78
72.5
6
72.5∙6 = 435.0
79 – 90
84.5
2
84.5∙2 = 169.0
n = 50
Σ(x∙f) = 2089.0
( x  f ) 2089
x

 41.8 minutes
n
50
Larson/Farber 4th ed.
53
Summary: Section 3.1
•
Computed mean, median, and mode from raw data
•
Interpreted mean, median, and mode
•
Explained how mean, median, and mode can be
affected by extreme data values
Computed
Trimmed mean
Weighted average
Mean of frequency distribution
•
Larson/Farber 4th ed.
54
Section 3.2
Measures of Variation
Larson/Farber 4th ed.
55
Objectives
Determine the range of a data set
Determine the variance and standard deviation of a
population and of a sample
Use Chebychev’s Theorem to interpret standard
deviation
Approximate the sample standard deviation for grouped
data
Larson/Farber 4th ed.
56
Range
Range
The difference between the maximum and minimum
data entries in the set.
The data must be quantitative.
Range = (Max. data entry) – (Min. data entry)
Larson/Farber 4th ed.
57
Example: Finding the Range
A corporation hired 10 graduates. The starting salaries
for each graduate are shown. Find the range of the
starting salaries.
Starting salaries (1000s of dollars)
41 38 39 45 47 41 44 41 37 42
Larson/Farber 4th ed.
58
Solution: Finding the Range
Ordering the data helps to find the least and greatest
salaries.
37 38 39 41 41 41 42 44 45 47
minimum
maximum
Range = (Max. salary) – (Min. salary)
= 47 – 37 = 10
The range of starting salaries is 10 or $10,000.
Larson/Farber 4th ed.
59
Deviation, Variance, and Standard
Deviation
Deviation
The difference between the data entry, x, and the mean
of the data set.
Population data set:
• Deviation of x = x – μ
Sample data set:
• Deviation of x = x – x
Larson/Farber 4th ed.
60
Exercise 1: Finding the Deviations
A corporation hired 10 graduates. The starting salaries
for each graduate are shown. Find the deviations of the
starting salaries.
Starting salaries (1000s of dollars)
41 38 39 45 47 41 44 41 37 42
Solution:
• First determine the mean starting salary.
x 415


 41.5
N
10
Larson/Farber 4th ed.
61
Solution: Finding the Deviations
• Determine the
deviation for each
data entry.
Larson/Farber 4th ed.
Salary ($1000s), x Deviation: x – μ
41
41 – 41.5 = –0.5
38
38 – 41.5 = –3.5
39
39 – 41.5 = –2.5
45
45 – 41.5 = 3.5
47
47 – 41.5 = 5.5
41
41 – 41.5 = –0.5
44
44 – 41.5 = 2.5
41
41 – 41.5 = –0.5
37
37 – 41.5 = –4.5
42
42 – 41.5 = 0.5
Σx = 415
Σ(x – μ) = 0
62
Solution: Finding the Squares of
Deviations
Salary
($1000s)
Deviations
x
41
38
39
45
47
41
44
41
37
42
TOTAL
Larson/Farber 4th ed.
( x – 41.5)
-0.5
-3.5
-2.5
3.5
5.5
-0.5
2.5
-0.5
-4.5
0.5
0
Squares of
Deviations SSx
(x – 41.5) 2
0.25
12.25
6.25
12.25
30.25
0.25
6.25
0.25
20.25
0.25
88.5
63
Deviation, Variance, and Standard
Deviation
• The variance can be thought of a kind of average of
the squares of the deviations
• Standard deviation is a measure of the typical amount
an entry deviates from the mean
• The more the entries are spread out, the greater the
standard deviation
Larson/Farber 4th ed.
64
Population Standard Deviation: Example 1
Test Grades
Example 1
N=10
Square of Deviation
𝜇 = 80
Score
Deviation
( x – 80)
x
80
80
80
80
80
80
80
80
80
80
TOTAL
Larson/Farber 4th ed.
(x – 80) 2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Summary Parameters
Mean
Standard Deviation
Variance
Count
80
0
0
10
65
Population Standard Deviation: Example 2
Test Grades
Example 2
N=10
Square of Deviation
𝜇 = 80
Score
Deviation
( x – 80)
x
90
90
90
90
90
70
70
70
70
70
TOTAL
Larson/Farber 4th ed.
(x – 80) 2
10
10
10
10
10
-10
-10
-10
-10
-10
0
100
100
100
100
100
100
100
100
100
100
1000
Parameter Summary
Mean
Standard Deviation
Variance
Count
80
10
100
10
66
Deviation, Variance, and Standard
Deviation
Population Variance
2

(
x


)
2 
N
Sum of squares, SSx
Population Standard Deviation
2

(
x


)
  2 
N
Larson/Farber 4th ed.
67
Finding the Population Variance &
Standard Deviation
In Words
1. Find the mean of the
population data set.
In Symbols
x

N
2. Find deviation of each
entry.
x–μ
3. Square each deviation.
(x – μ)2
4. Add to get the sum of
squares.
SSx = Σ(x – μ)2
Larson/Farber 4th ed.
68
Finding the Population Variance &
Standard Deviation
In Words
In Symbols
5. Divide by N to get the
population variance.
2

(
x


)
2 
N
6. Find the square root to get
the population standard
deviation.
( x   ) 2

N
Larson/Farber 4th ed.
69
Exercise 2: Finding the Population
Standard Deviation
A corporation hired 10 graduates. The starting salaries
for each graduate are shown. Find the population
variance and standard deviation of the starting salaries.
Starting salaries (1000s of dollars)
41 38 39 45 47 41 44 41 37 42
Recall μ = 41.5
Larson/Farber 4th ed.
70
Solution: Finding the Population
Standard Deviation
• Determine SSx
• N = 10
Larson/Farber 4th ed.
Deviation: x – μ
Squares: (x – μ)2
41
41 – 41.5 = –0.5
(–0.5)2 = 0.25
38
38 – 41.5 = –3.5
(–3.5)2 = 12.25
39
39 – 41.5 = –2.5
(–2.5)2 = 6.25
45
45 – 41.5 = 3.5
(3.5)2 = 12.25
47
47 – 41.5 = 5.5
(5.5)2 = 30.25
41
41 – 41.5 = –0.5
(–0.5)2 = 0.25
44
44 – 41.5 = 2.5
(2.5)2 = 6.25
41
41 – 41.5 = –0.5
(–0.5)2 = 0.25
37
37 – 41.5 = –4.5
(–4.5)2 = 20.25
42
42 – 41.5 = 0.5
(0.5)2 = 0.25
Σ(x – μ) = 0
SSx = 88.5
Salary, x
71
Solution: Finding the Population
Standard Deviation
2

(
x


)
88.5
2
 

 8.9
N
10
    8.85  3.0
2
The population standard deviation is about 3.0, or $3000.
Larson/Farber 4th ed.
72
Geometric View: Sample Variance and
Standard Deviation
s
s
s2
s
s
• Sample Variance is represented by the area of
the square
• Sample Standard Deviation is represented by
the side length of the square
Larson/Farber 4th ed.
73
Deviation, Variance, and Standard
Deviation
Sample Variance
( x  x )
s 
n 1
2
2
Sample Standard Deviation
2

(
x

x
)
s  s2 
n 1
Larson/Farber 4th ed.
74
Finding the Sample Variance & Standard
Deviation
In Words
In Symbols
x
n
1. Find the mean of the
sample data set.
x
2. Find deviation of each
entry.
xx
3. Square each deviation.
( x  x )2
4. Add to get the sum of
squares.
SS x  ( x  x ) 2
Larson/Farber 4th ed.
75
Finding the Sample Variance & Standard
Deviation
In Words
5. Divide by n – 1 to get the
sample variance.
6. Find the square root to get
the sample standard
deviation.
Larson/Farber 4th ed.
In Symbols
2

(
x

x
)
s2 
n 1
( x  x ) 2
s
n 1
76
Exercise 3 : Finding the Sample Standard
Deviation
The starting salaries are for the Chicago branches of a
corporation. The corporation has several other branches,
and you plan to use the starting salaries of the Chicago
branches to estimate the starting salaries for the larger
population. Find the sample standard deviation of the
starting salaries.
Starting salaries (1000s of dollars)
41 38 39 45 47 41 44 41 37 42
Larson/Farber 4th ed.
77
Solution: Finding the Sample Standard
Deviation
• Determine SSx
• n = 10
Deviation: x – μ
Squares: (x – μ)2
41
41 – 41.5 = –0.5
(–0.5)2 = 0.25
38
38 – 41.5 = –3.5
(–3.5)2 = 12.25
39
39 – 41.5 = –2.5
(–2.5)2 = 6.25
45
45 – 41.5 = 3.5
(3.5)2 = 12.25
47
47 – 41.5 = 5.5
(5.5)2 = 30.25
41
41 – 41.5 = –0.5
(–0.5)2 = 0.25
44
44 – 41.5 = 2.5
(2.5)2 = 6.25
41
41 – 41.5 = –0.5
(–0.5)2 = 0.25
37
37 – 41.5 = –4.5
(–4.5)2 = 20.25
42
42 – 41.5 = 0.5
(0.5)2 = 0.25
Salary, x
Σ(x – μ) = 0
Larson/Farber 4th ed.
SSx = 88.5
78
Solution: Finding the Sample Standard
Deviation
Sample Variance
2

(
x

x
)
88.5
2
s 

 9.8
n 1
10  1
The sample variance is about 9.8 𝐾$
Sample Standard Deviation
2
88.5
s s 
 3.1
9
2
The sample standard deviation is about 3.1 K$, or $3100.
Larson/Farber 4th ed.
79
Sample Variance: Computational Formula
Larson/Farber 4th ed.
80
Exercise 3: Variance Computational
Formula
L1
1
2
3
4
5
6
7
8
9
10
SUM
L2
x
41
38
39
45
47
41
44
41
37
42
415
Larson/Farber 4th ed.
x2
1681
1444
1521
2025
2209
1681
1936
1681
1369
1764
17311
81
Example: Using Technology to Find the
Standard Deviation
Sample office rental rates (in
dollars per square foot per year)
for Miami’s central business
district are shown in the table.
Use a calculator or a computer
to find the mean rental rate and
the sample standard deviation.
(Adapted from: Cushman &
Wakefield Inc.)
Larson/Farber 4th ed.
Office Rental Rates
35.00
33.50
37.00
23.75
26.50
31.25
36.50
40.00
32.00
39.25
37.50
34.75
37.75
37.25
36.75
27.00
35.75
26.00
37.00
29.00
40.50
24.50
33.00
38.00
82
Solution: Using Technology to Find the
Standard Deviation
Sample Mean
Sample Standard
Deviation
Larson/Farber 4th ed.
83
Interpreting Standard Deviation
• Standard deviation is a measure of the typical
amount an entry deviates from the mean
• The more the entries are spread out, the greater the
standard deviation
Larson/Farber 4th ed.
84
Coefficient of Variation
Notice that the numerator and denominator in the
definition of CV have the same units, so CV itself has no
units of measurement.
Larson/Farber 4th ed.
85
Interpreting Coefficient of Variation
• The coefficient of variation measures the relative
variation for distributions with different means
• In finance, CV measures the relative risk of a stock
portfolio (i.e. amount of variation relative to the
price of the stock)
Larson/Farber 4th ed.
86
Example: Coefficient of Variation
Larson/Farber 4th ed.
87
Coefficient of Variation
This gives us the advantage of being able to directly
compare the variability of two different populations
using the coefficient of variation.
In the next example, we will compute the CV of a
population and of a sample and then compare the
results.
Larson/Farber 4th ed.
88
Exercise 4 – Coefficient of Variation
The Trading Post on Grand Mesa is a small, family-run
store in a remote part of Colorado. It has just eight
different types of spinners for sale. The prices (in
dollars) are
2.10 1.95 2.60 2.00 1.85 2.25 2.15 2.25
Since the Trading Post has only eight different kinds of
spinners for sale, we consider the eight data values to
be the population (𝑁 = 8)
Larson/Farber 4th ed.
89
Exercise 4a – Coefficient of Variation
(a) Use a calculator with appropriate statistics keys to
verify
that for the Trading Post data, and   $2.14 and
  $0.22.
Solution:
Since the computation formulas for x and  are
identical, most calculators provide the value of x only.
Use the output of this key for . The computation
formulas for the sample standard deviation  and the
population standard deviation s are slightly different.
Larson/Farber 4th ed.
90
Exercise 4b – Coefficient of Variation
(b) Compute the CV of prices for the Trading Post and
comment on the meaning of the result.
Solution:
Since the Trading Post is very small, it carries a small
selection of spinners that are all priced similarly.
The CV tells us that the standard deviation of the
spinner prices is only 10.28% of the mean.
Larson/Farber 4th ed.
91
Pafnuty Chebyshev
• Born in Borovsk Russia in
1821
• Proved the Law of Large
Numbers
• Discovered Chebyshev’s
Theorem
Larson/Farber 4th ed.
92
Chebyshev’s Theorem
Larson/Farber 4th ed.
93
Chebyshev’s Theorem
The portion of any data set lying within k standard
deviations (k > 1) of the mean is at least:
1
1 2
k
k
Larson/Farber 4th ed.
2
3
4
5
10
75%
88.9%
93.8%
96%
99%
94
1
1 2
k
k
Chebyshev’s Theorem
2
3
4
5
10
75%
88.9%
93.8%
96%
99%
At least 75% of observations in any data set must lie
within 2 standard deviations of the mean
At least 88.9% of observations in any data set must
lie within 3 standard deviations of the mean
At least 93.8% of observations in any data set must
lie within 4 standard deviations of the mean
Larson/Farber 4th ed.
95
Example: Apartment Rental Costs
Suppose that a random sample of rural apartment rentals
shows 𝑥 = 490.80 and 𝑠 = 54.74. Compute a
Chebyshev interval for 𝑘 = 1.5 and interpret the
results.
1
1.52
Solution: At least 1 −
= 0.56 = 56% of the rental
prices lie between 𝑥 − 𝑘𝑠 and 𝑥 + 𝑘𝑠
• 490.80 −1.5 54.74 = 409.00 dollars
• 490.80 +1.5 54.74 = 573.00 dollars
Larson/Farber 4th ed.
96
Example: Using Chebyshev’s Theorem
The age distribution for Florida is shown in the
histogram. Apply Chebyshev’s Theorem to the data
using k = 2. What can you conclude regarding the age of
Floridians?
Larson/Farber 4th ed.
97
Solution: Using Chebyshev’s Theorem
k = 2: μ – 2σ = 39.2 – 2(24.8) = -10.4 (use 0 since age
can’t be negative)
μ + 2σ = 39.2 + 2(24.8) = 88.8
At least 75% of the population of Florida is between 0
and 88.8 years old.
Larson/Farber 4th ed.
98
Exercise 5: Using Chebyshev’s Theorem
A newspaper periodically runs an ad in its own advertising
section offering a free month’s subscription. Over a
period of two years the mean number of responses was
525 with a sample standard deviation of s = 30.
a) What is the smallest percentage of data we expect to
fall within 2 standard deviations of the mean (i.e.
between 465 and 585). 75%
b) Determine the interval from A to B about the mean in
which 88.9% of the data fall. 435 to 615
Larson/Farber 4th ed.
99
Exercise 5: Using Chebyshev’s Theorem
A newspaper periodically runs an ad in its own advertising
section offering a free month’s subscription. Over a
period of two years the mean number of responses was
525 with a sample standard deviation of s = 30.
c) What is the smallest percent of respondents to the ad
that falls within 2.5 standard deviations of the mean?
84%
d) Determine the interval from A to B in part c. Explain its
meaning in this application.
450 to 600
Larson/Farber 4th ed.
100
Standard Deviation for Grouped Data:
Defining Formula
Sample standard deviation for a frequency distribution
( x  x ) 2 f
s
n 1
where n= Σf (the number of
entries in the data set)
When a frequency distribution has classes, estimate the
sample mean and standard deviation by using the
midpoint of each class.
Larson/Farber 4th ed.
101
Exercise 6: Finding the Standard
Deviation for Grouped Data
You collect a random sample of the
number of children per household in
a region. Find the sample mean and
the sample standard deviation of the
data set.
Larson/Farber 4th ed.
Number of Children in
50 Households
1
3
1
1
1
1
2
2
1
0
1
1
0
0
0
1
5
0
3
6
3
0
3
1
1
1
1
6
0
1
3
6
6
1
2
2
3
0
1
1
4
1
1
2
2
0
3
0
2
4
102
Solution: Finding the Standard Deviation
for Grouped Data
First construct a frequency distribution.
Find the mean of the frequency
distribution.
xf 91
x

 1.8
n
50
The sample mean is about 1.8
children.
Larson/Farber 4th ed.
x
f
xf
0
10
0(10) = 0
1
19
1(19) = 19
2
7
2(7) = 14
3
7
3(7) =21
4
2
4(2) = 8
5
1
5(1) = 5
6
4
6(4) = 24
Σf = 50 Σ(xf )= 91
103
Solution: Finding the Standard Deviation
for Grouped Data
Determine the sum of squares.
x
f
xx
( x  x )2
0
10
0 – 1.8 = –1.8
(–1.8)2 = 3.24
3.24(10) = 32.40
1
19
1 – 1.8 = –0.8
(–0.8)2 = 0.64
0.64(19) = 12.16
2
7
2 – 1.8 = 0.2
(0.2)2 = 0.04
0.04(7) = 0.28
3
7
3 – 1.8 = 1.2
(1.2)2 = 1.44
1.44(7) = 10.08
4
2
4 – 1.8 = 2.2
(2.2)2 = 4.84
4.84(2) = 9.68
5
1
5 – 1.8 = 3.2
(3.2)2 = 10.24
10.24(1) = 10.24
6
4
6 – 1.8 = 4.2
(4.2)2 = 17.64
17.64(4) = 70.56
( x  x )2 f
( x  x )2 f  145.40
Larson/Farber 4th ed.
104
Solution: Finding the Standard Deviation
for Grouped Data
Find the sample standard deviation.
x 2 x
( x  x )2
( x  x ) f
145.40
s

 1.7
n 1
50  1
( x  x )2 f
The standard deviation is about 1.7 children.
Larson/Farber 4th ed.
105
Standard Deviation for Grouped Data:
Computational Formula
Larson/Farber 4th ed.
106
Example: Sample Variance of a Frequency
Distribution – Computational Formula
Lower
Limit
7
19
31
43
55
67
79
Larson/Farber 4th ed.
Upper Midpoint
Limit
x
18
12.5
30
24.5
42
36.5
54
48.5
66
60.5
78
72.5
90
84.5
TOTAL
f
x2 * f
6
937.5
10 6002.5
1317319.25
8 18818
518301.25
6 31537.5
2 14280.5
50107196.5
xf
75
245
474.5
388
302.5
435
169
2089
107
Summary
• Determined the range of a data set
• Determined the variance and standard deviation of a
population and of a sample
• Used Chebyshev’s Theorem to interpret standard
deviation
• Approximated the sample standard deviation for
grouped data
Larson/Farber 4th ed.
108
Section 3.3
Measures of Position
Box-and-Whisker Plots
Larson/Farber 4th ed.
109
Objectives
•
•
•
•
Determine the quartiles of a data set
Interpret other fractiles such as percentiles
Determine the interquartile range of a data set
Create a box-and-whisker plot
Larson/Farber 4th ed.
110
Percentiles
A percentile measure the position of a single data item
based on the percentage of data items below that
single data item.
Standardized tests taken by larger numbers of students,
convert raw scores to a percentile score.
If approximately n percent of the items in a distribution
are less than the number x, then x is the nth
percentile of the distribution, denoted Pn.
Larson/Farber 4th ed.
111
Quartiles
Fractiles are numbers that partition (divide) an ordered
data set into equal parts.
Quartiles approximately divide an ordered data set into
four equal parts.
• First quartile, Q1: About one quarter of the data
fall on or below Q1.
• Second quartile, Q2: About one half of the data
fall on or below Q2 (median).
• Third quartile, Q3: About three quarters of the
data fall on or below Q3.
Larson/Farber 4th ed.
112
Percentiles and Other Fractiles
Fractiles
Summary
Symbols
Quartiles
Divides data into 4 equal
parts
Divides data into 10 equal
parts
Q1, Q2, Q3
Divides data into 100 equal
parts
P1, P2, P3,…, P99
Deciles
Percentiles
Larson/Farber 4th ed.
D1, D2, D3,…, D9
113
Quartiles and Deciles
Larson/Farber 4th ed.
114
Example: Interpreting Percentiles
The ogive represents the
cumulative frequency
distribution for SAT test
scores of college-bound
students in a recent year. What
test score represents the 72nd
percentile? How should you
interpret this? (Source: College
Board Online)
Larson/Farber 4th ed.
115
Solution: Interpreting Percentiles
The 72nd percentile
corresponds to a test score
of 1700.
This means that 72% of the
students had an SAT score
of 1700 or less.
Larson/Farber 4th ed.
116
Exercise 1: Interpreting Percentiles
Suppose you challenge freshman composition by
taking an exam.
a. If your score was in the 89th percentile, what
percentage of scores was at or below your score?
Answer: 89 %
b. If the scores ranged from 0 to 100 and your raw
score was 95, does that mean that your score is at
the 95th percentile?
Answer: No! Percentile score is based
on position.
Larson/Farber 4th ed.
117
Exercise 2: Finding Quartiles
Case: n is odd
The test scores of 15 employees enrolled in a CPR
training course are listed. Find the first, second, and
third quartiles of the test scores.
13 9 18 15 14 21 7 10 11 20 5 18 37 16 17
Solution:
• Q2 divides the data set into two halves.
Lower half
Upper half
5 7 9 10 11 13 14 15 16 17 18 18 20 21 37
Q2
Larson/Farber 4th ed.
118
Solution: Finding Quartiles
Case: n is Odd
The first and third quartiles are the medians of the lower
and upper halves of the data set.
Lower half
Upper half
5 7 9 10 11 13 14 15 16 17 18 18 20 21 37
Q1
Q2
Q3
About one fourth of the employees scored 10 or less,
about one half scored 15 or less; and about three
fourths scored 18 or less.
Larson/Farber 4th ed.
119
Finding Quartiles - Case: n is Even
Lower Half
Upper Half
.
13.7
17.9
18.3
19.2
20.5
22.0
23.6
23.8
24.1
24.6
26.1
26.8
27.0
28.5
29.5
33.5
n = 16
19.85
Q1
23.95
Q2
26.9
Q3
120
Interquartile Range
Interquartile Range (IQR)
The difference between the third and first quartiles.
IQR = Q3 – Q1
Larson/Farber 4th ed.
121
Exercise 3: Find the Interquartile Range
Find the interquartile range of the test scores.
Recall Q1 = 10, Q2 = 15, and Q3 = 18
Solution:
• IQR = Q3 – Q1 = 18 – 10 = 8
The test scores in the middle portion of the data set
vary by at most 8 points.
Larson/Farber 4th ed.
122
Box-and-Whisker Plot
Box-and-whisker plot
Exploratory data analysis tool.
Highlights important features of a data set.
Requires (five-number summary):
• Minimum entry
• First quartile Q1
• Median Q2
• Third quartile Q3
• Maximum entry
Larson/Farber 4th ed.
123
Box-and-Whisker Plot
Larson/Farber 4th ed.
124
Five-Number Summary
• A box-and-whisker plot is a graphical representation
of the five-number summary
• Outlier: Any point that is more than 1.5 times the
length of the box (IQR) from either end of the box is
considered to be an outlier.
• When changing the units of a distribution, the center
and spread will be affected, but the shape will stay the
same.
Larson/Farber 4th ed.
125
Compare the Boxplots!
Larson/Farber 4th ed.
126
Drawing a Box-and-Whisker Plot
1. Find the five-number summary of the data set.
2. Construct a horizontal scale that spans the range of
the data.
3. Plot the five numbers above the horizontal scale.
4. Draw a box above the horizontal scale from Q1 to Q3
and draw a vertical line in the box at Q2.
5. Draw whiskers from the box to the minimum and
maximum entries.
Box
Whisker
Minimum
entry
Larson/Farber 4th ed.
Whisker
Q1
Median, Q2
Q3
Maximum
entry
127
Exercise 4: Draw Box-and-Whisker Plot
Draw a box-and-whisker plot that represents the 15 test
scores.
Recall Min = 5 Q1 = 10 Q2 = 15 Q3 = 18 Max = 37
Solution:
5
Larson/Farber 4th ed.
10
15
18
37
128
Exercise 5: Interpret Box-and-Whisker Plot
Recall Min = 5 Q1 = 10 Q2 = 15 Q3 = 18 Max = 37
Solution:
5
10
15
18
37
About half the scores are between 10 and 18. By looking
at the length of the right whisker, you can conclude 37 is
a possible outlier.
Larson/Farber 4th ed.
129
Exercise 6: Create Box-and-Whisker Plot
• Each student may come to the front of the class
• Guess / record how many states ___________ has
visited
• Sort the data
• Create the five number summary
• Create boxplot
• Check for outliers
Larson/Farber 4th ed.
130
Summary
• Determined the quartiles of a data set
• Interpreted other fractiles such as deciles and
percentiles
• Determined the interquartile range of a data set
• Created a box-and-whisker plot
Larson/Farber 4th ed.
131

chap_03_PPTx - OCVTS MATES-STAT

Transcript chap_03_PPTx - OCVTS MATES-STAT

Directory