14.3 Numerical Summaries of Data
Download
Report
Transcript 14.3 Numerical Summaries of Data
§ 14.3 Numerical Summaries of
Data
Numerical Summaries of a
Data Set
In the last section we looked at ways
to graphically represent a data set-today we will look at numerical ways to
summarize similar information.
The are two major types of numerical
summary:
1. Measures of location.
2.
Measures of spread.
Numerical Summaries of a
Data Set
In the last section we looked at ways
to graphically represent a data set-today we will look at numerical ways to
summarize similar information.
The are two major types of numerical
summary:
1. Measures of location.
2.
Measures of spread.
average/mean
range
The Average / Mean
The average or mean of a data set of size N is found by
adding the numbers and dividing by N.
Or more formally, if the data set is { x1 , x2 , x3 , . . . , xN }
then the mean is given by:
x1 + x2 + x3 + . . . + xN
N
The Average / Mean
What about when we are given a frequency table?
Let’s look at the test scores from yesterday:
Score
4 24
Frequency 1 1
28
32
36
40
44
48
56
60
64
72
76
96
2
6
10
16
13
9
1
2
1
8
4
1
The
Average
/
Mean
From
a
Entering Data and Finding the Mean on the TI-83:
1. Hit [Stat] Frequency Table
Select
“1: Edit…”the total of
2.Step
1: Calculate
3. data.
Enter data into L1. If you are working from a
the
frequency table enter the corresponding
frequencies
Total = x1 into
* f1 +L2x. 2 * f2+
x1 x2
Data
4.
Go
to
the
“List”
menu
([2nd],
[Stat])
x3* f3 + . . . + xk * f1k
Freque f1 .f2
5. Select “3: mean( “
6. You should now be on the ‘main’
screen.
ncy
Step 2: Calculate N.
Proceed as follows:
(a) If you are working from just a list of data,
N = f + f 2 + f3 + . . . +
type “L11” ([2nd],
[1]) , close the parentheses
fk
and hit [Enter].
(b) If you are working from a freq. table type
Step
3: Calculate
“L1” followed
by “,”the
and “L2” ([2nd], [2]). . Then
average.
close the parentheses and hit enter.
. . xk
. . . fk
Example: Average Salary
The average salary at a local computer
manufacturer with 50 employees is
$42,000.
The owner draws a yearly salary of
$800,000.
What is the average salary of the other
49 employees?
Example: 105 Exam Scores
Suppose you have averaged a 132 out
of 150 on the first 3 exams in Math 105.
What score would you need on the
fourth exam to have an average of 135?
Percentiles
The p th percentile of a data set is the
value such that p percent of the
numbers fall at or below the value.
The rest of the data falls at or above
the value.
We will call the p th percent of N the
locator, and write it as L .
Example: Height
Sorting Data on the TI-83:
1. Enter data into L1 as before.
2. Hit [Stat]
3. Select “2: SortA( “
4. You should now be on the ‘main’ screen. Hit
L1. ([2nd], 1)
5. Close the parentheses and hit enter.
Finding the p th Percentile
Step 1: Sort the original data set by size.
(Suppose {d1 , d2 , d3 , . . . , dN } is the sorted set)
Step 2: Compute the value of the locator.
L = ( p /100 )( N )
Step 3: The p th percentile is:
(a) The average of dL and dL+1 if L is a whole number.
(b) dL+ if L is not a whole number. L+ is L rounded up.
Percentiles: The Median and
Quartiles
The 50th percentile, called the median, is
the percentile that is most commonly
used. The median will be written M.
The other two commonly used
percentiles are the quartiles:
The first quartile, written as Q1, is the
25th percentile.
The third quartile, denoted Q3, is the
75th percentile.
Example: Let’s examine the test scores again. . .
Score
4 24
Frequency 1 1
28
32
36
40
44
48
56
60
64
72
76
96
2
6
10
16
13
9
1
2
1
8
4
1
Find the quartiles and the median.
The Five-Number Summary
One way to give a nice profile of a
data set is the “five-number summary,”
which consists of:
1. The lowest value, called the Min.
2. The first quartile, Q1.
3. The median, M.
4. The third quartile, Q3.
5. The highest value, called the Max.
Example: The Five-Number Summary for our test score
example would look like this:
Score
4 24
Frequency 1 1
28
32
36
40
44
48
56
60
64
72
76
96
2
6
10
16
13
9
1
2
1
8
4
1
The Five-Number Summary:
Box Plots
We can also represent the FiveNumber Summary graphically in what is
called a box plot or a box-and-whiskers
plot.
Min
Q1
M
Q3
Max
Example: Here is the box plot for our test score example:
Score
4 24
Frequency 1 1
Min = 4
28
32
36
40
44
48
56
60
64
72
76
96
2
6
10
16
13
9
1
2
1
8
4
1
Q1 =
36
M = 44 Q3 = 48
Max = 96
§ 14.4 Measures of Spread
Example - Find the average
and median of the following
data sets:
• Set 1 = {45, 46, 47, 48, 49, 51, 52, 53,
54, 55}
• Set 2 = {1, 12, 20, 31, 41, 59, 70, 78,
89, 99}
The Range
One way to measure the spread of data is to
examine the range, given by
R = Max - Min
The problem with using the range is that
outliers can severely affect it.
Example: Looking again at our ‘test score’ example. . .
Score
4 24
Frequency 1 1
28
32
36
40
44
48
56
60
64
72
76
96
2
6
10
16
13
9
1
2
1
8
4
1
We see that the range with the outliers (4 and 96) would be
R = 96 - 4 = 92.
However, without those pieces of data we would have
R = 76 - 24 = 52.
The Interquartile Range
In order to eliminate the problems
caused by outliers, we could make use
of the interquartile range--the difference
between the third and first quartile:
IQR = Q3 - Q1
This measure tells us where the middle
50% of the data is located.
Example: Your instructor didn’t feel like making a different
example. . .
Score
4 24
Frequency 1 1
28
32
36
40
44
48
56
60
64
72
76
96
2
6
10
16
13
9
1
2
1
8
4
1
The IQR for this set of data is:
IQR = Q3 - Q1 = 48 - 36 = 12
The Standard Deviation
The idea: Measure how spread out
your data set is by examining how far
each piece of information is from some
fixed reference point.
The reference point we will use is the
mean (average).
The Standard Deviation
We could try to average the Deviations
from the Mean:
(Data value - Mean)
Example: Once again, the test score data. . .
Score (
x)
(x 46.61)
Freque
ncy
4
24
28
32
36
40
44
48
56
60
64
72
76
96
42.6
1
1
22.6
1
1
18.6
2
1
14.6
6
1
10.6
10
1
6.6
16
1
2.6
13
1
1.3
9
9.3
9
13.3
9
17.3
9
25.3
9
29.3
9
49.3
9
9
1
2
1
8
4
1
The Standard Deviation
We could try to average the Deviations
from the Mean:
(Data value - Mean)
However, negative deviations and
positive deviations will cancel each
other out--in fact (assuming we don’t
round off any of our figures) the average
of the deviations from the mean will
always be 0!
The Standard Deviation
What would happen if we squared the
deviations from the mean?
The squared deviations are always
non-negative, so there would be no
canceling.
The average of these squared
deviations is called the variance, V.
The Standard Deviation
Unfortunately, there is a problem with
using the variance as well--the units of
measure.
For instance if we were studying
people’s height in inches (in), the
variance would appear in units of in2.
The Standard Deviation
Unfortunately, there is a problem with
using the variance as well--the units of
measure.
For instance if we were studying
people’s height in inches (in), the
variance would come be in units of in2.
The solution to our dilemma is simple-we will just take the square root of the
variance to get the what is called the
standard deviation, .
Finding The Standard
Deviation
Step 1: Find the average/mean of the data set. Call
it A.
Step 2: For each number x in the data set find the
deviation from the mean, x - A.
Step 3: Square each of the deviations found in Step
2.
Step 4: Find the average of the squared deviations
found in step 3. This is the variance, V.
Step 5: Take the square root of the variance. This is
the standard deviation, .
Finding The Standard
Deviation
Another way to find the Standard
Deviation by hand is to use the following
formula:
=
√
N
∑ ( x i - A )2
i=1
N
Finding The Standard
Deviation
Finding all of the
information from 14.2-14.3 on
the TI-83:
1. Enter data as shown previously. Quit to the
main screen.
2. Hit [Stat]
3. Move right to the “CALC“ menu.
4. Select “1-Var Stats”.
5. Now on the main screen, type “L1”. (If you are
using data from a frequency table also type “,”
and “L2”) Hit [Enter].
6. Interpret the information as follows:
x is the mean/average, A;
x is the Standard Deviation;
n is the size of your data set;
If you arrow down the Min, Max, Median and
Example:
Find the standard deviation for the following data set.
{1, 6, 14, 19}