Ch 3 - csusm
Download
Report
Transcript Ch 3 - csusm
Other Numerical Measures
Median
Mode
Range
Percentiles
Quartiles, Interquartile range
BUS304 – Data Charaterization
1
Median
The middle value
-- The value which divides the data in half, with equal
sizes above and below
Steps:
1. Put your data in ordered array (sort)
2. If n (or N) is odd, the median is the middle number
(i.e. the n1 th number)
2
3. If n (or N) is even, the median is the average of two middle numbers
(i.e. the average of the n and the n +1 th numbers)
2
2
BUS304 – Data Charaterization
2
Sensitivity to outliers
0 1 2 3 4 5 6 7 8 9 10
Median = 3
0 1 2 3 4 5 6 7 8 9 10
Median = 2.5
Median does not affected
by extreme values
0 1 2 3 4 5 6 7 8 9 10
Median = 3
BUS304 – Data Charaterization
3
Mode
The value that occurs most often
Steps:
1. Put your data in ordered array (sort)
Mode does not affected
by extreme value either.
2. Find the data value(s) that repeats the most frequently
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
No Mode!
0 1 2 3 4 5 6
Mode=5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode=5 and 9
Boston
Austin
San Diego
Los Angels
Mode=San Diego
BUS304 – Data Charaterization
4
Find Mode and Median from
Frequency Table
Below is a frequency table showing
Find the mean, median and mode.
the number of days the teams finish
Create a histogram, locate the mode,
their projects
median and mode.
Describe the shape of the histogram,
Relative
Days to
Complete
Frequency
5
4
?
6
12
?
7
8
?
8
6
?
9
4
?
10
2
?
Frequency
and find the relationship between
mean, median and mode.
BUS304 – Data Charaterization
5
Shape of a distribution
Symmetric
Mean = Median = Mode
Right-Skewed
Left-Skewed
Mean < Median < Mode
(Longer tail extends to left)
Mode < Median < Mean
(Longer tail extends to right)
Note that Mean is affected by the extreme
value the most. So mean is always leaning
towards the tail compared to the other two
measures.
BUS304 – Data Charaterization
6
Measures of center location
Mean
Median
Mode
Mean is generally used, unless extreme
values (outliers) exist;
the next common is median, since the
median is not sensitive to extreme values;
mode is sometime used when there is a
really large frequency.
Think of the example of house price
BUS304 – Data Charaterization
7
Range
Simplest measure of variation
Describe how wide the data spread
Formula
Range = Maximum Value – Minimum Value
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
BUS304 – Data Charaterization
8
Disadvantage of Range
Ignores the way in which
data are distributed
Sensitive to outliers
1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
7
8
9
10
11
12
Range = 5 - 1 = 4
Range = 12 - 7 = 5
1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
7
8
9
10
11
12
Range = 120 - 1 = 119
Range = 12 - 7 = 5
Range is affected the
most by outliers.
BUS304 – Data Charaterization
9
Break
BUS304 – Data Charaterization
10
Other measures
1.
Percentiles:
Measures the percentage of data below the value.
e.g. if the 60th percentile is 1240 (SAT score), that means there
are 60% students getting a score less than 1240.
Correspondingly, there are 40% of students getting 1240 or
higher.
How to find percentile? The pth percentile in an ordered array of n
values is the value in the ith position, where
p
i
(n 1)
100
BUS304 – Data Charaterization
11
Example
Find the 80th percentile
from the annual income
data
Step:
1. Sort the data
2. Find the location for the
80th percentile:
i
p
80
(n 1)
(100 1) 80.8 81
100
100
3. Find the 81st person’s
income
Think, what does this income
mean?
Exercise: find the value where
30% people have the income
or higher.
Exercise2: find the value
where 30% people have the
income less than it.
Exercise 3: find the value
where 50% people have the
income less than it. What is
the measure also called?
BUS304 – Data Charaterization
12
Quartiles
The 25th, 50th, and 75th percentiles
Called the first, second, and third quartiles, respectively.
Written as Q1, Q2, Q3, respectively.
The quartiles split the ranked data into 4 equal groups.
25% 25% 25% 25%
Q1
Q2
Q3
BUS304 – Data Charaterization
13
Example:
Example: Find the first quartile in the data sample:
22 12 14 16 17 16 13 20 18
Median = the 50th percentile = the second quartile
BUS304 – Data Charaterization
14
Interquartile Range
Recall:
Range? Disadvantage of range?
Interquartile Range:
Interquartile Range = Q3 – Q1
Example:
12 13 14 16 16 17 18 20 22
Q1=13.5
Q3=19
Interquartile range = Q3 – Q1 = 19 – 13.5 = 5.5
BUS304 – Data Charaterization
15
Summary
Understand and compute the following two sets of data
measures:
Measures of central tendency
• Mean, Median, and Mode
Measures of variation
• Range, Variance, and Standard deviation
Other ways to describe data:
Percentiles, Quartiles, Interquartile range
BUS304 – Data Charaterization
16