Analyze Data
Download
Report
Transcript Analyze Data
Analyze Data
USE MEAN & MEDIAN TO COMPARE THE CENTER OF DATA SETS.
IDENTIFY OUTLIERS AND THEIR EFFECT ON DATA SETS.
Focus 6 Learning Goal – (HS.S-ID.A.1, HS.S-ID.A.2, HS.S-ID.A.3, HS.S-ID.B.5) =
Students will summarize, represent and interpret data on a single
count or measurement variable.
4
In addition to level
3.0 and above and
beyond what was
taught in class, the
student may:
· Make connection
with other concepts
in math
· Make connection
with other content
areas.
3
The student will summarize,
represent, and interpret data
on a single count or
measurement variable.
- Comparing data includes
analyzing center of data
(mean/median), interquartile
range, shape distribution of a
graph, standard deviation
and the effect of outliers on
the data set.
- Read, interpret and write
summaries of two-way
frequency tables which
includes calculating joint,
marginal and relative
frequencies.
2
1
The student will be
able to:
- Make dot plots,
histograms, box
plots and two-way
frequency tables.
- Calculate
standard deviation.
- Identify normal
distribution of data
(bell curve) and
convey what it
means.
With help from
the
teacher, the
student has
partial success
with summarizing
and interpreting
data displayed in
a dot plot,
histogram, box
plot or frequency
table.
0
Even with
help, the
student has
no success
understandin
g statistical
data.
Reminder:
To find the median, you put all the numbers in
order from least to greatest. The middle number
is the median.
To find the mean, you add up all of the numbers
then divide by how many numbers are in the
data set.
Measure of Central Tendency
Mean & median are both measures of central tendency. This
means they identify the “middle” of the data.
This measure attempts to describe the whole set of data with a
single value that represents the middle or center of its distribution.
Median:
Advantage: Is less affected by outliers and skewed data. It is the preferred
measure of center when the distribution is not symmetrical.
Mean:
Advantage: Can be used for both continuous and discrete numeric data.
Limitations: Is influenced by outliers and skewed distribution.
The shape of data distributions.
Normal distribution is mound shaped, symmetric.
If the mean and median are equal, then the data is symmetric.
If the mean is greater than the median, the data is skewed right.
If the mean is less than the median, the data is skewed left.
Test your memory…
The mean of a data set is 12 and the median is 12. What are the
possible shapes for this data set?
A. Mound
B. Symmetric
C. Skewed Right
D. Skewed Left
E. Both A & B
The mean of a data set is 12 and the median is 10. What is the data
shape?
A. Octagonal
B. Symmetric
C. Skewed Right
D. Skewed Left
Outliers
The shape of the data helps us find and identify
outliers.
An outlier is something that sticks out from the rest of
the data.
It is a data point that has an “extreme value” when
compared with the rest of the data set.
Mathematically speaking, an outlier is defined as any
point that falls 1.5 times the IQR below the lower
quartile or 1.5 times the IQR above the upper quartile.
Data:
37, 37, 38, 38, 40, 40, 42, 42, 42, 62
The median is: 40
Q1: 38
Q3: 42
IQR = Q3 – Q1= 42 – 38 = 4
The box plot looks like this:
The lower limit on outliers is
Q1 – (1.5)(IQR).
38 – (1.5)(4) = 32
This means an outlier would be
any number less than 32.
The upper limit on outliers is
Q3 + (1.5)(IQR).
42 + (1.5)(4) = 48
This means an outlier would be
any number greater than 48.
Data:
37, 37, 38, 38, 40, 40, 42, 42, 42, 62
The outlier for this data set is 62.
It surpasses the cut off of 48.
When there is an outlier on one side of
the data set, we can chop off the
“whisker” at the limit and then record the
outlier as data points.
Calculate the mean of
the data set. 41.8
Calculate the mean of
the data set without the
outlier.
39.6
Removing the outlier
changes the mean
significantly.
Removing the outlier
does not change the
median significantly.
The final box plot would look like this.
Going Fishing
A fisherman records the length, in centimeters of
10 bass caught in a stream:
15 22 19 18 15 45 27 18 18 51
He wants to know the average length of a fish
he can catch.
Determine the mean and median of the data.
Mean:
248 ÷ 10 = 24.8 cm
Median:
15 15 18 18 18 19 22 27 45 51
18.5 cm
Going Fishing
Are there any outliers?
Divide
15
the data into quarters to find the IQR.
15 18 18 18 19 22 27 45 51
Q1
Q3
IQR = 27 – 18 = 9
The lower limit on outliers is Q1 – (1.5)(IQR).
18
– (1.5)(9) = 4.5
The upper limit on outliers is Q3 + (1.5)(IQR).
27 + (1.5)(9) = 40.5
Any number less than
4.5 or greater than
40.5 are outliers.
45 and 51 are outliers.
Going Fishing
Remove the outliers and recalculate the mean and
median.
15 15 18 18 18 19 22 27
Mean: 152 ÷ 8 = 19 cm
Median: 18 cm
With the outliers removed, the mean is now closer to the
center of the data.
The average length of a fish caught in this stream is
________.