Transcript bin width

Graphical Displays of
Information
Chapter 3.1 – Tools for Analyzing Data
Mathematics of Data Management (Nelson)
MDM 4U
Histograms




contain continuous data grouped in class
intervals, which will display how data is
spread over a range
the width of each bar is known as the bin
width
different bin widths produce different shaped
distributions
bin widths should be equal and there should
be at least five (5)
Histogram Example
Histogram
Data
9
8
7

Count
6
5
4
3
2
Histogram
Data
1
30
40
25
60
80
SomeData
100
20
15
10
5
40
60
80
SomeData
100
120
Data
Histogram
6
5
Count

these
histograms
represent the
same data
however, one
shows much
less of the
structure of
the data
too many bins
(bin width too
small) is also
a problem
Count

4
3
2
1
30
40
50
60
70
80
SomeData
90
100
110
120
Histogram Applet – Old Faithful
http://www.isixsigma.com/offsite.asp?A=Fr&Url
=http://www.stat.sc.edu/~west/javahtml/Histo
gram.html
Bin Width Calculation


the bin width is calculated by dividing the
range = max – min by the number of intervals
you desire (5-6)
the bins should not overlap


Discrete


wrong: 0-10, 10-20, 20-30, 30-40
correct: 0-10, 11-20, 21-30, 31-40
Continuous

correct: 0-9.99, 10-19.99, 20-29.99, 30-39.99
Mound-shaped distribution


The middle interval(s) have the greatest
frequency (i.e. the tallest bar)
The bars get smaller as you move out to the
edges.
U-shaped distribution


Lowest frequency in the centre, highest towards
the outside
E.g. height of a combined grade 1 and 6 class
Uniform distribution


All bars are approximately the same height
E.g. roll a die 50 times
Symmetric distribution


A distribution that is the same on either side of the
centre
U-Shaped, Uniform and Normal Distributions are
symmetric
Skewed distribution (left and right)



Highest frequencies at one end
Left-skewed drops off to the left
E.g. the years on a handful of quarters
Exercises

Define in your notes:




Frequency distribution (p. 146)
Cumulative frequency (p. 146)
Relative frequency (p. 146)
Try page 146 #1,2,3, 11 (use Excel or
Fathom),13
Measures of Central
Tendency
Chapter 3.2 – Tools for Analyzing Data
Mathematics of Data Management (Nelson)
MDM 4U
Sigma Notation



the sigma notation is used to compactly
express a mathematical series
ex: 1 + 2 + 3 + 4 + … + 15
this can be expressed: 15
 k  1  2  3  4...  14  15
k 1



the variable k is called the index of
summation.
the number 1 is the lower limit and the
number 15 is the upper limit
we would say: “the sum of k for k = 1 to
k = 15
Examples:
7

(2n  1)
n4

write in expanded form:

= [2(4) + 1] + [2(5) + 1] + [2(6) + 1] + [2(7) + 1]
= 9 + 11 + 13 + 15
=48
note that any letter can be used for the index of
summation, though k, a, n, i, j & x are often used



Example: write the following in sigma
notation
3 3 3
3  
2 4 8
3 3 3 3
 0 1 2 3
2 2 2
2
3
2
n 0
3
n
n
The Mean
x



xi
i 1
n
found by dividing the sum of all the data points by
the number of elements of data
Deviation
 the distance of a data point from the mean
 calculated by subtracting the mean from the
value
The Weighted Mean
n
x
xw
i 1
n



i
w
i 1

i
i
where xi represent the data points, wi represents
the weight or the frequency
see examples on page 153 and 154
example: 7 students have a mark of 70 and 10
students have a mark of 80
mean = (70 * 7 + 80 * 10) / (7 + 10)
Means with grouped data


for data that is already grouped into class
intervals (assuming you do not have the
original data), you must use the midpoint of
each class to estimate the weighted mean
see the example on page 154-5
Median



the midpoint of the data
calculated by placing all the values in order
if there are an even number of values, the
median is the mean of the middle two
numbers


1 4 6 8 9 12
median = 7
if there is an odd number of values, the
median is the middle number

1 4 6 8 9
median = 6
Mode









Simply chosen by finding the number that occurs most often
There may be no mode, one mode, two modes (bimodal), etc.
Which distributions from yesterday have one mode?
Mound-shaped, Left/Right-Skewed
Two modes?
U-Shaped, some Symmetric
Multiple modes?
Uniform
Modes are appropriate for discrete data or non-numerical data
 shoe sizes
 shoe colors
Distributions and Central Tendancy

the relationship between the three measures
changes depending on the spread of the data
Histogram
Data
symmetric (mound shaped)

Count

3
mean = median = mode
2
1
Histogram
Data
0
5
right skewed

mean > median > mode
2
3
4
data
5
6
7
4
Count

1
3
2
1
Histogram
Data
0
1
2
3
4
data
5
6
7
5

left skewed

mean < median < mode
Count
4
3
2
1
0
1
2
3
4
data
5
6
7
What Method is Most Appropriate?






Outliers are data points that are quite
different from the other points
Outliers have the greatest effect on the mean
Median is least affected by outliers
Skewed data is best represented by the
median
If symmetric either median or mean
If not numeric or if the frequency is the most
critical, use the mode
Example 1

find the mean, median and mode
Survey responses
1
2
3
4
Frequency
2
8
14
3





mean = [(1x2) + (2x8) + (3x14) + (4x3)] / 27 = 2.7
median = 3
mode = 3
which way is it skewed?
Left
Example 2

Find the mean, median and mode
Height
No. of Students





141-150
151-160
161-170
3
7
4
mean = [(145x3) + (155x7) + (165x4)] / 14 = 155.7
median = 155
mode = 151-160
which way is it skewed?
Mound-shaped
Exercises

try page 159 #4, 5, 6, 8

Remembrance Day by the Numbers
http://www42.statcan.ca/smr08/smr08_064_e
.htm

References

Wikipedia (2004). Online Encyclopedia.
Retrieved September 1, 2004 from
http://en.wikipedia.org/wiki/Main_Page