Transcript File

4. Interpreting sets of data
Cambridge University Press
1
 G K Powers 2013
Grouped frequency tables
1.
2.
3.
Classes or groups are
listed in the first column
in ascending order.
The tally column shows
the number of times a
score occurs in a class.
The frequency column
shows the total count of
the scores in each class.
HSC Hint – Class centre is the middle and is calculated by
adding the two extremes and dividing by 2.
Cambridge University Press
2
 G K Powers 2013
Cumulative frequency
Cumulative frequency is the frequency of the score plus
the frequency of all the scores less than that score. It is
the progressive total of the frequencies.
Score
Frequency
18
19
20
21
1
5
3
7
Cumulative
frequency
1
6
9
16
HSC Hint – The last number in the cumulative frequency
column equals the total number of scores.
Cambridge University Press
3
 G K Powers 2013
Cumulative frequency graphs
Cumulative frequency histogram Cumulative frequency polygon
HSC Hint – Cumulative frequency polygon joins the top
right corner of the rectangles in a cumulative
frequency histogram.
Cambridge University Press
4
 G K Powers 2013
Mean
Mean is a measure of the centre. It is calculated by summing
all the scores and dividing by the number of scores.
Mean =
x

x
n
fx

x
f

‒ ‘Sum of’ (Greek capital letter sigma)
x
‒ A score or data value
– Mean of a set of scores
x
n
f
Sum of scores
Number of scores
‒ Total number of scores
‒ Frequency
HSC Hint – Make sure all data has been cleared before using
the calculator for statistics.
Cambridge University Press
5
 G K Powers 2013
Mode

Mode is the score that occurs the most number of times.
Score with the highest frequency.

To find the mode:
 Determine the number of times each score occurs.
 Mode is the score that occurs the most number of times.
If two or more scores occur the same number of times
they are both regarded as the mode.
HSC Hint – Data is called bimodal if it contains two modes.
Cambridge University Press
6
 G K Powers 2013
Median

The median is the middle score or value.
 Cumulative frequency polygon is used to estimate the
median.
HSC Hint – Total number of scores is the value of the
cumulative frequency for the last score or class.
Cambridge University Press
7
 G K Powers 2013
Range and interquartile range
Range = Highest score – Lowest score
 Interquartile range is the difference between the first
quartile and third quartile. (IQR  Q3  Q1 )
 To calculate the interquartile range (IQR)
1. Arrange the data in increasing order.
2. Divide the data into two equal-sized groups. If n is odd,
omit the median.
3. Find Q1 the median of the first group.
4. Find Q3 the median of the second group.

Calculate the interquartile range.
(IQR  Q3  Q1 )
HSC Hint – Interquartile range is not dependent on the
5.
extreme values like the range.
Cambridge University Press
8
 G K Powers 2013
Standard deviation
 The standard deviation is a measure of the spread of data
about the mean.
 Two calculations are used for standard deviation.
 Population standard deviation ( n ) is a better measure
when we have all of the data or the entire population.
 Sample standard deviation ( n1) is the better measure
when a sample is taken from a large population.
HSC Hint – Population standard deviation or sample standard
deviation can be used if it is not specified.
Cambridge University Press
9
 G K Powers 2013
Investigating sets of data
 Outlier is a score that is separated from the majority of
the data. Outliers have little effect on the mean, median
and mode for large sets of data. However, in small data
sets, the presence of an outlier will have a large effect
on the mean, smaller effect on the median and usually
no effect on the mode.
 Shape of the graph is described in terms of smoothness,
symmetry and the number of nodes.
HSC Hint – An outlier is a score that is not close to any
other scores. It is not typical.
Cambridge University Press
10
 G K Powers 2013
Symmetry and skewness
 No skew (symmetric)
Data is symmetrical and balanced about
a vertical line.
 Positively skewed
Data is more on the left side. The long
tail is on the right side.
 Negatively skewed
Data is more on the right side. The long
tail is on the left side.
HSC Hint – Mean, mode and median are equal when the
data is symmetrical.
Cambridge University Press
11
 G K Powers 2013
Number of modes
 Unimodal
Data has only 1 mode or peak.
 Bimodal
Data has 2 modes or peaks.
 Multimodal
Data has many modes or peaks.
HSC Hint – List all the modes if the data is multimodal.
Cambridge University Press
12
 G K Powers 2013
Double stem-and-leaf plots
A stem-and-leaf plot has the tens digit of the data written in
numerical order down the page. The ‘units’ digit becomes
the ‘leaves’ and is written in numerical order across the
page.
HSC Hint – The numbers in the ‘leaves’ of a stem-andleaf plot must be written in increasing order.
Cambridge University Press
13
 G K Powers 2013
Double box-and-whisker plots
A graph that uses five-number summary – lower extreme,
lower quartile, median, upper quartile and the higher
extreme. A double box-and-whisker graph has two sets of
data.
HSC Hint – To draw a box plot arrange the data in order
before calculating the five-number summary.
Cambridge University Press
14
 G K Powers 2013
Radar charts
A radar chart looks like a spider web and is used to
compare the performance of one or more entities.
HSC Hint – Line segments in a radar chart must be constructed
accurately to ensure the information is valid.
Cambridge University Press
15
 G K Powers 2013
Area chart
A graph consisting of different ‘areas’ each representing
a data set over a period of time. The thickness of the area
indicates the size of the data.
HSC Hint – To read data from an area chart, draw a
vertical line and estimate the difference
between the heights.
Cambridge University Press
16
 G K Powers 2013
Comparison – Measures of location
Mean
Advantages
Disadvantages
Median
Advantages
Disadvantages
Advantages
Mode
Disadvantages
Cambridge University Press
Easy to understand and calculate.
Depends on every score.
Varies least from sample to sample.
Distorted by outliers.
Not suitable for categorical data.
Easy to understand.
Not affected by outliers.
May not be central.
Varies more than the mean in a sample.
Easy to determine
Not affected by outliers
Suitable for categorical data
May be no mode or more than one mode.
May not be central
17
 G K Powers 2013
Comparison – Measures of spread
Advantages
Range
Disadvantages
Interquartile
range
Advantages
Disadvantages
Standard
deviation
Advantages
Disadvantages
Cambridge University Press
Easy to understand.
Easy to calculate.
Dependent on the smallest and largest
values.
May be distorted by outliers.
Easy to determine for small data sets.
Easy to understand.
Not affected by outliers.
Difficult to calculate for large data sets.
Dependent on lower and upper quartiles.
Data needs to be sorted.
Depends on every score.
Not affected by outliers.
Difficult to determine without a
calculator
Difficult to understand.
18
 G K Powers 2013
Two-way tables
A two-way table presents data using rows and columns.
Data in a cell is interpreted by reading the headings for
the row and the column.
HSC Hint – Calculate the totals across each row and down each
column. Add the totals horizontally and vertically.
The results of these calculations should be equal.
Cambridge University Press
19
 G K Powers 2013