2)_Representation_and_Summary_of_Data_

Download Report

Transcript 2)_Representation_and_Summary_of_Data_

Representation and Summary of Data
- Location
This chapter is generally about calculating averages,
also known as ‘measures of location’
Much of the topics you will have seen at GCSE, but we
will begin to use proper ‘mathematical notation’ when
solving problems
By the end of the chapter you will have seen:
 Key words to remember
 Mean, Median and Mode (including from a table)
 The difference between a discrete frequency
table and a continuous frequency table and its effect
on calculations
 How to use coding
Representation and Summary of Data
- Location
Data
Key Terms
•
•
Quantitative Variable
– Data which is numerical
– eg) Height, profits, number of beads in a bag
Quantitative
Qualitative
Qualitative Variable
– Data which is not numerical
– eg) Car colour, brand name of clothes
•
Discrete Data
•
Continuous Data
Continuous
Discrete
– Numerical data that only takes certain values
– eg) Shoe size, goals scored
– Numerical Data that takes any value
– eg) Height, Weight, Time taken
2A
Representation and Summary of Data
- Location
Data in a table
•
•
Rebecca records the shoe size, x, of the
female students in her year. The table
shows her results.
Find:
a) The number of students who take size
37
 29
b) The shoe size taken by the smallest
number of female students
 35
c) The shoe size taken by the largest
number of female students
 38
d) The total number of students in the
year
 Add them up..
 95
x
Number of
students, f
35
3
36
17
37
29
38
34
39
12
x  The data you are
looking at
f  frequency
2A
Representation and Summary of Data
- Location
Data in a table
•
Add a Cumulative Frequency
column to the table
 Add up the totals after
each additional group
x
Number of
students, f
Cumulative
Frequency
35
3
3
36
17
20
37
29
49
38
34
83
39
12
95
x  The data you are
looking at
f  frequency
2A
Representation and Summary of Data
- Location
Data in a Grouped Frequency Table
Length of
Wing (mm)
Number of
Butterflies, f
Groups are known as classes
30-31
2
•
You need to know how to find class
boundaries
32-33
25
•
You need to be able to work out the
mid-point of a class
34-36
30
37-39
13
•
You need to know the following when
working with grouped data:
•
•
You need to be able to find the class
width
2A
Representation and Summary of Data
- Location
Data in a Grouped Frequency Table
•
Write down the class boundaries, midpoint and class width for the group 34-36
a) Class boundaries
 As there are gaps between the groups,
the groups are said to begin and end
halfway between each other
 33.5 – 36.5
b) Midpoint
 Add up the boundaries and divide by 2
 (33.5 + 36.5) ÷ 2 = 35mm
c) Class width
 The upper boundary minus the lower
boundary
 36.5 – 33.5 = 3mm
33.5
36.5
Length of
Wing (mm)
Number of
Butterflies, f
30-31
2
32-33
25
34-36
30
37-39
13
2A
Representation and Summary of Data
- Location
Data in a Grouped Frequency Table
•
Write down the class boundaries, midpoint and class width for the group
70 < s ≤ 75
a) Class boundaries
 No gaps so the same as in the table
 70 - 75
b) Midpoint
 Add up the boundaries and divide by 2
 (70 + 75) ÷ 2 = 72.5s
c) Class width
 The upper boundary minus the lower
boundary
 75 – 70 = 5s
70
75
Time taken
(s)
Number of
females, f
55 < s ≤ 65
2
65 < s ≤ 70
25
70 < s ≤ 75
30
75 < s ≤ 90
13
2A
Representation and Summary of Data
- Location
Measures of Location (averages)
•
Mode
– The most common value in a set of data.
•
Median
– The middle value when the data is put into ascending order
– For n observations, divide n by 2.
– If whole, find the midpoint of the corresponding term and the term above. If not
whole, round up and find the corresponding term
•
Eg) For the median of 14 values
– 14 ÷ 2 = 7
– 7 is whole so the median will be the midpoint of the 7th and 8th terms.
–
–
–
For the median of 29 values
29 ÷ 2 = 14.5
Round up to 15 and find the 15th value
2B
Representation and Summary of Data
- Location
Measures of Location (averages)
•
Mean
– The sum of the observations divided by the total number of observations
– This is written as:
x
n

– The symbol
means ‘the sum of’
– The x represents the observations
– The n stands for the number of observations
– Often, the mean is denoted by
x.
(x-bar)
2B
Representation and Summary of Data
- Location
Measures of Location (averages)
•
Calculate the mean, median and mode of the set of data below…
2, 6, 18, 21, 16, 17, 6, 5, 5, 1, 5, 3
a) Mode  5
b) Median
1, 2, 3, 5, 5, 5, 6, 6, 16, 17, 18, 21
12 ÷ 2 = 6
So find the mid-point of the 6th and 7th terms
 5.5
c) Mean
x
n
105
12
8.75
You must get into
the habit of
showing workings
like this!
2B
Representation and Summary of Data
- Location
Measures of Location (averages)
•
Ben collects 8 pieces of data and calculates that
•
Calculate the mean:
x
n
13.5
8
 x is 13.5
1.69 (2dp)
2B
Representation and Summary of Data
- Location
Measures of Location (averages)
•
You need to be able to calculate a combined mean.
1)
If the mean pay of 20 workers is £5 per hour, and the mean of a different 20 workers is
£6 per hour, what is the overall mean?
The midpoint of £5 and £6 = £5.50
2)
If the mean pay of 5 workers is £8 per hour and the mean pay of a different 12 workers
is £6 per hour, what is the overall mean?
This is not as simple. You need to work out the total pay, and the total number of people.
Total Pay  (5 x £8) + (12 x £6)
= £112
Total People  (5 + 12)
= 17
x
n
112
17
£6.59 (2dp)
2B
Representation and Summary of Data
- Location
Measures of Location (averages)
•
You need to be able to calculate a combined mean.
In general, you can use a formula…
If data set 1 has observations given by
observations 2 , and mean 2 then:
n
x
n , and mean x , and set 2 has
1
1
Mean of set 1 multiplied
by observations in set 1
Overall mean
Mean of set 2 multiplied
by observations in set 2
n
x

n
x
x
n n
1
1
2
1
2
2
Total number of
observations
2B
Representation and Summary of Data
- Location
Measures of Location (averages)
n
x

n
x
x
n n
1
Using the formula
A sample of 25 observations has a
mean of 6.4. The mean of a second
sample is 7.2, with 30 observations.
Calculate the overall mean.
You must get into
the habit of
showing workings
like this!
1
2
1
2
2
(25

6.4)

(30

7.2)
x
25  30
x  6.84
(2dp)
2B
Representation and Summary of Data
- Location
Measures of location (averages)
•
You should realise that the 3 measures of location have different
advantages and disadvantages.
Mode
Can be used with any data, qualitative or quantitative. No use when
there isn’t a common value.
Median
Used with quantitative data and is unaffected by extreme values. Only
uses the middle value(s) though.
Mean
Uses all the data but can be affected by extreme values.
2B
Representation and Summary of Data
- Location
Measures of location (averages)
from tables
Rebecca records the shirt collar
size, x, of male students in her
year group. Her results are in the
table.
Find the modal collar size.
 16.5 as this is the collar size
which occurred most often (34
times)
x
Number of
students, f
15
3
15.5
17
16
29
16.5
34
17
12
2C
Representation and Summary of Data
- Location
Measures of location (averages)
from tables
Find the median collar size.
 Fill in the Cumulative Frequency
column
 Total ÷ 2
 95 ÷ 2 = 47.5
 The median will be the 48th
value
 Find which group the 48th value
will be in, using the Cumulative
Frequency column
x
Number of
students, f
Cumulative
Frequency
15
3
3
15.5
17
20
16
29
49
16.5
34
83
17
12
95
 The median is 16
2C
Representation and Summary of Data
- Location
Measures of location (averages)
from tables
Find the mean collar size.
Sum of collar sizes ÷ Total students
 1537.5 ÷ 95
 16.18 (2dp)
x
Number of
students, f
fx
15
3
45
15.5
17
263.5
16
29
464
16.5
34
561
17
12
204
Total
95
1537.5
2C
Representation and Summary of Data
- Location
Measures of location (averages)
from tables
Find the mean collar size.
This is the formula you are actually
using:
 fx
f
 fx
f
Sum of ‘f times x’
Sum of ‘f’
1537.5
95
16.18
x
Number of
students, f
fx
15
3
45
15.5
17
263.5
16
29
464
16.5
34
561
17
12
204
Total
95
1537.5
2C
Representation and Summary of Data
- Location
•
Measures of location (averages) from grouped tables
•
All grouped data is treated as continuous data, and you need to be able to
calculate all 3 averages from this kind of table.
•
The mode is essentially the same, the group with the highest frequency
•
We will be focusing on the median and mean, and is important to know that when
data is grouped, you do not know the actual values. Therefore, the median and
mean from a grouped table are only estimates and not necessarily accurate.
2D
Representation and Summary of Data
- Location
•
Mean from a grouped table
•
To calculate the mean from a
grouped table, we use the same
formula as for an ungrouped table.
x
•
 fx
f
The difference is that x is now the
midpoint of each class, rather than
actual values
Length of
Pine Cone
(mm)
Number of
Cones, f
30-31
2
32-33
25
34-36
30
37-39
13
2D
Representation and Summary of Data
- Location
•
Mean from a grouped table
•
Fill in 2 columns on the table
(sometimes you will have to
remember which columns you need)
fx

x
f
2417.5
x
70
x  34.54
Length of
Pine Cone
(mm)
Number
of Cones,
f
Midpoint
(x)
fx
30-31
2
30.5
61
32-33
25
32.5
812.5
34-36
30
35
1050
37-39
13
38
494
Total
70
2417.5
2D
Representation and Summary of Data
- Location
•
Median from a grouped table
•
We will be using a formula to
estimate the median, but first we
will try to understand the process.
Length of
Pine Cone
(mm)
Number of
Cones, f
Cumulative
Frequency
•
First, find which group it is in…
 Complete the Cumulative
Frequency column
 70 ÷ 2 = 35
(for continuous data you just divide
by 2)
 It will be in the 34-36 group
30-31
2
2
32-33
25
27
34-36
30
57
37-39
13
70
•
Our next step is to consider ‘how
far’ the median will be into the group
2D
Representation and Summary of Data
- Location
•
Median from a grouped table
We have had 27 observations so far…
33.5
8 values
to go
30 values
in group
36.5
The Median will be 8/30ths into a
group with a class width of 3
Length of
Pine Cone
(mm)
Number of
Cones, f
Cumulative
Frequency
30-31
2
2
32-33
25
27
34-36
30
57
37-39
13
70
35th value, in the 34-36 group
8/
30
of 3 = 0.8, so the median is 0.8
into the group
Representation and Summary of Data
- Location
•
Median from a grouped table
•
The median is 0.8 into the group
•
The lower boundary of the group is
33.5
•
33.5 + 0.8 = 34.3
•
So our estimate of the median is
34.3, this process is known as
interpolation.
Length of
Pine Cone
(mm)
Number of
Cones, f
Cumulative
Frequency
30-31
2
2
32-33
25
27
34-36
30
57
37-39
13
70
35th value, in the 34-36 group
Representation and Summary of Data
- Location
•
Median from a grouped table
•
The formula (most important bit!)
Lower
+
Boundary
(
)
Places into Group x Classwidth
Group Frequency
33.5 +
(
8
30
= 34.3
You must get into
the habit of
showing workings
like this!
x 3
)
Length of
Pine Cone
(mm)
Number of
Cones, f
Cumulative
Frequency
30-31
2
2
32-33
25
27
34-36
30
57
37-39
13
70
35th value, in the 34-36 group
2D
Representation and Summary of Data
- Location
Coding
• You need to understand why data is coded, how to code it and
how to un-code it.
• Coding is done before any average is calculated, and is usually
used with large values of data in order to simplify calculations
• Once data has been coded, averages are calculated
• Then after the average is worked out, the code is reversed in
order to give the actual average
Representation and Summary of Data
- Location
Coding
•
Use the following coding to calculate the mean of the data below
110, 120, 130, 140, 150
x represents the original value
Coding 
y  x  100
y is the coded value
So this code is telling us to subtract 100 from all the numbers before calculating
the mean
10, 20, 30, 40, 50
The mean of these numbers is 30
However as 100 was subtracted, you must now undo this to get the correct mean
 So the mean of the original set of data is 130
Representation and Summary of Data
- Location
Coding
•
Use the following coding to calculate the mean of the data below
110, 120, 130, 140, 150
x represents the original value
Coding 
y
x  100
10
y is the coded value
So this code is telling us to subtract 100 from all the numbers, and then divide
by 10, before calculating the mean
1, 2, 3, 4, 5
The mean of these numbers is 3
We subtracted 100 then divided by 10..
So to undo this we must multiply by 10 then add 100…
 So the mean of the original set of data is 130
Representation and Summary of Data
- Location
Coding
Time
(mins)
Calls
Midpoint
,x
y
fy
0-5
4
2.5
-1
-4
5-10
15
7.5
0
0
10-15
5
12.5
1
5
15-20
2
17.5
2
4
y  16.5
27
20-60
0
40
6.5
0
y  0.61111
60-70
1
65
11.5
11.5
Total
27
Use the following code to estimate the
mean of this set of grouped data on the
lengths of phonecalls.
y
x  7.5
5
First the midpoints (x) must be turned
into new values (y) using the code.
We are now working out the mean, so use
the formula for this.
fy

y
f
16.5
Representation and Summary of Data
- Location
Coding
We calculated a mean of 0.61111 using the code
y
x  7.5
5
 So we subtracted 7.5 and then divided by 5
 We therefore need to multiply by 5 and then add 7.5
 (0.61111 x 5) + 7.5
 The mean for the original data ( x ) is 10.5555 (10.56 to 2dp)
Summary
• We have now covered all of chapter 2
• We have seen the 3 measures of location (averages)
• We have seen how to calculate them in tables, using
midpoints and interpolation where appropriate
• We have looked at combination means
• We have also used coding in answering questions