Statistics 1

Download Report

Transcript Statistics 1

Statistics 1
How long is a name?
• To answer this question, we might
collect some data on the length of a
name.
How long is a name?
• First we need to establish our target
population.
How long is a name?
• First we need to establish our target
population.
• Let’s say in this mathematics class.
How long is a name?
• What names should we use?
How long is a name?
• What names should we use?
• Names as listed on the roll.
Data
Averaging
• We call this a central tendency.
• There are 3 measures which we can
use.
• MEAN
• MEDIAN
• MODE
Mean
• Usually when we say average, we are
referring to the mean.
• To find the mean, we add up all the
numbers and divide by how many there
are.
Example
• Find the mean of 4, 0, 2, 1, 6
4 + 0 + 2 + 1+ 6 13
= = 2.6
5
5
In Excel we can use the
formula
• =average(highlight cells)
Data on names
Median
• A median is the middle value when the
data is put in order.
• If there are an odd number of data, the
middle is unique.
• If there is an even number of data, we
need to average the two middles.
Example
• Find the median of 4, 8, 2, 9, 1
• First put them in order
• 1, 2, 4, 8, 9
Example
• Find the median of 4, 8, 2, 9, 1
• First put them in order
• 1, 2, 4, 8, 9
• The middle number is ‘4’
Example
• Find the median of 4, 8, 2, 9, 1, 6
•
•
•
•
First put them in order
1, 2, 4, 6, 8, 9
The middle number is ‘4’ and ‘6’
Averaging gives median is 5.
Sort data on Excel or use
formula =median(data)
Mode
• The mode is the most common number.
• You can have 2 modes but not more
than 2.
Example
• Find the mode of 6, 4, 3, 7, 8, 6, 7, 2
Example
• Find the mode of 6, 4, 3, 7, 8, 6, 7, 2
• There are two modes 6 and 7
Using Excel
• Formula =mode(data)
• You must be careful as Excel will only
give one mode
Which average is the best?
• Generally we use the mean as it
includes all the data but if we have
extreme values, the median is a better
measure as it is not affected by extreme
values.
Example
• These are the incomes of a group of
university students.
• $2400, $1500, $2000, $1800, $22 000
• Find the best ‘average’.
Example
• $2400, $1500, $2000, $1800, $22 000
2400 + 1500 + 2000 + 1800 + 22000
Mean =
= $5940
5
Median = $2000
No mode
• The mean is not representative whereas the
median is.
Frequency tables
Length
Tally
Frequency
3
ll
2
4
llll
5
5
llll llll llll
14
6
llll ll
7
7
llll
5
8
ll
2
Mode is 5
Length
Tally
Frequency
3
ll
2
4
llll
5
5
llll llll llll
14
6
llll ll
7
7
llll
5
8
ll
2
Median is also 5
Length
Tally
Frequency
3
ll
2
4
llll
5
5
llll llll llll
14
6
llll ll
7
7
llll
5
8
ll
2
Mean is 5.4
Length
Tally
Frequency
3
ll
2
4
llll
5
5
llll llll llll
14
6
llll ll
7
7
llll
5
8
ll
2
Calculating the mean by hand
3 ´ 2 + 4 ´ 5 + 5 ´14 + 6 ´ 7 + 7 ´ 5 + 8 ´ 2
= 5.4
2 + 5 + 14 + 7 + 5 + 2
Using the calculator
•
•
•
•
•
•
•
•
STAT mode
Place data in list 1
Place frequency in list 2
CALC, SET,
1Var Xlist list1
1Var Freq list2
Exe
1Var
Measures of spread
• It is not enough to just give the
‘average’.
• The mean, median and mode is the
same for all 3 sets of data:
• 48 49 50 50 51 52
• 40 45 50 50 50 55 60
• 0 0 50 50 50 100 100
• But the data sets are quite different
Measures of spread
• Range is
• (highest number) - (lowest number)
• For our data set the first names have a
range of 8 - 3 = 5
Measures of spread
• Again, if there are extreme values, the
range can distort the true spread of the
data.
5-number summary
• We often sort the data into a 5 number
summary.
• The data is split into 4 groups
Example 1
• 1 14 29 35 43 48 49 78 82 82 92 95 95
• 13 numbers
Example 1
•
•
•
•
1 14 29 35 43 48 49 78 82 82 92 95 95
Lowest is 1
Median is 49
Highest is 95
Example 1
•
•
•
•
•
•
1 14 29 35 43 48 49 78 82 82 92 95 95
Lowest is 1
Lower quartile is 35
Median is 49
Upper quartile is 82
Highest is 95
Example 2
• 9 11 17 22 23 28 30 36
Example 2
• 9 11
17 22
14
22.5
23 28
30 36
29
Example 2
• 9 11
17 22
14
23 28
22.5
• 5-number summary is
• 9 14 22.5 29 36
30 36
29
For first names in our class
•
•
•
•
The 5-number summary is 3 4 5 6 8
Lower quartile is 4
Upper quartile is 6
Interquartile range is the difference
between quartiles 6 - 4 = 2
Statistics so far
•
•
•
•
•
Central tendencies:
Mean = 5.4
Median = 5
Mean = 5
Because the mean and median are
about the same, we wouldn’t expect
extreme values.
Statistics so far
• Measures of spread:
• Range = 5
• Interquartile range = 2
Statistics so far
• 5 - number summary
• 3 4 5 6 8