28 Data handling and statistics PowerPoint

Download Report

Transcript 28 Data handling and statistics PowerPoint

Programme 28: Data handling and statistics
PROGRAMME 28
DATA HANDLING
AND STATISTICS
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Introduction
Arrangement of data
Histograms
Measure of central tendency
Dispersion
Frequency polygons
Frequency curves
Normal distribution curve
Standardized normal curve
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Introduction
Arrangement of data
Histograms
Measure of central tendency
Dispersion
Frequency polygons
Frequency curves
Normal distribution curve
Standardized normal curve
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Introduction
Statistics is concerned with the collection, ordering and analysis of data.
Data consists of sets of recorded observations or values. Any quantity that can
have a number of values is a variable. A variable may be one of two kinds:
(a) Discrete – a variable whose possible values can be counted
(b) Continuous – a variable whose values can be measured on a continuous scale
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Introduction
Arrangement of data
Histograms
Measure of central tendency
Dispersion
Frequency polygons
Frequency curves
Normal distribution curve
Standardized normal curve
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Introduction
Arrangement of data
Histograms
Measure of central tendency
Dispersion
Frequency polygons
Frequency curves
Normal distribution curve
Standardized normal curve
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Arrangement of data
Table of values
Tally diagram
Grouped data
Grouping with continuous data
Relative frequency
Rounding off data
Class boundaries
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Arrangement of data
Table of values
A set of data:
28
31
29
27
30
29
29
26
30
28
28
29
27
26
32
28
32
31
25
30
27
30
29
30
28
29
31
27
28
28
Can be arranged in ascending order:
STROUD
25
26
26
27
27
27
27
28
28
28
28
28
28
28
29
29
29
29
29
29
30
30
30
30
30
31
31
31
32
32
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Arrangement of data
Table of values
Once the data is in ascending order:
25
26
26
27
27
27
27
28
28
28
28
28
28
28
29
29
29
29
29
29
30
30
30
30
30
31
31
31
32
32
It can be entered into a table.
The number of occasions on which any
particular value occurs is called the
frequency, denoted by f.
STROUD
Value
Number of times
25
1
26
2
27
4
28
7
29
6
30
5
31
3
32
2
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Arrangement of data
Tally diagram
When dealing with large numbers of readings, instead of writing all the values
in ascending order, it is more convenient to compile a tally diagram, recording
the range of values of the variable and adding a stroke for each occurrence of
that reading:
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Arrangement of data
Grouped data
If the range of values of the variable is large, it is often helpful to consider these
values arranged in regular groups or classes.
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Arrangement of data
Grouping with continuous data
With continuous data the groups boundaries are given to the same number of
significant figures or decimal places as the data:
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Arrangement of data
Relative frequency
If the frequency of any one group is divided by the sum of the frequencies the
ratio is called the relative frequency of that group. Relative frequencies can be
expressed as percentages:
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Arrangement of data
Rounding off data
If the value 21.7 is expressed to two significant figures, the result is rounded up
to 22. similarly, 21.4 is rounded down to 21.
To maintain consistency of group boundaries, middle values will always be
rounded up. So that 21.5 is rounded up to 22 and 42.5 is rounded up to 43.
Therefore, when a result is quoted to two significant figures as 37 on a
continuous scale this includes all possible values between:
36.50000… and 37.49999…
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Arrangement of data
Class boundaries
A class or group boundary lies midway between the data values. For example,
for data in the class or group labelled:
7.1 – 7.3
(a) The class values 7. 1 and 7.3 are the lower and upper limits of the class and
their difference gives the class width.
(b) The class boundaries are 0.05 below the lower class limit and 0.05 above
the upper class limit
(c) The class interval is the difference between the upper and lower class
boundaries.
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Arrangement of data
Class boundaries
(d) The central value (or mid-value) of the class interval is one half of the
difference between the upper and lower class boundaries.
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Introduction
Arrangement of data
Histograms
Measure of central tendency
Dispersion
Frequency polygons
Frequency curves
Normal distribution curve
Standardized normal curve
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Introduction
Arrangement of data
Histograms
Measure of central tendency
Dispersion
Frequency polygons
Frequency curves
Normal distribution curve
Standardized normal curve
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Histograms
Frequency histogram
Relative frequency histogram
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Histograms
Frequency histogram
A histogram is a graphical representation of a frequency distribution in which
vertical rectangular blocks are drawn so that:
(a) the centre of the base indicates the central value of the class and
(b) the area of the rectangle represents the class frequency
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Histograms
Frequency histogram
For example, the measurement of the lengths of 50 brass rods gave the
following frequency distribution:
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Histograms
Frequency histogram
This gives rise to the histogram:
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Histograms
Relative frequency histogram
A relative frequency histogram is identical in shape to the frequency histogram
but differs in that the vertical axis measures relative frequency.
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Introduction
Arrangement of data
Histograms
Measure of central tendency
Dispersion
Frequency polygons
Frequency curves
Normal distribution curve
Standardized normal curve
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Introduction
Arrangement of data
Histograms
Measure of central tendency
Dispersion
Frequency polygons
Frequency curves
Normal distribution curve
Standardized normal curve
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Measure of central tendency
Mean
Coding for calculating the mean
Decoding
Coding with a grouped frequency distribution
Mode of a set of data
Mode of a grouped frequency distribution
Median of a set of data
Median with grouped data
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Measure of central tendency
Mean
The arithmetic mean:
x
of a set of n observations is their average:
mean =
sum of observations
x
that is x 
number of observations
n
When calculating from a frequency distribution, this becomes:
xf  xf

x

n
f
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Measure of central tendency
Coding for calculating the mean
A deal of tedious work can be avoided by coding with a false mean.
Choose a convenient value of x near the middle of the range (the false mean)
and subtract it from every other value of x and then divide by a suitable data
interval to give the coded value of xc.
Proceed to find the mean of the coded values:
xc
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Measure of central tendency
Coding for calculating the mean
For example:
xc 
STROUD
x f
f
c

2.0
 0.0333 to 4 dp
60
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Measure of central tendency
Decoding
Decoding requires the coding process to be reversed
This means multiplying by the appropriate data interval and then adding the
false mean:
xc 
x f
f
c

2.0
x  30.8
 0.0333 to 4 dp where xc 
60
0.2
Therefore:
x  (0.0333)  0.2  30.8  30.79 to 2 dp
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Measure of central tendency
Coding with a grouped frequency distribution
This procedure is similar where the false mean is the centre value of a
convenient class.
xc 
STROUD
x f
f
c

11
 0.22
50
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Measure of central tendency
Coding with a grouped frequency distribution
Decoding again requires the coding process to be reversed
This means multiplying by the appropriate data interval and then adding the
false mean:
xc 
x f
f
c

11
x  2.30
 0.22 where xc  m
50
0.03
Therefore:
xm  (0.22)  0.3  2.30  2.3067 to 4 dp
giving:
x  2.307 to 3 dp
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Measure of central tendency
Mode of a set of data
The mode of a set of data is that value of the variable that occurs most often.
The mode of:
2, 2, 6, 7, 7, 7, 10, 13
is clearly 7. The mode may not be unique, for instance the modes of:
23, 25, 25, 25, 27, 27, 28, 28, 28
are 25 and 28.
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Measure of central tendency
Mode of a grouped frequency distribution
The modal class of grouped data is the class with the greatest population.
For example, the modal class of:
Is the third class.
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Measure of central tendency
Mode of a grouped frequency distribution
Plotting the histogram of the data enables the mode to be found:
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Measure of central tendency
Mode of a grouped frequency distribution
The mode can also be calculated algebraically:
If L = lower boundary value
l = AB = difference in frequency on the
lower boundary
u = CD = difference in frequency on the
upper boundary
c = class interval
the mode is then:
 l 
mode  L  
c
l u 
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Measure of central tendency
Median of a set of data
The median is the value of the middle datum when the data is arranged in
ascending or descending order.
If there is an even number of values the median is the average of the two
middle data.
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Measure of central tendency
Median with grouped data
In the case of grouped data the median divides the population of the largest
block of the histogram into two parts:
6  12  15  A  B  13  9  5
In this frequency distribution
A + B = 20 so that A = 7:
7
The width of A 
 class interval
20
 0.35  0.3
 0.105
Therefore, Median = 30.85 + 0.105
= 30.96 to 2 dp
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Introduction
Arrangement of data
Histograms
Measure of central tendency
Dispersion
Frequency polygons
Frequency curves
Normal distribution curve
Standardized normal curve
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Introduction
Arrangement of data
Histograms
Measure of central tendency
Dispersion
Frequency polygons
Frequency curves
Normal distribution curve
Standardized normal curve
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Dispersion
Range
Standard deviation
Alternative formula for the standard deviation
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Dispersion
Range
The mean, mode and median give important information about the central
tendency of data but they do not tell anything about the spread or dispersion
about the centre.
For example, the two sets of data:
26, 27, 28 ,29 30 and 5, 19, 20, 36, 60
both have a mean of 28 but one is clearly more tightly arranged about the mean
than the other. The simplest measure of dispersion is the range – the difference
between the highest and the lowest values.
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Dispersion
Standard deviation
The standard deviation is the most widely used measure of dispersion.
The variance of a set of data is the average of the square of the difference in
value of a datum from the mean:
( x1  x )2  ( x2  x ) 2 
variance 
n
 ( xn  x ) 2
This has the disadvantage of being measured in the square of the units of the
data. The standard deviation is the square root of the variance:
n
standard deviation   
STROUD
(x  x )
i 1
2
i
n
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Dispersion
Alternative formula for the standard deviation
Since:
n

 (x  x )
i
i 1

n
n

n
2
x
i 1
2
i
n
(x
i 1
2
i
n
n
 2 x  xi   x
i 1
n
 2 xi x  x 2 )
i 1
n
2

x
i 1
2
i
 2nx 2  nx 2
n
n

That is:
STROUD
x
i 1
n
2
i
 x2
  x2  x 2
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Introduction
Arrangement of data
Histograms
Measure of central tendency
Dispersion
Frequency polygons
Frequency curves
Normal distribution curve
Standardized normal curve
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Introduction
Arrangement of data
Histograms
Measure of central tendency
Dispersion
Frequency polygons
Frequency curves
Normal distribution curve
Standardized normal curve
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Frequency polygons
If the centre points of the tops of the rectangular blocks of a frequency
histogram are joined by straight lines, the resulting figure is called a frequency
polygon
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Introduction
Arrangement of data
Histograms
Measure of central tendency
Dispersion
Frequency polygons
Frequency curves
Normal distribution curve
Standardized normal curve
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Introduction
Arrangement of data
Histograms
Measure of central tendency
Dispersion
Frequency polygons
Frequency curves
Normal distribution curve
Standardized normal curve
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Frequency curves
If the frequency polygon is smoothed out the resulting figure is a frequency
curve.
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Introduction
Arrangement of data
Histograms
Measure of central tendency
Dispersion
Frequency polygons
Frequency curves
Normal distribution curve
Standardized normal curve
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Introduction
Arrangement of data
Histograms
Measure of central tendency
Dispersion
Frequency polygons
Frequency curves
Normal distribution curve
Standardized normal curve
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Normal distribution curve
Values within 1 standard deviation of the mean
Values within 2 standard deviations of the mean
Values within 3 standard deviations of the mean
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Normal distribution curve
Values within 1 standard deviation of the mean
When very large numbers of observations are made and the range is divided into
a very large number of ‘narrow’ classes, the resulting frequency curve, in many
cases, approximates closely to a standard curve known as the normal distribution
curve.
The normal distribution curve is
symmetrical about its centre line
which coincides with the mean
of the observations
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Normal distribution curve
Values within 1 standard deviation of the mean
There are two points on the normal distribution curve where the concavity
switches, one from concave to convex and the other from convex to concave.
The horizontal distance of each of these two points from the mean line is one
standard deviation.
Of the area beneath the normal
distribution curve:
68%
lies within one standard deviation
from the mean
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Normal distribution curve
Values within 2 standard deviations of the mean
Of the area beneath the normal
distribution curve:
95%
lies within two standard deviations
from the mean
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Normal distribution curve
Values within 3 standard deviations of the mean
Of the area beneath the normal
distribution curve:
99.7%
lies within three standard deviations
from the mean
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Normal distribution curve
The following diagram summarizes this information:
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Introduction
Arrangement of data
Histograms
Measure of central tendency
Dispersion
Frequency polygons
Frequency curves
Normal distribution curve
Standardized normal curve
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Introduction
Arrangement of data
Histograms
Measure of central tendency
Dispersion
Frequency polygons
Frequency curves
Normal distribution curve
Standardized normal curve
STROUD
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Standardized normal curve
The standardized normal curve is the same shape as the normal curve but the
axis of symmetry is the vertical axis; the horizontal axis carries a scale of
z-values where:
z
xx

and the area beneath the curve is 1.
Its equation is:
2
 ( z) 
STROUD
1  z2
e
2
Worked examples and exercises are in the text
Programme 28: Data handling and statistics
Learning outcomes
Distinguish between discrete and continuous data
Construct frequency and relative frequency tables for grouped and ungrouped discrete
data
Determine class boundaries, class intervals and central values for discrete and
continuous data
Construct a histogram and a frequency polygon
Determine the mean, median and mode of grouped and ungrouped data
Determine the range, variance and standard deviation of discrete data
Measure dispersion of data using the normal and standard normal curves.
STROUD
Worked examples and exercises are in the text