Lesson One Summary Statistics File

Download Report

Transcript Lesson One Summary Statistics File

Summary Statistics
One of the main purposes of statistics is to draw
conclusions about a (usually large) population from a
(usually small) sample of observed values.
Population – The collection of all individuals, items or data
under consideration in a statistical study.
Sample – That part of the population from which information is
collected.
Descriptive Statistics Methods of organizing and summarizing information in a clear and
effective way.
Inferential Statistics Methods of drawing conclusions about a population based
on information obtained from a sample of the population.
Parameter –
Statistic -
A descriptive measure for a population.
A descriptive measure for a sample.
Summary statistics for Discrete Data
The middle value, after the observations have been
Median: arranged in order of magnitude. If the total number of
values is even, then the median would be halfway
between the middle two values.
Mode:
The set of data which occur most frequently.
For grouped data, the term modal class is used.
Mean:
The Mean of the Population is generally unknown.
The Sample Mean serves as an unbiased estimate of the
Population Mean.
Population Mean
Sample Mean
k
k


i 1
fi xi
n
x
fx
i i
i 1
n
The shoe sizes of the members of a football team are:
10, 10, 8, 11, 10, 9, 9, 10, 11, 9, 10
Determine the mean, median and mode.
Answer: 9.73, 10, 10
Mean of Grouped Data
k
x
f
j 1
n
j
xj
where each value of x is the midpoint
of each class.
Each day, x, the number of diners in a restaurant was recorded and the
following grouped frequency table was obtained.
x
16-20
21-25
26-30
31-35
36-40
Number of
Diners
67
74
38
39
42
Using the above grouped data, find an estimate of the mean.
Answer: 26.4
Find the mean for the following set of data:
Interval
20-24
25-29
30-34
35-39
40-44
Answer: 31
Frequency
1
6
10
2
1
A test marked out of 100 is written by 800 students. The cumulative
frequency graph for the marks is given below.
(a) Write down the
number of students
who scored 40
marks or less on the
test.
(b) The middle 50 %
of test results lie
between marks a
and b, where a < b.
Find a and b.
Answers: a) 100
SPEC06/HL1/3
b) a = 75, b = 55
The heights of 60 children entering a school were measured. The
following cumulative frequency graph illustrates the data obtained.
Estimate
(a) the median height;
Answers: Median = 1.04, Mean = 1.05
(b) the mean height.
M04/HL1/15
The box and whisker plots shown represents the heights of female
students and the heights of male students at a certain school.
(a) What percentage of female students are shorter than any male
students?
(b) What percentage of male students are shorter than some female
students?
(c) From the diagram, estimate the mean height of the male students.
Answer: (a) 25%
(b) 75%
(c) 172 cm
N00/HL1/4
A die is rolled twenty times with the following results:
Outcome
1
2
3
4
5
6
Frequency
2
4
A
7
2
B
Given that the mean is 3.6, find the values of a and b.
Answer: a = 2, b = 3
Variance
The Variance tells us about how far the data lies from the Mean.
Population Variance
k
 
2
 f (x  )
i 1
i
i
n
k
2

fx
2
i i
i 1
n

2
The population variance is generally unknown. The Sample Variance is
slightly skewed and is biased. That is to say that the sample variance is
different from the Population Variance.
Sample Variance
k
s 
2
n
 f ( x  x)
i 1
i
k
2
i
n

fx
2
i i
i 1
n
x
2
Therefore, we also have an unbiased estimate of the Population Variance.
Unbiased Estimate of the Population Variance
k
2
n 1
s
n 2

sn 
n 1

i 1
fi ( xi  x)2
n 1
k

fi xi2
n 2


x
n 1
n 1
i 1
Standard Deviation:
The Standard Deviation is the square root of the variance.
Sample Standard
Deviation
Population Standard
Deviation
k
k

 f (x  )
i 1
i
i
n
2
sn 
 f ( x  x)
i 1
i
i
n
2
The nine planets of the solar system have approximate
equatorial diameters (in thousands of km) as follows:
4.9, 12.1, 12.8, 6.8, 142.8, 120.0, 52.4, 49.5, 2.5
Determine the standard deviation of these diameters.
Answer: 49.7
Find the standard deviation for the following set of data:
Interval
1-7
8-14
15-21
22-28
Answer: 7.01
Frequency
4
5
10
6
A grouped data for the number of days to maturity for 40 short-term
investments is given below. Compute the sample mean and
standard deviation.
Days to maturity Frequency
30-39
3
40-49
1
50-59
8
60-69
10
70-79
7
80-89
7
90-99
4
40
Answer:
Mean = 68.0 days
Standard Deviation = 16.2 days
Find the unbiased standard
deviation:
Answer:
Unbiased
Standard Deviation = 16.4 days
A machine tests the distance, w, measured in thousands of km, that
car tires travel before the tire wear reaches a critical amount. For a
random sample of tires, the results are summarizes below:
0  w  25
12
25  w  30
23
Answers:
30700 km
30  w  35
48
35  w  45
15
45  w  60
3
70900
(a) Find the grouped mean for this data
(b) Find the unbiased estimate of the variance based on the
grouped data.
A sample of 70 batteries were tested to how long they last. The results were:
Determine:
(a) the sample standard
deviation
(b) an unbiased estimate of
the standard deviation from
which this sample is taken.
Answers:
(a) 21.4 hours
(b) 21.6 hours
M00/HL1/4
For a set of 9 numbers
of the numbers.
2
2
(
x

x
)

60
and
x

  285 . Find the mean
Answer: mean = 5
For a given frequency distribution:
2
f
(
x

x
)
 182.3,

find:
 fx.
Answer: 159
2
fx
  1025,  f  30
A teacher drives to school. She records the time taken on each of 20
randomly chosen days. She finds that
20
x
i 1
i
 626 and
20
x
i 1
2
i
 19780.8
where xi denotes the time, in minutes, taken on the ith day.
Calculate an unbiased estimate of
(a) the mean time taken to drive to school;
(b) the variance of the time taken to drive to school.
Answers:
(a) 31.3
(b) 9.84
M03/HL1/19
Chebychev’s Rule
Property 1: At least 75% of the data lie within two standard
deviations to either side of the mean.
Property 2: At least 89% of the data lie within three standard
deviations to either side of the mean.
Property 3: In general, for any number k > 1, at least
1  k12
of the data lies within k standard deviations to either side of the
mean.
Z score: The z-score for a data value is the number of standard
deviations that the data value is away from the mean.
Sample z-score
xx
z
s
Population z-score
z
x

It is known that a coffee machine dispenses an average of 6 fluid
ounces of coffee with a standard deviation of 0.2 fluid ounces. A
cup of coffee dispensed from the machine is found to contain 7.1
fluid ounces of coffee. Determine and interpret the z-score for
this cup of coffee. Does this cup of coffee contain an unusually
large amount?
Answer:
The z-score is 5.5, or 5.5 standard deviations above the mean.
This cup contains more coffee than 96.7% of all cups
dispensed, it is a large amount.
Two major topics of inferential statistics:
1. Using the sample mean to make inferences about
the population mean.
2. Using the sample standard deviation to make
inferences about the population standard deviation.