Slides - Dr Frost Maths

Download Report

Transcript Slides - Dr Frost Maths

S2 Chapter 6: Populations and
Samples
Dr J Frost ([email protected])
www.drfrostmaths.com
Last modified: 2nd November 2015
Populations and samples
Population
Sample
A population is:
A sample is:
the full collection of people or things.
?
some subset of the population intended
? to represent the population.
Advantages of sampling
Data obtained from all
members of the
population is known as
a census.
?
β€’
β€’
Cheaper/quicker than taking a census.
Useful when testing of items results in their
destruction (e.g. life-time of light bulb)
?
Disadvantages of sampling
β€’
β€’
Potential for bias.
Natural variation between any two samples due to
variation in data.
?
Sampling key terms
Sample
! Each individual thing
in the population that
can be sampled is known
as a sampling unit.
! The list of all those
within the population
that can be sampled is
known as the sampling
frame.
Random sampling
!
Suppose that the heights of people in a population are represented using a random
variable 𝑋, where 𝑋 is (as you might expect), normally distributed, e.g. 𝑋~𝑁 1.5, 0.3
𝑓 π‘₯
Bro Helping Hand: This might conceptually seem
confusing as a population is a list of things. The
population can be represented as a distribution
where the outcomes are possible samples.
For example, if a population is all possible lottery
tickets, then the distribution representing it is a
uniform distribution whose outcomes are all the
possible tickets.
π‘₯ β„Žπ‘’π‘–π‘”β„Žπ‘‘
We want a sample with 𝒏 things in it.
How could we represent the possible choice of
1st member of our sample?
A random variable 𝑋1
?
where 𝑋1 ~𝑁 1.5, 0.3
Bro Helping Hand: Notice we’re representing the possible choice of the item in the sample, not the item itself.
𝑋1 must have the same distribution as 𝑋, because our sample item is drawn from the population.
How could we represent the possible choice of
𝑛th member of our sample?
A random variable 𝑋𝑛
where 𝑋𝑛 ~𝑁?1.5, 0.3
Random sampling
! A simple random sample, of size 𝑛, is one taken so that every possible sample of
size 𝑛 has an equal chance of being selected.
It consists of the observations 𝑋1 , 𝑋2 , … , 𝑋𝑛 from a population where each 𝑋𝑖 :
β€’ Are independent random variables
β€’ Have the same distribution as the population.
This means for example that if the first
person chosen for our sample is Indian,
that doesn’t make it any less or more
likely our second choice will be Indian,
i.e. our second choice is independent of
the first.
This will all become a lot clearer once we do an
example…
Random sampling
We might wish to calculate some numerical property of a population or a
sample, e.g. mean, variance, mode, range.
! A population parameter is a quantity calculated from the population.
! A statistic is a quantity calculated (solely) from the observations in a sample.
𝑋2 +𝑋5 +𝑋8
is a statistic (the average of the 2nd, 5th and 8th items in the sample)
3
Σ𝑋 2
Σ𝑋 2
Σ𝑋
βˆ’
is a statistic. But βˆ’ πœ‡ 2 is not as it involves the population mean πœ‡,
𝑛
𝑛
𝑛
e.g.
which is not
known purely from the sample.
The idea of a statistic is that the we hope it resembles the equivalent population
parameter. For example, if we’re trying to find the mean age in England, we
might take a sample, calculate the sample mean age 𝑋, and hope this represents
the β€˜true’ unknown population mean age πœ‡β€¦
(Recall that sample mean = 𝑋 and population mean = πœ‡)
Sampling Distribution of a Statistic
! The sampling distribution of a statistic gives all the values of a statistic and the
probability that each would happen by chance alone.
0
1
1
0
1
1
2
0
0
0
Suppose we had 10 families which form the population of an island (The Isle of Bob), for which we
know the number of children in each family. Suppose we took a (very small!) sample of 2 families.
Statistics for this sample could be the mode number of children, median, maximum, mean, …
Sampling distribution
#3: Thus we now have a
for sample maximum.
π‘€π‘’π‘‘π‘–π‘Žπ‘›
π‘€π‘Žπ‘₯values of
𝑃 𝑋1 , 𝑋2 distribution
𝑋1 𝑋2
over possible
Max 𝑴 𝑷(𝑴)
statistics0?
across all0possible
?
0.5 × 0.5?=the
0.25
0
0
we 0.5
could
have1?
had, i.e.
0
0.25
0.5 × 0.4?=samples
0.2
0
1
?
?
β€˜sampling
2?
1?distribution’.
0.5 × 0.1?=the
0.05
0
2
1
0.56
1
0.5
0.4 × 0.5?= 0.2
1
0
?
?
Possible
2
0.19
1?
1?
0.4 × 0.4?= 0.16
1
1
Note: Because each thing in the sample is
2?
Samples?
1.5
0.4 × 0.1?= 0.04
1
2
?
independently drawn from the population,
2
1
Let’s
0.1 ×we
0.5did.
= 0.05
2 reflect
0 on what
we technically have sampling with
replacement,
and hence the same item could
2?
1.5
#2:
in
some
#1:
0.1all
× possible
0.4?= 0.04
2 We considered
1
? We’re interested
be in the sample twice. In practice however
2statistic for2 each sample
(let’syou
saywon’t have to worry
× 0.1 = 0.01
samples,
probability
of
2
2and the0.1
(and in exams)
about this, as the population in exams is
the sample maximum)
each sample occurring.
assumed to be infinitely large.
Exam Example
Key Points:
Edexcel S2 May 2013 Q1
a) Ensure you don’t forget possibilities
through other possible orderings, etc.
b) If we know all the possible values
of the statistic, we can find the
probability of the last by just
subtracting from 1 (as it’s a
probability distribution!)
a As per tip, ordering matters!
(1p, 5p, 5p), (5p, 1p, 5p), (5p, 5p, 1p)
?
(2p, 5p, 5p), (5p, 2p, 5p), (5p, 5p, 2p)
(5p, 5p, 5p)
is 3 × 0.5 × 0.32 = 0.135
b For first three possibilities, probability
2
3
For next three: 3 × 0.2 × 0.3 = 0.054
𝑃(𝑀 = 5) = 0.216
c
?
Last: 0.3 = 0.027
Possible values of the statistic 𝑀 (the median) is 1p, 2p, 5p.
𝑃 𝑀 = 1 = 3 × 0.52 × 0.2 + 3 × 0.52 × 0.3 + 0.53 = 0.5
𝑃 𝑀 = 2 = 1 βˆ’ 0.5 βˆ’ 0.216 = 0.284
(since this is the only other possibility)
?
π‘š
𝑃 𝑀=π‘š
1
2
5
0.5
0.284
0.216
Test Your Understanding
Edexcel S2 June 2007 Q4
Step 1: List possible samples (and
statistic for each if possible).
Step 2: Use this to work out the
probability of obtaining each value of
the statistic.
?
π‘š
𝑃 𝑀=π‘š
5
10
0.15625
0.84375
Sampling Distribution by Inspection
Sometimes it is not practical to list out all the possible samples, but we can tell what
the sampling distribution is by thinking about what the statistic represents.
A school wishes to introduce a school uniform and is seeking to find out the
Q support this idea has among the students at the school. The random variable 𝑋 is
defined as:
1, 𝑖𝑓 𝑠𝑑𝑒𝑑𝑒𝑛𝑑 π‘€π‘œπ‘’π‘™π‘‘ π‘ π‘’π‘π‘π‘œπ‘Ÿπ‘‘ π‘–π‘‘π‘’π‘Ž
𝑋=
0,
π‘œπ‘‘β„Žπ‘’π‘Ÿπ‘€π‘–π‘ π‘’
a. Suggest a suitable population and the parameter of interest.
b. A random sample of 15 students is asked if they would support the idea. The
random sample is represented by 𝑋1 , 𝑋2 , … , 𝑋15 .
Write down the sampling distribution of the Statistic π‘Œ = 15
𝑖=1 𝑋𝑖
a
The population is the responses of the people, represented as 0s and 1s (note, it is not the
people themselves!)
Population parameter of interest (based on the original question based by the school) is the
proportion 𝑝 of students who support the idea.
?
b
Think what π‘Œ actually represents…
π‘Œ =the number of students who support the idea. Since sample is random, observations are
independent, each with a constant probability 𝑝 of β€œsuccess”. These are conditions for a
Binomial Distribution! 𝒀~𝑩 πŸπŸ“, 𝒑
?
More Wordy Questions
Bob has a cupcake factory. He is trying to establish the proportion of cupcakes that are
poisonous; he assumes 15%. He has ID numbers for all the cupcakes.
He takes a sample of 20 cupcakes.
Bro Note: The mark schemes
a) Identify the sampling frame.
The list of id numbers of all the cupcakes.
?
likes the idea of a β€˜list’ and the
idea that things in the sampling
frame can be clearly identified.
b) Identify the sampling distribution of the number of poisonous cupcakes in the
sample.
If π‘ͺ is the number of poisonous cupcakes in the sample, then π‘ͺ~𝑩(𝟐𝟎, 𝟎. πŸπŸ“).
?
Exercise 6B
1
A forester wants to estimate the height of the
trees in a forest. He measures the heights of 50
randomly selected trees and works out the
mean height. Is this a statistic?
Yes as it is based only on the sample.
?
A bag contains a large number of coins. 50%
are 50p coins. 25% are 20p coins, 25% are 10p
coins.
2
a Find the mean πœ‡ and variance 𝜎 for the value
of this population of coins.
𝝈𝟐 = πŸ‘πŸπŸ–. πŸ•πŸ“,
𝝁 = πŸ‘πŸ. πŸ“
5
?
A random sample of 2 coins is chosen from the
bag.
b List all the possible samples that can be
7
A supermarket sells a large number of 3litre and 2-litre cartons of milk. They are
sold in the ratio 3:2.
Find the mean and variance of the milk
content of this population of cartons.
A random sample of 3 cartons is taken
from the shelves (𝑋1 , 𝑋2 , 𝑋3 ).
List all of the possible samples.
Find the sampling distribution of the mean
𝑋.
Find the sampling distribution of the mode
𝑀.
Find the sampling distribution of the
median 𝑁 of these samples.
chosen.
πŸ“πŸŽ, πŸ“πŸŽ , πŸ“πŸŽ, 𝟐𝟎 , 𝟐𝟎, πŸ“πŸŽ , πŸ“πŸŽ, 𝟏𝟎 , 𝟏𝟎, πŸ“πŸŽ ,
𝟐𝟎, 𝟐𝟎 , 𝟐𝟎, 𝟏𝟎 , 𝟏𝟎, 𝟐𝟎 , 𝟏𝟎, 𝟏𝟎
Find the sampling distribution for the mean
𝑋 +𝑋
𝑋= 1 2
?
c
2
π‘š
𝑃 𝑀=π‘š
50
35
30
20
15
10
0.25
0.25
0.25
0.0625
0.125
0.0625
?
Continue onto Exercise 6C if
you’re done.