Statistics sampling and methods WBHS

Download Report

Transcript Statistics sampling and methods WBHS

A focus on Sampling and
Sampling Methods
Menu
Sampling Methods
Definitions
Measures of Centre
Assessment Tips
Measures of Spread
Practice Tasks
On Your Calculator
For clarification, click on any step you do not understand to see that
element broken down
The example used throughout this presentation is trying to find the
mean height of WBHS pupils
Sampling Methods
In this presentation you
will see a number of
sampling methods, their
benefits and drawbacks.
Simple Random Sample
Cluster Sampling
Systematic Sampling
Stratified Sampling
Note:
For more detailed instructions
on any of the example click on
the step you misunderstand
Measures of Central Tendency
In this presentation
you will learn how to
calculate a number of
measures of average
or centre, as well as
their benefits and
drawbacks
Mean
Median
Mode
Note:
For more detailed instructions
in any of the examples click
on the step you misunderstand
Measures of Spread
In this presentation you
will learn how to find a
number of measures of
spread as well as their
drawbacks and advantages.
You will also need to
decide which measure of
spread and which measure
of centre go together.
Standard Deviation
Interquartile Range
Range
Note:
For more detailed instructions
in any of the examples click
on the step you misunderstand
Simple Random Sample
The simplest unbiased
sample.
1- Number the entire
population.
2- Generate random
numbers.
3- Proceed until you have
as many as you need
ignoring any repeats.
Example (Heights of WBHS students)
1. Get a copy of the School Roll.
2. Number every person
3. Generate Random numbers from 1
to the maximum you need.
4. Proceed until you have the desired
sample size ignoring repeats.
Simple Random Sample
Advantages
Cheap
Easy to carry out
Unbiased
Disadvantages
May not represent strata
Needs an entire population
list
Cluster Sampling
1.
2.
3.
The easiest unbiased
sample.
Sort your data into
clusters based on location.
Randomly choose the
cluster.
Perform a simple random
sample on the chosen
cluster.
Example (Heights of WBHS students)
1. Get a copy of the School Roll.
2. Sort into clusters eg year levels
3. Randomly select the cluster.
4. Randomly generate a sample from
each cluster.
Care with clusters as Juniors are
much shorter than Seniors
Cluster Sampling
Advantages
Very Cheap
Very Easy to carry out
Unbiased
Disadvantages
Needs an entire population
list
Can be biased if clusters
strongly affect the
statistics.
Systematic Sampling
1.
2.
3.
4.
A relatively quick way to
pick an unbiased sample
List the entire population.
Decide on your step size
(Total ÷ Sample size = n).
Randomly generate a
starting point.
Step every nth data point
till you have your sample.
Example (Heights of WBHS students)
1. Get an alphabetical copy of the
School Roll.
2. Step Size = Total ÷ Sample size
3. Randomly generate a starting point.
4. Starting from the beginning use the
step size to pick the rest of the
sample
Systematic Sampling
Advantages
Cheap
Easy to Choose Sample
Unbiased
Disadvantages
Needs an entire population
list
If population list is ordered
then sample can become
biased
Stratified Sampling
1.
2.
3.
The most reliable
sampling method.
Sort the data into strata
based on information you
already know.
Calculate the proportions
for each strata.
Perform a Simple Random
Sample on each of the
strata.
Example (Heights of WBHS students)
1. Get a copy of the School Roll
separated into year levels.
2. Calculate the sample size for each
year group (strata).
3. Perform a simple random sample
on each year group to their specific
sample size.
Stratified Sampling
Advantages
Unbiased
Completely
representative of each
of the strata
Most reliable estimates
Disadvantages
Needs entire population
list
Information about entire
population needs to be
known beforehand
Time consuming
Generate a Random Number
1.
2.
3.
Decide on the starting
number (in this case 1)
Decide how many you
need (In the case of the
school 529 students)
Choose your calculator
Casio
Casio
FX-82
Graphic
Texas
Random Number on a Casio
Graphics Calculator
1.
2.
3.
Decide on the starting
number (in this case 1)
Decide how many you
need (In the case of the
school 529 students)
In Run Mode
Intg(529 × Ran# + 1)
Population size
or Strata size
Starting Value
F4
(
)
F6
OPTN
Intg OPTN – F6 – F4 – F5
Ran# OPTN – F6 – F3 – F4
On Screen
Intg(529 × Ran# + 1)
F3
7
8
5
1
×
+
Random Number on a Casio FX - 82
Decide on the starting
number (in this case 1)
Decide how many you
need (In the case of the
school 529 students)
Ran# = 2nd function ·
On screen
1.
2.
3.
4.
RAN#×529+1
shift
Ran# × 529 + 1 =
Population size
or strata size
note
Starting value
Ignore any decimal in the
answer
·
Random Number on a Texas
Decide on the starting
number (in this case 1)
Decide how many you
need (In the case of the
school 529 students)
1.
2.
RANDI PRB →
,
3.
RANDI
2nd Function )
On Screen
RANDI(1 , 529)
Starting Value
End Value
RANDI(1,529)
2nd
PRB
)
Simple Random Sample
1.
2.
3.
The simplest unbiased
sample.
Number the entire
population.
Generate random
numbers.
Proceed until you
have as many as you
need ignoring any
repeats.
Example (Heights of WBHS students)
1. Get a copy of the School Roll.
2. Number every person from 1 (to
529)
3. Generate Random numbers from 1
to the maximum you need (529).
4. Proceed until you have the desired
sample size ignoring repeats.
Strata Proportions
1.
2.
Number of people
in strata divided by
total in population.
Multiplied by
number of people
wanted in total
sample.
Example (Heights of WBHS students)
1. 529 people on School Roll.
2. 115 year 10’s
3. Sample size of 30
4. So year 10 sample size
115 ÷ 529 × 30 = 6.52
So take 7 year 10 students
Systematic Step Sizes
1.
Number of people
in population
divided by Sample
Size
Example (Heights of WBHS students)
1. 529 people on School Roll.
2. Sample size of 30
3. So Step size
529 ÷ 30 = 17.63333
So take every 17th student from the
starting position
Systematic Stepping
1.
Starting at the
random start point
step out till you get
desired sample size.
Example (Heights of WBHS students)
1. Random starting point 803, step
size 29
2. 803rd student on alphabetical list is
where we start.
3. Then 832nd student, 861st student,
we have now reached the end of the
roll so start at the beginning 890=
15th student then 45th student…
Mean
1.
2.
Add up all of the
values in the sample.
Divide by the sample
size.
Calculator Method
Advantages
Disadvantages
Easy to calculate for large
samples.
Affected by outliers
Accurate and well understood
Median
1.
2.
List all the values in
order.
Find the central value
Advantages
Disadvantages
Accurate
Not so widely known as an
average
Not affected much by Outliers
Time consuming to list large
sample in order
Mode
1.
2.
List all the values
Find the most common
item
Advantages
Disadvantages
Can calculate mode for data that
is not numeric or ordered
Can be inaccurate for numeric or
data that can be ordered
Not affected much by Outliers
Very easy to calculate
Statistics on a Calculator
Choose your calculator
Casio
FX-82
Casio
Graphic
Texas
Statistics on a
Casio Graphics Calculator
1.
2.
3.
4.
5.
6.
7.
In Stat Mode
In list 1 enter all data values
In list 2 enter their
frequencies
F2 (CALC)
F6 (SET) Should read
Exit
F1 (1VAR)
(All Statistics are listed χ is mean, χσn is std. dev.)
S.D. using table
1Var XList
1Var Freq
2Var XList
2Var YList
2Var Freq
F1
:List1
:List2
:List3
:List4
:List5
F2
F6
EXIT
Entering Data on
Casio Graphics Calculator
Enter each data value
in List 1 followed by
EXE
1
2
3
4
5
List 1 List 2 List 3 List4
Enter the frequency of
each data value in List 2
followed by EXE
Note
If all of the frequencies are
1 then you don’t need to
enter the frequencies.
In the Set Menu change the 1Var Freq
to 1 instead of list 2
EXE
Statistics on a
Casio FX 82 Calculator
Put your calculator into
statistics mode
1.
•
Mode 2
Clear the statistics memory
2.
•
Shift
Scl
mode
clr all
1
2
3
shift
Mode 1 Shown on Screen
Enter the data carefully
3.
M+
•
180cm
M+
Calculate desired statistics
4.
•
Shift
1.
2.
S.D. using table
2
χ
mean
χσn standard deviation
mode
Entering Data on
Casio FX 82 Calculator
n=
1
Enter each data value
followed by M+
‘n’ is the number of data
values that you have entered
M+
Note
Be very careful entering the
data values as you cannot
review them later to make
sure that they are correct.
Statistics on a
Texas Calculator
1.
Put your calculator into
statistics mode
1.
2.
2.
DATA
n
x Sx σx
Enter the data carefully
1.
3.
2nd Function
1 - VAR
DATA
2nd
Calculate desired statistics
1.
2.
STATVAR
Shift between statistics with arrow
keys
1.
2.
3.
S.D. using table
n
χ
σχ
number of data values
mean
standard deviation
DATA STATVAR
Entering Data on a
Texas Calculator
Press the Data Key
to begin
Begin entering data.
X1 is the data value
Followed by the down arrow
X1 = 180
2nd
DATA
Freq1 is that data values frequency
Followed by the down arrow
X2 is next then Freq2
To check data use up arrow
Definitions
•
Population
•
Census
Sample
Parameters
•
•
The entire list of those people or things that you wish
to sample
A survey of an entire population
A small group of a population
Facts about an entire population gained from a census
(Notation: mean ‘μ’ or standard deviation ‘σ’)
•
Statistics
Estimates of population parameters calculated from a
sample
(Notation: mean ‘χ’ or standard deviation ‘s’)
•
Representative
•
Bias
A sample that appears to represent all elements of the
in the correct proportions population
A sampling method that does not give every element of
the population an equal chance of selection
Standard Deviation
•
•
This is a calculation of the
average difference between
the data values and the
mean.
This measure of spread
applies to the mean.
Use Calculator to Calculate
Use table to calculate
Advantages
Disadvantages
Easy to calculate for large
samples on calculator.
Affected by outliers
Accurate
Very useful for certain types of
data
Possibly not so well understood
Interquartile Range
1.
2.
3.
Calculate the upper and
lower quartiles.
Upper quartile minus lower
quartile.
This measure of spread
applies to the median
Advantages
Disadvantages
Well understood
Easy to calculate for large
samples.
Unaffected by outliers
Range
Find the highest and lowest
value.
2.
Highest value minus the
lowest value.
3.
This measure of spread
applies to all measures of
centre.
Advantages
Disadvantages
1.
Well understood
Unaffected by outliers
Easy to calculate for large
samples.
Standard Deviation by Table
Mean
Calculated as usual,
doesn’t change
Data Values
From your sample
or census
χ
χ
Data values minus the Mean
χ–χ
(χ – χ)2
180
165
15
225
150
165
-15
225
165
165
0
0
170
165
5
25
160
165
-5
25
0
500
Total
825
Mean
165
100
Use Calculator to Calculate
Square of each of
the values to the left
Final Standard Deviation
is the square root of this
value so s = 10
Calculating Quartiles
1.
2.
3.
4.
5.
List all the values in order.
Find the central value
Discard that central value
Find the central value of the
remaining two halves.
These 2 numbers are the
upper and lower quartiles
Example (Heights of WBHS students)
1. Data Values
165, 170, 173, 180, 182, 183, 191, 192
2. Central value middle of 180 and 182
so median is 181
3. Discard 181 and calculate middle of
each half.
4. 165, 170, 173, 180//182, 183, 191, 192
Lower quartile
171
Upper quartile
187
Things to Consider
Is my sample representative of the population?
•
Need to consider whether any strata present in the data are
represented in approximately the correct proportions.
•
Need to consider the presence of any apparent outliers in the sample
chosen, and the effect they will have on estimates of population
parameters.
Things to Consider
Is my sample representative of the population?
•
Estimates are more reliable when taken from a large sample as the
effects of outliers are lessened.
•
Consider the size of the s.d.
A larger value of s suggests considerable variation in the data
values. Thus taking another sample could produce quite different
statistics.
•
Ask yourself, “If I were to repeat this sampling process, would I get
the same results?”
Things to Consider
How could I improve my sampling method?
•
Need to choose a sampling method which eliminates bias, and which
gives the best chance of choosing a representative sample. (Bias
exists when some of the population members have greater or lesser
chance of being included in the sample.)
•
Need to discuss which statistics would give the best estimates of
population parameters, including the effect of outliers.
Things to Consider
Would I get the same or similar results if I repeated
the same process?
•
Are there outliers or extreme values that may affect the sample
statistics? If so then I probably wouldn’t get similar results.
•
Is the standard deviation (or measure of spread) large when
compared to the mean, if it is then repeating the same results is
unlikely.
Things to Consider
When answering question or stating conclusions;
•
Answers need to be precise and refer to actual data values present in
the sample and/or population.
•
Strata must be clearly defined.
•
Answers cannot be vague or rote-learnt without referring specifically
to the context of the assessment.
•
Students must be very clear that the sample statistics are ESTIMATES
of the population parameters.
•
They must NOT state that the population mean is … unless they have
taken a census of the whole population!
Practice Tasks
Real Estate Stats
On Your Calculator
In this part of the
presentation you can
check on exactly how
to use your calculator
effectively to help with
Statistics
Generating Random Numbers
Entering Data
Calculating Statistics
Note:
For more detailed instructions
on any of the example click on
the step you misunderstand
Entering Data on a Calculator
Choose your calculator
Casio
FX-82
Casio
Graphic
Texas
Statistics on a Calculator
Choose your calculator
Casio
FX-82
Casio
Graphic
Texas
Statistics on a
Casio Graphics Calculator
1.
2.
3.
4.
5.
6.
7.
In Stat Mode
In list 1 enter all data values
In list 2 enter their
frequencies
F2 (CALC)
F6 (SET) Should read
Exit
F1 (1VAR)
(All Statistics are listed χ is mean, χσn is std. dev.)
S.D. using table
1Var XList
1Var Freq
2Var XList
2Var YList
2Var Freq
F1
:List1
:List2
:List3
:List4
:List5
F2
F6
EXIT
Entering Data on
Casio Graphics Calculator
Enter each data value
in List 1 followed by
EXE
1
2
3
4
5
List 1 List 2 List 3 List4
Enter the frequency of
each data value in List 2
followed by EXE
Note
If all of the frequencies are
1 then you don’t need to
enter the frequencies.
In the Set Menu change the 1Var Freq
to 1 instead of list 2
EXE
Statistics on a
Casio FX 82 Calculator
Put your calculator into
statistics mode
1.
•
Mode 2
Clear the statistics memory
2.
•
Shift
Scl
mode
clr all
1
2
3
shift
Mode 1 Shown on Screen
Enter the data carefully
3.
M+
•
180cm
M+
Calculate desired statistics
4.
•
Shift
1.
2.
S.D. using table
2
χ
mean
χσn standard deviation
mode
Entering Data on
Casio FX 82 Calculator
n=
1
Enter each data value
followed by M+
‘n’ is the number of data
values that you have entered
M+
Note
Be very careful entering the
data values as you cannot
review them later to make
sure that they are correct.
Statistics on a
Texas Calculator
1.
Put your calculator into
statistics mode
1.
2.
2.
DATA
n
x Sx σx
Enter the data carefully
1.
3.
2nd Function
1 - VAR
DATA
2nd
Calculate desired statistics
1.
2.
STATVAR
Shift between statistics with arrow
keys
1.
2.
3.
S.D. using table
n
χ
σχ
number of data values
mean
standard deviation
DATA STATVAR
Entering Data on a
Texas Calculator
Press the Data Key
to begin
Begin entering data.
X1 is the data value
Followed by the down arrow
X1 = 180
2nd
DATA
Freq1 is that data values frequency
Followed by the down arrow
X2 is next then Freq2
To check data use up arrow
Generate a Random Number
1.
2.
3.
Decide on the starting
number (in this case 1)
Decide how many you
need (In the case of the
school 529 students)
Choose your calculator
Casio
Casio
FX-82
Graphic
Texas
Random Number on a Casio
Graphics Calculator
1.
2.
3.
Decide on the starting
number (in this case 1)
Decide how many you
need (In the case of the
school 529 students)
In Run Mode
Intg(529 × Ran# + 1)
Population size
or Strata size
Starting Value
F4
(
)
F6
OPTN
Intg OPTN – F6 – F4 – F5
Ran# OPTN – F6 – F3 – F4
On Screen
Intg(529 × Ran# + 1)
F3
7
8
5
1
×
+
Random Number on a Casio FX - 82
Decide on the starting
number (in this case 1)
Decide how many you
need (In the case of the
school 529 students)
Ran# = 2nd function ·
On screen
1.
2.
3.
4.
RAN#×529+1
shift
Ran# × 529 + 1 =
Population size
or strata size
note
Starting value
Ignore any decimal in the
answer
·
Random Number on a Texas
Decide on the starting
number (in this case 1)
Decide how many you
need (In the case of the
school 529 students)
1.
2.
RANDI PRB →
,
3.
RANDI
2nd Function )
On Screen
RANDI(1 , 529)
Starting Value
End Value
RANDI(1,529)
2nd
PRB
)