Introduction to Statistical Inference

Download Report

Transcript Introduction to Statistical Inference

Introduction to
Statistical Inference
(Session 03)
SADC Course in Statistics
Learning Objectives
By the end of this session, you will be able to
• explain what is meant by statistical
inference
• explain what is meant by an estimate of a
population parameter
• explain what is meant by the sampling
distribution of an estimate
• calculate and interpret the standard error
of a sample mean from data of a simple
random sample
To put your footer here go to View > Header and Footer
2
What is statistical inference?
• Inference is about drawing conclusions
about population characteristics using
information gathered from the sample
• It will be assumed for the remainder of this
module that the sample is representative
of the population
• We shall further assume that the sample
has been drawn as a simple random
sample from an infinite population
To put your footer here go to View > Header and Footer
3
Estimating population parameters
Population
Sample
Mean

Variance
2
s2
Std. deviation

s
x
• Population characteristics (parameters) are
unknown, so use greek letters to denote
population mean and standard deviation
• Sample characteristics are measurable and
known, so use latin letters. They form
estimates of the population values.
To put your footer here go to View > Header and Footer
4
An example of statistical inference
• What is the mean land holding size owned
by rural households in district Kilindi in the
Tanga region of Tanzania?
• Data from 404 households surveyed in this
district gave a mean land holding size of
7.62 acres with a standard deviation 6.81.
• Our best estimate of the mean landholding
size in Kilindi district is therefore 7.62 acres.
What results are likely if we sampled
again with a different set of households?
To put your footer here go to View > Header and Footer
5
A brief return to Practical 2…
• In practical 2, you sampled 5 Uganda
districts twice. Look back at the mean and
standard deviation of each sample.
• You will notice the answers are different
each time you sample, i.e. there is
variability in the sample means.
• If we took many more samples, we could
produce a histogram of the means of these
samples.
An example follows…
To put your footer here go to View > Header and Footer
6
The distribution of means
• Suppose 10 University students were given
a standard meal and the time taken to
consume the meal was recorded for each.
• Suppose the 10 values gave:
mean = 11.24, with std.dev.= 0.864
• Let’s assume this exercise was repeated 50
times with different samples of students
• A histogram of the resulting 500 obs.
appears below, followed by a histogram of
the 50 means from each sample
To put your footer here go to View > Header and Footer
7
Histogram of raw data
The data appear
to follow a
normal
distribution
To put your footer here go to View > Header and Footer
8
Histogram of the 50 sample means
The distn of the
sample means
is called its
Sampling
Distribution
Notice that the
variability of the
above distn is
smaller than the
variability of the
raw data
To put your footer here go to View > Header and Footer
9
Back to estimation…
The estimate of the mean landholding size in
Kilindi district is 7.62 acres.
Is this sufficient for reporting purposes, given
that this answer is based on one particular
sample?
What we have is an estimate based on a
sample of size 404. But how good is this
estimate?
We need a measure of the precision, i.e.
variability, of this estimate…
To put your footer here go to View > Header and Footer
10
Sampling Variability
The accuracy of the sample mean x
estimate of  depends on:
as an
(i) the sample size (n)
since the more data we collect, the more we
know about the population, and the
(ii) inherent variability in the data 2
These two quantities must enter the measure
of precision of any estimate of a population
parameter. We aim for high precision, i.e.
low standard error!
To put your footer here go to View > Header and Footer
11
Standard error of the mean
Precision of x as estimate of  is given by:
the standard error of the mean.
s.e. x   
n
– Also written as s.e.m., or sometimes s.e.
Estimate using sample data: s/n
For example on landholding size,
s.e.=6.81/404 = 6.81/20.1 = 0.339
To put your footer here go to View > Header and Footer
12
Summary
If we had repeated samples (same size)
taken from the same population:
 sample means would vary
 standard error of the mean is a measure
of variability of sample means over
(hypothetically drawn) repeated samples
 distribution of sample means over
repeated samples is called the sampling
distribution of the mean,
x ~ N(, 2/n)
 The lower the value of the standard error,
the greater is the precision of the estimate
To put your footer here go to View > Header and Footer
13
References
SSC (2000b) Confidence and Significance: Key
Concepts of Inferential Statistics. Statistical
Guidelines Series supporting DFID Natural Resources
Projects, Statistical Services Centre, The University
of Reading, UK.
www.reading.ac.uk/ssc/publications/guides.html
Owen, F. and Jones, R. (1990). Statistics. 3rd edn.
Pitman Publishing, London, pp 480.
Clarke, G.M. and Cooke, D. (2004). A Basic Course
in Statistics. 5th edn. Edward Arnold.
To put your footer here go to View > Header and Footer
14
Practical work follows to ensure
learning objectives are
achieved…
To put your footer here go to View > Header and Footer
15