1387916151GEOSTAT_5 - The State University of Zanzibar
Download
Report
Transcript 1387916151GEOSTAT_5 - The State University of Zanzibar
GEOGRAPHICAL STATISTICS
GE 2110
Zakaria A. Khamis
INFERENTIAL STATISTICS
• Consider that you interview 50 people in the Zanzibar town, and
ask them ‘how far they commute to work?’ Then you compute
the MEAN from the data collected
• The sample mean provides us with both a simple summary
measure and our best estimate for what the true average
commuting distance is for the entire town
• What will happen if we interview another sample of other 50
people?
• If we interview other 50 people, we would likely come up with
different estimate
ESTIMATES
• Samples are used to obtain estimates of numerical
characteristics of populations, where it is impractical to measure
or count the whole population
• It is conventional to use the term ‘population parameter’ to
denote a numerical characteristic of a population and ‘sample
statistic’ to denote a numerical characteristic of a sample
• A sample statistic generally provides the only estimate available
of the corresponding population parameter
• The most commonly used parameters and statistics are the
mean and standard deviation
Population Parameter
Sample Statistic
Number of Variates
N
n
Mean
μ
Standard Deviation
σ
x
s
SAMPLING DISTRIBUTIONS AND
STANDARD ERROR
• It is unlikely that any one sample, no matter how unbiased, will
yield values of sample mean and standard deviation which will
give precise estimates of the corresponding population
parameters of mean and standard deviation
• Indeed, if one took a large number of samples from the same
population, it is probable that each would produce a different
mean and standard deviation
• A distribution of such sample statistics – means of samples – is
known as ‘a sampling distribution’
SAMPLING DISTRIBUTIONS AND
STANDARD ERROR
• If the samples are all unbiased, it is reasonable to expect that their
means will be distributed symmetrically about the corresponding
population mean
• Some sample means will be bigger than the population mean, and
a roughly equal number will be smaller
• The situation is the same for other sample statistics like standard
deviation
• Now in practice we do not take a number of samples but only one,
and we don’t know the population parameters but can only estimate
them from one sample
SAMPLING DISTRIBUTIONS AND
STANDARD ERROR
• Nevertheless, by hypothesizing sampling distribution based upon a
large number of number of imaginary samples, mathematicians have
been able to provide us with a measure of the accuracy of parameter
estimates based on single real sample
• Central Limit theorem states that if we imagine taking all possible
samples of similar size from a single population, the sampling
distribution of their means will be approximately normally distributed
about the population mean, whatever the character of the population
distribution, provided that the samples are fairly large
• The standard deviation of a sampling distribution is called the
standard error of the sample statistics
SAMPLING DISTRIBUTIONS AND
STANDARD ERROR
• The standard deviation of the sampling distribution of means is given
by
n
• Where σ is the standard deviation of the population and n is the
sample size
• The standard deviation of a sampling distribution is called Standard
Error (S.E) of the sample statistics. If the statistics is the mean, this
will be referred to as the S.E of mean
S .E
n
SAMPLING DISTRIBUTIONS AND
STANDARD ERROR
• Standard Error is the key to accurate estimation of population
parameters from sample statistics
• Suppose that we wish to estimate the average income of people in
the Urban-West region
• We take a random sample of 100 and find that
s 10
• The value of
questions are:-
x 50 and
x gives us a rough estimate of μ, but the
SAMPLING DISTRIBUTIONS AND
STANDARD ERROR
• HOW ROUGH?
• WHAT IS THE LIKELIHOOD OF THE TRUE VALUE OF μ
BEING AS HIGH AS 60 OR AS LOW AS 40?
SAMPLING DISTRIBUTIONS AND
STANDARD ERROR
• Back to normal distribution curve the shape of the normal
curve arising from the mathematical equation which relate to its
standard deviation (standard error).
• It is found that approximately
– 68% of the area under a normal curve lies within 1 standard deviation
(standard error) of the mean
– 95% of the area lies within 2 standard deviation (standard error) of the
mean
– 99.7% of the area lies within 3 standard deviation (standard error) of
the mean