Week 6, Lecture 2, Sampling

Download Report

Transcript Week 6, Lecture 2, Sampling

QBM117
Business Statistics
Statistical Inference
Sampling
1
Objectives
•
To give an overview of the nest topic, statistical
inference.
•
To understand that importance of correct sampling
techniques.
•
To introduce different sampling techniques.
2
Populations and Samples
• A population is the entire collection of items bout
which information is desired.
• A sample is a subset of the population that we collect
data from.
3
Parameters and Statistics
• A parameter is number that describes a population.
- A parameter is a fixed number.
• A statistic is a number that describes a sample.
- A statistic is a random variable whose value
changes from sample to sample.
4
Statistical Inference
• Population parameters are almost always unknown.
• We take a random sample from the population of
interest and calculate the sample statistic.
• We then use the sample statistic as an estimate of
the population parameter.
• Statistical Inference involves drawing conclusions
about a population based on sample information.
5
Sampling Distributions
• Sample statistics are random variables.
• The probability distribution of a sample statistic is
called its sampling distribution.
• We us the sampling distribution to make inferences
about the population parameters.
6
Estimation and Hypothesis Testing
• There are two types of statistical inference
- Estimation
- Hypothesis Testing
• Estimation is appropriate when we want to estimate a
population parameter.
• Hypothesis testing is appropriate when we want to
assess some claim about a population based on the
evidence provided by a sample.
7
Sampling
• Sampling is the process of selecting a sample from a
population.
• Samples may be selected in a variety of ways.
• The sample should be representative of the
population.
• This is best achieved by random sampling.
8
Random Sampling
• A sample is random if every member of the
population has an equal chance of being selected in
the sample.
• Most statistical techniques assume that random
samples are used.
• We will look at three types types of random sampling.
9
Simple Random Sampling
• A simple random sample is a sample in which each
member of the population is equally likely to be
included.
• The easiest way to generate a simple random sample
is to use a random number generator.
10
Example: Generating a Simple
Random Sample
A government income-tax auditor is responsible for
1000 tax returns.
The auditor wants to randomly select 40 tax returns
to audit.
Each tax return in the population of 1000 is given a
number from 1 to 1000.
We then use Excel’s random number generator to
select the random sample of 40 tax returns.
11
X(1000)
50 numbers
uniformly
distributed
between
0 and 1
0.3820002
0.1006806
0.5964843
0.8991058
0.8846095
0.9584643
0.0144963
0.4074221
0.8632466
0.1385846
0.2450331
382.00018
100.68056
596.48427
899.10581
884.60952
958.46431
14.496292
407.4221
863.24656
138.58455
245.03311
.
.
.
.
50 Random numbers
between 0 and 1000,
each has a probability
of 1/1000 to be selected
Round-up
383 383
101 101
597 597
900 900
885 885
959 959
15 15
408 408
864 864
139 139
246 246
50 integral
random numbers
between 1 and 1000
uniformly distributed
. .
. .
The auditor will select returns
numbered 383, 101, 597, ...
12
Stratified Random Sampling
• A stratified random sample is obtained by dividing the
population into homogeneous groups and drawing a
simple random sample from each group.
• The homogenous groups are called strata.
• Not only can acquire information about the whole
population, we can also make inferences within each
stratum or compare strata.
13
Example: Generating a Stratified
Random Sample
Suppose the Internal Revenue Service wants to
estimate the median amounts of deductions
taxpayers claim in different categories, e.g. property
taxes, charitable donations, etc.
These amounts vary greatly over the taxpayer
population.
Therefore a simple random sample will not be very
efficient.
14
• The taxpayers can be divided into strata based on
their adjusted gross incomes, and a separate SRS
can be drawn from each individual strata.
• Because the deductions generally increase with
incomes, the resulting stratified random sample
would require a much smaller total sample size to
provide equally precise estimates.
15
There are several ways to build the stratified random
sample.
One of them is to maintain the proportion of each
stratum in the population, in the sample.
A sample of size 1000 is to be drawn.
Stratum
Income
1
2
3
4
under $15,000
15,000-29,999
30.000-50,000
over $50,000
Population proportion
25%
40%
30%
5%
Stratum size
250
400
300
50
Total 1000
16
Cluster Sampling
• Cluster sampling groups the population into small
clusters, draws a simple random sample of clusters,
and observes everything in the sampled clusters.
• It is useful when it is difficult or costly to develop a
complete list of the population members.
• It is also useful whenever the population elements
are widely dispersed geographically.
17
Errors Involved in Sampling
• Two types of errors occur when sampling from a
population
- sampling error
- non-sampling error
18
Sampling Error
• Sampling error is the error that arises because the
data are collected from part, rather than the whole of
the population.
• Whenever we make inferences about a population
based on information from a sample there will
naturally be some degree of error.
• The larger the sample, the smaller the sampling error.
19
m - population mean income
Sampling error
x  sample mean income
20
Non-Sampling Error
• Non-sampling errors are due to errors in data
acquisition, non-response error and selection bias.
• These type of errors are more serious than sampling
errors as increasing the sample size will not help to
reduce them.
21
Errors in Data Acquisition
• These types of errors occur during data collection
and processing.
– Faulty equipment may lead to incorrect
measurements being taken.
– Data may be recorded incorrectly.
– Processing errors may occur.
22
Data Acquisition Error
Population
Sampling error +
Data acquisition error
Sample
If this observation is
wrongly recorded here
Then the sample mean is affected
23
Non-Response Error
• Non-response error is the error introduced when
responses are not obtained from some members of
the sample.
• The sample observations that are collected may not
be representative of the population.
• This results in biased results.
24
Non-Response Error
Population
No response here...
May lead to biased results here
Sample
25
Selection Bias
• Selection bias occurs when some members of the
population cannot possibly selected for inclusion in
the sample.
• For example, surveying voters by randomly selecting
telephone numbers is biased as voters who do not
have a telephone cannot possibly be selected in the
sample.
26
Selection Bias
Population
When parts of the population cannot be selected...
Sample
the sample cannot represent
the whole population
27
Reading for next lecture
• Chapter 7, Section 7.5
Exercises
• 6.11
• 6.12
28