lecture 6 SAMPLING - Dycker@control

Download Report

Transcript lecture 6 SAMPLING - Dycker@control

SAMPLING
OUTLINE:




sampling and census
sampling surveys, frame, size
probability and non-probability sampling
methods
census
SAMPLING AND CENSUS

collection methods for data
Sampling

any data collection that is not a controlled
experiment
i.e. percentage of greenhouse gases in
atmosphere above Winnipeg
SAMPLING AND CENSUS
Census

survey whose domain is the characteristics of an
entire population

any study of entire population of a particular set of
‘objects’.
i.e. female polar bears in western Hudson Bay
human residents of Heidelberg
the number of Epacris impressa plants on a single
hillside in Riding Mountain National Park
SAMPLING AND CENSUS

collect, analyse or study only some members of a
population then we are carrying out a survey

aim is to make observations at a limited number of
carefully chosen locations that are representative of a
distribution

use sample to predict the overall character of the
population – accuracy will depend on quality of
sample
SAMPLING SURVEYS

done for several reasons:

costs less than a census of the equivalent population

they are carried out to answer specific questions,

sample survey will usually offer greater scope than a
census (larger geographical area, greater variety of
questions)
SAMPLING SURVEYS
development of sampling survey:

state objectives of survey

define target population

define data to be collected

define the required precision and accuracy

define the measurement ‘instrument’

define the sample frame, sample size and sampling
method, then select the sample
SAMPLING SURVEYS


process of generating a sample requires several critical
decisions to be made:

sample frame

sample size

sampling method
errors will compromise the entire survey
SAMPLE FRAME

if frame is wrongly defined, sample may not be
representative of the target population.

frame might be ‘wrong’ in three ways:

contains too many individuals (membership is

contains too few individuals (membership is

contains the wrong set of individuals
(membership is ill-defined)
under-defined)
over-defined)
SAMPLE FRAME
Two-stage process:

divide the target population into sampling units
i.e. households, trees, light bulbs, soil samples,
cities, individuals

create a finite list of sampling units that make up
the target population.
i.e. names, addresses, identity numbers,
# of 50 mL sample bottles
SAMPLING UNITS

member of a sample/sample frame

in geomatics – points, lines (transects) and areas
(quadrats)
i.e. measuring snow depth at 10 cm intervals along a
10 m line
measuring all features that fall within 10 m of a line
SAMPLE SIZE

quantity is not better than quality

in statistics – sample size of 30 or greater is ideal

in geomatics – appropriate sample size is directly
related to a distribution’s variability
SAMPLING METHOD

aim is to obtain a sample that is representative of
the target population.

when selecting a sampling method, we need
some minimal prior knowledge of the target
population

how we actually decided which sampling units will
be chosen makes up the sampling method.
SAMPLING METHOD

most sampling methods attempt to select units such
that each has a definable probability of being chosen -
probability sampling methods.

we can ignore probability of selection and choose
samples on some other criterion – non-probability
sampling methods.
NON-PROBABILITY SAMPLING

units that make up the sample are collected with no
specific probability structure in mind
i.e. units are self-selected
units are most easily accessible
units are selected on economic grounds
units are considered to be typical of pop’n
units are chosen without an obvious design
NON-PROBABILITY SAMPLING

considered inferior to other method - no statistical
basis upon which the success of sampling method can
be evaluated.

may be unavoidable – regard as a ‘last resort’ when
designing a sample scheme.
PROBABILITY SAMPLING

basis is the selection of sampling units to make up the
sample based on defining the chance that each unit in
the sample frame will be included
i.e. have 100 students, need 10 to fill out a survey,
each student has a 1 in 10 chance or being selected
(probability of selection is 0.1)
PROBABILITY SAMPLING

each time we apply the same method to the same
frame, we will generate a different sample

concerned with probability of each sample being
chosen, rather than with the probability of
choosing individual units

number of probability sampling strategies
PROBABILITY SAMPLING
Simple random sampling


simplest way
select n units such that every one of the possible
samples has an equal chance of being chosen

generate a sample by selecting from the sample frame
by any method that guarantees that each sampling
unit has a specified probability of being included

how we do the sampling is of no significance (I.e.
random number tables, dice, …)
PROBABILITY SAMPLING
Simple random sampling
PROBABILITY SAMPLING
i.e.
94407382
94409687
93535459
94552345
94768091
93732085
94556321
94562119
93763450
94127845
94675420
94562119
93763450
94127845
Use random number table to generate
six random number between 1 and 14
4, 6, 7, 9, 11, 13
PROBABILITY SAMPLING
Stratified Sampling

used when you suspect the target population actually
consists of a series of separate ‘sub-populations’

stratification is the process of splitting the sample to
take account of possible sub-populations

stratified sampling – total pop is first divided into a set
of mutually exclusive sub-pops/strata

sub-populations may be of equal sizes or not
depending on their relative sizes
PROBABILITY SAMPLING
Stratified Sampling

within each strata, select a
sample usually ensuring that the
probability of selection is the
same for each unit in each subpop – stratified random sample
i.e. national polls and rating
surveys
PROBABILITY SAMPLING
i.e.
94407382
94409687
94535459
94552345
94768091
94732085
94556321
93562119
93763450
93127845
93675420
93562119
93763450
93127845
First split pop into sub-pops
(based on the second number
in this example)
Then sample from these subpops (three from each using a
random number table – 1, 2,
5)
PROBABILITY SAMPLING
Systematic Sampling

decide sample size from the
population size; population
has to be organized in some
way
i.e. points along a river,
simple numerical order

simpler in design and easier
to administer
PROBABILITY SAMPLING
Systematic Sampling

choose a starting point along the sequence by
selecting the rth unit from one end of the sequence

then take the rest of the sample by a number to r
PROBABILITY SAMPLING
i.e.
94407382
94409687
94535459
94552345
94768091
94732085
94556321
93562119
93763450
93127845
93675420
93562119
93763450
93127845
First order the sample units (in
this case decreasing numerical
order)
Next, select the first point (r
value) – 2
Then take every third sample
after this (2, 5, 8, 11, 14)
CENSUS

aim is to identify and record all members of a
population

most countries routinely carry out a census on its
population
i.e. Canada – performs a census every 5 years (1981,
1986, 1991, 1996, 2001)

original function to enumerate for electoral purposes,
but encompasses a large range of information about
national populations
CENSUS

collects important information about the social and
economic situation of people living in an area

Population Counts

Age, Sex, Marital Status, Families (number, type and structure)

Structural Type of Dwelling and Household Size

Immigration and Citizenship, Education, Mobility, Migration

Mother Tongue, Home Language and official/Non-Official Languages

Ethnic Origin and Population Group (visible minorities)

Labor Market Activities, Household Activity, Place of Work and Mode of
Transportation

Sources of Income, Total Income and Family and Household Income

Families: Social and Economic Characteristics, Occupied Dwellings and Household
Costs
CENSUS
disadvantages of census:

time consuming - require years of planning

laborious - requires thousands of
workers/volunteers

costly - millions of dollars to survey everyone
CENSUS
Errors in census data:

people respond dishonestly due to lack of
confidence in confidentiality

full accounting of residences is difficult to document
(i.e homeless)

recruiting substandard people to conduct surveys
CENSUS REGIONS

a census consists of “enumeration” data
 counts tabulated or ‘aggregated’ by geographic
areas

census regions/enumeration areas are not distributed
uniformly and vary in shape, size and orientation

Canada divided into 51,500 enumeration areas

census regions are defined by political boundaries and
natural and cultural landmarks
CENSUS REGIONS
Enumeration Area (EA)

smallest reported census area

canvassed by one census representative

125-440 dwellings, depending on situation in rural/urban area
Census Tract (CT)

represent urban or rural communities in CMAs and Cas

populations range between 2,500 - 8,000
Census Subdivision (CSD)

term applied to municipalities or equivalent
CENSUS REGIONS
Census Division (CD)

areas intermediate between municipality (CSD) and province level

represent counties, regional districts, regional municipalities
Census Metropolitan Area/Census Agglomeration (CMA/CA)

CMA and CA are very large urban cores together with adjacent
integrated urban and rural areas

urban core population >100,000 for CMA, >10,000 for CA

CMA may be combined with adjacent CAs to form ‘consolidated CMA’
Federal Electoral Districts (FED)

area entitled to elect a representative member to the House
CENSUS REGIONS


aggregate census information within the boundaries of
the data collection regions.

reduce costs

confidentiality
GIS concerns

census region totals are more abstract and spatially
inaccurate

mask the true nature of population distribution
REPORTING METHOD

aggregated data reported as census region totals –
data presentation is a count by region

also report census totals at region centroids

center of area – balance point for census region
shape

center of population – averaging x and y
coordinates of the individual pop`n.
Map of Census divisions
CENSUS AND GIS

census represents a very important source of data for
GIS because:

it provides data of use in many areas of human
geography: social, economic, political

the census goes back to Confederation, so
historical analyses can be performed

the census provides data in a large variety of
readily-mapped spatial zones (eg CMA, county)