Statistics without surveys

Download Report

Transcript Statistics without surveys

DTC Quantitative Methods
Data Sources:
Secondary Analysis
(& Official Statistics)
Friday 13th January 2012
What is ‘secondary analysis’?
• Hakim: “…any further analysis of an existing dataset
which presents interpretations, conclusions or
knowledge additional to, or different from, those
presented in the first report on the inquiry as a whole and
its main results”.
• Dale et al.: “secondary analysis implies a re-working of
data already analysed”.
• Hyman: “the extraction of knowledge on topics other
than those which were the focus of the original surveys”.
Online course extracts: Dale et al. 1988; Dale et al. 2008
Sources for secondary analyses
•
•
•
•
•
Surveys
The Census
Administrative and/or public records
Longitudinal studies
Qualitative studies
The UK Data Archive (http://www.data-archive.ac.uk) now
catalogues data from surveys and qualitative studies, as
well as the Census, historical data, international countrylevel databases, etc.
Some sources specifically geared
towards secondary analysis
• The British Social Attitudes Survey
• Understanding Society
http://www.understandingsociety.org.uk/
• The Timescapes qualitative longitudinal study:
http://www.timescapes.leeds.ac.uk/
Benefits of secondary analysis
• It avoids costs in money and time that would make primary
research impractical, especially for a lone researcher.
• It allows one to benefit from the fieldwork expertise of
professional organizations.
• Cross-national and historical research become more of a
practical possibility.
• Secondary analyses of longitudinal data facilitate studies of
change over time.
• Large, nationally representative samples facilitate
sophisticated, generalisable analyses, and (sometimes)
analyses relating to small/relatively inaccessible minorities.
Sampling error (again!)
There will always be some
sampling error
...but with a large sample it one can be more confident
that it will proportionally smaller.
The expected extent of sampling error in a sample is
expressed in terms of confidence levels
(e.g. that you’re 95% confident of being no more than
a stated amount wrong about the proportion of the
population who are Roman Catholic, given how many
people in your sample were Roman Catholic)
A population of ten people
with $0 - $9
The sampling distribution of
samples of size 1
The sampling distribution of
samples of size 2
The
sampling
distributions
of samples
of size
3 and 4
Bridging the
quantitative/qualitative ‘divide’
• Dale et al. comment that “qualitative
research can greatly enhance the value of
secondary analysis by providing greater
depth of information, particularly by
suggesting the underlying processes that
are responsible for the observed
relationships”.
Why is there a shortfall in secondary
analyses in the UK? (particularly in
some disciplines, e.g. Sociology)
• A lack of quantitatively-orientated researchers.
• The legacy of critiques of quantitative research
methods.
• More specifically, the legacy of critiques of
official statistics.
• Although it’s now more of a question of inertia
than of ongoing scepticism?
Themes within critiques of
official statistics
•
•
•
•
Concerns about coverage
Concerns about measurement
Epistemological concerns
‘Political concerns’
A ‘damning’ quote?
“It’s [i.e. the state’s] economic and political
functions are embedded in the production of
official statistics, structuring both what data
are produced and how this is done... only by
understanding that statistics are produced
as part of the administration and control of a
society organised around exploitative class
relations can we grasp their full meaning”
(Miles and Irvine, 1979).
However…
• Analyses of official data can produce substantively
interesting results
• The producers and users of official statistics are
normally very concerned about the errors in data
and the data’s limitations,
• The conceptual issues arising from the use of official
statistics are not dissimilar to those arising in other
forms of sociological research.
• Analyses of official data have been used to critique
governments with respect to issues such as
unemployment, health inequalities, etc.
(The first three of the above bullet points are suggestions by Bulmer)
... nevertheless
As Hindess commented:
• “Official statistics are never mere givens to be
taken as they are or else dismissed as
inadequate. Like other productions they must be
explained in terms of the conditions and
instruments of their production”.
• “As structured social products they [i.e. official
statistics] can [and should be!] be critically
assessed”.
Official statistics or official data?
• Published official statistics have justifiably
been viewed with some scepticism.
• However, the analysis of official data by a
secondary analyst can avoid some of the
problems.
• Given access to the ‘raw’ official data, she
or he can manipulate them in ways
different to how they were processed to
produce published official statistics.
Are UK official statistics getting
more independent?
• A Statistics Board resulting from the Statistics Bill of
July 2007, renamed the UK Statistics Authority in
February 2008 (see:
http://www.statisticsauthority.gov.uk/) is:
• “... an independent body operating at arm's length
from government as a non-ministerial department,
directly accountable to Parliament. … [its] overall
objective is to promote and safeguard the quality of
official statistics that serve the public good. It is also
required to safeguard the comprehensiveness
of official statistics”.
Some more specific developments: I
• OPCS [now ONS] Disability Surveys were criticised for not
adequately reflecting disabled people’s perspectives on their
disabilities.
(see Abberley’s chapter in Levitas and Guy, 1996).
• However, they were nevertheless used for some interesting
and useful secondary analyes (see Pole and Lampard, 2002,
Ch. 7).
• More recently, the Office for Disability Issues (ODI) brought
together a group of disabled people as a reference network, in
part to facilitate the effective design of a new longitudinal
disability survey, the Life Opportunities Survey (LOS):
(see http://www.ons.gov.uk/about/surveys/a-z-of-surveys/lifeopportunities-survey/index.html)
Some more specific developments: II
• A number of UK government surveys now (since 2009)
ask a question on sexual orientation, following a
question being asked in the 2007 Citizenship Survey
(and resulting from ONS's Sexual Identity Project,
established in 2006).
• This development reflects more general governmental
concerns about the availability of ‘equality data’.
• However, it does not seem that a question will be asked
on this topic in the 2001 Census!
• The consensus also seems to be that the results
generated by the question will under-estimate nonheterosexual orientations.
“What is the moral? Must have a moral…”
• Whether the source of their data is official or
non-official, secondary analysts should gain an
extensive knowledge of the research design and
data collection process.
• This allows the secondary analyst to adopt an
informed and suitably critical approach to their
assessment of the validity and value of their data
source(s).
The data source for the slide title (I think) is “A Funny Thing Happened
on the Way to the Forum” (Sondheim)
Key issues in secondary analysis
(according to Dale et al.)
• What was the original purpose of the study and
what conceptual framework was used? Who was
responsible for collecting the data?
• What data did the study collect and how were
variables such as occupational class
operationalized?
• What was the sample design that was used and
what was the level and pattern of non-response?
Is there documentation
available in relation to?
•
•
•
•
•
•
Sample selection
Patterns of (non-)response
Interview schedules [Questionnaires]
Instructions to interviewers
The coding of answers
The construction of derived variables
Some other relevant questions...
• Is secondary analysis an appropriate approach given the
researcher’s objectives?
• Does the secondary analyst know the topic area well enough
to be able to interpret and evaluate the information available?
• What similarities and differences are there between the
conceptual frameworks of the original researchers and of the
secondary analyst?
• Are the data recent and extensive enough for the secondary
analyst’s purposes?
• How consistent is the information with information from other
sources?
• Is the information representativeness enough to support
generalisations? Is weighting needed to correct for a lack of
representativeness?
An example: the General Household Survey
Advantages of the GHS include
• a large sample size
• the fact that it has been repeated more or less
annually since the early 1970s, which allows
trends to be examined
• a broad agenda which means that relationships
between concepts belonging to different policy
areas can be examined
• a hierarchical structure, which allows linkages
between different members of the same
household to be examined (Dale et al.)
Disadvantages of the GHS
include
• That it has been cancelled, although some
of its components have been reassigned
to other, less satisfactory surveys...
Some examples of sources and issues
from Richard’s research
• Social Change and Economic Life Initiative
(SCELI) main survey (1986)
• National Survey of Sexual Attitudes and
Lifestyles II (2000) [and various other couplerelated surveys]
• General Household Survey (1991 & 2005)
• British Election Study (1987)
See also Pole and Lampard, 2002, Ch. 7.
Reasons for the end of a
cohabiting or marital relationship
(as shown on a NATSAL II showcard)
•
•
•
•
•
•
•
•
•
•
•
•
•
Unfaithfulness or adultery
Money problems
Difficulties with our sex life
Different interests, nothing in common
Grew apart
Not having children
Lack of respect or appreciation
Domestic violence
Arguments
Not sharing household chores enough
One of us moved because of a change in circumstances
(for example, changed jobs)
Death of partner
Another reason (please say what)
...and the categories that had to be added
•
•
•
•
•
•
•
•
•
•
•
Drink, drugs or gambling problem
Mental health or related problem
Problem with children/step-children
Never at home (e.g. always out with friends)
Problems with parents/in-laws/family
Age-related problems (e.g. big age difference)
Another relationship involved
Lived in/moved to a different country/area
Still in relationship, but stopped living together
Change of mind/feelings/personality
Partner just left without any explanation
Weighting: A commonly-needed
adjustment
• This is used when a particular group has
been ‘over-sampled’ (or ‘under-sampled’).
This occurs in ‘disproportionate sampling’.
• It assigns some cases more weight than
others on the basis of the different
probabilities of selection each case had.
• The appropriate approach is to give each
case a weight that is (proportional to) the
inverse of that case’s selection probability.
Weighting example
• I have a population of 10,000 university students that includes 10%
minority ethnic students.
• I want to sample 100 people and to compare ‘white’ and minority
ethnic respondents.
• If I sample randomly I will probably get only about 10 minority ethnic
respondents. This won’t give me much of a basis for a comparison.
• So I stratify my sample and sample 50/1000 minority ethnic students,
giving a probability of selection of .05
• ...and 50/9,000 ‘white’ students, giving a probability of .0056
• We now have 50 ‘white’ and 50 minority ethnic respondents – this is
useful because it provides more balanced information about the two
sub-populations.
• However, it now looks from the sample as if the population is 50%
minority ethnic, which is not the case.
• To ‘re-weight’ the responses to make them represent the composition
of the ‘real’ population I can multiply each minority ethnic respondent
by the inverse of their chance of selection (1000/50 = 20) and each
‘white’ respondent by the inverse of their chance (9000/50 = 180).
• These weights give a sample size that is 100 times too large
(10,000/100), so dividing by 100 gives final weights of 0.2 and 1.8.