Secondary Data - University of New Mexico

Download Report

Transcript Secondary Data - University of New Mexico

Chapter 13
Secondary Data Analysis and
Content Analysis
Introduction



Secondary data analysis is the method of using
preexisting data in a different way or to answer a
different research question than intended by those
who collected the data.
The most common sources of secondary data—
previously collected data that are used in a new
analysis—are social science surveys and data
collected by government agencies, often with
survey research methods.
It is also possible to reanalyze data that have been
collected in experimental studies or with qualitative
methods.
Introduction, cont.


Even reanalysis by a researcher of data that he
collected previously qualifies as secondary
analysis if it is for a new purpose or in response
to a methodological critique.
Thanks to the data collected by social
researchers, governments, and organizations
over many years, secondary data analysis has
become the research method used by many
contemporary social scientists to investigate
important research questions.
Why Consider Secondary Data?


Data collected in previous investigations is
available for use by other social researchers
on a wide range of topics.
Available datasets often include many more
measures and cases and reflect more rigorous
research procedures than another researcher
will have the time or resources to obtain in a
new investigation.
Why Consider Secondary Data? cont.


Much of the groundwork involved in creating
and testing measures with the dataset has
already been done.
Most important, most funded social science
research projects collect data that can be used
to investigate new research questions that the
primary researchers who collected the data did
not consider.
Why Consider Secondary Data? cont.



Content analysis is similar to secondary data
analysis in its use of information that has already
been collected.
Therefore, like secondary data analysis, content
analysis can be called an “unobtrusive method”
that does not need to involve interacting with live
people.
In addition, most content analyses, like most
secondary data analyses, use quantitative analysis
procedures and you will find some datasets
resulting from content analyses in collections of
secondary datasets.
Why Consider Secondary Data? cont.



Content analyses can even be used to code data
collected in surveys, so you can find content
analysis data included in some survey datasets.
However, content analysis methods usually begin
with text, speech broadcasts, or visual images, not
data already collected by social scientists.
The content analyst develops procedures for coding
various aspects of the textual, oral(spoken), or
visual material and then analyzes this coded
“content.”
Secondary Data Sources


With the advent of modern computers and,
even more important, the Internet, secondary
data analysis has become an increasingly
accessible social research method.
Literally thousands of large-scale datasets are
now available for the secondary data analyst.
Secondary Data Sources, cont.


There are many sources of data for secondary
analysis within the United States and
internationally.
These sources range from data compiled by
governmental units and private organizations
for administrative purposes, which are
subsequently made available for research
purposes, to data collected by social
researchers for one purpose that are then made
available for reanalysis.
Secondary Data Sources, cont.


What makes secondary data analysis such an
exciting and growing option today are the
considerable resources being devoted to expanding
the amount of secondary data and to making it
available to social scientists.
For example, the National Data Program for the
Social Sciences, funded in part by the National
Science Foundation, sponsors the ongoing GSS
(General Social Survey) in order to make current
data on a wide range of research questions
available to social scientists.
U.S. Bureau of the Census


The U.S. government has conducted a census
of the population every 10 years since 1790;
since 1940, this census also has included a
census of housing.
The Census Bureau’s monthly Current
Population Survey (CPS) provides basic data
on labor force activity that is then used in U.S.
Bureau of Labor Statistics reports.
Integrated Public Use Microdata
Series


Individual-level samples from U.S. Census data
for the years 1850 to 1990, as well as historical
census files from several other countries, are
available through the Integrated Public Use
Microdata Series (IPUMS) at the University of
Minnesota’s Minnesota Population Center
(MPC).
These data are prepared in an easy-to-use
format that provides consistent codes and
names for all the different samples.
Bureau of Labor Statistics (BLS)


The U.S. Department of Labor, which collects
and analyzes data on employment, earnings,
prices, living conditions, industrial relations,
productivity and technology, and occupational
safety and health (U.S. Bureau of Labor
Statistics 1991, 1997b).
The monthly Current Population Survey (CPS)
provides a monthly employment and
unemployment record for the United States,
classified by age, sex, race, and other
characteristics.
Other U.S. Government Sources


Many more datasets useful for historical and
comparative research have been collected by
federal agencies and other organizations.
The National Technical Information Service
(NTIS) of the U.S. Department of Commerce
maintains a Federal Computer Products Center
that collects and catalogs many of these
datasets and related software.
Independent Investigator Data
Sources


Many researchers who have received funding to
investigate a wide range of research topics make their
data available on websites where they can be
downloaded by other researchers for secondary data
analyses.
One of the largest, introduced earlier, is the Add Health
study, funded at the University of North Carolina by the
National Institute of Child Health and Human
Development (NICHD) and 23 other agencies and
foundations to investigate influences on adolescents’
health and risk behaviors
(www.cpc.unc.edu/projects/addhealth).
Independent Investigator Data
Sources, cont.

Another significant data source, the Health and
Retirement Study (HRS), began in 1992 with funding
from the National Institute on Aging (NIA)
(http://hrsonline.isr.umich.edu/).


To investigate family experience change, researchers at
the University of Wisconsin designed the National
Survey of Families and Households (www.ssc.wisc.edu/nsfh/).
Other noteworthy examples, among many, are the
Detroit Area Studies, with annual surveys between 1951
and 2004 on a wide range of personal, political, and
social issues (www.icpsr.umich.edu/icpsrweb/detroitareastudies/).
ICPSR



The University of Michigan’s ICPSR is the
premier source of secondary data useful to
social science researchers.
ICPSR was founded in 1962 and now includes
more than 325 colleges and universities in
North America and hundreds of institutions on
other continents.
ICPSR archives the most extensive collection of
social science datasets in the United States
outside of the federal government.
ICPSR, cont.



ICPSR also catalogs reports and publications
containing analyses that have used ICPSR datasets
since 1962—more than 34,000 citations were in this
archive in July 2005.
This superb resource provides an excellent starting
point for the literature search that should precede a
secondary data analysis.
In most cases, you can learn from detailed study
reports a great deal about the study methodology,
including the rate of response in a sample survey
and the reliability of any indexes constructed.
ICPSR, cont.



Published articles provide not only examples of how
others have described the study methodology but
also research questions that have already been
studied with the dataset and issues that remain to
be resolved.
Even if you are using ICPSR, you shouldn’t stop
your review of the literature with the sources listed
on the ICPSR site.
Conduct a search in Sociological Abstracts or
another bibliographic database to learn about
related studies that used different databases.
International Data Sources



Comparative researchers can find datasets on
the population characteristics, economic and
political features, and political events of many
nations.
Some of these are available from U.S.
government agencies.
For example, the Social Security Administration
reports on the characteristics of social security
throughout the world (Wheeler 1995).
Qualitative Data Sources



Far fewer qualitative datasets are available for
secondary analysis.
By far the richest source, if you are interested in
cross-cultural research, is the Human Relations
Area Files (HRAF) at Yale University.
The ICPSR collection includes a limited number
of studies containing at least some qualitative
data (19 such studies as of July 2005), but
these include some very rich data.
Challenges for Secondary Data
Analyses

1.
2.
3.
The use of the method of secondary data
analysis has clear advantages for social
researchers
It can allow analyses of social processes in
other inaccessible settings.
It saves time and money.
It allows the researcher to avoid data
collection problems.
Challenges for Secondary Data
Analyses, cont.
4.
5.
6.
It can facilitate comparison with other samples.
It may allow inclusion of many more variables
and a more diverse sample than otherwise
would be feasible.
It may allow data from multiple studies to be
combined.
Challenges for Secondary Data
Analyses, cont.


The secondary data analyst also faces some
unique challenges.
The easy availability of data for secondary analysis
should not obscure the fundamental differences
between a secondary and a primary analysis of
social science data.
Challenges for Secondary Data
Analyses, cont.



So the greatest challenge faced in secondary data
analysis results from the researcher’s inability to
design data collection methods that are best suited
to answer her research question.
The secondary data analyst also cannot test and
refine the methods to be used on the basis of
preliminary feedback from the population or
processes to be studied.
Nor is it possible for the secondary data analyst to
engage in of making observations, developing
concepts, making more observations, and refining
the concepts, which is the hallmark of much
qualitative methodology.
Challenges for Secondary Data
Analyses, cont.

If the primary study was not designed to
measure adequately a concept that is critical to
the secondary analyst’s hypothesis, the study
may have to be abandoned until a more
adequate source of data can be found.
Challenges for Secondary Data
Analyses, cont.



Data quality is always a concern with secondary
data, even when the data are collected by an official
government agency.
Government actions result, at least in part, from
political processes that may not have as their first
priority the design or maintenance of high-quality
data for social scientific analysis.
The basis for concern is much greater in research
across national boundaries, because different datacollection systems and definitions of key variables
may have been used.
Challenges for Secondary Data
Analyses, cont.

1.
2.
Any secondary analysis will be improved if the
analyst— yourself or the author of the work that you
are reviewing— answers several questions before
deciding to develop an analysis of secondary data
in the first place and then continues to develop
these answers as the analysis proceeds
What were the agency’s or researcher’s goals in
collecting the data?
What data were collected, and what were they
intended to measure?
Challenges for Secondary Data
Analyses, cont.
3.
4.
5.
6.
When was the information collected?
What methods were used for data collection?
How is the information organized (by date,
event, etc.)?
What is known about the success of the datacollection effort? How are missing data
indicated? What kind of documentation is
available? How consistent are the data with
data available from other sources?
Challenges for Secondary Data
Analyses, cont.


Answering these questions helps to ensure that
the researcher is familiar with the data he or
she will analyze and can help to identify any
problems with it.
It is unlikely that you or any secondary data
analyst will be able to develop complete
answers to all of these questions prior to
starting an analysis, but it still is critical to make
the attempt to assess what you know and don’t
know about data quality before deciding
whether to conduct the analysis.
Content Analysis

We can learn a great deal about popular culture
and many other issues through studying the
characteristics of messages delivered through
the mass media and other sources.
Content Analysis, cont.



You can think of a content analysis as a “survey” of
some documents or other records of prior
communication—a survey with fixed-choice
responses that produce quantitative data.
This method was first applied to the study of
newspaper and film content and then developed
systematically for the analysis of Nazi propaganda
broadcasts in World War II.
Since then, content analysis has been used to
study historical documents, records of speeches,
and other “voices from the past” as well as media
of all sorts (Neuendorf 202:31–37).
Content Analysis, cont.


Content analysis bears some similarities to
qualitative data analysis, because it involves coding
and categorizing text and identifying relationships
among constructs identified in the text.
However, since it usually is conceived as a
quantitative procedure, content analysis overlaps
with qualitative data analysis only at the margins.
Stages of Content Analysis
1.
2.
3.
4.
5.
Identify a population of documents or other
textual sources
Determine the units of analysis
Select a sample of units from the population
Design coding procedures for the variables to
be measured
Develop appropriate statistical analyses
Ethical Issues in Secondary Data
Analysis and Content Analysis


Analysis of data collected by others, as well as
content analysis of text, does not create the
same potential for harm as does the collection
of primary data, but neither ethical nor related
political considerations can be ignored.
Because in most cases the secondary
researchers did not collect the data, a key
ethical obligation is to cite the original, principal
investigators, as well as the data source, such
as the ICPSR.
Ethical Issues in Secondary Data
Analysis and Content Analysis, cont.



Subject confidentiality is a key concern when original
records are analyzed.
Whenever possible, all information that could identify
individuals should be removed from the records to be
analyzed so that no link is possible to the identities of
living subjects or the living descendants of subjects
When you use data that have already been archived,
you need to find out what procedures were used to
preserve subject confidentiality. The work required to
ensure subject confidentiality probably will have been
done for you by the data archivist.
Ethical Issues in Secondary Data
Analysis and Content Analysis, cont.



It is not up to you to decide whether there are any
issues of concern regarding human subjects when
you acquire a dataset for secondary analysis from a
responsible source.
The Institutional Review Board (IRB) for the
Protection of Human Subjects at your college or
university or other institution has the responsibility
to decide whether they need to review and approve
proposals for secondary data analysis.
The federal regulations are not entirely clear on this
point, so the acceptable procedures will vary
between institutions based on what their IRBs have
decided.

If medical records are included in the
data then the IRB must approve the
use of the data.
Ethical Issues in Secondary Data
Analysis and Content Analysis, cont.



Data quality is always a concern with secondary
data, even when the data are collected by an official
government agency.
Researchers who rely on secondary data inevitably
make trade-offs between their ability to use a
particular dataset and the specific hypotheses they
can test.
If a concept that is critical to a hypothesis was not
measured adequately in a secondary data source,
the study might have to be abandoned until a more
adequate source of data can be found.
Conclusions


The easy availability for secondary analyses of
datasets collected in thousands of social
science investigations is one of the most
exciting features of social science research in
the 21st century.
You can often find a previously collected
dataset that is suitable for testing new
hypotheses or exploring new issues of interest.
Conclusions, cont.


Moreover, the research infrastructure that has
developed at ICPSR and other research
consortia, both in the United States and
internationally, ensures that a great many of
these datasets have been carefully checked for
quality and archived in a form that allows easy
access.
Many social scientists now review available
secondary data before they consider collecting
new data with which to investigate a particular
research question.