CU Denver Anschutz CRISP Seminar

Download Report

Transcript CU Denver Anschutz CRISP Seminar

Advancing the Frontiers of Social Science:
The
Rocky
Mountain
Research
Data
Center
The
Rocky
Mountain
Federal
Statistical
Opportunities and Challenges
Jani Little,
Executive
Director
Research
Data
Center
Jani Little,
Executive
Director
[email protected]
Katie Genadek, Expected Administrator
(RMRDC)
Jani Little
Executive Director
What is a Federal Statistical Research Data Center
(FSRDC)?
--A secure computing lab where restricted data, collected by federal
agencies, can be accessed FOR STATISTICAL PURPOSES ONLY
--Made possible by a contractual agreement between a leading research
institution and the U.S. Census Bureau
--The Census Bureau’s Center for Economic Studies (CES) directs all FSRDCs
and the FSRDC Program
--FSRDCs are managed by an on-site Census employee—the administrator—
who guides researchers on proposal development, enforces security
guidelines, and serves as liaison with the research community.
Katie Genadek, PhD
RMRDC Administrator
University of Colorado
[email protected]
IBS Room 423
The RMRDC Consortium
Partner Members:
Supporting Members:
UC Colorado Springs
Colorado State Government
Colorado School of Mines
National Center for Atmospheric
Research
National Renewable Energy
Laboratory
Partner Consortium Members
Faculty, Grad Students, and Affiliated
Researchers:
Free access to RMRDC services and secure
laboratory
Researchers with continued use are expected
to write grant proposals and include lab fees
Advantages to Researchers and Institutions:
--Greatly expands the policy and basic questions that
can be addressed
--Builds on past research findings with richer data
--Improves competitive edge for grants and publications
--Improves graduate education (big data/statistical
techniques) and placement
--Attracts and retains data-intensive faculty
Advantages Provided to Research:
--Microdata not available publicly
firms and establishments
individuals and households (especially longitudinal studies)
children
--Variables not available in public versions of data sets
(e.g., low level geography)
--Full population counts or larger samples (Decennial Census, ACS, CPS)
--Full range of response items (e.g., industry codes, occupational codes, detailed
race answers, income is not top-coded, etc.)
--Ability to make linkages
with external data (e.g., via geocodes, establishment ID, etc.)
between multiple internal data sets via non-public link keys
FSRDCs Used to Address Many Research Topics
•
•
•
•
•
•
•
•
•
•
•
•
Business, Trade, Finance, and Management
Crime and Crime Victimization
Demography, Population Distributions and Trends, Migration, and Immigration
Economics, Labor Markets, Entrepreneurship, Employment and Industry
Education and Education Policy
Hazard Mitigation, Environmental Impact Assessment, Pollution Abatement
Health and Well-Being, Health Insurance, Health Policy
Housing, Housing Markets, and Residential Patterns
Poverty, Social Welfare Policy, and Social Mobility
Transportation Analysis and Planning
Urban and Regional Economics and Planning
Energy Efficiency and Greenhouse Gas Emissions in Manufacturing
Requirements for Any FSRDC Project:
--Research projects must undergo a formal approval process with the agency
that owns the data, e.g., Census, NCHS, AHRQ, BLS
--Researchers must go through a background investigation that qualifies
them for “Special Sworn Status (SSS)” which makes them an unpaid Census
Bureau employee.
--Results must be formally reviewed for disclosure violation before they
leave the secure facility.
RMRDC: The Physical Facility
Projected Opening: May 2017
Location: IBS Building on CU Boulder Campus
--10 thin client workstations to access FSRDC servers
--Secure communications that tunnel over campus internet
--Contains the Administrator’s office
--Badge Reader at Entrance
--24/7 Security System with camera
--no electronic devices allowed
--NOTHING leaves the secure lab without approval
FSRDC Server Software
Gauss
Stata
Matlab & toolboxes
PBS Pro
Intel Composer XE
NX Enterprise
R
SAS
SAS (Dataflux)
SUDAAN
GeoDa
Tomlab
Knittro
Madd
QGIS
StatTransfer
Python - Anaconda
Fortran
Perl
Tex/LaTex
Components of Proposals:
--Personnel and Time frame
--Project Description (scientific merit, methods,
feasibility, why requires restricted data)
--Dataset(s), Variables, Geography
--Results Expected and Disclosure Avoidance
Strategies
Proposal Differences by Agency:
Census
NCHS and AHRQ
Time to
Approval
3 months on
average
1-3 months on
average
Benefit to
Agency
PPS Required
Not Required
Fee
None
$1200 min
extract fee NCHS
$300 AHRQ*
Scope
Broad (max of
30 pages)
Precise
Major Partners in the FSRDC System
• U.S. Census Bureau
• Economic Data
• Demographic Data
• Longitudinal Employer-Household Dynamics (LEHD) Data
• Bureau of Labor Statistics (BLS)
• National Center for Health Statistics (NCHS)
• Agency for Healthcare Research and Quality (AHRQ)
• Other Federal Partners
Economic data available in RDCs
• Microdata not available elsewhere
• Detailed geographies and industries
• Data linked over time
• Employee and employer linked data
• Full business register for the US
• Can link own data to individual businesses
Examples of Economic Microdata
Data Set
Frequency
Unit of
Enumeration
Availability
Standard Statistical Establishment
List/Business Register (SSEL)
Annually
Establishment
1974–2014
Longitudinal Business Database
(LBD)
Annually
Establishment
1976–2014
Examples of Economic Microdata
Data Sets
Census of Auxiliary Establishments (AUX)
Frequency
Every 5 Years
Unit of Enumeration
Establishment
Availability
1977–2012
Census of Construction Industries (CCN)
Every 5 Years
Establishment
1972–2012
Census of Finance, Insurance, and Real Estate
(CFI)
Census of Manufactures (CMF)
Every 5 Years
Establishment
1992–2012
Every 5 Years
Establishment
Census of Mining (CMI)
Every 5 Years
Establishment
1963,
1967–2012
1987–2012
Census of Retail Trade (CRT)
Every 5 Years
Establishment
1977–2012
Census of Services (CSR)
Every 5 Years
Establishment
1977–2012
Census of Transportation, Communications,
and Utilities (CUT)
Census of Wholesale Trade (CWH)
Every 5 Years
Establishment
1987–2012
Every 5 Years
Establishment
1977–2012
Census of Services--
• includes Health Care and Social Assistance Enterprises
• NAICS code 62
• 2012 Number of Establishments in U.S.: 831,303
• Receipts/Revenues ($1,000): 2,040,441,203
• Summary table:
https://factfinder.census.gov/faces/tableservices/jsf/pages/productvi
ew.xhtml?src=bkmk
Linked Employer Household Dynamics (LEHD)
LEHD data combine administrative data from states’
Unemployment Insurance systems with Census Bureau data.
Workers: Employer history and quarterly wages, Individual characteristics
(sex, age, race), Point in time residence and place of birth
Employers: Industry, employment, total payroll, location
Linkages between workers and employers
Links to other Census data
Census Data: Demographic data available in
RDCs
• More geographic detail—usually block group or tract
• Additional variables
• More observations
• Variables not censored (income)
• Additional detail within variables
Data Available
• Decennial Censuses
• Yearly ACS (American Community Survey)
• Current Population Survey Supplements
• American Housing Survey
• Survey of Income and Program Participation
• National Crime Victimization Survey
• National Longitudinal Mortality Study
• National Longitudinal Surveys (NLS)
• Decennial Censuses
• 1950-2000 full count short form and 17% long form
Long form: Household and individual level demographic, socio-economic,
program participation, education, household characteristics, etc
• 2010 short form only
• Yearly ACS (American Community Survey)
• Annual full samples-- 1.5% of US population
• Replaced Long form from 2000 decennial + a few extra
questions
• Current Population Survey Supplements
• ASEC (Annual Social and Economic Supplement) or March 1967-2015
• Fertility Supplement (1998-2012), Food Security (2001-2012), School
enrollment (2004-2014), Tobacco Use (1998-2011), Unbanked (20092013), Volunteer (2002-2015), Voter Reg (1998-2012)
• American Housing Survey
• Some years from 1984-2015; ~50,000 households per year
• Core questions: Home condition, occupant characteristics, home
improvements, housing costs, home values, characteristics of recent
movers, etc
• Topical questions vary by year
• Survey of Income and Program Participation
• 2-4 year household panels; interviews ~every 4 months; 19842014; 14,000 to 52,000 households each wave
• Core: labor force, income dynamics, government transfers
• Topical modules vary
• National Crime Victimization Survey
• Yearly 2006-2014; ~90,000 households
• Non-fatal and property crimes, reported and unreported;
demographic information for respondent; demographic
information of perpetrator
• National Longitudinal Mortality Study
• CPS-ASEC data linked to national death index
• CPS cohorts 1973-1998
• National Longitudinal Survey (NLS)
• Original cohorts (1966, 1968)
• Labor market, demographic, and other data collected over
35 years
• ~5,000 respondents per cohort
Health Restricted Data:
• More geographic detail
• Additional variables
• Child data (under 18 years)
• Additional detail within variables
Restricted Health Data and Variables
• Geographic Codes for all NCHS Surveys
• National Health and Nutrition Examination Survey (NHANES)
• National Health Care Surveys
• National Ambulatory Medical Care Survey (NAMCS) and National Hospital
Ambulatory Medical Care Survey (NHAMCS)
• National Hospital Discharge Survey (NHDS)
• National Nursing Home Survey (NNHS) and National Nursing Assistant Survey
(NNAS)
• National Home and Hospice Care Survey (NHHCS) and National Home Health
Aide Survey (NHHAS)
• National Survey of Residential Care Facilities (NSRCF)
• National Study of Long-Term Care Providers (NSLTCP)
• National Hospital Care Survey (NHCS)
• National Health Interview Survey (NHIS)
• National Survey of Family Growth (NSFG)
• State and Local Area Integrated Telephone Survey (SLAITS)
• National Survey of Children's Health (NSCH)
• National Survey of Children with Special Health Care Needs (CSHCN)
• NCHS Data Linkage Activities
•
•
•
•
Linked Mortality Data Products
Linked Medicare Enrollment and Claims Files Data
Linked Medicaid Enrollment and Claims Data
Linked Social Security Benefit History Data
• National Vital Statistics System (NVSS) Data Release and Access Policy
• National Maternal and Infant Health Survey
Some major health data sources:
• Survey data
• NHANES
• NHIS
• NSCH
• AHRQ Survey data
• MEPS-HC
• MEPS-IC
• Health Care Survey data
• NAMCSs
• NHDS
• Administrative data
• Vital Records
• Linked Data
•
•
•
•
Mortality Data Products
Medicare Enrollment and Claims Data
Medicaid Enrollment and Claims Data
Social Security Benefit History Data
National Health and Nutrition Examination Survey
(NHANES)
• Provides prevalence data on selected diseases and risk
factors of U.S. Population
• Monitors trends in diseases, behaviors, and environmental
exposures
• Identifies emerging public health concerns
• Provides national baseline information on health and
nutrition
National Health and Nutrition Examination
Survey (NHANES), 1999-2014
• National probability sample, approx. 10,000
• Data collection from Mobile unit
• Interview—acculturation, air quality, allergies, demographics, diet,
cognitive functioning, physical activity, sleep disorder, smoking, social
support, weight history, family background, food security,
• alcohol use, bowel health, overall health, depression screening, pesticide
exposure, reproductive health, exposure to chemicals, drug use, sexual
behavior, etc
• Physical exam — hearing, body measurements, balance, blood pressure,
vision, heart, etc
• Lab testing —blood, urine, oral rinse, etc
National Health and Nutrition Examination Survey
(NHANES) Restricted Data
• Identifies geography below national level down to Census
block
• Youth -- Alcohol and Drug Use, ADHD, STDs, Mental Health
Disorders, Depression, Sexual Behavior
National Health Interview Survey, 1993-2015
• Annual Sample that is Nationally and Regionally Representative
• Family, Household and Person Self-Report Data
• Extensive Health and Social Psychological Measures including
• Depression, anxiety
• Other Mental Health Conditions
• Other Emotional or Behavioral Problems
National Health Interview Survey, Restricted
Data
• Country of Birth and Related Immigration Variables (Person File)
• State and Year of Birth (Person File)
• Industry and Occupation Codes
• Detailed Race and Hispanic Origin (Person File)
• Exact Dates (e.g., date of birth in Person File)
• Low levels of geography from state down to tract
Exposures to Fine Particulate Air Pollution and Respiratory Outcomes in
Adults Using Two National Datasets: A Cross-sectional Study
Researchers: Keeve Nachman and Jennifer Parker
Datasets: NHIS, EPA Air Data System (External- Linked using geocode)
--Evaluates the relationship between air pollution and asthma across
race/ethnicity…
--Revealed significant associations for non-Hispanic blacks but not for
Hispanics and non-Hispanic whites
National Survey of Children’s Health
• National telephone survey of households with at least 1 child,
• N= 91,642
• Demographics, Health and Functioning, Home Environment,
• Early Childhood Care, Developmental Screening,
• Adolescent School, Exercise, Emotional Difficulties
• Family Functioning and Parental Health
• Neighborhood and Community
• All variables restricted
• County and zip code geography available
Medical Expenditure Panel Survey--Insurance
Component (AHRQ and Census)
• 1996-2006, 2008-2015
• Public (govt) and private sector employers ~40,000 each year
• Asks about insurance plans offered
• Asks about contributions provided by employers and employees
• Can be linked with Census business data
• Used to document changes in employer-provided insurance before
and after ACA
Medical Expenditure Panel Surveys—
Household Component (AHRQ)
• Annual sample of households from prior year NHIS
• 30,000 persons, 14,000 households
• Health services used, frequency, charges and source of payments
• Access to care and quality of care
• Panel design over 2 years
• Medical Provider Component supplements Household Component
• Detailed charge and payment data
• Hospitals, physicians, home health care providers, and pharmacies
National Ambulatory Medical Care Surveys
• Sample of physicians, 1 week of visits, randomly sampled
• Patient demographics, symptoms, diagnoses and medications
ordered, number of visits in past year
• Physician demographics, type and size of practice, specialty
• Zip code
Useful Websites
• Restricted NCHS Data
https://www.cdc.gov/rdc/b1datatype/dt122.htm
• Restricted AHRQ Data
• https://meps.ahrq.gov/mepsweb/data_stats/onsite_datacenter.jsp