presentation

Download Report

Transcript presentation

Semi-Permeable Boundaries
Among Institutions:
Non-Public Data
and the Census RDC at
Berkeley
IASSIST 2009 – Tampere, Finland
Jon Stiles
May 27, 2009
Three Questions

What is an RDC? (and why does it exist?)

Partners and Constituents

Proposal process and environment
CCRDC
California Census Research Data Center
Berkeley
The CCRDC is a joint project of the U.S. Bureau of the Census
and the University of California Berkeley and UCLA to enable
qualified researchers with approved projects to access
confidential, unpublished Census Bureau data.
There are nine RDCs in the U.S.: Berkeley, UCLA, Boston,
Baruch, Cornell, Ann Arbor, Duke, Chicago, Washington, DC
(plus Minnesota!)
CCRDC on the web: http://www.ccrdc.ucla.edu/
Why do RDC’s Exist?

Tension between high quality data
collection and distribution/use.
Census Bureau and other federal agencies
collect a huge amount of data, many items
“sensitive”.
 To maintain high response rates, promises of
confidentiality
 Data have diverse uses and users

How to reconcile tensions?

Release of aggregate data (Summary data)
How to reconcile tensions?


Release of aggregate data (Summary data)
Release of microdata with masked
geography, selected items, top-coded
categories (PUMS)
How to reconcile tensions?



Release of aggregate data (Summary data)
Release of microdata with masked
geography, selected items, top-coded
categories (PUMS)
Creation of synthetic data (LEHD)
How to reconcile tensions?




Release of aggregate data (Summary data)
Release of microdata with masked
geography, selected items, top-coded
categories (PUMS)
Creation of synthetic data (LEHD)
Controlled access with tight security and
disclosure review (RDCs)
Purpose of Census
Research Data Centers

Protected Access to non-public use data for
Researchers




Secure facility
Presence of Census Bureau employee
Disclosure Review
Benefits to Census Bureau


Necessary for access to Title 13 and Title 26 data
Not required for NCHS, AHRQ data if not linked to
Title 13 data
Data at the RDC’s include

Demographic Surveys and Censuses




Decennial Census
American Community Survey
CPS, SIPP, AHS, NLS, and more….
Economic Surveys and Censuses



Longitudinal Business Database
Census of Manufactures, Services, Mining, Retail
Trade, Wholesale Trade, Transportation,
Communications and Utilities
Survey of Employers, Plant Capacity, Capital
Expenditures, Pollution Abatement Costs, Energy
Consumption, and more….
Additional Data:
National Center for
Health Statistics

We are now hosting research using confidential NCHS and AHRQ
data in the CCRDC

Rules for access and disclosure the same as those in their enclaves
 http://www.cdc.gov/nchs/r&d/rdc.htm
 http://www.meps.ahrq.gov
 No requirement to demonstrate Census benefit.

Disclosure Avoidance review conducted by partner agencies

Long list of datasets – including NHIS, NHANES, NSFG,
LSOA….
Partners and Constituents



Census Bureau, UC Berkeley, Researchers
…and oversight agencies
Joint Project Agreement identifies
responsibilities
Financial, Security, Employees, Processes
Individual Agreements with Researchers, Special
Sworn Status
Partners and Constituents

Joint & Complementary Interests
Census Bureau – Benefits to Bureau an integral
and over-riding part of every project
Berkeley – availability seen a key component for
research, faculty recruitment/retention
Researchers – Data allows:
Why use data at RDC?

Not available elsewhere
Establishment level business data
 Linked household-firm (LEHD) data


More detail than elsewhere
Detailed geo-spatial variables
 Virtually no top or bottom coding
 Possible to link to other non-Census data

Proposal Process & Environment

Proposal





Create account at Census – Online submission
Contact with local RDC administrator for project development,
scope, benefits to Bureau
Special Sworn Status
Fairly long lead time, internal/external reviews,
Disclosure Risks vs. Benefits
Secure Data Center, thin client, Linux



GIS tools include SAS 9.2, R, Grass
Also Stata, Sudaan, Gauss, Matlab, etc..
Restricted Entry, Printing, Isolated from Internet, 24 hour
surveillance