HIPAA and its Implications on Epidemiological Research
Download
Report
Transcript HIPAA and its Implications on Epidemiological Research
HIPAA and its Implications on
Epidemiological Research Using
Large Databases
K. Arnold Chan, MD, ScD
1
Harvard School of Public Health
Channing Laboratory,
Birgham & Women’s Hospital
and Harvard Medical School
Brief outline of this presentation
Using large linked automated data
for public health research
● Data development processes to
ensure HIPAA-compliance
● Examples
● Some thoughts
●
Two types of data for public health research
●
Primary data
–
–
–
●
Prospectively collected
Well-designed data collection tool
Informed consent
Secondary data
–
–
–
–
Data originally collected for other purposes
May be proprietary
Privacy and confidentiality (particularly
important if no prior authorization)
Different data systems
Large linked healthcare databases
●
Health insurance claims data
– Medicaid
– Medicare
– Managed Care Organizations (MCO)
●
Automated medical records
●
Hospital / Clinic IT systems
●
Availability of written records
●
Need to contact patients / individuals ?
Public health research within MCOs
●
Harvard Community Health Plan
(subsequently became Harvard Pilgrim
HealthCare)
●
Kaiser Permanente (several states)
●
Group Health Cooperative (Seattle area)
●
Others
●
HMO Research Network
–
10+ MCOs across the U.S.
Public health research within MCOs
●
Different types of MCOs
●
Group model
– Staff model
– Different relationship with hospitals
– Implications on data access
MCOs with research programs
–
Separate research departments
– Full-time investigators and support staff
–
Data elements in the MCO data
●
Demographic information
●
Membership
–
●
Office visits
–
●
Start date, termination date, benefit plan, ...
Type of visit, diagnosis(es), special procedures
Special examinations
–
Radiology, Laboratory examinations
●
Hospitalizations
●
Drug dispensings
●
Linkable by a unique ID
HIPAA and Research with Databases
●
●
Authorization from individual research
subjects not feasible
Individual authorization may be waived by
Institutional Review Board or Privacy Board
–
Minimal Risk
–
Data reported in aggregate fashion
●
No single-case report
–
“Minimum necessary” principle
–
De-identification
HIPAA and Research with Databases
●
Single MCO studies
–
●
Multiple-MCO studies
–
●
Investigators and research staff are MCO
employees
May involve transferral of data across MCOs or
to a Data Center
Other types of studies not covered in this
presentation
–
e.g. Generate a de-identified dataset for public or
commercial use
HIPAA and data development
●
Do not move individual level data unless
absolutely necessary
–
Generate summary tables at each study site
–
Combine the tables for final report
–
Smalley et al. Contraindicated use of
cisapride: the impact of an FDA regulatory
action. JAMA 2000; 284: 3036-9.
HIPAA and data development
●
Randomly generated Study ID to replace
True ID
–
Crosswalk between the two stored at secured
location
–
Destroy the crosswalk after successful linkage of
data and quality check
–
Implications for storage and back-up
HIPAA and data development
●
Roll-up / transform variables
–
Age --> Age groups
–
National Drug Code --> Drug or Group of drugs
–
ICD-9 diagnosis code --> Disease
e.g. A man born on Dec 10, 1934 with
diagnosis code xxx.yy received durg 55555333-22
–
65-70 y/o m with Heart Failure received Digoxin
HIPAA and data development
●
Preserve temporal sequence of events
but disguise the real dates
●
e.g. Drug use during pregnancy study
–
29 year-old received 55555-333-22 on Nov 25,
1999 and delivered a baby on Dec 10, 1999
-->
–
26-30 year-old mother delivered in 1999, baby
exposed to amoxicillin at -16 days
HIPAA and data development
●
Only extract information relevant to the study
–
●
e.g. A study of osteoporosis does not require
information on subjects' mental health status
Co-morbid conditions may be relevant
–
Use proxy measures to describe level of
comorbidity
●
●
Charlson's Index (based on concomitant diagnoses)
Chronic Disease Score (based on co-medications)
HIPAA and data development
●
Geocoding
–
Describe social-economic status of study
subjects based on census tract data
–
Send out (Study ID, address) to a geocoding firm
–
(Study ID, X1, X2, X3) returned
●
●
●
X1 : education level
X2 : income level
X3 : race/ethnicity information
An example
Finkelstein et al. Decreasing Antibiotic Use
Among US Children: The Impact of Changing
Diagnosis Patterns. Pediatrics 2003; 112: 620-7.
●
●
Data elements involved
–
Date of birth, gender
–
Membership
–
Drug dispensings
–
Diagnoses in close proximity to antibiotics
dispensings
Data from nine MCOs
Finkelstein et al. Pediatric antibiotics use study
●
●
Data development at each MCO
–
Extract antibiotics use information
–
Extract diagnosis of interest (infections)
–
Use date of birth, gender, and membership data
to calculate person-time of interest
Refined, aggregate data forwarded to the
Data Center
–
Rate of antibiotics use =
# of antibiotics use / 1,000 person-years
for each age-gender group
HIPAA and data development
●
Individual identification is needed for certain
types of research
–
Obtain medical records
–
Contact patient to conduct interview and/or
request specimen
–
Linkage with external data
●
●
Cancer registry
National Death Index
HIPAA and data development
●
●
The process
–
Data extraction, transformation, reduction, and deidentification carried out at each MCO
–
Governed by State laws and local HIPAA-compliant
Standard Operating Procedures
–
Principle of Limited Dataset / Minimum necessary
The goal
–
Highly processed and de-identified data available for
concatenation across study sites and complex analyses
k-anonymity and large datasets
●
The goal
–
A de-identified dataset at a certain level of
individual anonymity
A 43 year-old man with hypertension, diabetes,
and anxiety, taking atenolol, rosiglitazone, and
lorazepam
vs.
A man 40-45 taking a beta-blocker and a
thiazolidenedione
HIPAA, Data Storage and Access
●
Implications on Data Backup Plans
–
●
●
Data need to be destroyed after the report is
published
Data only used to support pre-defined
analyses
Ancillary analysis are possible after IRB
review and approval
Epidemiology studies using large databases
●
●
In the old days ...
–
Give me all the data, do what I say ...
–
What if the investigator / reviewer want to do
THIS analysis ?
–
Use existing datasets to test new hypothesis
Good research practice
–
Define necessary data elements according to
research protocol
–
Pre-defined analytic plan
Epidemiology studies using large databases
●
Keys to protection of human subjects
–
Competent, responsible investigators and staff
–
IRB review and oversight
–
Data development guidelines
●
–
●
e.g. Good Epidemiology Practice
Information technology
Some reasonable rules/guidelines are better
than no guideline