Health Care Databases under HIPAA - dimacs

Download Report

Transcript Health Care Databases under HIPAA - dimacs

DIMACS Working Group on Privacy /
Confidentiality of Health Data
Rutgers University Center
Piscataway, New Jersey
December 10-12, 2003
Health Care Databases under HIPAA:
Statistical Approaches to De-identification
of Protected Health Information
Judith E. Beach, Ph.D., Esq.
Associate General Counsel, Regulatory Affairs
Chief Privacy Officer
Chair, Council on Data Protection and Council on
Research Ethics
Outline
 1.Evolution of De-identification Standards – HIPAA Privacy Regulation
 2.De-identification Standards for Health Information in Research
 a. Safe Harbor
 b. Statistician Method
)HIPAA Provisions
)Quintiles Experience and Methodology
 c. Limited Data Set
 3.Preemption of State laws on De-identification Standards for Health
Information
 4.Health Information Privacy - Cases and Controversies
Evolution of De-Identification Standards in
HIPAA Privacy Regulation
5
Federal Policy: De-Identification of
Health Information
Government’s intent - to provide a balance of
stringent standards flexible enough not to be a
disincentive to use or disclose de-identified health
information, wherever possible.
De-Identified health data is one of the best
mechanisms for avoiding wrongful disclosure of
Protected Health Information (PHI).
See Draft (05/27/03) DHHS Policy and Procedure Manual “De-Identification Policy d11”
(effective date 6/1/03) - applies to DHHS agencies: HIPAA covered health care components
and Internal Business Associates
6
Federal Policy: Use of De-identified Health
Data Rather than PHI for Research
“We [HHS] expressed the hope that covered
entities, their business [associates] and others
would make greater use of de-identified health
information . . . when it is sufficient for the
[research] purpose and that such practice would
reduce the burden and the confidentiality
concerns that result from the use of individually
identifiable health information for some of these
purposes.” [HHS, in final privacy rule, 65 Fed. Reg. at 82543 (Dec. 28, 2000), citing
proposed privacy rule of Nov. 3, 1999]
7
HIPAA’s Jurisdiction
Individually Identifiable Health Information (IIHI):
 A subset of health information, including demographic information, that
identifies the individual or with respect to which there is a reasonable
basis to believe the information can be used to identify the individual
Protected health information (PHI):
 Means individually identifiable health information (IIHI = Health
Information + Identifier) that is transmitted or maintained electronically,
or transmitted or maintained in any other form or medium
 An investigator who submits health claims would be a HIPAA covered
entity (CE)
CE + Health Information + Identifier = PHI
CE + Identifier - Health Information = NOT PHI
Health Information + Identifier - CE = NOT PHI
De-identification Standards for Health
Information in Research
9
De-identified Health Information
Definition: health information that does not identify
an individual and with respect to which there is no
reasonable basis to believe that the information can
be used to identify an individual. [45 CFR § 164.514(a)]
The Privacy Rule permits de-identification of PHI so
that such information may be used and disclosed
freely, without being subject to the Privacy Rule’s
requirements.
Once de-identified, the data is out of the Privacy Rule.
10
HIPAA De-identification Standards
Two methods for the de-identification of health
information:
“Safe Harbor” -- remove 18 specified identifiers - intended
to provide a simple, definitive method for de-identifying
health information with protection from litigation
“Statistician Method” -- retain some of the 18 safe harbor’s
specified identifiers and demonstrate the standard is met if
person with appropriate knowledge of and experience with
generally accepted statistical and scientific principles and
methods, e.g., a Biostatistician, makes and documents that
the risk of re-identification is very small.
11
Limited Data Set
Final rule: added another method requiring
removal of facial identifiers -- “Limited Data
Set”
Under confidentiality agreements - for research,
public health, and health care operations
Regarded as PHI - NOT de-identified
therefore, still subject to Privacy Rule requirements
such as minimum necessary rule.
Safe Harbor Method
13
Safe Harbor
Covered entities must remove all of a list of 18
enumerated identifiers and have no actual
knowledge that the information remaining could be
used alone or in combination to identify a subject of
the information.
The identifiers to be removed include
direct identifiers such as name, address, SSN
indirect identifiers such as birth date, admission and
discharge dates, and five-digit zip code
 [45 CFR § 160.514(b)(2)]
14
Safe Harbor
The safe harbor does allow for the disclosure of
All geographic subdivisions no smaller than a
State, as well as the initial three digits of a zip
code
IF the geographic unit formed by combining all
zip codes with the same initial three digits
contains more than 20,000 people
AGE, if less than 90, gender, ethnicity and other
demographic information not listed.
15
Safe Harbor’s 18 Identifiers



Names
All geographic subdivisions smaller than a State,
including street address, city, county, precinct, zip
code, and their equivalent geocodes
 Except for the initial three digits of a zip code if
according to the currently available data from
the Bureau of the Census:
 The geographic unit formed by combining
all zip codes with the same three initial
digits contains more than 20,000 people;
and
 The initial three digits of a zip code for all
such geographic units containing 20,000 or
fewer people are changed to 000;
All elements of dates (except year) or dates directly
relating to an individual, including:
 birth date, admission date, discharge date, date
of death;
 and all ages over 89 and all elements of dates
(including year) indicative of such age, except
that such ages and elements may be aggregated
into a single category of age 90 or older;















Telephone numbers;
Fax numbers;
Electronic mail addresses;
Social security numbers;
Medical record numbers;
Health plan beneficiary numbers;
Account numbers;
Certificate/license numbers;
Vehicle identifiers and serial numbers, including license
plate numbers;
Device identifiers and serial numbers;
Web Universal Resource Locators (URLs);
Internet Protocol (IP) address numbers;
Biometric identifiers, including finger and voice prints;
Full face photographic images and any comparable
images; and
Any other unique identifying number, characteristic, or
code.
16
Sources of Authority
In Privacy Rule Preamble, HHS recognizes
two sources of authority as to what constitutes
such principles and methods for deidentification adequate for posting a deidentified database on the Internet [65 Fed. Reg. at 82,70982,710 (Dec. 28, 2000)]
“Paper 22”: Statistical Policy Working Paper 22—Report
on Statistical Disclosure Limitation Methodology
“The Checklist”: The Checklist on Disclosure Potential of
Proposed Data Releases -“intended primarily for use in the
development of public-use data products.”
16
17
Safe Harbor
 BUT many researchers and other groups have complained
that the Safe Harbor renders the de-identified data as
virtually useless for research so that the result will be
MORE research using PHI.
 No dates of service, no patient initials, no date of birth
 Can have “deltas” such as number of patient visits over time
 However, the safe harbor was NOT designed for research,
but to provide an approved method of de-identification for
any purpose by any covered entity, regardless of
sophistication.
For instance, such de-identified data would be
deemed to be safely posted on the Internet.
Statistician Method
19
Statistician Method
For this method, the covered entity
must remove all direct identifiers
reduce the number of variables on which a
match might be made
should limit the distribution of records through
a “data use agreement” or “restricted access
agreement”
[65 Fed. Reg. at 82,709-710 (Dec. 28, 2000)]
20
Opinion of Statistician
Statistician must
determine that there is a “very small risk” of reidentification
after applying “generally accepted statistical and
scientific principles and methods for rendering
information not individually identifiable”
documents the methods and results of the analysis
that justify such determination.
[45 CFR 160.514(b)(1)]
21
Statistician Method
This method has been generally ignored by
covered entities.
Who prefer a safe harbor approach with “safe”
being the operative word.
Consider the Statistician alternative as too
complicated.
2
2
Statistician Method: Quintiles Experience
An expert statistician calculated the statistical
likelihood of re-identification IF all 18 safe harbor
identifiers were removed, that is, the “deidentification probability.”
Then, the statistician calculated the likelihood of
re-identification if certain dates of service of
medical or pharmacy claims were retained
And rather than age or year of birth, which is
allowed in the safe harbor, the month and year of
birth was included.
23
Statistician’s Opinion
This calculated number, the “de-identification
probability” served as a benchmark of a “very
small risk of re-identification” against which the
statistician method would be compared.
24
Analysis: Comparison of Both Methods
To ensure the statistical likelihood of re-identification was
comparable to that of the calculated safe harbor
benchmark, the following data fields were made stricter
than as permitted by the safe harbor:
For all patients older than 85 years of age (rather than
90), the year of their birth modified to make them all 85
years old.
All five-digit patient zip codes truncated to first 3 digits
and further merged so that no resulting 3 digit code has
a total population of less than 200,000.
25
Factors Considered by Statistician
In the analysis, the statistician pointed out the obvious:
The de-identified data received is conveyed under a
confidentiality agreement, which specifically prohibits reidentification or further disclosure of the data except in
statistically aggregated form.
The database is maintained on a physically and
technically secure, password-protected server.
25
26
Statistician’s Opinion
“Applying generally accepted statistical and scientific
principles and methods for rendering information not
individually identifiable, . . . I conclude that the risk
is very small that the information . . . could be
used, alone or in combination with other reasonably
available information, by an anticipated recipient to
identify an individual who is a subject of the
information. . . . In practice the actual
reidentification probabilities are much, much lower .
. . arguably de minimis.”
26
27
Statistician Method
It is clear that most persons who have reviewed the
Privacy Rule have failed to appreciate the
significance of the statistician opinion to deidentification, and, instead, have focused almost
exclusively on the "safe harbor."
In particular, many have failed to understand the
importance of the "restricted access" as it relates to
the statistician opinion approach to de-identification.
28
Ensuring HIPAA Compliance
All data handled is de-identified using a unique
patient identifier that is irreversibly encrypted.
Patient identifiable electronic healthcare claims
(standard health claims data fields)
Data Encryption Process
De-identified data
Patient Information
Encrypted Zip* DOB** Sex
* zip = 3 digit
** DOB = modified
Upon completion of the de-identification
process a unique patient identifier is
created, which is irreversibly encrypted.
Data
Warehouse
29
Core Data Elements
Pharmacy Data
Medical Data
RX Pharmacy Data
(NCPDP)
HX Facility Data
(UB-92)
MX Provider Data
(HCFA 1500)

Anonymous Patient ID

Anonymous Patient ID

Anonymous Patient ID

Patient Age & Gender


Patient Age & Gender

Patient Age & Gender
Date Written
Diagnosis Codes (ICD9)
Date Filled
Diagnosis Codes (ICD9)



Procedure Codes (CPT)
Procedure Codes (CPT)
NDC Code





Service Dates

DRG
Quantity Dispensed

Days Supply

Physician/Provider ID

Admit Date

Refill Flag

Location of Care

Discharge Date

Prescribing Physician


Physician/Provider ID
Payor Type

Pharmacy

Location of Care

Payor Type^

Payor Type
Jan ‘98 - to date
^Note: Payor Type not available on all records
July ‘98 - to date
30
Physician Demographics
 Specialty
 Region
 Number of years in practice
 Prescribing volume
 Type of practice
 Number of HMO / PPO / IPA affiliations
 % patient volume by insurance type
 Physician race
 Physician age
31
Patient Characteristics
 Location of contact
 Height and weight
 Age
 Gender
 Race
 Blood pressure
 Cholesterol levels (total, HDL, LDL, triglycerides)
 Insurance type
 Physician reimbursement method (fee-for-service
vs. capitation)
 Smoker or non-smoker
32
Disease Entities
 Visits (with and without drugs)
 Visits per physician per year
 Total patients seeking treatment
 Newly diagnosed patients
 Visit type (first vs. subsequent)
 Referrals and referring specialty
 Severity of condition
 Tests ordered or completed during visit
 Existing medical conditions not treated
 Number of times seen and days since last visit
 Number of patient drug requests for condition
33
Treatment Regimens
 Dosage form, strength and signa
 Formulary impact
 Quantity prescribed and number of refills (mean
and frequency)
 Weighted diagnosis value
 Dispensing instructions
 Occurrences per physician per year
 Therapy type:
 New
 First-line versus adjunct therapy
 Drug replacement and reason
 Continued
34
Treatment Regimens
 Desired action
 Concomitant drugs (to treat same diagnosis)
 Concurrent drugs (regardless of diagnosis)
 Drug issuance
 Sample days of therapy (mean and frequency)
 Prescribed days of therapy (mean and frequency)
 Daily average consumption (DACON)
 Non-drug therapy
Limited Data Set (LDS)
36
HHS’ Solution: Limited Data Set
For research, public health, or health care
operations purposes
Authorization not required
A limited data use agreement must be in place
between the covered entity and the recipient of
limited data set (LDS) [45 CFR §164.514(e)]
“Data Use Agreements would only be needed for those public
health, research, or health care operation uses and
disclosures that are not otherwise permitted by federal or
state laws.” [See Draft (05/27/03) DHHS Policy and Procedure Manual “DeIdentification Policy d11”]
37
LDS = Still PHI
Regarded as PHI, that is, not de-identified
data and, therefore subject to requirements
for protection of PHI such as
Prohibits re-identification or any attempt to
contact individuals by recipient
BUT re-identification code permitted for
covered entity
Subject to minimum necessary standards
BUT no accounting of disclosures or IRB
approval
38
Limited Data Set Specifications
May be useful for records-based research such
as epidemiological and other population
research
But may NOT be useful for patient recruitment
 Because re-identification of individuals or attempt to contact
individuals is prohibited by a third party even if by Researcher
(without IRB or internal privacy board approval) unless the
contact is made by the Covered Entity or the Covered Entity’s
Workforce.
39
LDS: Remove 16 Identifiers
 Name
 Postal address information
(other than city, state, zip
code)
 Telephone number
 Fax number
 E-mail address
 Social Security Number
 Medical record / prescription
numbers
 Health plan beneficiary
numbers
 Account numbers
 Certificate / license numbers
 Vehicle identity / serial
numbers
 Device numbers
 Web URL
 IP address
 Biometric identifiers (e.g.,
fingerprints, retinal scans)
 Full face similar
photographic images
[45 CFR §164.514(e)(2)]
40
LDS: Retain Indirect Identifiers
Five-digit zip code
Dates of service (e.g., admission / discharge)
Dates of birth and death
Geographic subdivision (e.g., state, county,
city, precinct), but not street address
41
Statistical Method for Dummies
“Limited Data Set” . . .
the Statistician Method made easy.
Preemption of State Laws on Deidentification Standards for Health
Information
43
Preemption of De-identification
Standards - A View
HIPAA Statute and privacy regulation
Preemption of state law only if
The provision of state law relates to the privacy of
individually identifiable health information
 HIPAA Statute § 1178 AND 45 CFR §§ 160.202 - .204
44
Preemption of State Law: HIPAA Statute
Health information considered identifiable and,
therefore, subject to all requirements of rule
ONLY if “reasonable basis to believe that the
information can be used to identify the
individual.”
Exception to preemption - when states can
assert contrary and more stringent definition of
“individually identifiable health information”
But exception analysis does not apply to de-identified
data
45
Preemption: Deidentification Standards
Thus, states would be preempted from
enforcing a standard for deidentification that
exceeds the “reasonable basis” definition of
individually identifiable health information as
established in HIPAA statute.
Note: in response to Quintiles’ written request,
HHS responded by revising preemption section
of the Rule to refer to “individually
identifiable” health information rather than
merely health information.
Privacy Cases & Controversies:
De-identified Health Databases
47
U.S. Controversy
Quintiles Transnational Corp. v. WebMD
No demonstrable violation of HIPAA or other privacy
law by transmission and aggregation of deidentified
health data
Inhibits additional state regulation of national electronic
data system
Order of Judge Terrence Boyle.
Re de-identified data: “the Dormant Commerce Clause
prevents the individual states from regulating the
interstate transmission of data.”
 [No. 5:01-CV-180-BO(3), U.S. EDNC Western Division]
48
UK Controversy
 Regina v. Department of Health, Ex Parte Source
Informatics Ltd. [Judge Latham, 4 All ER 185, May
29, 1999; Case No. CO\4490\97, Queen’s Bench
Division]
 Judge Latham dismissed applicants' application for a
Declaration that a policy document issued in March 1996
by the Department of Health “The Protection [and] Use of
Health Information.”
49
UK: Source Informatics: Overturned on Appeal
Court of Appeals: Simon Brown, Aldous and
Schiemann LJJ: 21 December 1999
 Where a patient's identity was protected, it would not be
a breach of confidence for general practitioners and
pharmacists to disclose to a third party, without the
patient's consent, the information contained in the
patient's prescription form for marketing research
purposes.
50
UK Health and Social Care Bill: Clause 65
Department of Health included language in the
Health and Social Care Bill that would have
essentially reinstated the lower court’s opinion
(Judge Latham’s)
After heavy lobbying in the House of Lords
against Clause 65, the language was defeated.
Conclusion
The key is . . .
Safeguarding protected health information by
encouraging use of federal standards for deidentification of health data for clinical research.