Duke Clinical Research Institute
Download
Report
Transcript Duke Clinical Research Institute
Linking Large Datasets
Why, How, and What Not To Do
Bradley G Hammill
Duke Clinical Research Institute
Presenter disclosure information
Bradley G Hammill
Linking Large Datasets: Why, How, and What Not
To Do
FINANCIAL DISCLOSURE:
None
UNLABELED/UNAPPROVED USES DISCLOSURE:
None
Acknowledgements
Thanks to:
Lesley Curtis
Adrian Hernandez
Gregg Fonarow
Kevin Schulman
Work initially funded by grant from GSK
Why link Medicare data to registry data?
Typical inpatient registry
Medications
Vitals
Lab results
Procedures
Clinical history
In-hospital events
etc.
Long-term follow-up?
Why link Medicare data to registry data?
Mortality
(or censoring)
Inpatient
Potential endpoints
Mortality
Readmission
Procedure
Adverse events (based on diagnoses)
Why not link Medicare data to registry data?
Linking will not help us address the limitations of
either data source
Medicare
No information on VA hospitals or managed care
patients
Selective coverage under age 65
Registries
Voluntary participation
May over-represent certain regions or hospital types
Data quality varies
How to link Medicare data with registry data
Direct identifiers
Name, address, SSN, date of birth, etc.
Goal: Identify each registry patient in the Medicare
data
Indirect identifiers
Service dates, date of birth (or age), sex
Goal: Identify each registry hospitalization in the
Medicare data
Linking registry data to Medicare claims
Step 1. Subset registry data
Step 2. Subset Medicare data
Step 3. Link hospital identifiers
Step 4. Link hospitalization records
Described in:
Hammill BG, Hernandez AF, Peterson ED, Fonarow GC,
Schulman KA, Curtis LH. Linking inpatient clinical registry
data to Medicare claims data using indirect identifiers. Am
Heart J 2009 June;157(6):995-1000.
You will have this conversation [Episode 1]
Me:
You know, we can link these data to
Medicare.
Adrian: How? We don’t know who the hospitals or
the patients are?
Me:
Turns out you don’t really need to know
those things.
[Brief explanation of how to link]
Adrian: (flustered) This feels like a giant leap of faith.
Percent of unique records within sites
2007 Medicare HF Records
Admit
Percent of unique records within sites
2007 Medicare HF Records
Admit
Discharge
Percent of unique records within sites
2007 Medicare HF Records
Admit
Discharge
DOB
Percent of unique records within sites
2007 Medicare HF Records
Admit
DOB
Percent of unique records within sites
2007 Medicare HF Records
Admit
Discharge
2/3 DOB
Percent of unique records within sites
2007 Medicare HF Records
Admit
Discharge
Age
Percent of unique records within sites
2007 Medicare HF Records
Admit
Discharge
Age
Sex
Percent of unique records within sites
2007 Medicare HF Records
Admit
Age
Sex
Percent of unique records within sites
2007 Medicare HF Records
Admit1d Discharge
Age
Sex
Percent of unique records within sites
2007 Medicare HF Records
Admit
Discharge
Age1y
Sex
Distinguishing records (DOB available)
Within sites, what percent of 2007 Medicare HF records
are unique given…
Variables
Admit
Discharge
Admit
Discharge
Admit
Discharge
Admit
Discharge
DOB
DOB
DOB
2/3 DOB
DOB
Sex
Sex
Sex
Sex
Unique
>99.9%
>99.9%
>99.9%
99.9%
>99.9%
Distinguishing records (Age available)
Within sites, what percent of 2007 Medicare HF records
are unique given…
Variables
Admit
Discharge
Admit Discharge1d
Admit1d Discharge
Admit
Discharge
Admit
Discharge
Age
Age
Age
Age1y
Age
Sex
Sex
Sex
Sex
Unique
99.4%
98.5%
98.4%
98.3%
98.9%
Distinguishing records, in general
Fewer records per site = Higher % unique records
Population
All records
Heart failure, any
Heart failure, primary
CABG procedure
ICD / CRT procedure
2007 HF Records per Site
Median (Q1, Q3)
456 (194, 1734)
89 (22, 391)
64 (20, 168)
71 (36, 124)
19 (6, 50)
Linking registry data to Medicare claims
Step 1. Subset registry data
Limit to records for patients 65 years or older
Step 2. Subset Medicare data
Step 3. Link hospital identifiers
Step 4. Link hospitalization records
Example registry data to be used for linking
OPTIMIZE-HF population
Adults hospitalized for episodes of new or worsening
heart failure
2003–2004
52,879 records from 255 sites overall
39,178 records for patients 65+ (74% of total)
Linking registry data to Medicare claims
Step 1. Subset registry data
Step 2. Subset Medicare data
Limit to records for patients 65 years or older
Limit using similar entry criteria as registry, if
possible
Step 3. Link hospital identifiers
Step 4. Link hospitalization records
Example Medicare data to be used for linking
Medicare inpatient population
Hospitalizations with a diagnosis of HF in any position
(ICD-9-CM Dx 428.x, 402.x1, 404.x1, 404.x3)
2003–2004
Age 65+
5.5m records from >5000 sites overall
Linking registry data to Medicare claims
Step 1. Subset registry data
Step 2. Subset Medicare data
Step 3. Link hospital identifiers
Link records on exact values of all fields (service
dates, date of birth, sex)
Use resulting matches to inform links
Step 4. Link hospitalization records
OPTIMIZE-HF sample site link results
OPTIMIZE Site
1
Using DOB
Medicare Site
Exact Matches
A
105
E
1
F
1
Using Age
Medicare Site
Exact Matches
A
114
K
7
L
6
1217 others
5
2
B
G
40 others
589
2
1
B
M
N
3420 others
631
28
28
26
3
C
D
H
I
29
25
1
1
C
D
O
938 others
32
27
4
3
4
--
--
P
Q
541 others
4
4
3
OPTIMIZE-HF site link results
Of 255 registry sites…
247 (97%) identified in Medicare
All non-VA sites with 25+ records identified
Linking registry data to Medicare claims
Step 1. Subset registry data
Step 2. Subset Medicare data
Step 3. Link hospital identifiers
Step 4. Link hospitalization records
Determine rules to apply
Decide if one-to-one correspondence needed
Go!
Get follow-up data from Medicare
OPTIMIZE-HF hospitalization link results
Combinations used
Admit
Discharge
Admit
Discharge
Admit
Discharge
Admit
Discharge
Records
Identified
DOB
DOB
DOB
2/3 DOB
DOB
Sex
Sex
Sex
Sex
24,750 (86%)
1,171 (4%)
590 (2%)
2,258 (7%)
284 (1%)
Of 39,178 eligible registry hospitalizations…
31,753 (81%) identified in Medicare
25,964 unique patients
You will have this conversation [Episode 2]
Me:
This is done using deterministic matching.
Adrian: No, that’s clearly probabilistic matching.
Me:
Actually, it’s not.
Adrian: Sure it is. We didn’t have names or SSNs.
Deterministic v. Probabilistic Linking
Deterministic
Rule-based
The rule determines the result
Probabilistic
Based on statistical theory
Characteristics assigned weights and potential links
are scored
Data-based score threshold determines the result
You will have this conversation [Episode 3]
Me:
(excited) We were able to link 75% of the
eligible records!
Adrian: Golly, that seems low.
Me:
It’s about what I expected.
Adrian: But [another registry] said they linked 98%.
Why might registry records not link to Medicare?
Sample site
Linked to Medicare
All HF patients
Not linked to Medicare
Why might registry records not link to Medicare?
In Medicare claims, but…
Inconsistent coding of procedures or
diagnoses
Inconsistent service dates or patient info
Not in Medicare claims due to…
Medicare as secondary payer
Medicare managed care enrollment
Age
VA hospital (site-level)
Histogram of OPTIMIZE-HF site link rates
You will have this conversation [Episode 4]
Adrian: The registry didn’t capture [obesity, anemia,
etc.]. Now we can use prior claims to get
that information.
Me:
We’re going to lose a bunch of patients if we
try that.
Adrian: But it’s so worth it.
Me:
Maybe not for that particular information,
though.
Other uses of Medicare data
Utilizing claims prior to registry hospitalization
Requires prior enrollment in Medicare FFS
8% of OPTIMIZE-HF patients did not have 12 months of
prior claims
Inpatient data only can be limiting
Need to understand coding limitations
e.g. Anemia is poorly coded
You will have this conversation [Episode 5]
Adrian: I want to validate our registry with these
links.
Me:
You can’t easily do that with these data.
Adrian: Sure we can, because now we know which
Medicare patients are in the registry.
Me:
True, but that’s not the whole story.
Validation issues
If you start with the registry population…
You usually do not know exactly who you should find
in Medicare claims data
Cannot validate VA sites
Cannot validate managed care patients
Cannot validate younger patients
Assumes all “linkable” records were linked
Validation issues
If you start with the Medicare population…
You usually do not know exactly who you should find
in registry data
Physician groups may be the registry participants, not
hospitals
Assumes all “linkable” records were linked
Registry may have allowed sampling at larger sites
Do you want to link data with Medicare?
Important caveats
Acquisition requires major investment in claims data
and infrastructure
Use of Medicare claims data governed by strict data
use agreements (DUA)
Delays in data release are common
[Currently available through 2008]
Why stop at inpatient Medicare data?
Medicare data
Inpatient
Outpatient / Physician
Pharmacy
Mortality
(or censoring)
Why stop with Medicare claims data?
Other claims data sources exist
Private insurer databases
But more difficult as smaller % of site hospitalizations
covered
Payer
Medicare
Medicaid
Private
Other (incl. self-pay, charity)
Age 18-64
15%
20%
48%
17%
Age 65+
89%
1%
8%
2%
[Source: 2007 HCUP NIS, excluding maternal/neonate-related admissions]
Conclusion
You can link your registry to Medicare claims
Get long-term follow-up for registry patients 65+
enrolled in fee-for-service Medicare
However…
Manage expectations
Understand claims data limitations
Contact Information
Brad Hammill
[email protected]