What is HIC - National e

Download Report

Transcript What is HIC - National e

Data Anonymisation and Linkage
Alison Bell
Senior Data Analyst / Programmer
Health Informatics Centre (HIC)
University of Dundee
What is HIC ?
The Health Informatics Centre (HIC) is a
partnership
between the University of Dundee, NHS Tayside and the
Information Services Division of NHS National Services
(ISD). It’s a shared research resource with strong
scientific traditions, built on MEMO work since early
1980s.
HIC provides authorised researchers and others with
anonymised extracts of information derived from personspecific data sets captured by the NHS, University of
Dundee researchers and others, to help them answer
research questions and address important quality and
patient safety issues.
HIC Structures
• Staff and facilities managed by HIC Executive
• User input: HIC User Group
• Governance
- Confidentiality & Privacy Advisory Committee
(HICCPAC)
- Users Forum
- Annual External Audit
Issues that HIC addresses
Governance: linkage then anonymisation carried out in
NHS domain
Trust in access to NHS data through approved
SOPs, Privacy Advisory Committee, “Clinical Information
Bureau”
Deterministic linkage via single patient identifier
Continually improving data quality through clinical
use of data & HIC Users’ Group
Ecological fallacy: person, not practice, based data
Information governance
Physical security:
• Isolation of servers holding identifiable data and staff working with
it
• Reliable backup and recovery mechanisms
• Separation of functions on NHSNet, JANET
Governed by Confidentiality & Privacy Advisory Committee
• Members include lawyer, GP, Caldicott Guardians, Director Public
Health
Management tools:
• Standard Operating Procedure
• Adverse incident reporting mechanism on intranet
• Project management system enforces SOP
• Annual external audit by information security experts & table of
issues reviewed monthly by HIC Exec
HIC Standard Operating
Procedure
Covers:
• Acquisition & anonymisation of datasets
• Requesting access to data
• Project level anonymisation (Pro-CHI)
• Release & archival of datasets
• Reversal of anonymisation
Includes:
• Definitions
• Appendix summarising 8 data protection
principles
• Declaration & signature
HIC has Caldicott & Ethics approval to supply anonymised data to
approved research projects
HIC project management
system
• Allocates each project a unique ID
• Captures:
–
–
–
–
–
–
–
–
Identity & contact details of “approved researcher”
Project funder
Project abstract
Copies of approval from Ethics & Caldicott (if
required), NHS R&D, protocol
Data sources and versions
Exact syntax used to generate & link data extracts
Audit trail of all data releases
Exact location of archived datasets once project
complete
Available HIC Data
• HIC hosts a large number of Tayside data sets
received from various sources (ISD, PSD,
GRO, Ninewells Labs etc.)
• These cover various populations, time periods
and use a variety of coding systems
• Each of these patient-specific data sets contain
the patient CHI number allowing linkage across
multiple data sets
• HIC currently has approval to provide Tayside
data only, but seeking to extend to Fife &
Glasgow soon
How data are linked and anonymised
CHI labelled data
CHI labelled data
Paper
Paper
prescription-ID
Paper- ID
prescription
prescription - ID
Find and
enter CHI
Drug dataDrug dataCHI dataDrug
CHI
CHI
Link using
CHI
Lab result-ID
Lab result - ID
Lab result - ID
Find CHI
Data Provider mainly NHS
Fully
anonymised but
linked data
Drug data,
Drug data,
lab
dataDrug
data, Delete CHI
lab dataCHIdata- Add Pro-CHI
lab
CHI
CHI
Lab dataLab dataCHIdataLab
CHI
CHI
Clinical Information Bureau
Drug data,
Drug
data,
lab data
Drug
data,
lab data
lab data
Analysis
Academia
Anonymisation Process
• Every research dataset has its own project level
anonymisation (Pro-CHI) applied to the data before
being released to a researcher.
• Purpose written software generates the Pro-CHI based
on the Project Management unique ID & the CHI
– A 3-digit alphabetic code is generated based on the PM ID (to
base26) eg. 165 translates to agj
– The last 7 digits are randomly generated
– Eg. (CHI)1212345678 = (Pro-CHI) agj8394601 under project 165
• All research data relating to a specific project will have
the same 3-digit code.
• All other patient identifiers are removed (eg name,
address etc)
• Other anonymisations are performed – anon DOB, anon
GP code
• If any identifiable data is required, specific Caldicott
approval must be granted
A bit more about the
prescribing data set …..
• The Tayside prescribing data set is unique to the UK.
• It is a database of all Tayside encashed prescriptions,
including CHI, date prescribed and drugs dispensed.
• Prior to 2005, paper prescriptions were scanned by the
data entry clerks and all prescription details were
entered manually using a purpose-built application.
• Since 2005, PSD have been automatically sending
HIC the scanned prescription images and associated
data.
– 300,000 prescriptions per month (total 14.5m in dbase from 2005)
– 13 GB .tif images per month (front and back)
– 17% (50,000) still require data entry (CHI) each month
Users of HIC data 2004-9
93 projects totalling £16m (£3.2m pa), inc:
–
–
–
–
–
–
–
–
Diabetes research
Maternal & Child Health
Dental Health Services Research
Cardiovascular
Genetics
Health Informatics
Drug Safety
Scottish Longitudinal Studies Centre
Examples of recent
studies using
prescription data
• Influence of apo-e & other genotypes on
response to statins (Louise Donnelly, GSK
studentship)
• Adherence: to insulin (Morris et al, Lancet); to
sulphonylureas (Donnan et al Diab Med, Evans
et al Diab Med)
• Drug safety studies: corticosteroids and risk of
fracture (Donnan et al); statins (Li Wei);
methadone (Fahey); methotrexate (Guthrie)
• Markers for co-morbidity, eg. emergency
admissions study (Donnan)
Future plans
•
Enhanced HIC service including
–
•
•
•
Programming, statistical, Clinical Trials Unit support, data
management
Scaling up to a Scotland-wide Health Programme (SHIP)
Rolling out novel research data mechanism to further
improve information governance: MILA
Pilot study – obtaining identifiable retinal images from
Ninewells eye clinic (300 images @ 5 MB each) &
anonymise for research
Conventional Record-Linkage
Confidentiality?
Governance?
Scalability?
Recipient
Generate identifier substitutions
and deliver to recipient
Data
sources
Trusted
repository
(PAC Oversight and
SOPs)
Data
sources
MILA:
Multi-Institutional Linkage & Anonymisation
(89)
Recipient
(17 -> 2)
(89 -> 2)
Confidentiality 
Governance 
Scalability 
(17)
Person (IDA, IDB, …)
Person 1 (17, 89, …)
Person 2 (…)
…
Linker (holds
identifiers)
B (89 -> 2)
A (17 -> 2)
Data
sources
Some research data
mechanisms
Mechanism
Pros
Cons
Ad hoc – no
governance, reuse
Project specific, ad
hoc data collections
Simple, personal,
researcher in
control
Data warehouse
Copies of all data in Threat to trust,
one place
privacy
GRID computing &
eScience techniques
No copies of data
Is it trustworthy ?
Is it scalable ?
Multi Institution
Linkage &
Anonymisation
(MILA)
Transparent, data
owners retain
control
In development –
pilot complete
How MILA matches the
requirements
Stakeholder
Patients, the
public
Data
owners, eg .
NHS
Researchers
Requirements
• Trust that mechanism respects consent & privacy
?
• Data used once, for intended purpose only 
• Promotes research and knowledge creation 
• Trust that mechanism always secure, follows law ?
• No work to provide or update dataset  (a benefit ?)
• Due credit given 
• Trust in data provenance, quality, completeness ?
• Wide range of datasets (data owners trust
mechanism) ?
• Dataset descriptions, scoping searches 
• Data anonymised but linkable 
• Simple, rapid, cheap data extracts 
• Long term data curation 
Sir Alan Langlands, September 2005