What is the RecordCounter?
Download
Report
Transcript What is the RecordCounter?
NEW ENHANCEMENTS IN THE
SYNTHETIC DERIVATIVE AND WHAT
THAT MEANS FOR THE
RESEARCHER
Jacqueline Kirby
June 7th, 2013
Resources
• StarPanel
• Identified clinical data; designed for clinical use
• Record Counter
• De-identified clinical data; sophisticated phenotype searching
• Returns a number – record counts and aggregate demographics
• Synthetic Derivative
• De-identified clinical data; sophisticated phenotype searching
• Returns record counts AND de-identified narratives, test values,
medications, etc., for review and creation of study data sets
• Research Derivative
• Identified clinical data
• Programmer (human) supported
• BioVU
• Genotype data
• De-identified clinical data; sophisticated phenotype searching
• Able to link phenotype information to biological sample
What is the RecordCounter?
The Synthetic Derivative Record Counter (RecordCounter)
provides exploratory data figures and counts to members of
the VU research community for research planning purposes
and feasibility assessment.
• Available to ANYONE with the VUNET id
• Allows the user to input basic medical data, such as ICD 9
codes or text keywords, e.g., lung cancer, as well as
demographic information, and then search the Synthetic
Derivative database to determine the approximate number
of records that meet those criteria.
What is the Synthetic Derivative (SD)
• Rich, multi-source database of de-identified clinical and demographic
data
• User Interface tool that can be used for access and analysis
• Services are available to help deliver results for non-standard
queries (temporal queries, controls matching, etc)
• Contains ~2.3 million records
• ~1 million with detailed longitudinal data
• averaging 100k bytes in size
• an average of 27 codes per record
• Records updated over time and are current through December, 2012
• Soon to be 5/31/2013
The RecordCounter Vs. The SD
The RecordCounter – Users can use search criteria to
return exploratory counts (The results returned are not
exact and are meant for a high level assessment of the
available data.)
The SD - User can use search criteria to returns exact
count and the associated longitudinal data for review.
What is BioVU?
•
•
•
•
•
The move towards personalized medicine requires very large
sample sets for discovery and validation
BioVU: biobank intended to support a broad view of biology and
enable personalized medicine
Contains de-identified DNA extracted from leftover blood after
clinically-indicated testing of Vanderbilt patients who have not
opted out
Linked to Synthetic Derivative: de-identified EMR
Current sample number: 166,397
o147,292 adult samples
o19,220 pediatric samples
Synthetic Derivative vs. BioVU
Synthetic Derivative Data Types
Documents, such as:
• Clinical Notes
• Discharge Summaries
• History and Physicals
• Problem Lists
• Surgical Reports
• Progress Notes
• Letters
Diagnostic Codes, Procedural Codes
Forms (intake, assessment)
Reports (pathology, ECGs, echocardiograms)
Clinical Communications
Lab Values and Vital Signs
Medication Orders
TraceMaster (ECGs)
Tumor Registry
Technology + policy
De-identification
• Derivation of 128-character identifier (RUI) from the MRN generated by
Secure Hash Algorithm (SHA-512)
• HIPAA identifiers removed using combination of custom techniques and
established de-identification software
Date Shift
• Our algorithm shifts the dates within a record by a time period (up to
364 days backwards) that is consistent within each record, but differs
across records
Restricted access & continuous oversight
•
•
•
•
Access restricted to VU; not a public resource
IRB approval for study (non-human)
Data Use Agreement
Audit logs of all searches and data exports
The New SD…
Synthetic Derivative 3.0 was launched with on February 25, 2013. SD
3.0 leverages the power of an IBM Netezza data warehouse appliance
to provide faster, near-immediate counts as the user builds their search
criteria and new review features that includes enhanced data
visualization and covariate annotation capabilities.
SEARCH: Counts are provided for each search item in real-time as you
build your algorithm letting you adjust your criteria immediately.
Modifiers for ICD 9 codes allow searches to require 2 or more codes.
REVIEW: Filter and highlight documents, medications and labs to make
review efficient.
ANNOTATE: Create your own set-based annotations that are sharable
across the study team.
General algorithm for determining a
phenotype
• Definition of phenotype for cases and controls is critical
• May require consultation with experts
• Basic understanding of data elements; uses and
limitations of particular data points is important
• Reviewing records manually to make case determination
(or even to calculate PPV of search methodology) will be
somewhat time consuming
The problem with ICD9 codes
• ICD9 give both false negatives and false positives
• False negatives:
• Outpatient billing limited to 4 diagnoses/visit
• Outpatient billing done by physicians (e.g., takes too long to find the
•
unknown ICD9)
Inpatient billing done by professional coders:
• omit codes that don’t pay well
• can only code problems actually explicitly mentioned in documentation
• False positives:
• Diagnoses evolve over time -- physicians may initially bill for suspected
•
•
diagnoses that later are determined to be incorrect
Billing the wrong code (perhaps it is easier to find for a busier clinician)
Physicians may bill for a different condition if it pays for a given treatment
• Example: Anti-TNF biologics (e.g., infliximab) originally not covered for psoriatic
arthritis, so rheumatologists would code the patient as having rheumatoid arthritis
Lessons from preliminary phenotype
development
• Eliminating negated and uncertain terms:
• “I don’t think this is MS”, “uncertain if multiple sclerosis”
• Delineating section tag of the note
• “FAMILY MEDICAL HISTORY: Mother had multiple sclerosis.”
• Adding requirements for further signs of “severity of disease”
• For MS: an MRI with T2 enhancement, myelin basic protein or
oligoclonal bands on lumbar puncture, etc.
• This could potentially miss patients with outside work-ups,
however
Once you have logged in…
The New SD gives a cleaner Home
page interface with aggregate SD
graphs.
New features for the Investigator:
• A welcome and
announcement section to give
the Investor any immediate
information/Help when
accessing the SD
• Overall SD/BioVU population
demographics with to give an
up-to-date population details
of the resource
Improved Search Features
Once you have selected “Start a New Search”, you will go to the Search Interface. Users can
select search criteria to see record counts by dragging and dropping Search Criteria (e.g.
ICD codes, Labs, Document Keywords, Medications) into the Search box.
New Search Features include:
• Counts for each specific criteria element as denoted to the right hand side of the search
box(circled in red), summary counts for combined criteria (this OR that) indicated at the
bottom of the group box(circled in blue), and a final Total count at the right corner of your
search(circled in green)
• Limit Search To BioVU Records, Non-compromised BioVU Samples, or only BioVU
Samples available for external assay
• Limit your search based on number of ICD code occurrences in the subject record to
require multiple instances of a ICD code
Improved Set Review
After you have build your set, you can be begin
reviewing your records. The New SD has both a
Summary view to see a high level graphic view
of a subject AND a Detail view that allows you to
customize your view with a new Tabular view.
What’s new in Review:
• Subject ids listed on left hand side to move
easily through the records.
• Tabular view of the different data elements
with custom sorting of tabs
• Arial buttons for determining Subject status
New Data Visualization Features
In the Summary tab and in the Vitals
view, the new SD has new data
visualization features that allow a
reviewer to get a quick view of a
subject’s longitudinal data.
Improved Document View
Documents are divided into three tabs:
• High Value Documents
• Other Documents
• Problem Lists
On each Document tab, you can
1. Filter based on Keywords, Document Type,
Subtypes
2. Filter keywords searches and display only
the context
3. Highlight based on Keywords and display
either the full documents or the word(s) in
context
New Medications and Labs Display
Medication and Lab view now have two
displays for easier review. The Summary view
displays aggregate mentions of meds/labs with
beginning and end dates. The Details view
show each instance of the meds/lab full detail
display with the ability to filter by data
element.
Improved Annotations
Annotations allow for easier identification
and saving of covariate information during
set review. Create your own set-based
annotations that are sharable across the
study team. These can be exported to
excel when performing your data analysis.
What’s Next?
• Data Export into REDCap
• Adding PheWAS to the search criteria
• Predict Labs in the Lab view
• Custom and Timeline View
• ….
The SD has evolved greatly in the past six months and this
is largely due to suggestions and needs from its users.
Please let us know what YOU would like in the SD so that
the SD can continue to evolve.
SD Access Protocol
Requests IRB
Exemption
Researcher
Enters
StarBRITE to
complete
electronic
application
(IRB status is
in StarBRITE)
Signs DUA
SD staff
verify/
access
granted
Researcher
accesses
SD
Leveraging VICTR Resources
• Record Counter (RC) – part of SD but open to anyone
with a Vunet ID:
https://biovu.vanderbilt.edu/RC/RC.html
• SD (/BioVU) – Erica Bowton (via StarBrite)
• RD – email or call me, or fill out a Request form at
https://starbrite.vanderbilt.edu/
• (https://starbrite.vanderbilt.edu/managedata/datarequest.h
tml )
Questions or Comments?
SD User Group Sessions will be held the fourth
Wednesday of each month at 1 pm. All are welcome.
Time: 1:00-2:00 PM
Location: Light Hall, Room 439
If you have any questions or feedback about the new SD,
please contact us, email
[email protected]
THANK YOU!