Smith_Lecture1 - Buffalo Ontology Site
Download
Report
Transcript Smith_Lecture1 - Buffalo Ontology Site
How Informatics Can Drive
Your Research
Barry Smith
http://ontology.buffalo.edu/smith
1
Four Lectures
Today
Introduction
2/15
How Electronic Health Record
(EHR) Data Can Drive Your
Research
2/22
Case Study: Pain and Mental
Health
2/29
Case Study: Alzheimer’s Disease
2
Agenda
How to
• 1. use data gathered in the clinic
• 2. use (and find) publicly available
clinical data
• 3. use (and find) publicly available
biological data relevant to clinical work
• 4. use data to address NIH funding
requirements
Albert Goldfain: Bad and good practices
3
Agenda
How to
• 1. use data gathered in the clinic
• 2. use (and find) publicly available clinical
data
• 3. use (and find) publicly available
biological data relevant to clinical work
• 4. use data to address NIH funding
requirements
Albert Goldfain: Bad and good practices
4
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
How informatics can help the clinician
observation &
measurement
data
organization
diagnosis
use
=
outcome
verify
add
Δ
Generic
beliefs
treatment
R T U New York State
Center of Excellence in
Bioinformatics & Life Sciences
How informatics can help the researcher
observation &
measurement
data
organization
hypothesis
use
=
outcome
add
Δ
(instrument and
study optimization)
verify
further R&D
Generic
beliefs
The meaning of life
• Choose your disease and devote your life
to finding a cure
• Form a cohort of patients
• Assemble maximally accurate data for all
the patients in this cohort
• Use this data to forge links with other
researchers who find your data valuable
• Important: In the era of genomic medicine,
only structured data is valuable
7
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDR
KRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTL
SLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYM
FLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRA
CALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCAC
TARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTR
RIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDP
NQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGS
RFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCS
FSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEI
YMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPV
RNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQS
QFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMF
NLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVV
WIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGG
LCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIE
RMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTAST
NVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATT
TESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTS
ATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTN
SNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSEN
MNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEAL
AVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTR
GKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKG
GVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSM
LIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDG
8
RFDILLCRDSSREVGE
The 3 big problems
• ICD
• Free text
• EHR coding systems
9
The 3 big problems
• ICD
– used for billing
– used sloppily, and inconsistently, and …
– is a poorly structured coding scheme
• Free text
• EHR coding systems
10
The 3 big problems
• ICD
• Free text
– what your colleagues have been using to
create all those paper records
– is not structured
– you will need to use high quality codes
• EHR coding systems
11
The 3 big problems
• ICD
• Free text
• EHR coding systems
– EHR’s are used inconsistently, and
sloppily
– with lots of free text
– but most of all:
• there are many different systems
12
The problem of data silos
You need to create a patient
database of your own
And populate this database through manual
chart review
Drawing in data from all relevant sources
(both patient data and biology data)
But how to do this?
You will be creating an Excel spreadsheet
But how to do this? (How will you avoid
silos?)
14
Problems with databases
How to find and integrate other people’s
data?
How to reason with data when you find it?
How to understand the significance of the
data you collected 3 years earlier?
Part of the solution must involve use of
standardized coding schemes
15
An example: SNOMED-CT
Systematized Nomenclature of Medicine
http://snomed.vetmed.vt.edu/sct/menu.cfm
• Very large, internationally maintained
• Covers the whole of medicine
• Still poorly and sloppily used (mainly for diagnosis
and treatment; not for etiology, signs, symptoms)
• Built by pathologists
• Will help to structure your data
16
Examples of Buffalo sources
• Roswell Biosample repository
• Pharmacology data (Medicare/Medicaid)
• Buffalo Ontology Group
17
18
19
20
21
You will need to embrace this strategy in
any case if you want to get funding
NIH Mandates for Sharing of
Research Data
Investigators submitting an NIH application seeking
$500,000 or more in any single year are expected to
include a plan for data sharing
(http://grants.nih.gov/grants/policy/data_sharing)
22
You don’t need to become a computer
scientist for this strategy to work
Werner Ceusters (CoE, Psychiatry)
Jason J. Corso (Computer Science)
Alexander D. Diehl (Neurology)
Albert Goldfain (Blue Highway Inc.)
Alan Ruttenberg (Director, UB Clinical and Translational
Data Exchange, Dental School)
Alexander C. Yu (Infectious Disease)
+ Barry Smith
http://org.buffalo.edu
23
http://org.buffalo.edu
24
Next week
2/15
How Electronic Health Record
(EHR) Data Can Drive Your
Research
On the pitfalls and the promise of EHRs
with Peter Winkelstein
25