Mining biomedical terminology from literature

Download Report

Transcript Mining biomedical terminology from literature

CDT Seminar Overview:
Health Informatics
Clinical informatics
 Large amount of clinical data – BIG DATA




EHR, hospital discharge letters
guidelines, protocols, etc.
tests, measurements,
medical literature (case notes, ...)
 Ultimate aim: MAKING SENSE OF THIS DATA
to support clinical research and facilitate
clinical decision support
 Close collaboration with clinical teams and
pharmaceutical industry, local and wider
Health e-research centre (HeRC)
New £18M centre to be opened soon
Datasets
Link
Value
Science and Industry
(R&D)
Link
Ingredients
Experts
Insights
Data Quality
Improved Care for
Patients and Communities
(Service)
Methods
Health e-research centre (HeRC)
 CS areas in need
 Data management
 Machine learning, data mining
 Text mining
 Information management
 privacy preservation
 User interface design
 High-performance computing
 Knowledge management
 ontologies, logics, Bayesian modelling
 reasoning
Clinical text mining
 Extract data from Electronic Health Records
(EHRs)
 Challenges
 Highly condensed text
 often without proper sentences
 list of medications, symptoms, acronyms, etc.
 Terminological variability and ambiguity
 orthographic, acronyms, local conventions
 Various sections
 previous history, social/family background
 Recording “practice” vary
 aneurism size: ‘large’, between 20-30mm
Patient: X
Date: 12.02.2007.
Medication: Enalapril 20mg
Duration: 7 days
Frequency: 2 X 1
Mode: oral
Reason: hyperthension
Dg. cardiac arrest, ….
Example: extract status of diseases
UoM performance (ranked 1st/28)
Micro-average: Accuracy (0.9723)
Macro-average: P (0.8482), R (0.7737), F-score (0.8052)
#Eval
#Corr
#Gold
Precision
Recall
F-score
Y
2267
2132
2192
0.9404
0.9726
0.9562
N
56
40
65
0.7142
0.6153
0.6611
Q
12
9
17
0.7500
0.5294
0.6206
U
5709
5640
5770
0.9879
0.9774
0.9826
Yang, H., Spasic, I., Keane, J., Nenadic, G.: A Text Mining Approach to the Prediction of a
Disease Status from Clinical Discharge Summaries, JAMIA 16(4):596-600
Clinical “narratives”
very anxious
dry cough
feeling low
no herion use
Mining health-care Web 2.0
Sentiment mining
of health-related
social media
 e-epidemiology
 suicide prevention
 quality of life
assessment
 ...
HeRC research themes
 CoOP
 “Coproducing observation with patients”
 MOD
 “Missed opportunities detector”
 SEA-3
 “Scalable endotypes of asthma, allergies and
andrology”
 DOT
 Diabesity outcomes translator
 FIN
 Trials feasibility improvement network
Linked2Safety
 An advanced environment for clinical research
 based on clinical care information in EHRs and clinical
trial systems
a) early detection of patients’ safety issues
b) identification of adverse events
c) identification of suitable cohorts for clinical trials
 Use semantic technologies (Linked Data) and
data/text analytics
 Inter-disciplinary at Manchester involving
CS, Medicine and Mathematics
http://www.linked2safety-project.eu/
Clinical document management
 Dynamic documentation knowledge services
 find the right forms/questions depending on the
patient and clinical observations
 reasoning
 present it to the users
 Tasks/areas
 Modelling (ontologies, description logics, SW)
 Data analytics and integration
 User interface design
Systems biology
 Large-scale extraction and contextualization of
biomolecular events
 extraction of host-pathogen interactions
 molecular modelling of thyroid cancerogenesis using
text mining
 Modelling dynamics of small blood vessels and
roles of smooth muscle cells
 combine literature mining and structured data
Contacts
 Goran Nenadic
 text mining, information management
 e-health research
 Bijan Parsia
 Knowledge management, reasoning
 GUI
 John Keane
 data management/analytics
 decision support systems