Transcript Open access

Open access – making the most
of biomedical literature mining
Lars Juhl Jensen
EMBL Heidelberg
why open access?
why biomedicine?
why literature mining?
MEDLINE
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1
homolog) directly phosphorylated Swe1 and
this modification served as a priming step to
promote subsequent Cdc5-dependent Swe1
hyperphosphorylation and degradation
information retrieval
finding the papers
if you can’t find them …
… they don’t exist!
ad hoc retrieval
users-specified query
“yeast AND cell cycle”
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1
homolog) directly phosphorylated Swe1 and
this modification served as a priming step to
promote subsequent Cdc5-dependent Swe1
hyperphosphorylation and degradation
MEDLINE
abstracts
complete papers
tricks
stemming
yeast / yeasts
synonyms
yeast / S. cerevisiae
dynamic query expansion
next logical step
ontologies
annotation
Cdc28  yeast gene
Cdc28  cell cycle
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1
homolog) directly phosphorylated Swe1 and
this modification served as a priming step to
promote subsequent Cdc5-dependent Swe1
hyperphosphorylation and degradation
“yeast AND cell cycle”
entity recognition
identifying the substance(s)
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1
homolog) directly phosphorylated Swe1 and
this modification served as a priming step to
promote subsequent Cdc5-dependent Swe1
hyperphosphorylation and degradation
if you can’t find them …
… they don’t exist!
abstracts
MEDLINE
tricks
good synonyms list
manual curation
orthographic variation
CDC28
Cdc28p
disambiguation
hairy
SDS
Cdc2
information extraction
formalizing the facts
co-mentioning
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1
homolog) directly phosphorylated Swe1 and
this modification served as a priming step to
promote subsequent Cdc5-dependent Swe1
hyperphosphorylation and degradation
NLP
Natural Language Processing
Mitotic cyclin (Clb2)-bound Cdc28 (Cdk1
homolog) directly phosphorylated Swe1 and
this modification served as a priming step to
promote subsequent Cdc5-dependent Swe1
hyperphosphorylation and degradation
Gene and protein names
Cue words for entity recognition
Verbs for relation extraction
[nxexpr The expression of
[nxgene the cytochrome genes
[nxpg CYC1 and CYC7]]]
is controlled by
[nxpg HAP1]
new discoveries
text mining
temporal trends
buzzwords
grant applications
global correlations
3592
79
32
Regulates
83
Regulated
P < 910-9
transcriptional networks
3704
27
11
Phosphorylates
44
Phosphorylated
P < 210-7
signal cascades
3625
107
8
Expression
47
Phosphorylation
P < 510-4
integration of text and data
network mining
linking genes to diseases
multifactorial diseases
genotype to phenotype
where are we now?
abstracts
complete papers
restricted access
open access
the tools are there
now we need the text!
Acknowledgments
Jasmin Saric
Rossitza Ouzounova
Michael Kuhn
Isabel Rojas
Miguel Andrade
Peer Bork