Transcript Barend Mons
•Peter Bram ‘t Hoen
•Ellen Sterrenburg
•Herman van Haagen
•Allessandro Botelho-Bovo
•Judith Boer
•Johan den Dunnen
•Gert Jan van Ommen
•Erik van Mulligen
•Martijn Schuemie
•Rob Jelier
•Antoine Velthoven
•Christina Hettne
•Jan Kors
•Johan van der Lei
•Christine Chichester
•Erik van Mulligen
•Marc Weeber
•Kevin Kalupsen
•Reuben Christie
•Jacintha van Beemen
•Nickolas Barris
•Albert Mons
•Gerard Meijssen
•Erik Moeller
•Peter Jan Roes
•Karsten Uil
•Siebrand Mazeland
•Sabine Cretella
Barend Mons
Second Order Semantic Enrichment
and the role of Wiki’s for Professionals
The Consortium
Open Access Semantic Support Technology
For on-line Knowledge Tracking, Discovery and Management
WikiProfessional
Semantic Web workspaces for scientists
enabling real time knowledge exchange and
exploration
The Million Minds Approach
Why ?
Many challenges in current bomedical research
•
•
•
•
•
•
•
•
•
Volume of data (both high troughput and text)
Complexity
Distributed systems and databases
Incompatible data formats
Multi-disciplinarity
Multi-linguality
Ambiguity of terminology
Inability to share Knowledge
Globalization of knowledge
•Repetition of facts is of great
value for the readability of
individual papers,
•but the fact itself is a single unit of
information, and needs no
repetition.
The Million Minds Approach
–
A defining characteristic of wiki technology
is the ease with which pages can be created
and updated. Generally, there is no review
before modifications are accepted.
Websites such as www.dmd.nl are increasingly cited in the literature
Personal Communication Johan den Dunnen.
The majority of (SP) proteins has more than 1 research group asociated
6000
5000
4000
3000
genes/proteins
2000
1000
0
1 research
group
2 or m ore
groups
So…..can we use wikis for this ??????
First order semantic enrichment
2nd order S.E.
• Contextual annotation of web pages for interactive browsing, van Mulligen E, Diwersy M, Schijvenaars
B, Weeber M, van der Eijk CC, Jelier R, Schuemie M, Kors J, Mons B, Medinfo 2004, 11:94-8
• Which gene did you mean?, Mons B, BMC Bioinformatics 2005 Jun 7, 6:142
The Knowlet
What does a Knowlet look like ‘under the hood’?
<Source concept>
<Target Concept>
<Relations>:
<Typea1> Database facts (mutiple attributes)
<Typea2> Community Annotations (WikiProf)
<Typeb1> Co-occurrence sentence
<Typeb2> Co-occurrence abstract
<Typec1> Concept Profile Match
<Type c2> Sequence similarity (BLAST score Genes and Proteins only)
<Type c3 Co-expression with (genes from expression Databases)
Knowlet building block
Knowlet of core concept
Knowlet space
factual
co occurrence
associative
K
D
K
E
K
D
K
G
K
A
K
G
K
H
K
D
K
H
K
Z
K
F
K
Z
K
B
K
I
K
B
• Rules to combine different sources of
information into a single relationship
• Time-stamped information
• The relationship to the original texts or
database entries
The Knowlet
•
A Knowlet represents a unit of thought
interconnected with other units of thoughts
or in other words: a ‘cloud’ of concepts that
have one or more relationship types with the
central (selected) concept
•
The interconnection reflects a semantic
relationship derived:
– From facts in database
– From co-occurrence in a text
– From other associations
•
Relations have a strength
– Based on the source of relationship
– Based on the amount of «evidence»
•
Knowlets belong to one or more semantic
classes: proteins, diseases, authors,
organizations, journals, experiments, etc.
•
Each Knowlet is uniquely identified by a URL
or URI (Unique Resource Identifier)
3. Building an association matrix of large data sources
1 Million
1 Million
person
organisation
Object 1
gene
Object 2
disease
Object 3
drug
Function unknow n
FunctionChaperones
unknow n
Chaperones
Chromatin structure
Chromatin
structure
Fibrous
proteins
Fibrous proteins
mRNA metabolism
mRNA metabolism
Others
Others Ribosomal proteins
Ribosomal
proteinsbiogenesis
Ribosome
Translation
Ribosome biogenesis
Translation
l
Z
PARN
SRP
• Assignment of protein function and discovery of new nucleolar proteins based on automatic analysis of MEDLINE.
Martijn Schuemie, Christine Chichester, Frederique Lisaceck, Yohann Coute, Peter-Jan Roes, Jean Charles Sanchez, Barend Mons
Special issue on Systems Biology in Proteomics, 2008 (accepted for publication)
Kappa-based clustering based on Gene ID
Cluster studies on basis of
Homologene IDs
Cluster 1:
Mdx mice
Dysferlin-deficient mice
Cluster 2: myositis
Cluster 3: DMD
Cluster 4: EOM-specific genes in mdx
Cluster 5: Development of EOM muscle
and rat atrophy
GeneSet Clusterer, Rob Jelier, Erasmus MC
Clustering of genes based on similarity of concept profiles
Cluster 1: atrophy and myopathy
Cluster 2: extraocular muscle of mdx
Cluster 3: human and mouse muscular
dystrophies and myositis
Cluster 4: long gene lists
Cluster 5: muscle differentiation;
Ky-mutant and Fxr-/- mice
Cluster 6: ageing and sarcopenia
GeneSet Clusterer, Rob Jelier, Erasmus MC
Evaluate biological processes that bring studies together
No overlap on GeneID level
Annotate
Many assocations on concept
profile level
DatasetComparer, Rob Jelier, Erasmus MC
•
•
•
•
•
•
OmegaWiki (terminology system)
Wiki Authors
Wiki Medical/Clinical
Wiki Proteins
Wiki Chemicals
Wiki Etc.
Allow for:
•
•
•
Community Annotation
Quick growth of terminology systems
Semantic Linking between concepts
Association Matrix
Literature
Meta-analysis
Knowlet
Expert Challenge
Protein A
Update
Expert comments
U.W. Fingerprint
WikiZ/P
Peer to Peer Review
Final
Approval
Central Annotation
Proposals to
Data bases ?
Discussion
Voting in Wiki
Solid (a)
0.1
0.9
0.4
Liquid (b)
Reduction
False Positives
Meta-analysis
Proximity measures
Gas (C)
1st order Semantic
enrichment
New publications or annotations
Science Wiki’s
• REGISTRATION (1X)
• Unique Author ID
• E-mail Adress
• PHP/userpage
• People Knowlets
• Unique concept ID
• Language variants
• Homonyms
• Definitions (brief)
• Object Knowlets
• UID from WiktionaryZ
• Research information
• Talk-page
• Liquid Threads
• Object Knowlets
• UID from WiktionaryZ
• Articles about UID’s
• Encyclopaedic/ NPOV
• Anonymous allowed
Nature News February 15, 2007
Core concept:
v
v
v
v
v
v
?
v
v
v
Malaria (mean distance 5)
chloroquine
primaquine
New Drug ????
Para-amino-benzoic acid
Cellular Memberan (GO)
Mosquitoes