rose_dieng_transp-W3C-Life
Download
Report
Transcript rose_dieng_transp-W3C-Life
Semantic Web Technologies for Analysis
of Transcriptome
Rose Dieng-Kuntz1, Khaled Khelif1, Olivier Corby1
Pascal Barbry2
1INRIA - Sophia Antipolis
ACACIA project,
http://www.inria.fr/acacia
http://www.inria.fr/acacia/corese
2IPMC,
Sophia Antipolis
http://www.ipmc.fr
1
Outline
• Context: Memory of Biochip Experiments
• The MEAT Project
• Semi-automatic generation of semantic annotations
• Conclusions: Requirements for Semantic Web
2
Context: Biochip experiments
• DNA microarrays (gene chips, biochips) enable to
simultaneously measure the expression level and transcription
rate of various genes in an organism.
• Applications in biology, medicine, pharmacology…:
Gene discovery
Disease diagnosis or prognosis
Drug discovery: Pharmacogenomics
Toxicological research: Toxicogenomics
3
Towards Biochip Experiment Memory
Experiment
sheets
Biologist
Experiment
DB
Documents
Domain
Ontologies
Need of Knowledge Management for a community of
biologists: Biochip Experiment memory
Need of support to validation & interpretation of results of
biochip experiments
4
The MEAT Project
MEDIANTE
MEAT-Annot&Search
UMLS,
Gene Onto…
MEAT-Miner
MEAT-Onto
5
Phases: before experiment
Biologist checks & validates probes available on the biochip
& selects a subset
Order slides in order to launch a new biochip experiment
Submission of journal articles on genes supposed interesting
Constitution of an electronic document corpus
Creation of semantic annotations on these articles
with MEAT-Annot
6
Phases: after experiment
Storage of the experiment description and of its results
in MEDIANTE, according to Array Express format
Statistical analysis of results with MEAT-Miner
Interpretation of results, using more bibliographical searches
Addition of new semantic annotations on the experiment
7
MEAT-Annot&Search
MEAT-Annot:
Annotation Acquisition Tool
Manual annotation
editor
Automatic generation
of annotations
from a corpus
BRIGENE: Annotation base
Article annotation Result annotation
base
base
General knowledge base
ARRAY-EXPRESS
- Experiment description
- Result description
MEAT-Search
CORESE Search
engine
- MEAT-dedicated
Query interface
-Result browsing Interface
8
MEATAnnot: Technical Choices
NLP tools : term extractor + relation extractor
Extraction of terms corresponding to UMLS
Ontology concepts, from texts
Extraction of relations between them, from texts
Automatic generation of a semantic annotation and
representation in RDF
9
Relationship extraction
Test
corpus
Syntex
• Syntex (Bourigault D. 2000) : Corpus syntactic analyser
• Used to reveal « verb syntagms » usually used in the biochip domain
10
Relationship extraction
• Choosing potential relationship revealed by Syntex
• Writing relationship extraction grammar : using JAPE
{Tag.lemme == "play"}
{SpaceToken}
({Token.string == "a"}|
{Token.string == "an"})?
({SpaceToken})?
({Token.string == "vital"}|
{Token.string == "important"}|
{Token.string == "critical"}|
{Token.string == "some"} |
{Token.string == "unexpected"}|
{Token.string == "multifaceted"} |
{Token.string == "major"})?
({SpaceToken})?
{Tag.lemme == "role"}
11
System architecture
UMLS
Knowledge
server
{Tag.lemme == "play"}
{Tag.lemme == "play"}
{SpaceToken}
{SpaceToken}
({Token.string == "a"} |
({Token.string == "a"}|
{Token.string == "an"})?
{Token.string == "an"})?
({SpaceToken})? ({SpaceToken})?
({Token.string == "vital"} |
({Token.string == "vital"}|
{Token.string == "important"}|
{Token.string == "important"} |
{Token.string == "critical"}|
{Token.string == "critical"} |
{Token.string == "some"}
{Token.string| == "some"} |
{Token.string == "unexpected"}|
{Token.string == "unexpected"}
|
{Token.string == "multifaceted"}
|
{Token.string =="multifaceted"} |
{Token.string == "major"})?
({SpaceToken})? {Token.string == "major"})?
{Tag.lemme == "effects"}
({SpaceToken})?
Gate
API
{Tag.lemme == "role"}
----- -- --- ---------- ---- ------------
Biologist
RDF Annotations
Documents
MeatAnnot
12
Example
« HGF plays an important role in lung development »
The information extracted from this sentence are:
HGF :
an instance of the concept « Amino Acid, Peptide or protein »
lung development :
an instance of the concept « organ or tissue function »
HGF play role lung development :
an instance of the relation « play role » between the two terms
13
RDF Annotation Generated
<rdf:RDF
xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntaxns#'
xmlns:m='http://www.inria.fr/acacia/meat#'
xmlns:rdfs='http://www.w3.org/2000/01/rdf-schema#'>
<m:Amino_Acid_Peptide_or_Protein rdf:about='HGF#'>
<m:play_role>
<m:Organ_or_Tissue_Function rdf:about='lung_
development#'/>
</m:play_role>
</m:Amino_Acid_Peptide_or_Protein>
</rdf:RDF>
14
CORESE Semantic search engine
<accident>
<date> 19 Mai 2000 </date>
<description>
<facteur>le facteur
</description>
</accident>
Legacy sys.
<ns:article rdf:about="http://intranet/articles/ecai.doc">
<ns:title>MAS and Corporate Semantic Web</ns:title>
<ns:author>
<ns:person rdf:about="http://intranet/employee/id109" />
</ns:author>
</ns:article>
<rdfs:Class rdf:ID="thing"/>
<rdfs:Class rdf:ID="person">
<rdfs:subClassOf
rdf:resource="#thing"/>
</rdfs:Class>
Schema in
RDFS
CORESE
XML
Annotations in RDF formed by
instances of schema in RDFS
Web stack
QUERIES
RDFS
CG Support
PROJECTION
RULES
ONTOLOGY
RDF
CG Base
Rules
CG Rules
Queries
CG Query
RDFS
RDF
XML
URI
INFERENCES
NAMESPACES
UNICODE
CG Results
Users
push
Documents
query
answer
Ontologies
Semantic
Web server
RDF/S
15
Ontology-based query
Formulate
queries
Interface
Biologists
Return
results
Submit
queries
Corese
load
UMLS
load
Annotation Base
16
Semantic Web requirements
• Adaptation of Corese semantic search engine to OWL
• Corese query language vs SPARQL
• Contextual annotations Need of expression of
multiple contexts / viewpoints
• Temporal queries on the past biochip experiment base
+ temporally evolving ontologies & annotations
• Scalability of NLP tools: articles stemming from
scientific watch on the open (semantic) Web…
17
Many thanks to
• ACACIA team:
in particular Khaled Khelif, Laurent Alamarguy,
Olivier Corby, Alain Giboin…
• IPMC: Pascal Barbry, Kevin Le Brigand, Hélène,
Chimène, Yves
• Bayer Crop Science: Rémi Bars
• Didier Bourigault (ERSS), developer of Syntex
• The developers of GATE (Sheffield Univ.)
18
Support to health network
Medical Ontology
Semantic Annotations
Documents (Patient record,
Best practices Guide …)
<dossierPatient>
<date> 19 Mai 2000 </date>
<donneesAdministratives>
<Patient><nom>Dupont</nom>
<prenom> Michel </prenom>
</Patient>
</donneesAdministratives>
…
Translator
Life Line
Corese
search engine Virtual Staff
Nautilus DB
Member of
the health network
19
Visual Staff Architecture
20