Gene Regulation, Cancer, Rules

Download Report

Transcript Gene Regulation, Cancer, Rules

FP7 meeting - Gent - Carlos Rodríguez - April 18
WP4: Conceptual Mining from Text
for Knowledge Engineering
State of the Art
WP Coordinators:
Alfonso Valencia
Carlos Rodriguez
FP7 meeting - Gent - Carlos Rodríguez - April 18
Why Concept/Semantic Mining?

Knowledge Acquisition Bottleneck

Top-Down, manually-designed Ontologies are:





sparse (non-exhaustive)
shallow (not fine-grained)
not mappable (to terms or other ontologies)
not easily updated or customized
Text-based ontologies reflect better diversity
in knowledge as reflected by the literature
and domain terminology
FP7 meeting - Gent - Carlos Rodríguez - April 18
Information for Ontology Learning
FP7 meeting - Gent - Carlos Rodríguez - April 18
State of the Art Methods

implicit relations



explicit relations



Corpus Distribuition
Machine Learning Algorithms
Symbolic (rule and syntax-based)
Hybrid, combining some or all
Bootstrap the ontology-learning process
using existing resources
FP7 meeting - Gent - Carlos Rodríguez - April 18
An example
Meiosis
Cyclin
Checkpoint
Interphase
Nucleoplasma
Division
Histone
Replication
Chromatid
Blaschke, et al., Funct. Integ. Genomics 2001
Cell
cycle
17 genes
PCNA
CDC2
MSH2
LBR
TOP2A
...
GO codes
DNA replication
DNA metabolism
Cell Cycle control
PCNA-MSH2
The binding of PCNA to MSH2 may
reflect linkage between mismatch repair
and replication.
LBR-CDC2
LBR undergoes mitotic phosphorylation
mediated by p34(cdc2) protein kinase.
24 genes
ABCA5
CAT
ELF2
PIM1
WNT2
...
Words
Unknown
Dipeptidyl
Prolyl
nmr
Collagen-binding
FP7 meeting - Gent - Carlos Rodríguez - April 18
Induce rules at different linguistic levels
FP7 meeting - Gent - Carlos Rodríguez - April 18
Lexical- and syntax-derived relationships from text

Complex relationships in CCO






degradates
participate_in
catalyses
adjacent_to
agent_in
What new ones can be learnt?
LBR undergoes mitotic phosphorylation
mediated by p34(cdc2) protein kinase.



mitotic phosphorylation mediated_by protein kinase
Can it be subsumed by others?
Are there other subcategories?
FP7 meeting - Gent - Carlos Rodríguez - April 18
Beyond the State of the Art

Optimal hybrid methodology for:






Extracting entities
Discovering relations
Providing ontology-relevant information
(But what and how ?)
Comparing top-down with bottom-up ontologies
Providing definitional information
Application to CC-cancer domains
(and possibly to gene regulation)
FP7 meeting - Gent - Carlos Rodríguez - April 18
In the context of project and other WPs…




Reasoning with text-generated ontologies:
competing or complementing?
Reduction of lexical and semantic relationships to
ontological relation inventory
How to present and use Text-Mined information for
ontology design (especially for database
annotation)?
How to curate, evaluate and compare ontologies?
FP7 meeting - Gent - Carlos Rodríguez - April 18
Information for Ontology Engineers






New Classes (ontology) and Instances (KB)
Definitions and glosses
Concept usage and entity examples
Terms and synonyms
Hierarchical and non-hierarchical relations
Possible reasoning rules
FP7 meeting - Gent - Carlos Rodríguez - April 18
To and Fro other WPs
WP
To
from
1: CCO extension
New entities, terms,
definitions and relations
Seeds for learning and
ontology curation
2.- Ontology Engineering
Integration of text-mining
into ontology design
methods
Ontology evaluation
3.- Corpus Processing and Curation
Subcorpus and term
inventories
Annotated corpus
5.- Knowledge Base Population
New entities, terms
definitions and relations
Ontology evaluation and
curation
6.- Reasoning
New relations and
inference rules from text
Evaluation of mappings
and reasoning