report_mid_term - Marco Brandizi`s Site

Download Report

Transcript report_mid_term - Marco Brandizi`s Site

Marco Brandizi
Corso di Dott. in Informatica, Univ. Milano
Bicocca
XIX Ciclo
Progress Report
Feb 2005
Agenda
Microarrays and Gene Expression overview
A Knowledge Managment System for uA data
management
Motivations
What to model and where to start from
First elaborations
Ongoing work and future
Gene Expression and Microarrays
Genes Machine
D
N
A
g
e
n
e
mRNA
protein
Cell/Life
Microarray Data / Details
Microarray Data
Microarrays Data Mgmt Issues
Exp. data vs. seq. data:
Context dependent (living system, exp. Conditions)
Lack of standard unit of measure
Several normalizations methods
Multiple platforms and methods
No standard for data annotation
Vocabularies and terminology coherence
Details about: experiment, source, protocols, exp.
conditions
Microarrays Data Mgmt Issues / 2
Evidences about data quality
What to store?
Raw Images
Computed values
Normalized values
How to find data
Complex vocabularies aware systems (ontologies)
Data mining and exp. comparison tools
Data access control
Issues => MIAME/MAGE
MIAME Experiment Modelling
GCA DB
GCA DB
GCA DB
Need of a KMS for uA data
management
The uA Experiment Cycle
“Closing the loop”
“Closing the loop”
uA KMS: What to model?
Knowledge management... what?
Genes
Textual annotations, literature
Interactions, pathways.
Genes collections (functional families, clusters)
Experiment and Experimental Conditions
Keyword/ontology based searches
Tested conditions searches
Expression Values
Navigation
Same trascriptome/trend/correlation/pattern
Knowledge management... what?
Chips
Keyword searches
Annotations about chip quality, protocols to be used, etc.
People
“Is expert in ...”
“Works with ...”
“Is studing ...”
Its ranking is X (based on publications, user preferences, etc.
Knowledge management... what?
“Does IL-2 regulate something and under what
conditions?”
Interactions of gene: IL2
Note added by Norman on Dec 10, 2003, deduced by
text, not confirmed [see original text]:
IL2 --------> UPREGULATES -----------> IL10, IL12
|
ON --> Cellls --> Type: DC
UNDER CONDITION: LPS Stimulus
Knowledge management... what?
Knowledge management... what?
Knowledge management... what?
uA KMS: Where to start from?
What to do first?
Gene Expression Formal Model
Focused on GE measures
Oriented to “closing the loop” goal
Several things to start from
Ontologies and Inference Systems
Already defined alike models
Other alike systems
Defining a GE Model
Start Point: Ontologies and Inference Systems
XML->RDF->...->OWL, and related tools (ex.:
Protegé, Racer, Jena)
Logics, particularly Description Logic
Inferential Systems and Languages (ex.:
Prolog)
Defining a GE Model
Start Point: Already defined alike models
"Modeling Gene Expression", Proceedings of
NETTAB/2004, www.loa-cnr.it, A model in Description
Logic of GE, but without focus on microarrays and
expression intentsities
Defining a GE Model
Start Point: Already defined alike models
Very similar to previous work, but with tools for
annotation/querying of microarray chips
Yet, seems not focused on data/assays/etc.
annotation.
Defining GE Model
Start point: Other alike systems
Synapsia by Agilent, very similar, but not focused on
uAs
Hybrow, www.hybrow.org, a computer-aided
hypothesis evaluation
The Notebook Project, www.notebook.org, a bio-KMS
based on SOAP and P2P
2004, Sarini, M., Blanzieri, E., Giorgini, P., and Moser,
C., From actions to suggestions: supporting the work
of biologists through laboratory notebooks
Defining GE Model
Start point: Other alike systems
uA KMS: Toward a GE Model
Defining GE Model
Gene Expression Formal Model
Basic elements: genes, hybridizations,
experiments
Defining GE Model
Gene Expression Formal Model
Basic elements: annotated sets
Defining GE Model
Gene Expression Formal Model
Basic elements: annotated sets
Gene Expression Entities
Entities Grouping
EntitityCollection ::=
Cluster of DataSet
| Cluster of Entity
Cluster of DataSet ::=
Cluster of DataSet
| GeneCluster of DataSet.GeneSet
| HybCluster of DataSet.HybSet
Cluster of Entity :: =
Cluster of Entity
| Set of Entity
All entities in a cluster are of same type. Ex. A cluster of genes
contains hierarchically grouped sets of genes, only genes.
NOTE: Grammar here used is VERY informal!!!
Entities Grouping
GeneSet ::=
Set of Gene
HybSet ::=
Set of Hybridization
Set of X ::= { x : x IS-A X }
Singleton ( C ) ::= { S : S = Set of C AND #S = 1 }
Annotations
Annotation ::=
EntityCollection => AnnotationSet
Annotation allows to track Gene Expression data with
useful info.
Annotations/Basics
Annotation ( EmptySet ) ::=
EmptySet
Annotation ( Singleton ( Entity e ) ) ::=
Attributes ( e ) U BaseAnnotation ( e )
Annotations/Basics
BaseAnnotation ( Any ) ::=
To be decided, first ideas is a set of:
Name/Value/Type, and Description like in MAGE
External Reference, with URI, or attachment
Graph attachment, "vectoring" values, ex: PCA with components
values, scatter plots wit
Annotation Author
Annotation Date
Security/Access references
Alike the classes Extendeable, Describable, Identifiable of MAGEOM
Entity annotates another Entity, ex.: Exp author
Annotations/Basics
Attributes ( Entity e ) ::=
Set of < attrib, value, type > for each declared attribute of Entity
attribute may be declared in JavaBean fashion, optionally
providing a mapping for type and semantic of attribute
Annotation ( GeneSet GS ) ::=
BaseAnnotation ( GS )
U Annotation ( g ) : g BELONGS GS
U BiologicalAnnotation ( GS )
Annotations/Biological Ann.
BiologicalAnnotation ( GS ) ::=
Allows for tagging the gene set with a biological meaning the
genes have ben grouped why
Ex.:
belonging to functional family of apoptosis
in the KEGG pathway about IL-2
under GO ID #10234
Annotations/Data Sets
Annotation ( Cluster of DataSet ds ) ::=
BaseAnnotation ( ds ) U Annotation ( < all entities in ds > )
Meaning of clustering
Clustering method / alghoritm
Alghoritm annotations, ex.: parameter values
Cluster includes the case of flat set (not tree), and sub-cases:
gene/hybs filtering ( genes have been filtered in from another
data set )
values transformation ( normalization, PCA, average on
replicas )
Annotations/Examples
Types of annotations / searches:
Generic
<attribute> LIKE <pattern>
<value> BETWEEN ( <lo>, <hi> )
<author> IS author
Genes
public_id LIKE pattern
REGULATION ( g1, g2, ... gn )
g1 REGULATES | DOWN_REGULATES | UP_REGULATES |
PROMOTE | INHIBITS ( g1, g2, ... gn )
geneX IN_PATHWAY ( p )
Annotations/Examples
DataSet
geneSet1 SIMILAR_PROFILE geneSet2 IN_DATASET ds
hybSet1 SIMILAR_PROFILE hybSet2 IN_DATASET ds
Not necessarily computed, annotated.
CORRELATION ( dSet1, dSet2 ... dSetN, value )
annotates the expression values correlation
Annotations/Time Courses
Time profiles may be shifted between or it may happen that when
a gene is up another is down.
Annotations/Time Courses
Time profiles may be shifted between or it may happen that when
a gene is up another is down.
geneSet1 SIMILAR_TIME_PROFILE geneSet2, a part. case of
SIMILAR_PROFILE
geneSet1 TIME_AFTER geneSet2 ...
geneSet1 TIME_BEFORE geneSet2 ...
geneSet1 TIME_SHIFT geneSet2 ...
geneSet1 TIME_OPPOSED geneSet2
geneSet1 TIME_OPPOSED_SHIFT geneSet2
Annotations/Comparisons
“+/-” graphs. Common graphs of gene interaction that is evident
from comparison experiments. Modeled via previous shown
constructs.
Annotations/Set Graphs
Observing Eulero-Venn diagrams is very common. Modeled via
Aset theory operations
Operators
Operations and relations with data
When storing result of operations, result source may be annotated and
annotation composed coherently:
gset = gset1 U gset2
save ( gset, annotation )
gset is saved with:
further annotation provied by user
SOURCE ( UNION, geset1, gset2 )
all annotations coming from gset1 and gset2 belongs to gset too
Operators
Aset theory operations:
EntityCollection U EntityCollection ... U EntityCollection =
EntityCollection
EntityCollection INTERSECTION EntityCollection ...
INTERSECTION EntityCollection = EntityCollection
EntityCollection - EntityCollection = EntityCollection
Compositions:
new Cluster ( geneSet1, geneSet2, geneSet3 ... geneSetN,
geneSetAnnotation )
new Cluster ( cluster1, cluster2, clusterAnnotation )
Relations on single entitities
gene1 DOWN_REGULATES ( gene2, gene3 ) AUTHOR misterX
Ongoing and future...
What's next?
Refinements of GCA, study of BASE
Study of Ontologies tools and Ontology reasoners
Better definition of GE Model
Review with biologists
Cooperation with Ontology Groups (proposals
are welcome...)
To be continued...