Transcript Sai-Bio
Integrative Analysis of
Biological Data
Sai Moturu
MAGIC
Multisource Association of Genes by
Integration of Clusters
Goal: Integrate heterogeneous types of
high-throughput data for accurate gene
function prediction
Bayesian reasoning
Incorporates expert knowledge
Yeast Data
Integrative analysis ! Why ??
High throughput methods sacrifice
specificity for scale
Microarray data alone is good for
hypothesis generation but lacks specificity
for accurate gene function prediction
By using heterogeneous functional data,
the prediction accuracy is improved
Need for MAGIC
Studies have combined different types of
data in a heuristic fashion on a case by
case basis
No general scheme or probabilistic
representation is applied
Methods for combination of specific data
MAGIC – general method to integrate
disparate data sources
Input to MAGIC
Input: Gene-Gene relation matrices for
each data source
The elements of the matrix are scores
that indicate whether there could be
relationship between two genes
The score can be binary, discrete or
continuous
Input format is flexible and allows genes
to be in more than one group or cluster
Thus does not exclude biclustering or
fuzzy clustering methods
Structure of the MAGIC Bayesian
network
Prior probabilities assessed by experts
Evaluation
No gold standard for gene groupings exists
GO is the best available reflection of
current biological knowledge
Use a cutoff of 3 levels in the hierarchical
structure to say that to genes are
functionally related
Results
Results
AVID
Annotation Via Integration of Data
Integrates data to build high-confidence
networks in which proteins are connected
if they are likely to share a common
annotation
AVID predictions functional annotation in
all three GO categories
AVID stages
AVID results
AVID results