Transcript Sai-Bio

Integrative Analysis of
Biological Data
Sai Moturu
MAGIC

Multisource Association of Genes by
Integration of Clusters

Goal: Integrate heterogeneous types of
high-throughput data for accurate gene
function prediction

Bayesian reasoning
Incorporates expert knowledge
Yeast Data


Integrative analysis ! Why ??

High throughput methods sacrifice
specificity for scale

Microarray data alone is good for
hypothesis generation but lacks specificity
for accurate gene function prediction

By using heterogeneous functional data,
the prediction accuracy is improved
Need for MAGIC

Studies have combined different types of
data in a heuristic fashion on a case by
case basis

No general scheme or probabilistic
representation is applied

Methods for combination of specific data

MAGIC – general method to integrate
disparate data sources
Input to MAGIC





Input: Gene-Gene relation matrices for
each data source
The elements of the matrix are scores
that indicate whether there could be
relationship between two genes
The score can be binary, discrete or
continuous
Input format is flexible and allows genes
to be in more than one group or cluster
Thus does not exclude biclustering or
fuzzy clustering methods
Structure of the MAGIC Bayesian
network

Prior probabilities assessed by experts
Evaluation

No gold standard for gene groupings exists

GO is the best available reflection of
current biological knowledge

Use a cutoff of 3 levels in the hierarchical
structure to say that to genes are
functionally related
Results
Results
AVID

Annotation Via Integration of Data

Integrates data to build high-confidence
networks in which proteins are connected
if they are likely to share a common
annotation

AVID predictions functional annotation in
all three GO categories
AVID stages
AVID results
AVID results