Sai_Presentation

Download Report

Transcript Sai_Presentation

Sai Moturu
Introduction
• Current approaches to microarray data analysis
– Analysis of experimental data followed by a
posterior process where biological information is
incorporated to make inferences
• Integrative analysis technique in this paper
– Integrate gene annotation with expression data to
discover intrinsic associations among both data
sources based on co-occurrence patterns
Methods and Data
– Association Rules Discovery
– Gene expression data
– Gene annotation: Gene ontology categories,
metabolic pathways and transcriptional regulators
– Applied to two previously studied experiments
Association Rules Discovery
– Antecedent -> Consequent
X -> Y
– Measures of Quality
• Support: P(XυY)
• Confidence: P(Y|X) = P(XυY)/P(Y)
• Improvement: Confidence/Consequent = P(XυY)/(P(X)*P(Y))
Association Rules Discovery
– Itemsets
• Genes and the set of experiments in which gene is over or
underexpressed
• Gene characteristics
– Constraint
• Antecedent needs to be gene annotation
– Expression Thresholds
• Genes with log expression values >1 are overexpressed and
<-1 are underexpressed (two fold)
Mining Association Rules
– The association rules that we are interested in have
low support values and high confidence values
– A variant of the apriori algorithm is used that has
helped previously with mining low support-high
confidence biologically significant patterns
Filtering
– Major drawback with association rules is the number
of rules generated is huge
– Also there is redundancy
– This is taken care of with two filters
• Redundant filter
• Single antecedent filter
Diauxic shift dataset
– Gene expression accompanying the metabolic shift
from fermentation to respiration that occurs when
fermenting yeast cells
– Expression levels recorded at 7 time points
– External information
• Metabolic pathways
• Transcriptional regulators
Results
– Association rules among metabolic pathways and
expression patterns
• 1126 out of over 6000 genes were annotated with at least
one pathway
• Association rules with minimum support of 5, minimum
confidence of 40% and minimum improvement of 1
• Redundant and single antecedent filters applied
• 21 association rules
Results
– Association rules among transcriptional regulators
and expression patterns
• 3490 genes were annotated with at least one regulator
• Association rules with minimum support of 5, minimum
confidence of 80% and minimum improvement of 1
• Redundant filter applied
• 28 association rules
Results
– Association rules among transcriptional regulators,
metabolic pathways and expression patterns
• 3882 genes
• Association rules with minimum support of 5, minimum
confidence of 80% and minimum improvement of 1
• Redundant filter applied
• 37 association rules
Results
Results
Results
Serum stimulation dataset
– Gene expression program of human fibroblast after
serum exposure
– External information
• Gene ontology terms
Results
– Association rules among biological process
annotation and expression patterns
• 4092 genes of over 8000
• Support of 4, min confidence of 10% and min improvement
of 1
• Single antecedent and redundant filters applied
• 12 associations
Results
– Association rules among terms from all GO
categories
• 4630 genes of over 8000
• Support of 4, min confidence of 10% and min improvement
of 1
• Redundant filter applied
• 31 associations
Results
Results
Results
Conclusions
– Some of the biological implications matched the
ones found experimentally
– The others could be explored further
– Integrative data analysis is very useful for
meaningful discoveries using gene expression data