Transcript Genomatix

Genomatix
Microarray Evaluation for
Gene Regulation Analysis
Dr. Martin Seifert
Genomatix Software GmbH
Landsberger Strasse 6, D-80339 München
http://www.genomatix.de
© 2005 by Genomatix Software GmbH
Microarrays today
Genomatix
The general goal in microarray analysis
Metabolic
pathways
?
Microarray experiment
Classification /
Diagnostics
Regulatory
networks
Cell
Disease
mechanisms
Biological functionality is not directly evident from microarrays
© 2005 by Genomatix Software GmbH
Genomatix
Methods for microarray data analysis
How to reach the general goal in microarray analysis?
Literature analysis
Statistic analysis
Cellular processes
Genomatix knowledge transfer approach
Sequence analysis (Genome annotation and promoter analysis)
© 2005 by Genomatix Software GmbH
A real life example
Genomatix
Evaluation of the role of PDGF in fibroblasts
PDGF stimulation of fibroblasts (Demoulin et al. JBC 279, No. 34, 2004; 35392–35402)
Microarray experiment
Statistical analysis; clustering
Evaluation of chip clusters
PDGF
Chip data
Cluster
Genomatix
What is the biological functionality behind the chip data?
© 2005 by Genomatix Software GmbH
Genomatix
Technology
Linking genomic sequence analysis and literature mining
Promoter source for
functional promoter analysis
Analysis of
promoter sequences/
database scans
Automatic evaluation
of gene relationships
© 2005 by Genomatix Software GmbH
Genomatix
Analysis strategy
Workflow of the project
1 Find statistical clusters
2 Project statistical clusters onto biology and categorization of results
by z-scoring (BiblioSphere)
3 Analyze functional groups for co-regulation (ElDorado & GEMS)
and find additional potentially co-regulated genes (ModelInspector)
4 Carry out additional statistical analysis
5 Merge results into biological context
© 2005 by Genomatix Software GmbH
Genomatix
Methods for microarray data analysis
Step 1: Statistical Analysis
Literature analysis
Statistic analysis
Cellular processes
Sequence analysis
© 2005 by Genomatix Software GmbH
Cluster Analysis
Genomatix
Statistical analyzed microarray data data
Significance Analysis for Microarrays (SAM; FDR: 4,3%)
105 of 9928 gene spots are significantly up regulated (Chip: Hver1.2.1)
hours PDGF induction
1
4
10
24
© 2005 by Genomatix Software GmbH
Genomatix
Workflow
2 Project statistical clusters onto biology and categorization of results
by z-scoring (BiblioSphere)
Literature analysis
Statistic analysis
Cellular processes
Sequence analysis
© 2005 by Genomatix Software GmbH
Gene Cluster
Genomatix
Characterisation of experimental cluster with BiblioSphere
cluster contains 107 genes
Too many genes for biological
meaningful co-regulation
Strategy: knowledge driven
sub-clustering
Find functional correlations
BiblioSphere: Large Cluster Query
Functional correlations are retrieved by categorization
© 2005 by Genomatix Software GmbH
Genomatix
Knowlege driven sub-clustering
Ontology based functional ranking: Genomatix z-scoring
highest
z-score
© 2005 by Genomatix Software GmbH
Genomatix
Knowlege driven sub-clustering
Ontology based functional ranking: Genomatix z-scoring
retrieval of genes overrepresented in the GO-category sterol biosynthesis
© 2005 by Genomatix Software GmbH
Genomatix
Gene group analysis
BiblioSphere subgroup analysis: connecting TFs
re-enter the six
overrepresentd
genes into
BiblioSphere
© 2005 by Genomatix Software GmbH
Genomatix
Knowlege driven sub-clustering
Towards regulatory networks: connecting TFs
Co-citation for HMGCS1, HMGCR,
SC4MOL, DHCR7 with SREBF1
Prediction of SREBF1 (EBOX) binding sites
in the promoters of HMGCS1, HMGCR and
DHCR7
ElDorado
Bibliosphere on sentence level; at least
4 co-citations with input genes
© 2005 by Genomatix Software GmbH
Genomatix
Experimental verification
SREBP1 (=SREBF1) expression is experimentally confirmed
© 2005 by Genomatix Software GmbH
Genomatix
Workflow
3 Analyze functional groups for co-regulation (Gene2promoter & GEMS)
and find additional potentially co-regulated genes (ModelInspector)
Literature analysis
Statistic analysis
Cellular processes
Sequence analysis
© 2005 by Genomatix Software GmbH
Genomatix
Sequence analysis
Promoter analysis by GEMS based on ElDorado data
Literature analysis
Promoter analysis
ElDorado + Gene2Promoter
GEMS
Results from literature analysis are used to guide sequence analysis
© 2005 by Genomatix Software GmbH
Sequence analysis
Genomatix
Analysis strategies: Inter-genomic and intra-genomic
Comparative analysis of promoters within one species -> co-regulation
human
DHCR24
DHCR7
EBP
HMGCR
HMGCS1
SC4MOL
mouse
rat
6 genes sterol synthesis
Comparative genomics of promoters -> phylogenetic conservation
107 genes
© 2005 by Genomatix Software GmbH
Genomatix
Intra-genomic approach
Comparative promoter analysis (intra-genomic co-regulation)
ElDorado + Gene2Promoter
Extraction of the promoters of
DHCR24, DHCR7, EBP, HMGCR,
HMGCS1, and SC4MOL
Analysis of the promoters of
DHCR24, DHCR7, EBP, HMGCR,
HMGCS1, and SC4MOL with
GEMS
FrameWorker
Frameworks underly functional conservation of promoters
© 2005 by Genomatix Software GmbH
Genomatix
Regulatory genome annotation
Promoter resource ElDorado / Gene2Promoter
promoter
ElDorado
Regulatory regions
Promoter modules
Regulatory SNPs
Alternative
promoters/
transcripts
Interconnected to:
BiblioSphere
GEMS
© 2005 by Genomatix Software GmbH
Genomatix
Regulatory genome annotation
Promoter retrieval ElDorado / Gene2Promoter
© 2005 by Genomatix Software GmbH
Genomatix
Regulatory genome annotation
Promoter retrieval ElDorado / Gene2Promoter
© 2005 by Genomatix Software GmbH
Genomatix
Regulatory genome annotation
Promoter retrieval ElDorado / Gene2Promoter
© 2005 by Genomatix Software GmbH
Genomatix
Analysis of promoter organization
Promoter analysis with FrameWorker
© 2005 by Genomatix Software GmbH
Genomatix
Analysis of promoter organization
EBOX (SREBF1) frameworks are found in a subset of the genes
EBOX ECAT ZBPF
Genes sharing framework:
DHCR7, EBP, HMGCS1
Frameworks are conserved in order and distance of TFBSs
© 2005 by Genomatix Software GmbH
Genomatix
Analysis of promoter organization
EBOX (SREBF1) frameworks are found in a subset of the genes
EBOX ECAT ZBPF
EBOX ECAT ZBPF
EBOX ECAT ZBPF
© 2005 by Genomatix Software GmbH
Beyond the microarray
Genomatix
ModelInspector search
EBOX ECAT ZBPF
framework
Genomatix Human promoter database GPD
© 2005 by Genomatix Software GmbH
Genomatix
Results of database search
ModelInspector results
Framework
# of hits in human
promoters
steroid
biosynthesis
z-score
10
3
13.55
EBOX-ECAT-ZBPF
highly selective model
no Additional found genes for steroid metabolism so fare...
The selectivity is reduced by modification of the model by
increasing of the distance variability (application of FastM)
© 2005 by Genomatix Software GmbH
Genomatix
Model modification
modification of the model with FastM
distance variability is increased to 5-100 bp
© 2005 by Genomatix Software GmbH
Beyond the microarray
Genomatix
additional ModelInspector search
EBOX ECAT ZBPF
framework with modified
distance variability
Genomatix Human promoter database GPD
© 2005 by Genomatix Software GmbH
Results of database search
Genomatix
ModelInspector results
Framework
EBOX-ECAT-ZBPF
# of hits in human
promoters
four categories
related to
“steroid
metabolism”
z-score
389
7
4.43 - 6.35
Additional found genes related to steroid metabolism:
LSS, MVK, SC5DL, SREBF2
LSS and MVK are present on chip, up-regulated but not statistically
significant
SC5DL, is not present on microarray
Possibility to re-evaluate statistical results
© 2005 by Genomatix Software GmbH
Genomatix
Re-analysis of promoter organization
Additional framework analysis
All sterol-metabolism related genes identified by microarray analysis,
and Modelinspector are included:
HMGCS1, MVK, SC5DL, DHCR7, EBP, SREBF2, LSS, HMGCR, SC4MOL,
DHCR24
A additional framework consisting of three TFBSs found
ECAT EGRF ZBPF
It matches 8 of 10 genes input genes:
HMGCS1, DHCR7, HMGCR, EBP, LSS; MVK, SC5DL, SREBF2
© 2005 by Genomatix Software GmbH
Beyond the microarray
Genomatix
Is the framework also part of other human Promoters?
ECAT EGRF ZBPF
Genomatix Human promoter database GPD
Second framework is searched in human promoters by ModelInspector
Matches may overlap with first framework but are basically distinct
Several frameworks may be important for sterol-related pathways/networks
© 2005 by Genomatix Software GmbH
Genomatix
Results of second database search
ModelInspector results
Framework
EBOX-ECAT-ZBPF
# of hits in human
promoters
four categories
related to
“steroid
metabolism”
z-score
961
16
4.36 - 6.25
CYP46A1, FDPS, HMGCR, HSD17B8, OPRS1, SREBF1!, STARD5
SREBF1/2 are potential regulators of the previous framework!
SREBF1/2 may be mediators between the two frameworks identified so far
© 2005 by Genomatix Software GmbH
Genomatix
Workflow
4 Carry out additional statistical analysis
Literature analysis
Statistic analysis
Cellular processes
Sequence analysis
© 2005 by Genomatix Software GmbH
Relaxed statistical approach
Genomatix
Clustering by profile of the initially selected 105 genes
Expression cluster is extended by Pavlidid Template Matching (PTM)
Cluster of 105 significant regulated genes is taken as template
The threshold p-value is 0.1
Profile cluster
Initial profile
Cluster is extended to 798 genes (including all 105 initial genes)
Relaxed statistics requires cross-validation by second evidence
© 2005 by Genomatix Software GmbH
Genomatix
Workflow
5 Merge results into biological context
Literature analysis
Statistic analysis
Cellular processes
Sequence analysis
© 2005 by Genomatix Software GmbH
Genomatix
Merging profile and database searches
Comparison of ModelInspector results with profile cluster
52 genes share a common framework and are co-expressed
8 genes belong to the GO-category "steroid biosynthesis":
DHCR24, DHCR7, EBP, HMGCR, HMGCS1, LSS, MVK, SC4MOL
Eight genes are associated with steroid metabolism are supported by three lines
of evidence:
1. Common up-regulation
2. Common framework
3. Common functional class (GO-annotation)
© 2005 by Genomatix Software GmbH
Genomatix
Sterol biosynthesis
and regulatory networks
Acetyl-CoA +
AcetoacetylCoA
HMG-CoA
Mevalonat
ECAT EGRF ZBPF
Lanosterol
EBOX ECAT ZBPF
Cholesterol
© 2005 by Genomatix Software GmbH
Genomatix
Confirmation of results by GNF tissue profiles
Example: profile of HMGCS1
Find correlates with cut-off 0.6
© 2005 by Genomatix Software GmbH
Genomatix
Sterol biosynthesis
and regulatory networks
ECAT EGRF ZBPF
GNF profile
EBOX ECAT ZBPF
© 2005 by Genomatix Software GmbH
Genomatix
Additional gene group: Tubulins
1
4
10
24
CDEF EGRF MAZF
© 2005 by Genomatix Software GmbH
Genomatix
CDEF EGRF MAZF
Sterol biosynthesis / cell structure proteins
and regulatory networks
ECAT EGRF ZBPF
EBOX ECAT ZBPF
© 2005 by Genomatix Software GmbH
Genomatix
Conclusions
Evaluation of microarray data
No individual method can reveal networks and pathway mechanisms
An alternating combinatorial approach can achieve this
However, the final focus usually is on a few genes (30 or less usually)
Several independent functional groups may be derived from one chip
All of this is possible based on available tools
Genomatix technology elucidates the biology behind the chip data!
© 2005 by Genomatix Software GmbH
Genomatix
Let’s have a break…
© 2005 by Genomatix Software GmbH