Transcript Slide 1

Functional module identification with tomato gene and metabolite expression profiles
Cass Peluso
Project Leaders: Zhangjun Fei, Ph.D,
Je-Gun Joung, Ph.D
Boyce Thompson Institute, Cornell University, Ithaca, NY 14853, USA
Introduction
Abstract
Cells carry out a multitude of complex functions through the
Results
A
B
D
Pathogenesis-related
transcriptional factor
(CBF1 )
coordinated effort of a set of genes. Such activity is often carried
out through the organization of the genome into regulatory
modules. Modules are sets of co-regulated genes that share a
common function. The identification of modules, their regulators,
and the conditions under which regulation occurs is thus very
important since a good deal of a cell’s activity is organized into
this network of interacting modules. It is essential that these
modules be identified and their functions be determined in order
to understand cellular responses to internal and external signals
(Segal et al. 2003). Here we report the identification of functional
modules in the tomato using gene expression and metabolite
profile datasets generated from a set of Solanum pennellii
introgression lines.
TMV responserelated gene
Product (WRKY)
Module 35
heat shock protein
salicylic acid-binding protein
gibberellin 2-oxidase
Module 20
Phytoene
PDS3
Phytofluene
x-carotene
wound-induced protein
syringolide-induced protein 19-1-5
Heavy metal transport/detoxification protein
pathogenesis-related protein osmotin precursor
Avr9/Cf-9 rapidly elicited protein 231
CBF2
transcription factor - Apidaecin gene family
ZDS
Neurosporene
Lycopene
Module 6
b-LYC
g-carotene
b-LYC
e-LYC
a-carotene
Figure 3. Representative functional modules
C
Methods
TMV response-related gene
product (WRKY)
Floral homeotic protein AGAMOUS
(TAG1)
Figure 2. Computational pipeline for module identification
Figure 1. The schema of module identification
First, a computational pipeline was implemented to identify
transcription factors on tomato TOM2 oligo-nucleotide arrays
(See Fig. 2 for details).
Step 2: Map TOM2 array probe IDs to GO term IDs using
the Gene Ontology Annotation Database (GOA) based
on their homologues in SwissProt and TrEMBL.
Then, the gene expression profiles generated using the TOM2
arrays and the targeted metabolite profiles from twenty-three S.
pennellii introgression lines were processed and normalized.
Step 3: Associate GO IDs and GO names using the Gene
Ontology definition file (OBO v1.2) downloaded from
http://geneontology.org.
The processed and normalized gene expression and metabolite
profiles and the set of candidate regulatory genes on the TOM2
arrays were then loaded into Genomica, a program that uses an
algorithm to simultaneously search for a partition of genes into
modules and for each module's regulatory program. A module's
regulation program specifies the set of regulators that control the
module and the expression of the genes in the module. The
program outputs a list of modules and associated regulation
programs. Fig. 3 shows several interesting modules that were
identified.
Step 4: Add each GO name to each GO ID in the result file
from Step 2.
Each of the identified modules was then analyzed for GO term
enrichment using a tool in the Tomato Functional Genomics
Database. Significantly over-represented GO terms were
identified in each module with an adjusted p-value (False
Discovery Rate, FDR) < 0.05. A heatmap of the significance of
GO term enrichment was generated using the web-based
application Matrix2PNG, with an orange color signifying that a
module has a certain function (Fig. 4). A list of modules and their
regulators was then processed using the program Cytoscape,
which created a module-regulator network map, with modules in
light blue and regulators in orange (Fig. 5).
ABA
(A) The inferred regulatory modules.
(B) Module 35 contains a pathogenesis-related TF as a regulator. It also has a number
of genes that are potentially involved in plant responses to biotic and abiotic stresses.
This module is thus likely related to pathogen response, which could have important
implications for the creation of disease-resistant tomato varieties.
(C) Module 6 shows two regulators acting on gene products that relate to the cell wall.
The likely function of this module is related to cell wall organization and biogenesis.
Tomato TOM2 array transcriptional factor identification
Step 1: Blast tomato TOM2 probe sequences against
SwissProt and TrEMBL protein databases. Parse
results using BioPerl to extract probe IDs and hit
accessions.
Lutein
b-carotene
MADSbox
cell wall organization and biogenesis
cell wall protein
(D) Module 20 contains phytofluene, a metabolite in the carotenoid biosynthesis
pathway.
WRKY
CBF1
WRKY
NAC2
WRKY
NAC
WRKY
TAG1
Step 5: Identify TOM2 array probes with GO names of the
desired regulators.
WRKY4
Tomato functional module identification
Step 6: Impute gene expression dataset.
ERF
Step 7: Make input expression dataset: Convert absolute
value to log value (for gene and metabolite profiles),
choose expressed gene in introgression lines, and
merge expression profiles.
Step 8: Make Genomica input file
Figure 4. A heatmap representing the significant biological
functions of modules
8.1: Insert associated genes (SGNs) with symbols
(LEs) and sort.
8.2: Get symbols for the regulators.
8.3: Extract and add the expression data for the
regulators, add the associated symbols, and merge
them into the output file from step 8.1.
ERF
Figure 5. The regulator-module network represents key
regulators that are linked to several different modules. Module
35 shares the pathogenesis-related transcriptional factor with
modules 4, 31, and 43. These modules need to be investigated
to see if they have the functional interactions. Modules 6 and 20
also share the TMV response-related gene product with
numerous other modules.
References
Segal E. et al. (2003) Module networks: identifying regulatory
modules and their condition-specific regulators from gene
expression data. Nat Genet 34: 166-176.
Acknowledgements
Thank you to BTI and Dr. Je Min Lee for the IL datasets used and helpful
comments given.