Functional Genomics and Bioinformatics - People

Download Report

Transcript Functional Genomics and Bioinformatics - People

Functional Genomics and
Bioinformatics
Applied to Understanding
Oxidative Stress Resistance
in Plants
Ruth Grene Alscher
Lenwood S. Heath
Virginia Tech
December 14, 2001
Overview
• Organization of our group
• About environmental stress and reactive oxygen
species (ROS)
• Plant responses to ROS
• Analysis of responses to stress on a chip microarray technology
• Expresso: management system for microarrays
– Managing expression experiments
– Analyzing expression data
– Reaching conclusions
• Where do we go from here?
Ron Sederoff
NCSU
Ruth Alscher
Carol Loopstra
Texas A and M
Lenny Heath Naren Ramakrishnan,
Senior Collaborators
Boris Chevone
Students: VT
Len van Zyl
Jonathan
Watkinson
NCSU
Keying Ye
Margaret Ellis
Cecilia Vasquez
Dawei Chen
Logan Hanks
Iterative strategy for detection of stress -mediated effects on
gene expression using microarrays
and CS expertise
Detection of stress mediated gene
expression effects on
microarrays
1
4
Genetic
Regulatory
Networks
Revised / New
Tools and
Experiments
3
Test inferences with
varying conditions
and genotypes
2
Computational
tools to infer
interaction
among genes,
pathways
Expresso
Plant Response to Stress
• Plants adapt to changing environmental
conditions through global cellular responses
involving successive changes in, and
interactions among, expression patterns of
numerous genes.
• Our group studies these changes through a
combination of bioinformatics and genomic
techniques.
Long Term Goals
• Biological: To identify molecular stress
resistance mechanisms in tree and crop
species.
•Bioinformatic: To support iterative
experimentation in plant genomics, capture
and analyze experimental data, integrate
biological information from diverse sources,
and close the experimental loop.
The Paradox of Aerobiosis
• Oxygen is essential, but toxic.
• Aerobic cells face constant danger from
reactive oxygen species (ROS).
• ROS can act as mutagens, they can cause
lipid peroxidation and denature proteins.
ROS Arise as a Result of
Exposure to:
•
•
•
•
•
•
•
Ozone
Sulfur dioxide
High light
Herbicides
Extremes of temperature
Salinity
Drought
Free Radicals
Responses to Environmental Signals
Redox Regulation of Cellular Systems
Environmental
Stress
Prooxidants (ROS)
Membrane
Receptors
Metabolite
Defense
Protein kinases;
phosphatases
Antioxidants
Transcription
factors
Gene Expression
Defense, Repair, Apoptosis
Scenarios for Effects of Abiotic Stress
on Gene Expression in Plants
Drought Stress Responses in
Loblolly Pine: Questions to be
Addressed
• Can a hierarchy of drought stress resistance
mechanisms be identified ?
• Can a clear distinction be made between rapidly
responding and long term adaptational
mechanisms?
• Can particular subgroups within gene families
be associated with drought tolerance?
Hypotheses
• There is a group of genes whose expression confers
resistance to drought stress.
• Based on previous work increased expression of defense
genes is co-regulated and is correlated with resistance
to oxidative stress. Failure to cope is correlated with
little or no defense gene activation. Candidate resistance
genes follow this pattern of expression.
• A common core of defense genes exists, which responds
to several different stresses.
Components of Stress Study
Pine Drought
Stress
Experiments
Expresso
Prototype
Select Pine cDNAs
384, 2400 (1999, 2001)
Design and Print
Microarrays
Design Functional
Hierarchy
Integrate and
Analyze
Inductive Logic
Programming (ILP)
Capture Spot
Intensities
 = water potential (bars)
Imposition of Successive Cycles of Mild or Severe Drought Stress
on 1-year-old Loblolly Pine Seedlings
Water
withheld
Water
withheld
Water
withheld
Water
withheld
0
-2
RNA
Harvest
RNA
Harvest
RNA
Harvest
RNA
Harvest
I
II
III
IV
-10
Water
given
-15
Water
given
Water
given
Water
given
Cycles of
Mild
Drought
Stress
 = water potentional (bars)
DAYS
Water
withheld
Water
withheld
Water
withheld
0
-2
-10
-15
Cycle
I
= PS (photosynthesis)
RNA
Harvest
RNA
Harvest
RNA
Harvest
I
II
III
Water
given
Cycle
II
DAYS
Water
given
Cycle
III
Water
given
Cycles of
Severe
Drought
Stress
Categories within Protective
and Protected Processes
Gene
Expression
Signal
Transduction
Protease-associated
ROS and Stress
Environmental
Change
Protective
Processes
Nucleus
Cell Wall Related
Trafficking
Phenylpropanoid
Pathway
Development
Protected
Processes
Secretion
Cells
Cytoskeleton
Tissues
Plant Growth Regulation
Chloroplast Associated
Metabolism
Carbon Metabolism
Respiration and Nucleic Acids
Mitochondrion
Abiotic
Biotic
Stress
Protective
Processes
Cell Wall Related
“Isoflavone
Reductases”
Antioxidant
Processes
Phenylpropanoid
Pathway
Categories
within
“Protective
Processes”
Drought
Dehydrins, Aquaporins
Heat
Non-Plant
Heat shock proteins
(Chaperones)
Xenobiotics
GSTs
Chaperones
NADPH/Ascorbate/
Glutathione
Scavenging Pathway
Sucrose Metabolism
Cellulose
Arabionogalactan proteins
Cytosolic
ascorbate
peroxidase
superoxide
dismutase-Fe
superoxide
dismutase-Cu-Zn
glutathione
reductase
Extensins and proline rich proteins
Hemicellulose
Pectins
Xylose
Other Cell Wall Proteins
Lignin Biosynthesis
isoflavone reductases
phenylalanine ammonia-lyases
S-adenosylmethionine decarboxylases
glycine hydromethyltransferases
4-coumarate-CoA
ligases
CCoAOMTs
cinnamyl-alcohol
dehydrogenase
Hypotheses versus Results –
1999 Expt
o Among the genes responding positively to mild
stress, there exists a population of genes
whose expression is negative or unchanged
under severe stress.
– Candidate stress resistance genes. Genes in 69
categories ( e.g. HSP70s and 100s, aquaporins, but
not HSP80s) responded positively to mild stress.
Effect of severe stress was not detectable or negative.
Hypotheses versus Results –
1999 Experiment
Genes associated with other stresses responded to
drought stress
–Isoflavone reductase homologs and GSTs responded
positively to mild drought stress.
–These categories are previously documented to respond to
biotic stress and xenobiotics, respectively.
–However, both isoflavone reductase homologs and GSTs
responded positively also to severe drought stress. Thus,
they do not fall into the category of candidate stress
resistance genes.
Candidate Categories
• Include
– Aquaporins
– Dehydrins
– Heat shock proteins/chaperones
• Exclude
– Isoflavone reductases
Flow of a Microarray Experiment
PCR
Select cDNAs
Replication and
Randomization
Robotic Printing
Hypotheses
Identify Spots
Intensities
Statistics
Hybridization
Test of
Hypotheses
Extract RNA
Clustering
Reverse
Transcription and
Fluorescent
Labeling
Data Mining, ILP
Design of Microarrays I --Randomization
• Selected 384 archived ESTs
• Organized into four 96-well microtitre source
plates after PCR
• Pipetted into 8 sets of four randomized microtitre
plates
• Each set is a different randomized arrangement of
the 384 ESTs
Design of Microarrays II --Replication
• Printed type A microarrays from first four sets (16
plates); printed type B microarrays from second
four sets
• Each array type has four replicates of each EST,
randomly placed
• Each comparison was performed with four different
hybridizations, with dyes reversed in two
• Total of 16 replicates of each EST in each
comparison
Spot and Clone Analysis
• Image Analysis: gridding, spot identification,
intensity and background calculation,
normalization
• Statistics:
• Fold or ratio estimation
• Combining replicates
• Higher-level Analysis:
• Clustering methods
• Inductive logic programming (ILP)
Spot Identification and
Intensity Analysis
• Microarray Suite: Manual grid; extract intensities
for each spot; compute ratios; compute calibrated
ratios
• Spot Statistics:
– Every calibrated ratio is divided by the mean of all
the uncalibrated ratios; the result is simply that the
mean of the calibrated ratios is 1.0
– Our tools use the logarithm of each calibrated ratio
– Positive: expression increase
– Negative: expression decrease
– Zero: no change in expression
Analysis of Expression Data
• The multiple (typically 16) log calibrated ratios for
a replicated clone do NOT follow a normal
distribution.
• Distribution is spread relatively evenly over a large
range.
• Statistical analysis based on mean and standard
deviation will be overly pessimistic in identifying
clones that are up- or down-expressed.
• From the observation of an even spread of the log
ratios, we assume that a clone whose expression is
not different from a probe pair will show a
distribution centered at a mean log ratio of 0.0.
Computational Methods --Alternate Assumptions
• Our more general assumption avoids the trap of
having to classify the response of each SPOT; rather,
we classify the response of an EST as one of
– Up-regulated
– Down-regulated
– No clear change
• Response CLASSIFICATION rather than
QUANTIFICATION allows us to develop unified
relationships among genes and among treatments.
• Provides sufficient results for the use of inductive logic
programming (ILP).
Data Mining:
Inductive Logic
Programming
• ILP is a data mining algorithm expressly designed
for inferring relationships.
• By expressing relationships as rules, it provides
new information and resultant testable
hypotheses.
• ILP groups related data and chooses in favor of
relationships having short descriptions.
• ILP can also flexibly incorporate a priori biological
knowledge (e.g., categories and alternate
classifications).
Rule Inference in ILP
• Infers rules relating gene expression levels to
categories, both within a probe pair and across
probe pairs, without explicit direction
• Example Rule:
[Rule 142] [Pos cover = 69 Neg cover = 3]
level(A,moist_vs_severe,not positive) :level(A,moist_vs_mild,positive).
• Interpretation:
“If the moist versus mild stress comparison was
positive for some clone named A, it was
negative or unchanged in the moist versus
severe comparison for A, with a confidence of
95.8%.”
ILP subsumes
two forms of reasoning
• Unsupervised learning
– “Find clusters of genes that have similar/consistent
expression patterns”
• Supervised learning
– “Find a relationship between a priori functional
categories and gene expression”
• Hybrid reasoning: Information Integration
– “Is there a relationship between genes in a given
functional category and genes in a particular
expression cluster?”
– ILP mines this information in a single step
NSF-Supported Work of 2001:
Expresso Progress to Date
Margaret Ellis and Logan Hanks (computer
science graduate students):
• MEL: Semistructured data model for
experiment capture
• Parsing: Automatic parser generators to
drive archival storage
• Database: Loading and cataloging MEL data
in a Postgres RDBMS
• Pipeline: Linkages to data analysis and data
mining software
 = water potential (bars)
Imposition of Successive Cycles of Mild or Severe Drought Stress
on 1-year-old Loblolly Pine Seedlings
Water
withheld
Water
withheld
Water
withheld
Water
withheld
0
-2
RNA
Harvest
RNA
Harvest
RNA
Harvest
RNA
Harvest
I
II
III
IV
-10
Water
given
-15
Water
given
Water
given
Water
given
Cycles of
Mild
Drought
Stress
 = water potentional (bars)
DAYS
Water
withheld
Water
withheld
Water
withheld
0
-2
-10
-15
Cycle
I
= PS (photosynthesis)
RNA
Harvest
RNA
Harvest
RNA
Harvest
I
II
III
Water
given
Cycle
II
DAYS
Water
given
Cycle
III
Water
given
Cycles of
Severe
Drought
Stress
Differential
Expression
Replication
Final Harvest; Control versus Mild Stress; 2001
Cy3 TIFF
Image
Cy5 TIFF
Image
Final Harvest; Control versus Mild Stress; 2001
Cy5 to Cy3 ratios. Final harvest after four drought
cycles. RNA harvested 24 hours after final watering.
Cy5 = treated; Cy3 = control.
Aquaporins responded positively. HSP 80’s were
unaffected (same as in 1999 results).
Drought Stress Responses in
Loblolly Pine: Questions to be
Addressed
• Can a hierarchy of drought stress resistance
mechanisms be identified ?
• Can a clear distinction be made between rapidly
responding and long term adaptational
mechanisms?
• Can particular subgroups within gene families
be associated with drought tolerance?
Proposed Project: 2002-2005
Plant Biology (with co-PIs: Ron Sederoff,
NCSU; Carol Loopstra, TAMU)
• An investigation of drought stress responses in
lobolly pine in a variety of provenances.
• Quantitative RT-PCR to confirm and expand
results obtained with microarrays.
• In situ hybridization to stressed and
unstressed cell and tissue types.
Proposed Project: 2002-2005
Sources of cDNAs for 2002-2005 arrays
• NCSU ESTs selected on the basis of function.
• Stressed cDNA libraries from roots and stems
of drought tolerant families from East Texas
and Lost Pines, and from the Atlantic Coastal
Plain (humid conditions).
• Homologs of drought-responsive Arabidopsis
genes.
Drought Stress Responses in
Loblolly Pine: Future
Bioinformatics Goals
• Support incorporation of biological information
in the form of functional hierarchies and gene
families.
• Close the computational and experimental loop
to support iterative experimental regimes.
• Integrate information from multiple experiments
involving multiple provenances, drought stresses,
and EST sets.
Gene Discovery in the Arabidopsis Transcriptome
Drought Stress
(short and long
term)
Postgres Database
Hybridize to
Arabidopsis
Transcriptome
Database
Queries
Data Mining,
ILP
Statistical
Analysis and
Clustering
Scanning, Image
Processing
Data Capture
Possible
Identification of
Novel Drought
Responsive Genes
in Arabidopsis
Identification of Drought Responsive Genes and
Pathways Across Provenances in Loblolly Pine
Select Pine cDNAs
Via Contigs
Robotic Replication
and Printing
Hybridization
Scanning, Image
Processing
Close The Loop
Database
Queries
Identification of
Drought
Responsive Pine
Genes
Arabidopsis
Drought
Responsive genes
Postgres Database
Drought Stress
Experiments on
NC, TX Pine
Data Mining,
ILP
Statistical
Analysis and
Clustering
Data Capture
Proposed Project: 2002-2005
Bioinformatics I (Alscher, Heath,
Ramakrishnan)
• Constraint-based selection of cDNAs,
including intelligent use of contigs.
• Assignment of pine ESTs to subgroups within
protein families (ProDom, Pfam).
• Extend information integration in ILP to
include Mendel classification of gene families.
• Integrating data across provenances and
known degrees of drought tolerance.
Proposed Project: 2002-2005
Bioinformatics II (Ramakrishnan, Heath)
• Specialize ILP for particular biological
information sources.
• Automatic tuning of ILP parameters.
• Pushing data mining functionality into the
database.
• Interleaving and iteration of query, data
analysis, and data mining operations.