Microarray technique and Functional genomics
Download
Report
Transcript Microarray technique and Functional genomics
Microarray and functional
genomics
Wenjing Tao
University of Missouri
Microarray: high through-put
whole genome approach
48 grids,
with 31k
probes
Each grid contain 650
probes
Microarray is a tool for analyzing gene
expression that consists of a small
membrane or glass slide containing
samples of many genes arranged in a
regular pattern
Microarray terminology
• Feature - an array element
• Probe - a feature corresponding to a defined
sequence (immobilized on a solid surface in
an ordered array)
• Target - a pool of nucleic acids of unknown
sequence
Microarray provides the opportunities
- Find the genes and assign them functions
- Predict protein structures and functions
- Reconstruct metabolic, signaling, and other pathways
- Reconstruct informational networks
- Link genotype to phenotype
- Use genotype/phenotype to predict relevant outcome
- Cross- species comparisons
Kinds of array features
Synthetic oligonucleotides:
Affymetrix genechip
Long oligo array
PCR products from:
Cloned cDNAs
Genomic DNA
cDNA & oligonucleotide arrays
100-300 m spot
20-25 mers
Schulze and Downward, 2001 Nat Cell Biol 3, 190
cDNA and long oligo array experiment
RNA
Target1
Target2
RT
Labeling with
Flouresent dye
RT
Hybridization
Scan
ORFs or ESTs
••••••••••
••••••••••
••••••••••
••••••••••
Design long oligoes
Microtiter plate
Microarray slides
Affymetrix
GeneChip
RED represents Test DNA
hybridized to the target DNA.
GREEN represents Reference
DNA hybridized to the target
DNA.
YELLOW represents a
combination of Test and
Reference DNA hybridized
equally to the target DNA.
BLACK represents areas
where neither the Reference
nor Test DNA hybridized to
the target DNA.
Fluorescent microarrays are composed of a
combined two false color laser scanned images
Image file post-processing
• Single slide normalization – GenePix Pro 4.1
• Slide-slide and dye-swap comparison – TMEV & MIDAS
• Cross-slides quality evaluation - GeneSpring + R script
for CV filter
• Mixed linear model analysis of Variance to identify
significant differentially expressed genes – R or SAS
program
• Data Analysis in the Post-Genomic Era (gene
annotation, ontology and pathway analysis– KOG, COG,
KEGG, TAIR, Onto-Tools, GenMapp…
• Data validation – qPCR or Northern blot
Whole genome approaches to
biological questions
• Gene expression
• Gene variation
• Gene function
Functional Genomics of Root Growth and
Root Signaling under drought
NSF-DB1-0211842, PI: Henry Nguyen
http://rootgenomics.missouri.edu/prgc/research.html
Drought-stress inducible genes and their possible functions
in stress tolerance and response.
Yamaguchi-Shinozaki et al. JIRCAS Working Report, 2002
Characterize the transcript profiles of
apical and basal regions of the root
growth zone under water deficit condition
using maize long oligonucleitide arrays
Dr. Henry Nguyen’s lab, Plant Sciences,
University of Missouri
Objectives
To identify genes contributing to root growth
maintenance under water deficit condition
To determine genes responsible for progressive
inhibition of root elongation under water-deficit
condition
To compare the differential gene expression in root
region of progressive inhibition of root elongation
under water stress with the normal growth
deceleration in well-watered root region
Pair-wise comparison of maize root
segments using oligo array
4
3
6
5
4
3
2
2
1
1
WW48
4
3
2
1
WS48
Characterization of the maize long
oligo array
• Maize oligo array, printed at the University
of Arizona, contains 56,311 70-mer
oligonucleotide probes, including >30,000
identifiable unique maize genes. 16,915
oligoes do not have any annotation.
• 70-mer oligonucleotides in conjunction with
Operon Qiagen based on the TIGR Maize
Database
Slides feature and dye-swap experiment
WS/WW=Cy5/Cy3
WS/WW=Cy3/Cy5
Dye Swap
Two-color microarray data feature
1. Channel A intensity vs. channel B
intensity
4. Z-score histogram
2. Log channel A intensity vs. log
channel B intensity
5. Box plot
3. R-I
Flip dye consistency checking
- processed data count: 27852 (only slides A)
- pre-filtering corr. coeff: 0.11360581
- post-filtering data count: 26747
- confidence factor: 0.9647781
- dispersion factor: 0.035401408
Summary of the evaluation of replicates
(technique & biological)
• ~50,000 of the 56,311 genes have intensity >200
(at least one channel).
• Confidence of dye-swap is > 96%
• 99.9% confidence limit was estimated by testing
the coefficient of variance (CV) for replicates
Mixed linear model analysis of two color microarray
data- producing lists of differentially expressed
genes with low false discovery rates
To obtain accurate and precise estimates of gene expression values between
treatment and control, analyze gene effects with a simultaneous
consideration of all blocking factors, a linear mixed ANOVA model is
applied:
There are two processes:
First, global mixed model was applied:
Log2(singal values) = treat + dye + treat*dye + tech_reps_effect +
array_effect (within treat*dye and tech_reps_effect)
Second, take residual values from the first model and then apply this
model for individual gene:
Residuals = treat + dye + tech_reps_effects + array(within
tech_reps_effects)
Gene function categorization of
significantly differentially expressed
genes
CELLULAR PROCESSES AND SIGNALING
10%
INFORMATION STORAGE AND PROCESSING - 4%
METABOLISM - 11%
POORLY CHARACTERIZED - 6%
NOT ASSIGNED – 69%
KOG analysis
Information storage and processing
Replication,
recombination and
repair
14%
RNA processing
and modification
16%
Chromatin
structure and
dynamics
13%
Transcription
34%
Translation,
ribosomal
structure and
biogenesis
23%
Metabolites
Secondary
metabolites
biosynthesis,
transport and
catabolism
14%
Energy
production and
conversion
15%
Amino acid
transport and
metabolism
14%
Inorganic ion
transport and
metabolism
16%
Lipid transport
and metabolism
14%
Coenzyme
transport and
metabolism
4%
Nucleotide
transport and
metabolism
5%
Carbohydrate
transport and
metabolism
18%
CELLULAR PROCESSES AND
SIGNALING
Cytoskeleton
6%
Nuclear structure
1%
Extracellular structures
1%
Defense mechanisms
8%
Cell cycle control, cell
division, chromosome
partitioning
3%
Cell
wall/membrane/envelope
biogenesis
9%
Cell motility
0%
Intracellular trafficking,
secretion, and vesicular
transport
11%
Posttranslational
modification, protein
turnover, chaperones
30%
Signal transduction
mechanisms
31%
Summary
• Microarray is a high through-put tool to
identify novel genes
• We have identified 19 hundred drought
response and root growth maintenance
related genes
• Combining functional analysis we would find
drought stress tolerance related pathways
and genes
• This knowledge will lead to novel
approaches for improving drought tolerance
in maize.