Potential use of microarrays and related methodologies in

Download Report

Transcript Potential use of microarrays and related methodologies in

Uses of microarrays and
related methodologies in
animal breeding
Quic kT ime™ and a T IFF (Unc ompres sed) dec ompres sor are needed to s ee this picture.
Bruce Walsh, [email protected]
University of Arizona
(Depts. of Ecology & Evolutionary Biology, Molecular & Cellular Biology,
Plant Sciences, Animal Sciences, and Epidemology & Biostatistics)
The basic idea behind gene
expression arrays
• With a complete (or partial) genome
sequence in hand, one can array sequences
from genes of interest on small chip, glass
slide, or a membrane
• mRNA is extracted from cells of interest
and hybridized to the array
• Genes showing different levels of mRNA
can be detected
Types of microarrays
• Synthetic oligonucleotide arrays
– Chemically synthesize oligonucleotide sequences
directly on slide/chip/membrane (e.g., using
photolithography)
– Affymetrix, Agilent
• Spotted cDNA arrays
– PCR products from clones of genes of interest
are spotted on a glass slide using a robot
– Extracted cellular mRNA is reversetranscribed into cDNAs for hybridization
Cell type 1
Cell type 2
Extract mRNA
Label mRNA with red
fluorescent dye (Cy5)
Label mRNA with Green
fluorescent dye (Cy3)
Cell
Type 1
Cell type 2
Hybridize mRNA to array
The color of the spot
corresponds mRNA
to theequal mix
from cell Types
relative concentrations
1 and 2
of mRNAs for that gene
in the two cell types
mRNAsfor
from
these
mRNAs
from
these
mRNAs
these
Genes
of roughly
equal
genes
more
abundant
genes
more
abundant
Abundance
celltype
type21in both cell
inin
cell
types
Analysis of microarray data
• Image processing and normalization
• Detecting significant changes in expression
• Clustering and classification
– Clustering: detecting groups of co-expressed
genes
– Classification: finding those genes at which
changes in mRNA expression level predicts
phenotype
Significance testing-- GLM
Yklijk = u + Ak +Rkl + Ti + Gj + TGij +elkijk
Array
k
Replicate
l Gene j i
Treatment
in array k between
Interaction
k-th spotting of gene j under
genei ionand
treatment
treatment
replicate
l of jarray
k
Problem of very many tests (genes)
vs. few actual data vectors
• Expectation: A large number of the GxT
interactions will be significant
– Controlling experiment-wide p value is very
overly conservative (further, tests may be
strongly correlated)
• Generating a reduced set of genes for
future consideration (data mining)
– FDR (false discovery rate)
– PFP (proportion of false positives)
– Empirical Bayes approaches
Which loci control array-detected
changes in mRNA expression?
• Cis-acting factors
– Control regions immediately adjacent to the
gene
• Trans-acting factors
– Diffusable factors unlinked (or loosely linked)
to the gene of interest
• Global (Master) regulators
– Trans-acting factors that influence a large
number of genes
David Treadgill’s (UNC) mouse
experiment
• Recombinant Inbred lines from a
cross of DBA/2J and C57BL
• The level of mRNA expression
(measured by array analysis) is
treated as a quantitative trait and
QTL analysis performed for each
gene in the array
Distribution
of >12,000
gene
interactions
CIS-modifiers
MASTER modifiers
Genomic location of genes on array
TRANS-modifiers
Genomic location of mRNA level modifiers
Candidate loci : Differences in
Gene Expression between lines
• Correlate differences in levels of
expression with trait levels
• Map factors underlying changes in
expression
– These are (very) often trans-acting factors
• Difference between structural alleles and
regulatory alleles
– Different structural alleles may go undetected
on an array analysis
Expanded selection opportunities
offered by microarrays
• GxE
– Candidate genes may be suggested by examining
levels of mRNA expression over different
major environments
– With candidates in hand, potential for
selection of genes showing reduced variance in
expression over critical environments
• Breaking (or at least reducing) potentially
deleterious genetic correlations
– Look for variation in genes that have little (if
any) trans-acting effects on other genes
Towards the future
• Selection decisions using information on
gene networks / pathways
• Microarrays are one tool for
reconstructing gene networks
• Additional tools for examining proteinprotein interactions
– Two hybrid screens
– FRET & FRAP
– 2D Protein gels
Analysis and Exploitation of
Gene and Metabolic Networks
• Graph theory
• Most estimation and statistical issues
unresolved
• Major (current) analytic tool:
Kascer-Burns Sensitivity Analysis
Gene networks are graphs
Kascer-Burns Sensitivity Analysis
(aka. Metabolic Control Analysis)
“All theory
modelsshould
are wrong,
are useful”
“No
fit allsome
the models
facts because
some of
(Box)
the
facts are wrong” (N. Bohr)
Flux = production rate of a
Perhaps we increase the
concentration
of ehere
particular
product,
F
1
However, it may be more efficient
The flux control coefficient, introduced by
To increase the concentration of e
Kascer and Burns, provides a quantitative solution 4
How best to increase the flux through this
to this problem
pathway?
Flux Control Coefficients,
j
Ci
The control coefficient for the flux at step i in
a pathway associated with enzyme j,
j
Ci
@F i E j
@ln Fi
=
=
@E j f i
@ln E j
Roughly speaking, the control coefficient is
the percentage change in flux divided by
percentage change in enzyme activity
.
Flux
Why many mutations are recessive: a 50%
reduction in activity (the heterozygote)
results in only a very small change in the flux
Activity
When
When the
the activity
activity of
of E
E is
is large,
near zero,
C
C is
is close
close to
to zero
1
Kacser-Burns Flux
summation theorem:
X
j
Ci
= 1
i
•• While
Coefficients
are
notproteins
intrinsic
properties
most
valuescoefficient
of C for
are
positive,
If
a
control
is
greatly
•
Truly
rate-limiting
steps
are
rare
negative
regulators
(repressors)
give
negativesystem
values,
of
an enzyme,
but
rather
a (local)
increased
in value,
this decreases
the
allowing for C values > 1.
property
values of other control coefficients
“rate-limiting” steps in pathways
Small-Kacser theorem: the factor f by which flux is
Hence, the limiting increase in f is
increased by an r-fold increase in activity of E is
11
f
=
j
f =
1 °r °CE1 j
1°
CE
r
Using estimated Control
Coefficients as selection aids
• Loci with larger C values should respond
faster to selection
• Such loci are obvious targets for screens
of natural variation (candidate loci)
• Selection with reduced correlations
– Tallis or Kempthorne - Nordskog restricted selection
index
– Select on loci with large C for flux of interest, smallest C
for other fluxes not of concern
– Positive selection on C for flux of interest, selection to
reduce flux changes in other pathways
We wish this flux to
remain unchanged
A more
Thecorrect
initial approach
approach,
might
however
be to
is to
Flux
we
wish to increase
Pick try
theeither
step(s)e3that
or emaximize
CF while
e1 orminimizing
e2
CH
4, rather than
Index selection on pathways
• The elements of selection include both
phenotype and C, and (possibly) marker
markers as well
• Problems:
– C is a local estimate, changing as the pathway
evolves
– Still have all the standard concerns with a
selection index (e.g., stability of inverse of
genetic covariance matrix)
– These are important caveats to consider even
under the rosy scenaro where all C’s are known
What to call it?
MAS = Marker Assisted Selection
CAS = Control Coefficient Assisted Selection
CASH $ = Control Activity Selection Helper
Summary
• Microarray analysis = data mining
• Potential (immediate) useage:
– Suggesting candidate loci
– More efficient use of G X E
– Reducing/breaking deleterious correlations
• Cis (easy) vs. trans (hard) control of
expression levels
• Future = analysis of pathways
– Index selection (and all its problems)
Farewell from the “desert”
U of A Campus
QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture.