Functional Genomics

Download Report

Transcript Functional Genomics

Functional Genomics
in Non-Model Organisms
What is Functional Genomics?
•
•
Functional genomics refers to the development and application of
global (genome-wide or system-wide) experimental approaches to
assess gene function by making use of the information and reagents
provided by structural genomics. It is characterized by high-throughput
or large-scale experimental methodologies combined with statistical or
computational analysis of the results (Hieter and Boguski 1997)
Functional genomics as a means of assessing phenotype differs from
more classical approaches primarily with respect to the scale and
automation of biological investigations. A classical investigation of gene
expression might examine how the expression of a single gene varies
with the development of an organism in vivo. Modern functional
genomics approaches, however, would examine how 1,000 to 10,000
genes are expressed as a function of development. (UCDavis Genome
Center)
Functional Genomics
Hunt & Livesey (eds.)
•
•
•
•
•
•
•
Subtracted cDNA Libraries
Differential Display
Representational Difference Analysis
Suppression Subtractive Hybridization
cDNA Microarrays
Serial Analysis of Gene Expression
2-D Gel Electrophoresis
My View of Functional Genomics
• Differential Gene expression
– SAGE/MPSS
– RDA/SSH
– *Open systems*
• Identifying the Function of Genes
– Functional Complementation
– RNA interference/RNA silencing
Disclaimer
•
•
•
•
Relevant primarily to eukaryotes
Most common systems (literature/class)
Personal experience with them
I like them
Why We Need Functional Genomics
Organism
E. coli
yeast
C. elegans
Drosophila
Arabadopsis
mouse
human
# genes
% of genes with
inferred function
Completion date
of genome
4288
6,600
19,000
12-14K
25,000
~30,000?
~30,000?
60
40
40
25
40
10-20
10-20
1997
1996
1998
1999
2000
2002
2000
My Two Cents (as expressed by Hieter & Boguski 97)
• Functional genomics will not replace the timehonored use of genetics, biochemistry, cell biology
and structural studies in gaining a detailed
understanding of biological mechanisms.
• The extent to which any functional genomics
approach actually defines the function of a
particular protein (or set of proteins) will vary
depending on the method and gene involved.
mRNA abundance classes
(Okamuro & Goldberg)
• Superabundant
– 15-90% of mRNA mass
– <10 structural gene transcripts
– >5000 molecules per cell per sequence
• Abundant
– 50-75% of mRNA mass
– ~200-1000 structural gene transcripts (5% of diversity)
– 500-2500 molecules per cell per sequence
• Rare/complex
– <25% of mRNA mass; individual seqs <0.01%
– 95% of mRNA diversity
– 1-10 molecules per cell per sequence
SAGE & MPSS
•
•
•
•
Serial Analysis of Gene Expression
Massively Parallel Signature Sequencing
Start from mRNA (euks)
Generate a short sequence tag (9-21 nt) for
each mRNA ‘species’ in a cell
Generate cDNA primed with
biotin-oligo(dT)
Restriction digest double-stranded cDNA with
a 4-base cutter “anchoring enzyme”; bind to
streptavidin coated beads
AAAA
TTTT
AAAA
TTTT
AAAA
TTTT
GTAC
AAAA
TTTT
GTAC
Divide pool in half & ligate to different linkers (1 or 2),
both of which have a restriction site for the “tagging enzyme”
1
CATG
GTAC
AAAA
TTTT
2
CATG
GTAC
AAAA
TTTT
Restriction digest with a Type IIS restriction enzyme, which recognizes
the linker sequences and cuts downstream in a sequence independent
fashion; fill-in 5’ overhang to blunt ends.
1
GGATGCATGXXXXXXXXXX
CCTACGTACXXXXXXXXXX
Blunt end ligate pool 1 to pool 2, and PCR amplify with
primers specific to linker sequences 1 and 2
1
2
Tag 1
GGATGCATGOOOOOOOOOO
CCTACGTACOOOOOOOOOO
Tag 2
GGATGCATGXXXXXXXXXXOOOOOOOOOOCATGCATCC
CCTACGTACXXXXXXXXXXOOOOOOOOOOGTACGTAGG
2
Ditag
Restriction digest with same anchoring enzyme (above);
concatenate ditags and ligate to cloning/sequencing vector
Ditag
Ditag
-----CATGXXXXXXXXXXOOOOOOOOOOCATGXXXXXXXXXXOOOOOOOOOOCATG----
----GTACXXXXXXXXXXOOOOOOOOOOGTACXXXXXXXXXXOOOOOOOOOOGTAC---Tag 1
Tag 2
Tag 3
Tag 4
SAGE
•
•
•
•
Described by Velculescu et al. (1995)
Originally 9 bp tags, now LongSAGE 21 bp
10-50 tags in a clone
Only requires a sequencer (and some time)
MPSS
•
•
•
•
Proprietary technology; published 2000
Generates 17 nt “signature sequence”
Collects >1,000,000 signatures per sample
Requires 2 µg of mRNA and $$
What is significantly different?
Ruijter et al. 2002. Physiol. Genomics 11:37-44.
What is significantly different?
Planning SAGE experiments…
How many tags
need to be
sequenced?
Comparing 2 libraries…
MPSS - Alexandrium fundyense
90000
100
80000
90
70000
80
-N signatures (tpm)
-N signatures (tpm)
39931 unique tags; 3172 different at p<0.001
60000
50000
40000
30000
20000
70
60
50
40
30
20
10000
10
0
0
0
20000
40000
60000
-P signatures (tpm)
80000
100000
0
20
40
60
-P signatures (tpm)
80
100
Not every tag is a unique sequence
Not every sequence has a unique tag
• Alternative splicing, >1 tag per gene
• No restriction site, no tags per gene
• Sequencing error (random, 0.7% for SAGE,
Velculescu et al. 1995)
• Antisense transcripts
Tag Abundance Distribution
10000
100000
1000
# tags -P
# tags -N
100
10
1
# tags (p<0.001)
# of tags
10000
1000
# sig tags -P
100
# sig tags -N
10
1
>1%
>0.1%
>0.01%
abundance
>0.001%
<0.001%
>1%
>0.1%
>0.01%
abundance
>0.001%
<0.001%
1.5
P=
N
>N
:P
>1
N:
P>
1.5
N:
P>
2
N:
P>
5
N:
P>
10
N:
P>
20
N:
P>
40
N:
P>
50
N:
P=
0
P:
N=
0
P:
N>
50
P:
N>
40
P:
N>
20
P:
N>
10
P:
N>
5
P:
N>
2
P:
N>
1.5
1.5
>P
:N
>1
# tags (p<0.001)
P=
N
N:
P>
1
N:
P>
1.
5
N:
P>
2
N:
P>
5
N:
P>
10
N:
P>
20
N:
P>
40
N:
P>
50
N:
P=
0
1.5
>
P:
N=
0
P:
N>
50
P:
N>
40
P:
N>
20
P:
N>
10
P:
N>
5
P:
N>
2
P:
N>
1
1.5
.5
>P
:N
>1
# of tags
100000
10000
1000
100
10
1
expression ratio
10000
1000
100
10
1
expression ratio
Expression
Ratio
RDA
• Initially used for DNA comparisons (Lisitsyn et al.
1993)
• Later modified for cDNA to reduce complexity
(Hubank and Schatz 1994)
• May need >1 enzyme to cover all genes
• Should pick up transcript present at <=0.005%
• Time-intensive + a LOT of manipulation
Success with RDA
• DNA markers in ginbuna (Murakami et al.
2002)
• mRNA induced under hypoxia in tiger
salamander (McKean et al. 2002)
• Rice & date palm 2002; oak 2001; tobacco
2000; pea & maize 1998; earliest 1996
• No more recent refs
MPSS - Alexandrium fundyense
90000
100
80000
90
70000
80
-N signatures (tpm)
-N signatures (tpm)
39931 unique tags; 3172 different at p<0.001
60000
50000
40000
30000
20000
70
60
50
40
30
20
10000
10
0
0
0
20000
40000
60000
-P signatures (tpm)
80000
100000
0
20
40
60
-P signatures (tpm)
80
100
Tester cDNA with Adaptor 1
Driver cDNA (in excess)
Tester cDNA with Adaptor 2
first hybridization
all components denatured
a
b
c
d
{
second hyb: mix, add freshly denatured driver; anneal
a,b,c,d + e
fill in the ends
a
add primers; PCR amplify
no amplification
b
no amplification
c
linear amplification
d
no amplification
e
exponential amplification
Efficacy of SSH…
Ji et al. 2002 BMC Genomics 3:12
• Diatchenko et al. 1996; could detect as little
as 0.001% target
• Critical factor is relative concentration of
target in tester and driver populations
• Effective enrichment when:
– Target present at >= 0.01%
– Concentration ratio>= 5-fold
What this looks like
5000
1000
4500
900
4000
800
3500
700
-N signatures (tpm)
-N signatures (tpm)
208 signatures at >=0.01%, >= 5-fold induction
3000
2500
2000
1500
600
500
400
300
1000
200
500
100
0
0
0
1000
2000
3000
-P signatures (tpm)
4000
5000
0
200
400
600
-P signatures (tpm)
800
1000
Success with SSH
• Armbrust 1999, diatoms
• Lots of biomedical refs 2003
• Xylella, Aspergillus, Dunaliella
Post-translational gene silencing
Fungi
Neurospora
quelling
transgenes
Plants
Petunia, Nicotiana,
Arabadopsis, rice,
tomato, potato, etc.
PTGS
Co-suppression
transgenes
viruses
Animals:
Invertebrates
C. elegans
Drosophila
Paramecium
Planaria
Hydra
T. brucei
RNAi
RNAi
Co-suppression
RNAI
RNAi
RNAi
dsRNA
dsRNA
transgenes
dsRNA
dsRNA
dsRNA
Animals:
Vertebrates
Zebrafish
mouse
RNAi
RNAi
dsRNA
dsRNA
Kamath et al. 2003
16,757 strains = 86% of predicted ORFs
Looked for sterility or lethality(Nonv), slow growth (Gro) or defects (Vpep)
1,722 strains (10.3% had such phenotypes)
Genes involved in basic metabolism & cell
maintenance are enriched for Nonv
phenotype
Genes involved in more complex
‘metazoan’ processes (signal transduction,
transcriptional regulation) are enriched for
Vpep phenotype
Nonv phenotypes highly underrepresented
on the X chromosome
X chromosome is enriched for Vpep
phenotypes
Basal functions of eukaryotes are shared:
- lethal (Nonv) genes tended to be of ancient origin
- ‘animal-specific’ genes tended to be non-lethal (Vpep)
- almost no ‘worm-specific’ genes were lethal
Genes producing a defective phenotype are clustered:
Nonv clustered in central regions, except:
on the X chromosome, which is underenriched for Nonv phenotypes
Functional Complementation
• Often yeast, E. coli
• The goal of the SGDP is to generate as complete a set as
possible of yeast deletion strains with the overall goal of
assigning function to the ORFs through phenotypic
analysis of the mutants.
• As of 01/03, 95% of the approx. 6200 ORFs have been
deleted; more than 20,000 strains are available from
Research Genetics, Open Biosystems and the ATCC.
Functional Complementation
• Intramembrane cleaving proteases:
Drosophila rhomboid complements the
aarA of Providencia stuartii and vice versa
(Gallio et al. 2002)
• Cyclophilin-RNA interacting proteins in
Paramecium, conserved from yeast to
humans (Krzywicka et al. 2001)