Transcript Lecture8
Lecture 9
Genomic wide analysis of nucleic acids.
A Genome Revolution in Biology
and Medicine
• We are in the midst of a "Golden Era" of
biology
• The Human Genome Project has
produced a huge storehouse of data that
will be used to change every aspect of
biological research and medicine
• The revolution is about treating biology as
an information science, not about specific
biochemical technologies.
Historical Milestones
Year
Milestone
1866
Mendel’s discovery of genes
1871
Discovery of nucleic acids
1951
First protein sequence (insulin)
1953
Double helix structure of DNA
1960s
Elucidation of the genetic code
1977
Advent of DNA sequencing
1975-79
First cloning of human genes
1986
Fully automated DNA sequencing
1995
First whole genome (Haemophilus Influenza)
1999
First human chromosome(Chr #22)
2000
Drosophila / Arabidopsis genomes
2001
Human and mouse genomes
Much more genomes since them!!
• Genomic data
– Whole genome data sets. According to
http://www.ebi.ac.uk/genomes/ as at 30-sept-04
•
•
•
•
•
•
•
•
Archea – 19
Bacteria – 167
Eukaryota - 36
Organelles – 569
Phages – 137
Plasmids – 204
Viroids – 36
Viruses – 911
• TOTAL:2079
The ….omics
Genomics
• The application of high-throughput
automated technologies to molecular biology.
• The experimental study of complete
genomes.
Genomics Technologies
• Automated DNA sequencing
• Automated annotation of sequences
• DNA microarrays
– gene expression (measure RNA levels)
– single nucleotide polymorphisms (SNPs)
• Protein chips (SELDI, etc.)
• Protein-protein interactions
New Types of Biological Data
• Microarrays - gene expression
• Multi-level maps: genetic, physical,
sequence splicing, expression, function
• Networks of protein-protein interactions
• Cross-species relationships
• homologous genes
• chromosome organization (synteny)
• common regulatory sequences
Biological Information
Protein 2-D gel
mRNA Expression
Protein 3-D Structure
Mass Spec.
Genome sequence
The Cell
What is gene expression?
• The amount of RNA produced from a gene.
• Level of RNA produced from a gene is controlled
by:
– Transcription
– Degradation
• Transcriptome - Expressed transcripts in a cell
under defined experimental conditions.
– mRNA(5-10% of total RNA).
– rRNA, tRNA - make up most of total RNA
Analysis of gene expression at
the single gene level.
• Northern Blots
– Measure RNA levels by hybridization of a
labeled probe to total RNA.
• Reporter Genes
– Use of an enzyme to measure the amount of
transcription from a promoter.
• Quantitative RT-PCR.
Assaying the regulation of 1000s of
genes in a single experiment
• DNA microarrays
– DNA molecules printed at high density used to
determine the level of RNA or DNA in a
sample.
– Can be thought of a “reverse Northern blots”
• Other technologies
- SAGE
- Microbeads
DNA Microarrays
• Spotted DNA arrays (glass slides)
– Competitive binding of samples
– Fluorescent detection - Cy3 and Cy5
– Small sample sizes (10-30µl).
– PCR or cDNA arrays
– Long oligonucleotide arrays
• Short oligonucleotide arrays
– ex. Affymetrix
• DNA spotted onto nylon membranes
(macroarrays)
Applications of DNA microarrays
• Expression profiling
– Determining the relative levels of RNA in two or more
samples.
• DNA/DNA hybridizations
– Investigate gene content between different strains
– Determine gene dosage
– 16S arrays - microbial communities (being
developed).
• Identification of protein binding sites
– ChIP-Chip. Immunoprecipitation of protein/DNA
complexes. Assaying those interactions with
microarrays.
cDNA spotted microarrays
Labeling RNA or DNA with
Cy3 or Cy5.
• Cy3 and Cy5 - most often used fluorescent
molecules used to label samples for microarray
analysis.
– Absorb light at one wavelength and emit at another.
– Emission and Excitation spectra do not overlap
significantly.
– In arrays Cy3 and Cy5 are usually false colored green
(Cy3) and red (Cy5) for ease of visualization.
Affymetrix Gene Chips
Microarray Experiment - labeling, hybridizing, scanning
Affymetrix = Oligonucleotide Microarray
Each gene on an Affy chip is represented by a probe set
Rationality of Affy analysis
- MM probes are used to measure background signals
due to non-specific sources and scanner offset.
- Using a MM probe as an estimate of background
seems them great in theory.
- The expression value for a gene is a combination of
the (PM-MM) signals for each of the probes (i.e. the
average)
Microarray Data Analysis
•
•
•
•
Data mining and visualization
Controls and normalization of results
Statistical validation
Linkage between gene expression data
and gene sequence/function/metabolic
pathways databases
• Clustering and pattern detection
• Discovery of common sequences in coregulated genes
Regulons and Stimulons
• Operon - group of genes co-expressed on a
single transcript.
– One location of the genome
• Regulon - genes that are regulated by a single
transcription factor.
– Genes and operons throughout the genome
• Stimulon - collection of genes that are regulated
in response to environmental changes.
– Can be multiple regulons affected at once.
• Regulatory network - alternative term for
regulon.
Identifying genes
whose expression
changes at specific
stages of the cell cycle
RBK1
PHO87
BUD5
MATa2
MATa1 TSM1
HO
Microarray analysis of 150
damage-regulated mRNAs
after a single unrepaired
HO-induced DSB
YCR043C
12346 hr
1 2 3 4 6 hr
YCR033W
YCR034W
YCR035C
YCR036W
YCR037C
YCR038C
YCR039C
YCR040W
YCR041W
YCR042C
YCR043C
YCR044C
YCR045C
FEN3
RRP43
RBK1
PHO87
BUD5
MATa 2
MATa 1
TSM1
4 kb/hr
Audrey Gasch
Moreshwar Vaze
day night day night day night
Circadian Rhythms
Genes whose
expression changes
during the day
in fruit flies
Cancer can be qualified from the transcriptome
Bioinformatics
• Genomics produces high-throughput,
high-quality data, and bioinformatics
provides the analysis and interpretation of
these massive data sets.
• It is impossible to separate genomics
laboratory technologies from the
computational tools required for data
analysis.
What type of data we can use to
build a transcriptional network?
-Protein-Protein interaction data
-Expression data
-ChIP data
CHIP ON CHIP
Comparative Genomics
•
The Assumption that underlies comparitive genomics is that the two
genomes had a common ancestor and that each organism is a combination
of the ancestor and the action of evolution.
•
Evolution can be broadly thought of as the combination of two processes:
mutational forces that generate random mutations in the genome
sequence, and selection pressures that
1. Eliminate random mutations (negative selection),
2. Have no effect on mutations (neutral selection) or,
2. Increase the frequency of mutant alleles in the population as a result
of a gain in fitness (positive selection).
•
The combined action of mutation and selection is represented generally by a
RATE MATRIX of base-pair changes between the two observed genomes.
Human
Comparative Genomics
Mouse
Rat
Evolutionary
relationship
between metazoans
that are sequenced,
or due for
sequencing.
Evolutionary
distances are in
millions of years.
C.Elegans
Comparative Genomics
• Comparative genomics may be
defined as the derivation of genomic
information following comparison of
the information content of 2 or more
species genome sequences
The similarity is such that human chromosomes can be cut
(schematically at least) into about 150 pieces (only about 100 are
large enough to appear here), then reassembled into a reasonable
approximation of the mouse genome.
http://www.ornl.gov/TechResources/Human_Genome/graphics/slides/ttmousehuman.html
Harnessing the genome to answer real problems
How do we control infectious disease?
How do we slow or stop the effects of cancer?
How can we detect and treat genetic disorders?
Only 2% of human diseases are due to single gene defects
the rest involve networks of gene expression.
Most pharmaceutical drugs act on individual proteins
or sets of proteins.
Proteomics
The study of the ‘proteome’
While an organism has only one genome,
it has many transcriptomes, proteomes and metabolomes
mRNA level expressed protein level nor does it
indicate the nature of the functional protein product
Genomic
Sequence
mRNA
Protein
Product
Functional
Protein
Product
Translational
Control
Transcriptional
Control
Post-Translational
Control
Temporal Changes in mRNA and protein
t
Gene
t
Expression
t
Protein
When you measure expression affects what you find
Does mRNA level correlate with protein level?
1000
1000
100
10
1
0.1
Glutathione-S-transferase
in 60 human cell lines
mRNA (Northern)
mRNA (EST clones)
20 liver proteins
and corresponding mRNAs
R=0.48
0.1
1
10
Protein (2D gels)
Anderson & Seilhamer
Electrophoresis
1997 18:533-537
100
100
x
10
x
x
xx
1.0
R = 0.43
0.1
0.1
1.0
10
100
Protein (Affinity-HPLC)
Anderson & Anderson
Electrophoresis
1998 19:1853-1861
From Tew et al 1996
Lung
Ovarian
x
CNS
Leukemia
Renal
Melanoma
Breast
Genomics, proteomics era.
-Lots of data (lots of real data and lots of noise!). Needs validation!
-Dangers :
+ Become too descriptive and reductionist
+ Forget about the biological problem
Year by year we are becoming better equipped to
accomplish the things we are striving for.
But what are we actually striving for?
- Bertrand de Jouvenel, 1903-1987
Success is the ability to go
from failure to failure without
losing your enthusiasm.
- Winston Churchill, 18741965