Transcript Final

Final
• Final: 2 of the following 3 choices,
– 1 hour exam covering recent materials (June 11),
– 2 page review of an assigned paper (due June 11),
– Self-study of a remaining chapter in the text,
answers to the “odd” problems (due June 11).
DNA Arrays
…DNA systematically arrayed at high density,
– virtual genomes for expression studies,
• RNA hybridization to DNA for expression studies,
– comparative genomics,
• DNA hybridization to DNA,
– inter- and intra-species comparisons, etc.
– potential yet to be developed.
Arrays
DNA Chip:
oligonucleotides, up to
1000s kb fragments.
solid substrate
Probes/Targets
...Probes: are the tethered nucleic acids with known
sequence,
– the DNA on the chip,
...Target: is the free nucleic acid sample whose
identity/abundance is being detected,
– the labeled nucleic acid that is washed over the chip.
DNA-Probes
– cDNA arrays, DNA arrays,
nucleic acid is spotted
onto the substrate.
• DNA Microarrays,
– oligonucleotide arrays,
• DNA chips.
nucleic acid is
synthesized directly
onto on the substrate.
DNA Chips
…oligonucleotides
systematically synthesized
in situ at high density.
Affymetrix DNA Chip
Allele-Specific Oligonucleotides
(DNA Chips)
--AGTAGCTaTAGCT---AGTAGCTGTAGCT--
--TCATCGACATCGA--
mismatch
no binding
--TCATCGACATCGA--
…allele specific oligonucleotides (ASOs)
recognize single base pair differences in
DNA sequences.
Ordered Array of ASOs
linker
molecule
...over a million ASOs and controls can be gridded per cm2.
Photolithography
…the process of using an optical image and a
photosensitive substrate to produce a
pattern,
• oligonucleotide synthesis can be inhibited by
a ‘protection group’ molecule,
• the ‘protection group’ can be linked by a
photosensitive bond, and thus cleaved by light.
QuickTime™ and a
Animation decompressor
are needed to see this picture.
Targets
...fluorescent targets,
– genomic DNA,
– cDNA, mRNA or cRNA
for expression studies,
…targets are washed over the chip for hybridization.
cDNA Microarrays
...denatured, double stranded DNA (500 - 5000 bp) is
dotted, or sprayed on a glass or nylon substrate,
...up to tens of thousands of spots per array,
quill technology...
Hybridization Detection
…fluorescent images are read by an optical scanner,
and intensities are compared using algorithms to
differentiate artifacts.
Screening for Genetic Disease
• Cystic fibrosis: 75% of mutations are at
the D508 deletion site,
– 8% are in three additional specific locations in
the gene, the rest are spread across the length
of the gene,
• Pre-Array tests yielded only an ~83%
chance of detecting a mutation.
Cystic fibrosis Detection
• Create a DNA chip with ASOs for wildtype Cystic fibrosis gene,
– approximately 4.5 kb of the 250 kb gene codes
for the structural portion of the gene,
• 225 20-mers span 4.5 kb,
• 20 mismatches per 20-mer requires 4500 ASOs, or
grids, plus controls.
Creating the Mask
…computer algorithms are used to design the
mask,
– creation of mask is now the limiting process,
requires months to accomplish, and about
$100,000 per mask,
– masks have limited lifetimes, each array costs
about $100 currently.
Cystic fibrosis Chip
…using photlithography, create a chip with
ASOs to identify any difference from wildtype DNA,
…match results with mutations at know
deleterious loci,
…catalog new deleterious loci.
1 Gene of Many
…with controls, the Cystic fibrosis gene may
require up to 20,000 grids,
…new chips can accommodate up to 1 million
grids,
…can look at 50 similarly sized genes on one
chip.
+
4000 Genetic Diseases
…as genes are linked to diseases, quick,
inexpensive tests can be performed to
determine who carries specific mutations,
…computer analysis will provide genome
profiles that predict a variety of traits.
Genome Profiling
…with 1500 SNPs now, and up to thousands
available, genetic profiles can be made,
…choose SNPs in or near genes involved in
traits or diseases,
…compare profiles over large populations.
How are we different?
…at the RNA level.
Southern Analysis
DNA hybridizing to RNA,
DNA Arrays and Expression
…grid gene-specific ASOs onto the DNA
chip, or cDNAs onto microarrays,
…assay with labeled cDNA, genes that are
expressed at a specific time, place or under
a specific condition will bind to the chip for
display.
Genes and Targets
• once the Human Genome Project is done,
all of the genes can be gridded,
– presently, several completely sequenced
genomes have been gridded,
• yeast,
• E. coli,
• various bacteria,
• drug identification, fundamental research,
etc.,
Gene Expression Technologies
• DNA Chips (Affymetrix) and MicroArrays can measure
mRNA concentration of thousands of genes simultaneously
• General scheme: Extract RNA, synthesize labeled cDNA,
Hybridize with DNA on chip.
The Experiment
• After hybridization
– Scan the Chip and obtain an image file
– Image Analysis (find spots, measure signal and noise)
• Output File
– Affymetrix chips: Measure each gene’s signal and make
a present/absent call.
– cDNA MicroArrays: competing hybridization of target
and control. For each gene the log ratio of target and
control.
Preprocessing: From one experiment to many
• Chip and Channel Normalization
– Aim: bring readings of all experiments to be on the
same scale
– Cause: different RNA amounts, labeling efficiency and
image acquisition parameters
– Method: Multiply readings of each array/channel by a
scaling factor such that:
• The sum of the scaled readings will be the same for all arrays
• Find scaling factor by a linear fit of the highly expressed genes
Preprocessing: From one experiment to many
• Filtering of Genes
– Remove genes that are absent in most
experiments
– Remove genes that are constant in all
experiments
– Remove genes with low readings which are not
reliable.
Noise and Repeats
log – log plot
•
•
•
•
>90% 2 to 3 fold
Multiplicative noise
Repeat experiments
Log scale
dist(4,2)=dist(2,1)
We canSupervised
ask many
Methods questions?
(use predefined labels)
• Which genes are expressed differently in two
known types of conditions?
• What is the minimal set of genes needed to
distinguish one type of conditions from the others?
• Which genes behave similarly in the experiments?
• How many different types of conditions are there?
Unsupervised Methods
(use only the data)
Unsupervised Analysis
• Goal A: Find groups of genes that have correlated
expression profiles.
These genes are believed to belong to the same biological
process and/or are co-regulated.
• Goal B: Divide conditions to groups with similar gene
expression profiles.
Example: divide drugs according to their effect on gene
expression.
Clustering Methods
What is clustering?
Linear
Round
Cluster Analysis Yields Dendrogram
T (RESOLUTION)
Applications
• Monitor expression patterns under the experimental
conditions of your choosing to determine the function of
the thousands genes,
• Common expression patterns can be used to identify genes
that are members of the same pathway,
• Explore expression of candidate/unknown genes.
Gene/Drug Discovery
…genes involved in cancer and other diseases have
been identified through a variety of techniques,
– genome expression analysis provides a means of
discovering other genes that are concomitantly
expressed,
– genome expression analysis provides a means of
monitoring drug/treatment regimes.
Applications
• Can study the role of more than 1700 cancer
related genes in association with the (rest) of the
genome,
• Define interactions and describe pathways,
• Measure drug response,
• Build databases for use in molecular tumor
classifications,
– benign vs. cancerous, slow vs. aggressive
Extended Applications
• Water quality testing (4 hours vs. 4 days),
• Environmental watchdogs,
• Fundamental research on non-human subjects,
• Direct sequencing of related species for
evolutionary studies,
• Comparisons of gene regulation between closely
related species,
• etc.
What’s the Question
• Human and chimp DNA is ~98.7 similar,
• But, we differ in many and profound ways,
• Can this difference be attributed, at least in
part, to differences in gene expression,
rather than differences in the actual gene
and gene products?
Huh?
• Prevailing notion: a gene is mutated, better
alleles survive and, in fact, out-compete old
alleles…evolution marches on.
• Paper’s hypothesis: it’s not the genes that
are changing, but the REGULATION of
the genes.
Regulation?
• Although the # of genes (~35,000) in the
genome remains controversial, it appears to
be a lot less than early dogma (100,000 150,000 genes),
• One thought, “many” of the additional
genes found in complex organisms, are
transcription factors.
First
...What does it mean that our genomes are
98.7% similar at the DNA level, and how do
we know this?
DNA Sequence Comparisons
Bacterial Artificial Chromosomes
BACs
• F plasmid ancestry,
– maintain bacterial
replication system
and copy number
control system.
BAC End Sequencing
• “Mate Pairs”,
– sequence both ends of the
BAC using vector derived
sequencing primers,
– yields about 600 bp per
sequence.
Contiguous Sequences
(contigs)
...looks for end-to end overlaps of at least 40 bp with
no more than 6% differences in match.
What’s the significance?
17
...a one in 10 event.
Science 291 (5507), 1304-1351
8, September 1999 - 25, June 2000
x 543bp / read =
...if 100%
sequenced.
Chimp DNA Sequences
• 3.3x Coverage of the genome.
Human/Chimp BES Similarity
This represents coding (highly conserved) and non-coding (low
conservation) regions of the genome.
Are our Phenotypes 98.7% Similar?
• Some apparent differences,
– HIV susceptibility, epithelial neoplasms (cancers),
malaria, and Alzheimers,
• In fact, there is only one well understood
biochemical difference,
– A 92 bp deletion in a gene that codes for a hydroxylase,
results in an un-hydroxylated secretion protein in our
immune system.
The Experiment
• Check patterns of gene expression level, using DNA chips, for
12,000 genes in humans, chimps, orangutans, and macaques,
(TRANSCRIPTOME),
– brain, liver, and blood
• Check for protein levels using 2-D gel analysis, (PROTEOME)
• Controls,
– Microarray analysis, (17,997 transcripts),
– Rodent tests.
Affymetrix
• U95A array...
Targets
• Labeled Human cDNA, Chimp cDNA,
Macaque cDNA,
– Collect tissue,
– Extract RNA,
– Label RNA.
Cluster Analysis
• Distances represent the relative differences
in expression changes.
So What?
Primates
Mice
• Changes in gene expression are greatest in the
Human gene cluster.
Probably Rejected by the Journal
• Why?
– Probe was human, target at least 98.7%
different,
– At the “allele specific oligonucleotide” level,
single base changes may skew the data.
Microarray
• Spotted 17,997 PCR products onto nylon, probed
with labeled cDNAs,
– PCR primers are available, in kits, that will amplify just
about any part of the human genome,
– 1000 bp fragments were generated,
• Base pair differences won’t affect probe sensitivity over this
large a target.
Microarray Data
5:1 difference in
expression
profiles.
Proteomics
(2d-gels)
• Proteins separated by mass, then by charge.
• Qualitative (positions), Quantitative (amount)
8500 Protein Spots
What do You Think?
Monday
• Schedule change...
*RNAi (June 3)
Background: Review of RNAi
Specific and heritable genetic interference
by double-stranded RNA in Arabidopsis
thaliana