Powerpoint slides - School of Engineering and Applied Science

Download Report

Transcript Powerpoint slides - School of Engineering and Applied Science

CS177 Lecture 11
Experimental Methods
(PCR, X-ray crystallography, Microarrays)
Tom Madej 11.22.04
Lecture overview
•
Polymerase chain reaction (PCR) and its applications.
•
X-ray crystallography and the Protein Data Bank (PDB).
•
Microarrays and applications.
Polymerase Chain Reaction (PCR)
• A method that allows us to generate a large amount
(relatively) of a particular DNA sequence even from an
extremely small sample.
• Exquisitely sensitive; even the DNA from a single cell
may suffice!
• Numerous applications in biotechnology.
PCR: main ideas
• You need to know what you are looking for, e.g. the DNA
sequence for a particular gene (the target).
• Sample, primers, nucleotides to build new DNA strands,
and Taq polymerase mixed together.
• Mixture is subjected to cycles of heating, cooling,
reheating, on the order of a few minutes.
• If the target is present in the initial sample, the amount of
it in the mixture will grow exponentially with the number
of cycles.
ds-DNA target
primers
primers are complementary to opposite ends of target seq.
PCR cycle
• Mixture is heated to 90ºC for 1-2 minutes to separate the
DNA strands (denature).
• Temperature is dropped to 50º-60ºC so that primers can
anneal to complementary regions.
• Temperature is raised to 70ºC for 1-2 minutes to allow
Taq polymerase to synthesize new DNA strands, starting
at the primers; this goes from 5’ to 3’ for both strands.
• Note: The Taq polymerase is a DNA polymerase from
Thermus aquaticus, a bacteria that lives in hot springs.
Polymerase Chain Reaction (PCR)
PCR notes
• Primer selection is critical. The primers should be at
least 15-20 bases to ensure specificity.
• If you are unsure of the exact sequence, you can use
“degenerate” primers, i.e. a mixture of primers (vary at
third codon position).
• Note that almost all of the product is exactly the target
sequence you want, i.e. with flush ends.
PCR applications
• Making a lot of protein! Use RT-PCR, “reverse
transcriptase” PCR, to create DNA with introns removed
and then insert it into bacteria to clone the gene. E.g. to
make proteins for X-ray crystallography.
• Medical diagnosis: e.g. detect HIV viral proteins long
before AIDS symptoms arise; or rapid tuberculosis test.
• Forensics; detect trace amounts of DNA at a crime
scene.
Methods to determine protein structures
• X-ray crystallography (most important, over 80% of
structures in the PDB are obtained this way).
• NMR spectroscopy (Nuclear Magnetic Resonance).
• Electron microscopy; uses a beam of electrons to create
images (maybe issues with sample preparation and
resolution in regards to applications to protein structure
determination).
Protein crystallography steps
• Grow crystals of the protein that diffract well (a difficult
step, can take from weeks to years!).
• Obtain the X-ray diffraction data.
• Compute electron density maps.
• Refinement: calculate an atomic model to fit electron
density; compare the diffraction data computed from the
model with the actual data; refine the model to fit the
data (iterate).
Protein crystals
http://www-structure.llnl.gov/crystal_lab/Crys_lab.html
Protein crystal
molecule
crystal
The unit cell is the basic unit of symmetry in the crystal.
Facts about protein crystals
• In contrast e.g. to salt or quartz crystals, protein crystals
are mostly water (due to the irregular shape of the
molecule) and therefore fragile.
• Since they are mostly water, the actual protein structures
obtained must be similar to their conformations in vivo.
• To preserve the crystal in the X-ray beam, it is kept at a
very low temperature (100ºK).
X-ray diffraction
• The incident beam of X-rays is diffracted by the electrons
in the protein molecules in the crystal.
• Some of the diffracted waves will interfere constructively,
and others will interfere destructively.
• This results in a diffraction pattern of spots of varying
intensity on the detector.
Illustration of diffraction
http://www.eserc.stonybrook.edu/ProjectJava/Bragg/index.html
X-ray diffraction pattern
Analysis of the diffraction pattern
• The diffraction pattern is analyzed by
mathematical/computation methods (Fourier analysis) to
produce an electron density map.
• This gives a 3-dimensional image of the molecule that
will be subjected to further processing and analysis.
Electron density maps at different
resolutions
http://www-structure.llnl.gov/Xray/101index.html
Refinement
• Refinement is an iterative process; one constructs an
atomic model based on the electron density, then
computes diffraction data from the model, which is
compared to the actual diffraction data.
• The crystallographic R-factor is a measure of how well
the model fits the diffraction data.
• Can be subject to error! The electron density for certain
pairs of amino acid residues is extremely similar.
Fitting amino acid residues into the electron
density map
X-ray crystallography summary
http://www.bnl.gov/discover/Spring_04/crystallography.asp
NMR
• Based on magnetic moments of atomic nuclei.
• NMR spectra give information about distances between atoms in the
molecule.
• Applied to protein molecules in solution (no crystals needed!).
• Only works well for smaller proteins, e.g. 100 residues or less (or
so).
• A different set of mathematical/computational tools is involved.
• Note: The different “models” represent different structures
compatible with the distance contraints, not actual conformations of
the molecule.
PDB
PDB File: Header
HEADER
TITLE
COMPND
COMPND
COMPND
COMPND
COMPND
COMPND
COMPND
COMPND
COMPND
COMPND
COMPND
COMPND
COMPND
COMPND
COMPND
COMPND
COMPND
COMPND
COMPND
SOURCE
SOURCE
SOURCE
SOURCE
SOURCE
SOURCE
SOURCE
SOURCE
KEYWDS
ISOMERASE/DNA
01-MAR-00
1EJ9
CRYSTAL STRUCTURE OF HUMAN TOPOISOMERASE I DNA COMPLEX
MOL_ID: 1;
2 MOLECULE: DNA TOPOISOMERASE I;
3 CHAIN: A;
4 FRAGMENT: C-TERMINAL DOMAIN, RESIDUES 203-765;
5 EC: 5.99.1.2;
6 ENGINEERED: YES;
7 MUTATION: YES;
8 MOL_ID: 2;
9 MOLECULE: DNA (5'10 D(*C*AP*AP*AP*AP*AP*GP*AP*CP*TP*CP*AP*GP*AP*AP*AP*AP*AP*TP*
11 TP*TP*TP*T)-3');
12 CHAIN: C;
13 ENGINEERED: YES;
14 MOL_ID: 3;
15 MOLECULE: DNA (5'16 D(*C*AP*AP*AP*AP*AP*TP*TP*TP*TP*TP*CP*TP*GP*AP*GP*TP*CP*TP*
REMARK
1
17 TP*TP*TP*T)-3');
REMARK
2
18 CHAIN:
D;
REMARK
2 RESOLUTION. 2.60 ANGSTROMS.
19 ENGINEERED:
REMARK
3YES
MOL_ID:
1;
REMARK
3 REFINEMENT.
2 ORGANISM_SCIENTIFIC:
HOMO SAPIENS;
REMARK
3
PROGRAM
: X-PLOR 3.1
3 EXPRESSION_SYSTEM_COMMON:
BACULOVIRUS
REMARK
3
AUTHORS
: BRUNGER EXPRESSION SYSTEM;
4 EXPRESSION_SYSTEM_CELL:
SF9 INSECT CELLS;
…
5 MOL_ID:
REMARK2;
280
6 SYNTHETIC:
YES;
REMARK 280
CRYSTALLIZATION CONDITIONS: 27% PEG 400, 145 MM MGCL2, 20
7 MOL_ID:
REMARK3;
280 MM MES PH 6.8, 5 MM TRIS PH 8.0, 30 MM DTT
8 SYNTHETIC:
YES
REMARK 290
PROTEIN-DNA
COMPLEX, TYPE I TOPOISOMERASE, HUMAN
...
From Coordinates to Models
1EJ9: Human topoisomerase I
Annotating Secondary Structure
1EJ9: Human topoisomerase I
α-Helices
β-strands
coils/loops
Creating 3D Domains
3D Domain 0: 1EJ9A0 = entire polypeptide
Creating 3D Domains
1EJ9A1
1EJ9A4
3D Domains 1EJ9A3
1EJ9A5
1EJ9A2
< 3 Secondary Structure Elements
Microarrays
• Used to study gene expression levels in cells.
• Cells can differ dramatically in the amounts of various
proteins that they synthesize; e.g. due to different cell
types or different external/internal conditions.
• In fact, in higher level organisms only a fraction of the
genes in a cell are expressed at a given time, and that
subset depends on the cell type.
• Via microarrays it is possible to study the expression
levels of tens of thousands of genes simultaneously.
Microarray technology
• Physically, a microarray is just a glass slide with spots of
DNA on it; each spot is a probe (or target).
• The DNA is single-stranded cDNA (complementary) and
may consist of an entire gene or part of one (an
oligonucleotide consisting of 50 bases or so).
• If the microarray is exposed to a solution containing
mRNA, then the mRNA molecules will bind to those
probes to which they are complementary.
Microarray probes
ssDNA gene
sequences or
oligos
Microarray technology
• Thousands of probes can fit on a single slide.
• The slides can be spotted by robots.
• Of course, what genes you can study with a given
microarray depends on the collection of probes on it.
• There are a number of commercial manufacturers; e.g.
Affymetrix, Agilent, Amersham.
• They’re expensive!
Microarray experiments
• Start with two cell types, e.g. “healthy” and “diseased”.
• Isolate mRNA from each cell type, generate cDNA with fluorescent
dyes attached, e.g. green for healthy and red for diseased.
• Mix the cDNA samples and incubate with the microarray.
• After incubation the cDNA in the samples has had a chance to bind
(hybridize) with the probes on the chip.
• The chip is read by a scanner that uses lasers to excite the
fluorescent tags; the intensity levels of the dyes are recorded for
each probe gene and stored in a computer.
Microarray data representation
• There is a “standard” color scale representation, as follows.
• Red means the gene produced more mRNA in the experimental
condition; green means the gene produced more mRNA in the
control.
• Black means equal amounts of mRNA for both experiment and
control.
• If e.g. there were 5 times as much mRNA for the experimental
condition compared to the control, we would say there was a 5-fold
induction; 1/5 as much would be 5-fold repression.
• The data is recorded numerically as the log base 2 of the expression
ratio.
Microarray data
Microarray data analysis
• Since there are typically so many genes, it is useful to
cluster the genes based on similar expression patterns.
• Different clustering algorithms may be used, e.g.
hierarchical with different metrics, or k-means, kmedians.
• It may also be useful to cluster the samples (we’ll see
this shortly).
• Other statistical methods may be useful, e.g. support
vector machines (SVM).
Acute Lymphoblastic Leukemia (ALL)
• Constitutes 75% of annual diagnoses of childhood
leukemia.
• Long-term outlook has improved dramatically since
about 1970. At that time the long term disease free
survival rate (LTDFS) was under 10%; at present it is
over 80%.
• There is still a risk of relapse in 20% of patients.
ALL (cont.)
• The LTDFS rate improved because it was recognized
that ALL is heterogeneous, and the therapy should be
tailored to the subtype so as to improve the odds of a
successful treatment (e.g. bone marrow transplant vs.
chemotherapy).
• Important subtypes include: T-ALL, E2A-PBX1, BCRABL, TEL-AML1, MLL rearrangement, and hyperdiploid
> 50 chromosomes.
Cancer Cell, March 2002, v. 1 133-143.
Cancer Cell, March 2002, v. 1 133-143.
Cancer Cell, March 2002, v. 1 133-143.
Cancer Cell, March 2002, v. 1 133-143.
Science v. 306, Oct. 22, 2004 630-631.
Science v. 306, Oct. 22, 2004 630-631.
Abstract from S.A. Mitchell et al.