Transcript General
Computational Biology, Part 1
Introduction
Robert F. Murphy
Copyright 1996, 2000, 2001.
All rights reserved.
Course Introduction
What these courses are about
What I expect
What you can expect
What these courses are about
overview of ways in which computers are
used to solve problems in biology
supervised learning of illustrative or
frequently-used programs
(03-510) supervised learning of
programming techniques and algorithms
selected from these uses
I expect
students will have basic knowledge of biology and
chemistry (at the level of Modern Biology/Chemistry) and
willingness to learn more
students will have basic familiarity with use of computers
(e.g., at the level of Computing Skills Workshop) and
eagerness to gain new skills
(03-510) students have some programming experience and
willingness to work to improve
heterogeneous class - I plan to include refreshers on each
new topic
students will ask questions in class and via email
You can expect
Three major course sections
Class sessions: lectures/demonstrations/exercises/quizzes
Homework assignments
Sequence Analysis (13 classes)
Biological Modeling (11 classes)
Biological Imaging (4 classes)
4 homework assignments for 03-311 (80% of grade)
8 homework assignments for 03-310 (70% of grade)
10 homework assignments for 03-510 (70% of grade)
Test March 1 (20% for 03-311, 10% for others)
Final (20% of grade for 03-310, 03-510)
Communication on class matters via email list
Textbooks for first half of course
For 03-310/311 students
“Required
textbook” is Baxevanis & Ouellette
For 03-510 students
“Recommended”
textbook is Durbin et al.
Additional suggested book
Computational
Molecular Biology, Peter Clote
& Rolf Backofen (ISBN 0-471-87252-0)
Chap.
1 is an excellent introduction to Molec. Biol.
for non-Biology majors
Specific sources for CMU
computational biology classes
Web page (http://www.bio.cmu.edu/Courses/03310 or
03311 or 03510)
Lecture
Notes (as PowerPoint files)
Homework Assignments (as Word files)
Additional materials as needed
FTP server (www.bio.cmu.edu)
Files
needed for homework assignments
CompBiol project volume on AFS
/afs/andrew.cmu.edu/usr/murphy/CompBiol
Additional classes for 03-510
We will have one additional class meeting
per week for 03-510 for the first half of the
semester only
Purpose is to cover some more advanced
material and programming assignments
Other relevant courses
Second half mini-course “47-863: Topics in
Operations Research: Computational
Biology” will be taught by Dr. R. Ravi
Tuesday-Thursday
1:30-2:50 starting 3/13
Recommended for 03-510 students
Fall 2001 course on advanced topics in
computational molecular biology will be
taught by Dr. Dannie Durand
Prerequisite:
03-310/311/510
Information flow
A major task in computational molecular
biology is to “decipher” information
contained in biological sequences
Since the nucleotide sequence of a genome
contains all information necessary to
produce a functional organism, we should in
theory be able to duplicate this decoding
using computers
Review of basic biochemistry
Central Dogma: DNA makes RNA makes
protein
Sequence determines structure determines
function
Structure
macromolecular structure divided into
primary structure (1D sequence)
secondary structure (local 2D & 3D)
tertiary structure (global 3D)
DNA composed of four nucleotides or "bases":
A,C,G,T
RNA composed of four also: A,C,G,U (T
transcribed as U)
proteins are composed of amino acids
DNA properties - base
composition
Some properties of long, naturally-occuring
DNA molecules can be predicted accurately
given only the base composition, usually
expressed as either
%GC
(the percent of all base pairs that are
G:C), or
GC (the mole fraction of all bases that are
either G or C)
%GC = 100*GC
DNA properties - melting
temperature and buoyant density
Two such properties are
Tm, the melting temperature, defined as the
temperature at which half of the DNA is singlestranded and half is double-stranded
Tm (oC)
= 69.3 + 41 GC (for 0.15 M NaCl)
0,
the buoyant density, defined as the density
of a solution in which a DNA molecule will feel
no net force when centrifuged (the density at
the point in a density gradient at which the
DNA stops moving, or “bands”)
0
(g cm-3) = 1.660 + 0.098 GC (for CsCl)
DNA structure - restriction maps
Restriction enzymes cut DNA at specific
sequences.
A restriction map is a graphical description
of the order and lengths of fragments that
would be produced by the digestion of a
DNA molecule with one or more restriction
enzymes
Restriction map of a circular
plasmid with one enzyme
AccII
AccII AccII
AccII
AccII
AccII
pGEM4
AccII
AccII
AccII
AccII
AccII
Restriction map of all enzymes
that cut only once
SspBIBsrGI Bsp1407I
AcsI ApoI EcoRI Ecl136II EcoICRISacI SstI Acc65I Asp718I AvaI
NheINaeINgoMINgoAIV
SgrAI
Eco47IIIAor51HI
DsaI BsmFI
EcoNI
AflIII
pGEM4
AlwNI
AatII
SspI
XmnIAsp700I
ScaI Eco255I
XorII PvuI BspCI
AhdI AspEI Eam1105I EclHKI
BpmI GsuI BglI
AviII FspI
Transcription
transcription is accomplished by RNA polymerase
RNA polymerase binds to promoters
promoters have distinct regions "-35" and "-10"
efficiency of transcription controlled by binding
and progression rates
transcription start and stop affected by tertiary
structure
regulatory sequences can be positive or negative
RNA processing
eukaryotic genes are interrupted by introns
these are "spliced" out to yield mRNA
splicing done by spliceosome
splicing sites are quite degenerate but not all
are used
Translation
conversion from RNA to protein is by
codon: 3 bases = 1 amino acid
translation done by ribosome
translation efficiency controlled by mRNA
copy number (turnover) and ribosome
binding efficiency
translation affected by mRNA tertiary
structure
Protein localization
leader sequences can specify cellular
location (e.g., insert across membranes)
leader sequences usually removed by
proteolytic cleavage
Postranslational processing
peptides fold after translation - may be
assisted or unassisted
processing enzymes recognize specific sites
(amino acid sequences)
protein signals can involve secondary and
tertiary structure, not just primary structure
Goals of Sequence Analysis
Assigned Reading:
Baxevanis & Ouellette, Chapter 10
Goals of Sequence Analysis
Management of sequence information
Assembly of sequence fragments into complete
units (proteins, genes, chromosomes)
Goals of Sequence Analysis
Confirmation and prediction of restriction enzyme
sites (for nuc.acids)
can
aid sequence determination in areas of uncertainty
by permitting testing of specific bases
can permit selection of appropriate enzymes for
sequence checking
can permit selection of appropriate enzymes for
subcloning or generation of probes
Goals of Sequence Analysis
Finding open reading frames (ORFs) for cDNAs or
genomic DNA from organisms without introns
Finding protein coding regions in DNAs using codon usage
tables
not all ORFs are made into proteins
redundancy in genetic code is not fully reflected in the tRNAs
made by a particular organism (codon preference)
can use to identify "real" coding regions (pseudo-genes "drift" in
their codon usage)
can use expressed sequence tags (ESTs)
Goals of Sequence Analysis
Finding and using consensus sequences
Examples
promoters
transcription initiation sites
transcription termination sites
polyadenylation sites
ribosome binding sites
protein features
use sets of sequences identified (by other means) as related
use sets of sequences identified by sequence comparison
Goals of Sequence Analysis
Comparison and alignment of sequences
compare
sequence to database - goal: find related
sequences (SIMILARITY)
compare sequence to sequence - goal: find matching
domains (ALIGNMENT)
compare database to database - goal: estimate genetic
distance (EVOLUTION)
either: determine consensus sequences
comparisons can be pairwise or multiple-strand
Goals of Sequence Analysis
Translation to protein sequence and prediction of
protein properties - use measured propensities of
particular amino acids or amino acid stretches
Predict
molecular weight
Predict isoelectric point (pI)
Predict extinction coefficient
Prediction of secondary and tertiary structure
RNA -
use base pairing energies
protein - use propensities