Why teach a course in bioinformatics?

Download Report

Transcript Why teach a course in bioinformatics?

Day 2
Genetic information, stored in DNA, is
conveyed as proteins
In sickle-cell anemia, one
nucleotide change is responsible
for the one amino acid change.
Sickle-cell anemia is caused by
one amino acid change.
A single base-pair mutation
is often the cause of a human
genetic disease.
Alteration of the primary
sequence of the polypeptide
may alter the secondary and
tertiary sequence of the
protein. The altered protein
may not function properly.
3
Basic amino acid structure:
A protein also has polarity- the
N-terminal end and the Cterminal end:
The immediate product of
translation is the primary protein
structure
The primary
sequence
dictates the
secondary
and tertiary
structure of
the protein
a-helical structure
is a very regular
structure (3.6
amino acids/turn)
b-sheet: anti-parallel
b-sheet: parallel
Two questions
• Can you change the 3 (tertiary)
o
sequence without changing the 1
(primary) sequence?
o
• Can you change the 1o (primary)
o
sequence without changing the 3
(tertiary) sequence?
List of Amino Acids and Their
Abbreviations
Nonpolar (hydrophobic)
amino acid
glycine
alanine
valine
leucine
isoleucine
methionine
phenylalanine
tryptophan
proline
3 letter code
Gly
Ala
Val
Leu
Ile
Met
Phe
Trp
Pro
1 letter code
G
A
V
L
I
M
F
W
P
Polar (hydrophilic)
serine
threonine
cysteine
tyrosine
asparagine
glutamine
Ser
Thr
Cys
Tyr
Asn
Gln
S
T
C
Y
N
Q
Electrically Charged (negative and hydrophilic)
aspartic acid
glutamic acid
Asp
Glu
D
E
Electrically Charged (positive and hydrophilic)
lysine
Lys
K
arginine
Arg
R
histidine
His
H
Others
X = unknown
* = STOP
The ‘protein-folding problem’.
• Proteins -- hundreds of thousands of
different ones -- are the biochemical
molecules that make up cells, organs
and organisms. Proteins put themselves
together, in a process termed "folding."
How they do that is called "the proteinfolding problem," and it may be the
most important unanswered question in
the life sciences.
• The transformation happens quickly and
spontaneously. It takes only a fraction of a
second for a floppy chain of beads to fold
into the shape it will keep for the rest of its
working life.
• How does that happen? How do the linear
-- and, in some sense, one-dimensional -structures of proteins carry the information
that tells them to take on permanent threedimensional shapes? Is it possible to study
a protein chain and predict the folded
shape it will take?
• That is the protein-folding problem.
DNA sequencing information 
predictions of the primary amino
acid sequence.
Needed- Software that will convert
o
the 1 sequence to its corresponding
o
3 sequence.
Needed- Software that will describe a
o
1 sequence that will generate a
o
particular 3 sequence.
• WHY IS PROTEIN FOLDING SO
DIFFICULT TO UNDERSTAND?
• It's amazing that not only do proteins selfassemble -- fold -- but they do so amazingly
quickly: some as fast as a millionth of a second.
While this time is very fast on a person's
timescale, it's remarkably long for computers to
simulate. In fact there is a 1000 fold gap
between the simulation timescales
(nanoseconds) and the times at which the fastest
proteins fold (microseconds).
A Glimpse of the Holy Grail?
• The prediction of the native conformation of a
protein of known amino acid sequence is one of the
great open questions in molecular biology and one
of the most demanding challenges in the new field
of bioinformatics. Using fast programs and lots of
supercomputer time, Duan and Kollman (1) report
that they have successfully folded a reasonably sized
(36-residue) protein fragment by molecular
dynamics simulation into a structure that resembles
the native state. At last it seems that the folding of a
protein by detailed computer simulation is not as
impossible as most workers in the field believe.
Proteins from Scratch:
• Not long ago, it seemed inconceivable that proteins
could be designed from scratch. Because each protein
sequence has an astronomical number of potential
conformations, it appeared that only an
experimentalist with the evolutionary life span of
Mother Nature could design a sequence capable of
folding into a single, well-defined three-dimensional
structure. But now, on page 82 of this issue, Dahiyat
and Mayo (1) describe a new approach that makes de
novo protein design as easy as running a computer
program. Well almost.
Progress in the ‘protein-folding
problem’?
• When proteins fold, they don’t try
ever possible 3D conformation.
Protein folding is an orderly
process (i.e. there are molecular
shortcuts involved).
Success in protein-folding?
Given the primary sequence of a
protein, the success rate in
predicting the proper 3D structure
of a protein shows strong
correlation, to the % of the protein
that showed similarity to proteins
of known structure.
Genomics Research Funding
(selected programs; $ millions)
PROGRAM
NHGRI (U.S.)
WELCOME
TRUST (U.K.)
STA (JAPAN)
ENERGY
(U.S.)
GHGP
SWEDEN
1998
211
61
2000
326
121
39
85
115
89
19
5
79
35
• Link to NCBI
How to find a gene?
• The simplest way is too search for an open
reading frame (ORF).
• An ORF is a sequence of codons in DNA
that starts with a Start codon, ends with a
Stop codon, and has no other Stop codons
inside.
• Finding a gene is much more
difficult in eukaryotic genomes
than in prokaryotic genomes.
WHY??
mid 1970s- The discovery of ‘split
genes’. Split genes are the norm in
eukaryotic organisms.
Exon
=
Genetic code
Intron
=
Non-essential
DNA ? ?
• The mechanism
of splicing is
not well
understood.
Alternate Splice sites generate
various proteins isoforms
Splicing mutants do exist.
.
• Most mutations in introns are
(apparently) harmless
• Consequently, intron sequences
diverge much quicker than
exons.
• Prokaryotic cells- No splicing
(i.e. – no split genes)
• Eukaryotic cells- Intronless
genes are rare (avg. # of introns
in HG is 3-7, highest # is 234)
How to confirm the identification
of a gene?
• Answer- Identify the gene by
identifying its promoter.
Promoters are DNA regions that
control when genes are activated.
Exons encode the information
that determines what product will
be produced.
Promoters encode the
information that determines when
the protein will be produced.
• De
Demonstration of a consensus
sequence.
How to find a gene?
• Look for a substantial ORF and
associated ‘features’.
The End