Why teach a course in bioinformatics?

Download Report

Transcript Why teach a course in bioinformatics?

What is bioinformatics?
Answer:
It depends who you ask.
Various definitions:
• The science of using information to
understand biology.
• The science that uses computational
approaches to answer biological questions.
(subtle distinction - bioinformatics is subset of
the larger field of computational biology).
• Organizing and analyzing complex data
resulting from molecular and biochemical
techniques
What skills should a
bioinformatician have?
• You should have fairly deep background in
molecular biology.
• You must understand the central dogma of
molecular biology.
• You should have experience with
programming.
• You should be comfortable working in a
command line computing environment.
Why teach a course in
bioinformatics?
Part of answer:
Bioinformaticians are
needed.
From Science ‘next wave’:
• Imagine a job fair with 50 high-tech
companies competing to recruit one of the
handful of properly qualified scientists who
bothered to show up. Sounds like a pie-inthe-sky dream, doesn't it? But according to
Victor Markovitz, vice president of
bioinformatics systems at Gene Logic Inc.,
this actually happened at a recent biotech
fair. And it is more or less typical of the
prevailing global job market in
bioinformatics and computational biology,
where there are many more headhunters than
heads.
Why now??
Easy Answer:
* The astronomical growth of
Genbank and The Protein Data
Bank!
More complex answer:
The way biology is done is
changing.
*one of goals of this course: to
illustrate the ways biology is
changing
Biology is scaling up.
•
•
Genetics lab don’t do things one gene at a time
anymore.
Genetics lab use a ‘Genomic Approach’.
*These types of large scale projects required, more
than anything, a change in mindset:
1. Focus isn't everything.
2. Do things smarter and save work.
3. Think big!
Collaborative projects require
centralized databases and
systematic methods for sharing
data.
• A good example is the C.
elegans genome project.
. Knowledge is built by
constructing relations between
different kinds of data.
• SMD (Stanford Microarray Database) stores
raw and normalized data from microarray
experiments. The data for a given gene is
linked to a mass of genetic information,
including an expression history for that
entity, a description of the associated protein,
chromosomal location, etc.
. Data is a resource that can be
mined.
• Beyond the initial project, data is still a
valuable resource . Results from
numerous research projects that might
themselves be of minimal significance,
can often be put together to make
generalizations or observations that
could be quite significant.
Why teach a course in
bioinformatics?
Part of answer:
Biologists who understand
bioinformatics are needed.
Introduction to Molecular
Biology
Overview
• The DNA-based Genome
• Biology’s Central Dogma
• Genotype to Phenotype
A&G=
Purines
C&T=
Pyrimidines
Purine
Pyrimidine
G-C and A-T pairing.
Double-stranded DNA is peeled
apart to replicate DNA
• The 2 daughter
molecules are
identical to each
other and exact
duplicates of the
original
(assuming errorfree replication).
• One Chromosome
is one long twisted,
dramatically
compacted DNA
molecule.
• The average length
of a human
chromosome is 130
million b.p.
Genes are defined segments of
DNA
•The information content
of the DNA molecule
consists of the order of
bases (A, C, G, and T)
along the length of the
molecule.
How Genes are
Expressed- the Central
Dogma.
• RNA is quite
similar to
DNA, but
usually singlestranded.
In RNA,
“U”
replaces
“T “
Transcription
=
RNA
synthesis
Translation
=
Protein
synthesis
Eukaryotic transcription operates
‘gene by gene’.
One strand of DNA is copied (sense
strand); the antisense strand is never
transcribed.
Transcription produces an RNA
‘copy’ of a gene (DNA)
• animation
Eukaryotic transcripts
(mRNA) are processed and
leave the nucleus
The mRNA are translated in the
cytoplasm
Three consecutive bases in the
mRNA form one codon
No exceptionsthe genetic
code is a triplet
code.
tRNA are the ‘bilingual’ molecules
The genetic code is the codon-amino acid conversion table
Two amino acids are joined by a
peptide bond.
http://academy.d20.co.edu/kadets/lundbe
rg/DNA_animations/protein.mov
The immediate product of
translation is the primary protein
structure
The primary
sequence
dictates the
secondary
and tertiary
structure of
the protein
Genetic information, stored in DNA, is
conveyed as proteins
A mutation in the DNA may alter the
primary sequence of the corresponding
protein
Alteration of the primary
sequence of the polypeptide
may alter the secondary and
tertiary sequence of the
protein. The altered protein
may not function properly.
Sickle-cell anemia is caused by
one amino acid change.
One nucleotide change is
responsible for the one amino
acid change.
A single base-pair mutation
is often the cause of a human
genetic disease.
mid 1970s- The discovery of ‘split
genes’. Split genes are the norm in
eukaryotic organisms.
Exon
=
Genetic code
Intron
=
Non-essential
DNA ? ?
The mechanism of splicing is not
well understood.
Molecular evolution is the study of
organismal relationship, based in part
on the comparison of conserved exon
sequences. Comparison of intron
sequences are rare. Why?
.
• Most mutations in introns are
(apparently) harmless
• Consequently, intron sequences
diverge much quicker than
exons.
• Prokaryotic cells- No splicing
(i.e. – no split genes)
• Eukaryotic cells- Intronless
genes are rare.
Promoters are DNA regions that
control when genes are activated.
Exon encode the information that
determines what product will be
produced.
Promoters encode the
information that determines when
the protein will be produced.
• De
Deomonstration of a consensus
sequence.
The End
The IUPAC-IUB symbols for nucleotide
nomenclature are shown below:
Symbol
Meaning
Symbol
Meaning
G
Guanine
K
G or T
A
Adenine
S
G or C
C
Cytosine
W
A or T
T
Thymine
H
A or C or T
U
Uracil
B
G or T or C
R
Purine (A or G)
V
G or C or A
Y
Pyrimidine (C ,T) D
G or T or A
M
A or C
G , A , T, or C
N
List of Amino Acids and Their
Abbreviations
amino acid
glycine
alanine
valine
leucine
isoleucine
methionine
phenylalanin
e
tryptophan
proline
3 letter code
Gly
Ala
Val
Leu
Ile
Met
Phe
Trp
Pro
1 letter code
G
A
V
L
I
M
F
W
P
Polar (hydrophilic)
serine
threonine
cysteine
tyrosine
asparagine
glutamine
Ser
Thr
Cys
Tyr
Asn
Gln
S
T
C
Y
N
Q
Electrically Charged (negative and hydrophilic)
aspartic acid
glutamic acid
Asp
Glu
D
E
Electrically Charged (positive and hydrophilic)
lysine
Lys
K
arginine
Arg
R
Histidine
His
H
Others
X = unknown
* = STOP