Bioinformatics

Download Report

Transcript Bioinformatics

Bio 2900
Computer Applications in Biology
Bioinformatics
Presented by
Frank H. Osborne, Ph. D.
© 2005
Bioinformatics
• Bioinformatics is the computational
branch of molecular biology.
• It involves using computers in the
analysis of DNA, RNA and protein
sequences.
• It is part of a larger field of biology called
Computational Biology.
Protein Synthesis
• Generally, we begin with DNA.
• DNA is transcribed to produce RNA.
• RNA is then translated to produce
protein.
• The protein is the result of the expression
of a gene.
Amino Acids
• Proteins are made of amino acids. There
are about 20 that are generally used in
protein molecules.
• A set of three-letter abbreviations is used
for the amino acids in biochemistry.
• The International Union of Pure and
Applied Chemistry (IUPAC) has created
one-letter abbreviations to ease work in
bioinformatics.
Amino Acid Table
Additional Amino Acid Codes
• Additionally, IUPAC recognizes other
code letters for special situations.
• There are an additional four codes that
may be used.
Additional Amino Acid Code Table
DNA
• Deoxyribonucleic acid (DNA) is made up of
purine bases (adenine and guanine) and
pyrimidine bases (cytosine and thymine).
Bases are part of nucleotides which are
formed using the sugar deoxyribose.
Nucleotides are connected by condensation
reaction from the 5’OH to the 3’OH.
DNA
• For DNA sequences, the IUPAC has
established the one-letter codes shown below.
RNA
• The IUPAC one-letter codes for RNA are
shown below.
Gene structure
• A gene is a sequence of bases of DNA. It
begins at a location known as a promoter and
ends at another location called the terminator.
Gene expression
• Genes are expressed by transcription and
translation of DNA. DNA is first transcribed
to make messenger RNA. The genetic code of
the messenger RNA is translated into protein.
RNA polymerase
• Transcription uses DNA-dependent RNApolymerase. RNA polymerase holoenzyme
consists of a core enzyme of four polypeptides
and another factor called s factor.
• Core enzyme =
– 2 a identical subunits
– b, b’ similar but different proteins
• Holoenzyme = core enzyme + s factor
• There are different types of promoters that
are recognized by different s factors.
Transcription
• Transcription consists of three stages called
initiation, elongation and termination. Note
that these are not the same as initiation,
elongation and termination of protein
synthesis, which make up the process of
translation.
Stages of transcription
• Initiation
– RNA polymerase attaches to the promoter. An
open complex forms.
• Elongation
– RNA polymerase moves along the DNA molecule
making a molecule of RNA as it travels.
• Termination
– RNA polymerase reaches the terminator. The
RNA is released.
Translation
• The mRNA molecule is translated into
protein using the standard genetic code.
There are some exceptions, especially during
protein synthesis in mitochondria.
Stages of translation
• Initiation
– Ribosomes bind to the ribosome-binding site on
the mRNA molecule known as the ShineDalgarno sequence adjacent to AUG.
• Elongation
– Transfer RNA brings each amino acid to the
amino-acyl site according to the specified codons.
• Termination
– The completed protein is released from the
peptidyl site.
Gene organization in Bacteria
• A cistron is a distinct region of DNA that
codes for a particular polypeptide. The term
is used in the context of a protein which is
made up of several subunits, each of which is
coded by a different gene.
• An operon is a common form of gene
organization in bacteria.
Genotypes and phenotypes
• The genotype is an actual gene in the
chromosome. The phenotype is the observed
effect of that gene.
• Genotypes are given using italic letters.
Phenotypes are written in ordinary, regular
letters. Thus, two of the tryptophan genes in
E. coli would be trpA and trpB. When
expressed, they produce polypeptides. The
trpA gene produces trpA (TrpA) polypeptide
and the trpB gene produces trpB (TrpB)
polypeptide.
Regulation of gene expression
The lac operon
• The lac operon contains the genes necesary to
utilize lactose. Lactose is a b-galactoside
sugar containing galactose b(1,4) as shown
below.
Regulation of gene expression
Products of the lac operon
• The lac operon codes for three proteins; LacZ,
LacY, LacA; which are directly involved in
galactoside (lactose) utilization.
– LacZ - b-D-galactosidase (EC 3.2.1.23)
– LacY - galactoside permease (M protein)
– LacA - galactoside acetyltransferase (EC 2.3.1.18)
• These enzymes appear adjacent to each other
on the E. coli chromosome. They are preceded
by a region of the chromosome responsible for
the regulation of these genes.
Regulation of gene expression
Function of the lac operon
• lacI - gene for the lac repressor protein
• lacPi - promoter for lacI
• lacP - promoter for lac operon
• lacO - operator: binding site for the repressor
LacI is a repressor that binds to the promoter
(lacP) and prevents the gene from being
transcribed. This type of control is known as
transcriptional regulation.
Induction and repression
• When lactose is present it induces the operon
by binding to the repressor and changing its
shape, causing it to fall off the operator.
• When lactose is removed, the repressor goes
back to its original shape and can bind to the
operator again.
• Because the repressor binds to the operator,
the RNA polymerase is said to be primed,
meaning that it is ready to use as soon as the
block comes off the operator.
Structure of the lac operon
Gene Expression in Eukaryotes
• DNA in eukaryotic organisms is
organized into chromosomes. The
eukaryotic chromosome consists of DNA
interwound with proteins known as
histones.
• Much eukaryotic DNA has either no
function or unknown function. Unlike
bacteria, only about 10% of eukaryotic
DNA codes for proteins.
Gene Expression in Eukaryotes
• Eukaryotic DNA has numerous repeated
nucleotide sequences. The protein-coding
regions are separated by non-coding
regions.
• The non-coding regions are called
introns.
• The coding sequences that are expressed
as protein are called exons.
Transcription in Eukaryotic Cells
The End