Transcript Genomics

C
E
N
T
R
E
F
O
R
I
N
T
E
G
R
A
T
I
V
E
Lecture 19:
B
I
O
I
N
F
O
R
M
A
T
I
C
S
V
U
(A) Protein-protein
interaction
and
(B) Nucleic Acid Structure
Introduction to Bioinformatics
Lecture 19A:
Protein-protein interactions

Complexity:
– Multibody interaction

Diversity:
– Various interaction types

Specificity:
– Complementarity in shape and binding
properties
PPI Characteristics

Universal
– Cell functionality based on protein-protein interactions
• Cyto-skeleton
• Ribosome
• RNA polymerase

Numerous
– Yeast:
• ~6.000 proteins
• at least 3 interactions each
~18.000 interactions
– Human:
• estimated ~100.000 interactions

Network
– simplest: homodimer (two)
– common: hetero-oligomer (more)
– holistic: protein network (all)

Interface Area
Contact area
– usually >1100 Å2
– each partner >550 Å2

each partner loses ~800 Å2 of solvent accessible surface
area
– ~20 amino acids lose ~40 Å2
– ~100-200 J per Å2

Average buried accessible surface area:
– 12% for dimers
– 17% for trimers
– 21% for tetramers


83-84% of all interfaces are flat
Secondary structure:
–
–
–
–

50% a-helix
20% b-sheet
20% coil
10% mixed
Less hydrophobic than core, more hydrophobic than exterior
Complexation Reaction

A + B  AB
– Ka = [AB]/[A]•[B]
 association
– Kd = [A]•[B]/[AB]
 dissociation
Experimental Methods for determining PPI








2D (poly-acrylamide) gel electrophoresis  mass
spectrometry
Liquid chromatography
– e.g. gel permeation chromatography
Binding study with one immobilized partner
– e.g. surface plasmon resonance
In vivo by two-hybrid systems or FRET
Binding constants by ultra-centrifugation, microcalorimetry or competition
Experiments with labelled ligand
– e.g. fluorescence, radioactivity
Role of individual amino acids by site directed
mutagenesis
Structural studies
– e.g. NMR or X-ray
PPI Network
http://www.phy.auckland.ac.nz/staff/prw/biocomplexity/protein_network.htm
Binding vs. Localization
strong
Obligate
oligomers
Non-obligate
weak transient
Non-obligate
triggered transient
e.g. GTP•PO4-
Non-obligate
permanent
e.g. antibody-antigen
Non-obligate
co-localised
e.g. in membrane
weak
co-expressed
and at same place
different places
Some terminology

Transient interactions:
– Associate and dissociate in vivo

Weak transient:
– dynamic oligomeric equilibrium

Strong transient:
– require a molecular trigger to shift the equilibrium

Obligate PPI:
– protomers no stable structures on their own (i.e.
they need to interact in complexes)
– (functionally obligate)
Analysis of 122 Homodimers
 70
interfaces
single patched
 35 have two
patches
 17 have three
or more
Interfaces
~30% polar
 ~70% non-polar

Interface

Rim is water accessible
rim
interface
Interface composition

Composition of interface essentially the same
as core
= different surface/interface areas
Some preferences
prefer
avoid
Ribosome structure



In the nucleolus, ribosomal RNA
is transcribed, processed, and
assembled with ribosomal
proteins to produce ribosomal
subunits
At least 40 ribosomes must be
made every second in a yeast
cell with a 90-min generation
time (Tollervey et al. 1991). On
average, this represents the
nuclear import of
3100 ribosomal proteins every
second and the export of
80 ribosomal subunits out of the
nucleus every second. Thus, a
significant fraction of nuclear
trafficking is used in the
production of ribosomes.
Ribosomes are made of a small
and a large subunit
Large (1) and small (2) subunit fit
together (note this figure mislabels
angstroms as nanometers)
Ribosome structure
• The ribosomal subunits of prokaryotes and eukaryotes are quite similar
but display some important differences.
• Prokaryotes have 70S ribosomes, each consisting of a (small) 30S and a
(large) 50S subunit, whereas eukaryotes have 80S ribosomes, each
consisting of a (small) 40S and a bound (large) 60S subunit.
• However, the ribosomes found in chloroplasts and mitochondria of
eukaryotes are 70S, this being but one of the observations supporting
the endosymbiotic theory.
• "S" means Svedberg units, a measure of the rate of sedimentation of a
particle in a centrifuge, where the sedimentation rate is associated with
the size of the particle. Note that Svedberg units are not additive.
• Each subunit consists of one or two very large RNA molecules (known as
ribosomal RNA or rRNA) and multiple smaller protein molecules.
Crystallographic work has shown that there are no ribosomal proteins
close to the reaction site for polypeptide synthesis. This suggests that the
protein components of ribosomes act as a scaffold that may enhance the
ability of rRNA to synthesise protein rather than directly participating in
catalysis.
• The differences between the prokaryotic and eukaryotic ribosomes are
exploited by humans since the 70S ribosomes are vulnerable to some
antibiotics that the 80S ribosomes are not. This helps pharmaceutical
companies create drugs that can destroy a bacterial infection without
harming the animal/human host's cells!
70S structure at 5.5 Å
(Noller et al. Science 2001)
70S structure
30S-50S interface

Overall buried surface area ~8500 Å2
< 37.5 Å2
37.5 Å2 – 75 Å2
> 75 Å2
Protein-nucleic acid Interactions
Interactions in the Ribosome
Docking - ZDOCK

Protein-protein docking
– 3-dimensional (3D) structure of protein complex
– starting from 3D structures of receptor and ligand

Rigid-body docking algorithm (ZDOCK)
– pairwise shape complementarity function
– all possible binding modes
– using Fast Fourier Transform algorithm

Refinement algorithm (RDOCK)
– Take top 2000 predicted structures from ZDOCK (RDOCK is too
computer intensive to refine very many possible dockings)
– three-stage energy minimization
– electrostatic and desolvation energies
• molecular mechanical software (CHARMM)
• statistical energy method (Atomic Contact Energy)

49 non-redundant unbound test cases:
– near-native structure (<2.5Å) on top for 37% test cases
• for 49% within top 4
Protein-protein docking


Finding correct surface
match
Systematic search:
– 2 times 3D space!

Define functions:
– ‘1’ on surface
– ‘r’ or ‘d’ inside
– ‘0’ outside
d
r
Protein-protein docking

Correlation function:
Ca,b,g = 1/N3 So Sp Sq exp[2pi(oa + pb + qg)/N] • Co,p,q
Docking Programs














ZDOCK, RDOCK
AutoDock
Bielefeld Protein Docking
DOCK
DOT
FTDock, RPScore and MultiDock
GRAMM
Hex 3.0
ICM Protein-Protein docking (Abagyan group,
currently the best)
KORDO
MolFit
MPI Protein Docking
Nussinov-Wolfson Structural Bioinformatics Group
…
Docking Programs
Issues:
 Rigid structures or made flexible?
– Side-chains
– Main-chains
Full atomic detail or simplified models?
 Docking energy functions (purpose built
force fields)

Docking example:
antibody HyHEL-63 (cyan) complexed with Hen Egg White Lysozyme
The X-ray structure of the antibody HyHEL-63 (cyan) uncomplexed and complexed with Hen Egg White Lysozyme (yellow) has shown that there are small but significant, local
conformational changes in the antibody paratope on binding. The structure also reveals that most of the charged epitope residues face the antibody. Details are in
Li YL, Li HM, Smith-Gill SJ and Mariuzza RA (2000) The conformations of the X-ray structure Three-dimensional structures of the free and antigen-bound Fab from monoclonal
antilysozyme antibody HyHEL-63. Biochemistry 39: 6296-6309.
Salt links and electrostatic interactions provide much of the free energy of binding. Most of the charged residues face in interface in the X-ray structure. The importance of the salt link
between Lys97 of HEL and Asp27 of the antibody heavy chain is revealed by molecular dynamics simulations. After 1NSec of MD simulation at 100°C the overall conformation of
the complex has changed, but the salt link persists. Details are described in Sinha N and Smith-Gill SJ (2002) Electrostatics in protein binding and function. Current Protein & Peptide
Science 3: 601-614.
Introduction to Bioinformatics
Lecture 19B:
Nucleic acid structure
Nucleic Acid Basics
• Nucleic Acids Are Polymers
• Each Monomer Consists of Three Moieties:
Nucleotide
A Base + A Ribose Sugar + A Phosphate
Nucleoside
• A Base Can be One of the Five Rings:
• Pyrimidines
• Purines
•Pyrimidines and Purines can Base-Pair (Watson-Crick Pairs)
•
•
Unlike three dimensional structures of
proteins, DNA molecules assume simple
double helical structures independent of their
sequences. There are three kinds of double
helices that have been observed in DNA: type
A, type B, and type Z, which differ in their
geometries. The double helical structure is
essential to the coding function of DNA.
Watson (biologist) and Crick (physicist) first
discovered the double helix structure in 1953
by X-ray crystallography.
RNA, on the other hand, can have as diverse
structures as proteins, as well as simple double
helix of type A. The ability of being both
informational and diverse in structure suggests
that RNA was the prebiotic molecule that
could function in both replication and catalysis
(The RNA World Hypothesis). In fact, some
viruses encode their genetic materials by RNA
(retrovirus)
Forces That Stabilize Nucleic Acid
Double Helix
• There are two major forces that
contribute to stability of helix formation
– Hydrogen bonding in base-pairing
– Hydrophobic interactions in base stacking
5’
3’
3’
5’
Same strand stacking
cross-strand stacking
Types of DNA Double Helix
• Type A: major conformation of RNA, minor
conformation of DNA;
• Type B: major conformation of DNA;
• Type Z: minor conformation of DNA
3’
5’
3’
A
Narrow
tight
5’
5’
3’
3’
B
Wide
Less tight
5’
5’
3’
Z
3’ Left-handed 5’
Least tight
Three Dimensional Structures of
Double Helices
A-DNA
A-DNA
Minor
Groove
Major
Groove
A-RNA
Secondary Structures of Nucleic
Acids
• DNA is primarily in
duplex form.
• RNA is normally
single stranded which
can have a diverse
form of secondary
structures other than
duplex.
More Secondary Structures of
Nucleic Acids
Pseudoknots:
Source: Cornelis W. A. Pleij in Gesteland, R. F. and Atkins, J. F.
(1993) THE RNA WORLD. Cold Spring Harbor Laboratory Press.
3D Structures of RNA:
Transfer RNA Structures
Secondary Structure
of tRNA
Tertiary Structure
of tRNA
TyC Loop
Variable
loop
Anticodon
Stem
D Loop
Anticodon Loop
Gm, Cm, etc., are modified
bases
3D Structures of RNA:
Ribosomal RNA Structures
Secondary Structure
Of large ribosomal RNA
Tertiary Structure
Of large ribosome subunit
Ban et al., Science 289 (905-920), 2000
rRNA Secondary Structure Based on Phylogenetic Data
Central Dogma of
Molecular Biology
Replication
DNA
Transcription
mRNA
Translation
Protein
Transcription is carried out by RNA polymerase (II)
Translation is performed on ribosomes
Replication is carried out by DNA polymerase
Reverse transcriptase copies RNA into DNA
Transcription + Translation = Expression
But DNA can also be transcribed into noncoding RNA …
tRNA (transfer): transfer of amino acids to the
ribosome during protein synthesis.
rRNA (ribosomal): essential component of the ribosomes
(complex with rProteins).
snRNA (small nuclear): mainly involved in RNA-splicing
(removal of introns). snRNPs.
snoRNA (small nucleolar): involved in chemical modifications of
ribosomal RNAs and other RNA genes. snoRNPs.
SRP RNA (signal recognition particle): forms RNA-protein
complex involved in mRNA secretion.
Further: microRNA,,eRNA, gRNA, tmRNA etc.
Eukaryotes have spliced genes …





Promoter: involved in transcription initiation (TF/RNApol-binding sites)
TSS: transcription start site
UTRs: un-translated regions (important for translational control)
Exons will be spliced together by removal of the Introns
Poly-adenylation site important for transcription termination
(but also: mRNA stability, export mRNA from nucleus etc.)
DNA makes mRNA makes Protein
Some facts about human genes

There are about 20.000 – 25.000 genes in the human
genome (~ 3% of the genome)

Average gene length is ~ 8.000 bp

Average of 5-6 exons per gene

Average exon length is ~ 200 bp

Average intron length is ~ 2000 bp

8% of the genes have a single exon

Some exons can be as small as 1 or 3 bp
DMD: the largest known human gene

The largest known human gene is DMD,
the gene that encodes dystrophin:
~ 2.4 milion bp over 79 exons

X-linked recessive disease (affects boys)

Two variants: Duchenne-type (DMD) and
Becker-type (BMD)

Duchenne-type: more severe,
frameshift-mutations
Becker-type: milder phenotype, “in
frame”- mutations
Posture changes during progression
of Duchenne muscular dystrophy
Nucleic acid basics

Nucleic acids are polymers
nucleotide
nucleoside

Each monomer consists of 3
moieties
Nucleic acid basics (2)

A base can be of 5 rings

Purines and Pyrimidines
can base-pair (WatsonCrick pairs)
Watson and Crick, 1953
Nucleic acid as hetero-polymers

Nucleosides, nucleotides
(Ribose sugar,
RNA precursor)

DNA and RNA strands
(2’-deoxy ribose sugar,
DNA precursor)
REMEMBER:


(2’-deoxy thymidine triphosphate, nucleotide)

DNA = deoxyribonucleotides;
RNA = ribonucleotides (OH-groups at
the 2’ position)
Note the directionality of DNA (5’-3’
& 3’-5’) or RNA (5’-3’)
DNA = A, G, C, T ; RNA = A, G, C, U
So …
DNA
RNA
Stability of base-pairing

C-G base pairing is more stable than A-T (A-U) base
pairing (why?)

3rd codon position has freedom to evolve (synonymous
mutations)

Species can therefore optimise their G-C content (e.g.
thermophiles are GC rich) (consequences for codon use?)
Thermocrinis ruber, heat-loving bacteria
DNA compositional biases

Base compositions of genomes: G+C (and therefore also
A+T) content varies between different genomes

The GC-content is sometimes used to classify organism in
taxonomy

High G+C content bacteria: Actinobacteria
e.g. in Streptomyces coelicolor it is 72%
Low G+C content: Plasmodium falciparum (~20%)

Other examples:
Saccharomyces cerevisiae (yeast)
38%
Arabidopsis thaliana (plant)
36%
Escherichia coli (bacteria)
50%
Let’s return to DNA and RNA structure …

Unlike three dimensional structures of proteins,
DNA molecules assume simple double helical
structures independent on their sequences.

There are three kinds of double helices that have
been observed in DNA: type A, type B, and type Z,
which differ in their geometries.

RNA on the other hand, can have as diverse
structures as proteins, as well as simple double
helix of type A.

The ability of being both informational and diverse
in structure suggests that RNA was the prebiotic
molecule that could function in both replication and
catalysis (The RNA World Hypothesis).

In fact, some viruses encode their genetic materials
by RNA (retrovirus)
Three dimensional structures of double
helices
Side view: A-DNA, B-DNA, Z-DNA
Space-filling models of A, B and Z- DNA
Top view: A-DNA, B-DNA, Z-DNA
Major and minor grooves
Forces that stabilize nucleic acid double
helix

There are two major forces that contribute to stability of
helix formation:
 Hydrogen bonding in base-pairing
 Hydrophobic interactions in base stacking
5’
3’
3’
5’
Same strand stacking
cross-strand stacking
Types of DNA double helix

Type A

Type B

Type Z
major conformation RNA
minor conformation DNA
major conformation DNA
minor conformation DNA
Right-handed helix
Short and broad
Right-handed helix
Long and thin
Left-handed helix
Longer and thinner
Secondary structures of Nucleic acids

DNA is primarily
in duplex form

RNA is normally
single stranded
which can have a
diverse form of
secondary
structures other
than duplex.
Non B-DNA Secondary structures

Cruciform DNA


Slipped DNA
Triple helical DNA
Hoogsteen basepairs
Source: Van Dongen et al. (1999) , Nature Structural Biology 6, 854 - 859
More Secondary structures

RNA pseudoknots

Cloverleaf rRNA structure
16S rRNA Secondary Structure Based on
Phylogenetic Data
Source: Cornelis W. A. Pleij in Gesteland, R. F. and Atkins, J. F. (1993)
THE RNA WORLD. Cold Spring Harbor Laboratory Press.
3D structures of RNA :
transfer-RNA structures

Secondary structure
of tRNA (cloverleaf)

Tertiary structure
of tRNA
3D structures of RNA :
ribosomal-RNA structures

Secondary structure
of large rRNA (16S)

Tertiary structure
of large rRNA subunit
Ban et al., Science 289 (905-920), 2000
3D structures of RNA :
Catalytic RNA

Secondary structure
of self-splicing RNA

Tertiary structure
of self-splicing RNA
Some structural rules …

Base-pairing is stabilizing

Un-paired sections (loops) destabilize

3D conformation with interactions
makes up for this
Final notes

Sense/anti-sense RNA
antisense RNA blocks translation through
hybridization with coding strand
Example. Tomatoes synthesize ethylene in order to ripe. Transgenic
tomatoes have been constructed that carry in their genome an
artificial gene (DNA) that is transcribed into an antisense RNA
complementary to the mRNA for an enzyme involved in ethylene
production  tomatoes make only 10% of normal enzyme amount.

Sense/anti-sense peptides
Have been therapeutically used
Especially in cancer and anti-viral therapy

Sense/anti-sense proteins
Does it make (anti)sense?
Codons for hydrophilic and hydrophobic amino acids on
the sense strand may sometimes be complemented, in
frame, by codons for hydrophobic and hydrophilic amino
acids on the antisense strand. Furthermore, antisense
proteins may sometimes interact with high specificity
with the corresponding sense proteins… BUT
VERY RARE: HIGHLY CONSERVED CODON BIAS