Strukturna biologija, bioinformatika, biologija sistema
Download
Report
Transcript Strukturna biologija, bioinformatika, biologija sistema
Strukturna biologija,
bioinformatika, biologija sistema
biologija 21-og veka
Danasnja presentacija bice
podeljena u tri dela:
1.
2.
Pozadina i opsti uvod
NMR
Rentgenska kristalografija
Strukturna biologija – primeri iz moje
laboratorije
3. Poseban osvrt na bioinformatiku i
biologiju sistema kao interdisciplinarne
grane relvantne za kompiuterske nauke
Human & other genome sequences
Functionally cloned DNA molecules
coding for proteins of interest.
Bioinformatics
Amino acid sequences
Protein structure
Structural biology
Protein function
Protein chemistry / yeast 2hybrid screens / proteomics /
enzymology / genetics /
transgenes / knock-outs /
knock-ins / chemical genetics /
etc.
3D structures of molecules allow us to understand
biological processes at the most basic level. We
can ‘see’ which molecules interact, how they
interact, how they function, how drugs act. They
can help us understand disease at an atomic level.
3D structures can be exploited in development of
new drugs.
structure-based drug design
Strukturna Biologija
• Struktura moze ponekad da odkrije funkciju proteina
direktno
• Struktura moza da racionalizuje eksperimentalnu
obzervaciju o selktivitetu i specificnosti enzimaticne
reakcije
• Struktura moze da postane osnova za rational
drug/inhibitor discovery.
• Struktura moze da razotkrije dinamicki aspekt
proteinskog ponasanja.
• Trodimenzionalne topologije polipeptida obezbedjuju
podatke za resavanje problema formiranja proteinske
strukture -- ‘protein folding problem’.
• [Sometimes a 3D structure can be rather
uninformative - the ‘structural genomics’ debate.]
Experimental modes of molecular structural biology
X-ray crystallography
NMR spectroscopy
Protein crystals
Protein solutions
(3-50 mgs/ml; ca.
100 ml)
(> 0.5 mM; min.
volume 0.3ml; 10100 mg)
Macromolecular
assemblies/particles
frozen in vitreous ice
15N,13C-Isotope
Electron microscope
[Se-methionine
labelling]
X-ray source/Synchrotron
Applicable to
proteins of any size
(in principle).
labelling
NMR Spectrometer
Applicable to
small(ish) proteins
(smaller than ca.
30,000 MW)
High resolution
Medium resolution
Cryo-electron
microscopy
Large particles,
typically > 500,000
MW
‘Low’ resolution
The sizes of cells and of their
component parts
Unaided
eye
200 mm
Light
microscope
x 10
20 mm
x 10
2 mm
x 10
200 nm
CELLS
= 106 mm
= 109 nm
x 10
20 nm
x 10
2 nm
x 10
0.2 nm
MOLECULES
ORGANELLES
1 m = 103 mm
Electron
microscope
ATOMS
STA JE NMR?
Nuclear Magnetic Resonance (NMR) je mocna
spektroskopska tehnika koja pruza informaciju o
strukturnim i hemijskim osobinama molekula.
NMR je ne-destruktivna metoda za analizu
strukture i dinamike molekula. NMR koristi
osobine odredjenih atoma kada su izlozeni vrlo
jakom magnetnom polju. For biochemists these
are mainly 1H, 15N, 13C and 31P. 1H and 31P are
highly abundant isotopes whilst 15N and 13C are
present at only low levels < 1%. Studies using
these nuclei generally require isotopic enrichment
by production of the molecule from media that
has been enriched in these isotopes.
Prof. Kurt Wüthrich
Nobel Prize for Chemistry 2002
Typically the magnets used in NMR
spectroscopy are 10,000-15,000 times
stronger than the earth’s magnetic field. The
NMR experiment generally consists of applying
short bursts or pulses of energy in the radio
frequency (RF) range, typically 40-800 MHz, to
the sample. These pulses of RF cause the
nuclei to rotate away from their equilibrium
position and they start to precess (rotate)
around the axis of the magnetic field. The
exact frequency at which the nuclei precess is
related to both the chemical and physical
environment of the atom in the molecule. By
using different combinations of RF pulses and
delays it is possible to determine how each
atom in the molecule interacts with other
atoms in the molecule.
110
709
592
15N(ppm)
543
592
709
110
110
115
115
120
120
120
125
125
125
115
732
1H(ppm)
10.0
9.0
8.0
7.0
10.0
9.0
8.0
7.0
10.0
9.0
8.0
7.0
The NMR spectrum is exquisitely sensitive to the conformation of the
polypeptide chain, and to the presence of interacting chemical ligands.
These and other features of the rich ‘spin physics’ that underlies the
NMR phenomenon mean that NMR spectroscopy is a highly versatile
tool for the characterisation of:
Structure
Dynamics
Molecular interactions
C
C
a5
a2
C
a4
N
N
a1
N
a3
a6
Harris et al. (2004) J. Mol. Biol.
C
a 3 a5
N
a1
C
N
a4
E676
E667
D709
E664
a2
a4
E700
a6
E699
E652
Harris et al. (2004) J. Mol. Biol.
X-ray Crystallography
- An experimental technique involving diffraction of X-rays by
crystalline material.
-X-ray wavelength ~ Å
-Based on the diffraction pattern, electron density of the
molecule could be reconstructed. (Need intensities and
phases)
-Model is built in the reconstructed electron-density
-Model – the molecular picture – molecular structure from
global folds to atomic details
-Limited information about the molecule’s dynamic
-Depends on obtaining crystals
1. Why X-rays?
2. Why electron density?
3. Why crystals?
COMPUTED
ELECTRON-DENSITY
MAP
EYEPIECE LENS
magnification n
Scattered
radiation
OBJECTIVE LENS
magnification m
CRYSTALLOGRAPHER
PHASES
COMPUTER
DETECTOR
Scattered
radiation
OBJECT
OBJECT
(crystal)
VISIBLE LIGHT
Enlarged image of object
Magnification mn
X-RAYS
Pregled procesa odredjivanja strukture
proteina koriscenjem difrakcije X zraka
1. Proizvodnja izolovanog dovoljno velikog kristala
kandidat proteina
2. Postavljanje kristala, prikupljanje i evaluacija
preliminarnih difrakcionih podataka
3. Kompletno prikupljanje podataka i procena fazi
4. Izgradnja i rafiniranje proteinskih lanaca
5. Validacija strukture
European
Synchrotron
Radiation
Facility
(Grenoble, France)
Structural characterisation of drug-targets
from M.tuberculosis
Institute of Structural
Molecular Biology
Snezana Djordjevic
M. tuberculosis
• 2-3 million deaths from
tuberculosis annually
• 1/3 of world population
currently infected with the
disease
• Drug resistance
-multidrug-resistant strains
-12.6 % M. tuberculosis isolates
resistant to at least one drug
-2.2 % resistant to both
isonazid and rifampin
New Drugs
-agents that exhibit activity
against drug resistant strains
-completely sterilize infection
-shorten the duration of drug
therapy and thus promote drug
compliance
METRO – 06/03/2007
Mechanism of resistance to Isoniazid
-Isoniazid is a prodrug that is oxidized by KatG
-KatG is catalase-peroxidase
-Mutation of the KatG leads to resistance
KatG
Prodrug
activation
Resistance
KatG activity is important for virulence !
-Physiological function of the KatG includes protection of the
mycobacterium against H2O2 and other ROS produced by
the microbe and its host.
?
KatG
AhpC
AhpD
AhpD
Alkylhydroperoxidase
From M. tuberculousis
Paul Ortiz de Montellano
Dept. of Pharmaceutical
Chemistry, UCSF
C2; a=186.38 Å, b=117.28 Å, c=88.99 Å, b=113.97°
177 residues/monomer
Structure solution: SeMet/MAD
4 wavelengths data collected in Grenoble
1.9 (1.7) Å resolution
2Fo-Fc map
AhpD Monomer Topology
From structure to function and
the catalytic mechanism
CXXC
a7
a6
Thioredoxins
a5
Peroxiredoxins
-solvent exposed
a3
a8
-pKa ~ 7.1
a4
a2
C
a1
N
N
C
Cys130
Cys133
His137
Glu118
Putative substrate binding site
Cys133
Novel redox pathway in M. tuberculosis
NADH
Lpd(ox)
DlaT-LpH2
AhpD(ox)
AhpC(red)
ROOH
NAD+
Lpd(red)
DlaT-Lp
AhpD(red)
AhpC(ox)
ROH
E3
E2
Lpd: Dihydrolipoamide dehydrogenase
SucB: Dihydrolipoamide acyltransferase
Components of pyruvate dehydrogenase complexes
Pyruvate
Acetyl-CoA + CO2
NAD+
NADH
Molecular Surface
A Prototypical Two-Component Signal
Transduction System
Periplasmic
Space
P
External
Stimulus
Receptor / input /
sensor domain
Kinase
Core
Histidine Kinase (HK)
Sensory Protein
Response
Regulator (RR)
Response
Chemotaxis
P
Tar
-CH3
B
SAM
+CH3
W
R
W
A
A
+ATP
P
B
P
Y
DosS
• Induced by exposure to hypoxia, NO and ethanol.
• Structural studies have been initiated with the aim of
describing the signalling mechanism that leads to
histidine kinase activation.
• Histidine kinase domain (HK) undergoes
autophosphorylation and can carry out a Mg2+
dependant phosphotransfer reaction onto DosR.
• DosS : DosR are a cognate sensor-regulator pair.
Identification of domain boundaries
Further structural investigation of GAF domains
PDE2A_B
DosS GAF_A
cGMP PDE_1
cGMP PDE_2
anfA
cGMP PDE_3
ADEN_CYCL_1
ADEN_CYCL_2
yebR
Hypoth. Pro.
Nif-regul_1
Nif-regul_2
Nif-regul_3
Nif-regul_4
consensus
b1
a2
Secondary Structure: 1MC0
b2
196
3
154
336
46
228
79
271
27
54
68
46
35
21
DVSVLLQEIITEARN-------LSNAEICSVFLLDQ------------NELVAKVFDGGVVDDe----sY
DLEATLRAIVHSATS-------LVDARYGAMEVHDRQH---------RVLHFVYEGIDEETVR------R
DVTALCHKIFLHIHG-------LISADRYSLFLVCEdss-------ndKFLISRLFDVAEGSTleeasnN
SLEVILKKIAATIIS-------FMQVQKCTIFIVDEdcsdsf-ssvfhMECEELEKSSDTLTR------E
DLADALSIVLGVMQQ-------HLKMQRGIVTLYDMr----------aETIFIHDSFGLTEEEk-----K
DATSLQLKVLRYLQQ-------ETQATHCCLLLVSEd----------nLQLSCKVIGEKVLG-------E
GFENILQEMLQSITLkt---geLLGADRTTIFLLDEe----------kQELWSIVAAGEGDRS------L
DLEDTLKRVMDEAKE-------LMNADRSTLWLIDRd----------rHELWTKITQDNGST-------K
DLNRDFNALMAGETS-------FLATLANTSALLYErlt-------diNWAGFYLLEDDTLVLg----pF
LIKATLQKTMEASIH-------QTGAQLGSLFLLDGd----------gRVTESILARGATDQSqk---kN
RLEVTLANVVNVLSS-------MLQMRHGMICILDSe-----------GDPDMVATTGWTPEMa-----G
RLEVTLANVLGLLQS-------FVQMRHGLVSLFNDd-----------GVPELTVGAGWSEG-------T
NTARALAAILEVLHD-------HAFMQYGMVCLFDKe----------rNALFVESLHGIDGERkk--etR
DLSKTLREVLNVLSA-------HLETKRVLLSLMQDs-----------GELQLVSAIGLSYEEf-----Q
1
DLEELLQTILEELRQ-------LLGADRVSIYLVDEDK---------RGELVLVASDGLTLPE------L
b3
a3
b4
a4
b5
PDE2A_B
DosS GAF_A
cGMP PDE_1
cGMP PDE_2
anfA
cGMP PDE_3
ADEN_CYCL_1
ADEN_CYCL_2
yebR
Hypoth. Pro.
Nif-regul_1
Nif-regul_2
Nif-regul_3
Nif-regul_4
EIRIPADQ-----GIAGHVATTGQILNIP-DAYAHPl--fYRGVDDSTGFR-----TRNILCFPIKNEnIGHLPKGL-----GVIGLLIEDPKPLRLD-DVSAHP----AS-IGFPPYHPP----MRTFLGVPVRVR-CIRLEWNK-----GIVGHVAAFGEPLNIK-DAYEDPr--fNAEVDQITGYK-----TQSILCMPIKNHrRDANRINY-----MYAQYVKNTMEPLNIP-DVSKDKr---FPWTNENMGNInq-qcIRSLLCTPIKNGkRGIYAVGE-----GITGKVVETGKAIVAR-RLQEHP-----DFLGRTRVSRng-kaKAAFFCVPIMRA-EVSFPLTM-----GRLGQVVEDKQCIQLK-DLTSDD----VQQLQNMLGCE-----LRAMLCVPVISRaEIRIPADK-----GIAGEVATFKQVVNIPfDFYHDPrsifAQKQEKITGYR-----TYTMLALPLLSEqELRVPIGK-----GFAGIVAASGQKLNIPfDLYDHPdsatAKQIDQQNGYR-----TCSLLCMPVFNGdQGKIACVRipvgrGVCGTAVARNQVQRIE-DVHVFD-------GHIACDAA-----SNSEIVLPLVVK-IVGQVLDK-----GLAGWVRENKRTGLIN-DTTKDY----RWLKLPDEPYQ-----ALSALGVPIVWG-QIRAHVPQ-----KAIDQIVATQMPLVVQ-DVTADP-----LFAGHEDLFGppeeaTVSFIGVPIKAD-DERYRTCVp---qKAIHEIVATGRSLMVE-NVAAEt---aFSAADREVLGAsd-siPVAFIGVPIRVD-HVRYRMGE-----GVIGAVMSQRQALVLP-RISDDQ-----RFLDRLNIYDy----SLPLIGVPIPGAdSGRYRVGE-----GITGKIFQTETPIVVR-DLAQEP-----LFLARTSPRQsqdgeVISFVGVPIKAA--
consensus
GVRFPLDE-----GLVGRVAETGRPLVIP-DVEADP----FFFLDLLQRYQL----IRSFLAVPLVAG--
Secondary Structure|1MC0
b6
a5
PDE2A_B
DosS GAF_A
cGMP PDE_1
cGMP PDE_2
anfA
cGMP PDE_3
ADEN_CYCL_1
ADEN_CYCL_2
yebR
Hypoth. Pro.
Nif-regul_1
Nif-regul_2
Nif-regul_3
Nif-regul_4
-QEVIGVAELVNK-------------------INGPWFSKFDEDLATAFSIYCGISIAHSLLYKKVN
-DESFGTLYLTDK-------------------TNGQPFSDDDEvlvqalaaaagiavanarlyqqak
-EEVVGVAQAINKk-----------------sGNGGTFTEKDEKDFAAYLAFCGIVLHNAQLYETSL
kNKVIGVCQLVNKmee--------------ttGKVKAFNRNDEQFLEAFVIFCGLGIQNTQMYEAVE
-QKVLGTIAAERV-------------------YMNPRLLKQDVELLTMIATMIAPLVELYLIENIER
tDQVVALACAFNK-------------------LGGDFFTDEDERAIQHCFHYTGTVLTSTLAFQKEQ
-GRLVAVVQLLNKlkpyspp-----dallaerIDNQGFTSADEQLFQEFAPSIRLILESSRSFYIAT
-QELIGVTQLVNKkktgefppynpetwpiapeCFQASFDRNDEEFMEAFNIQAGVALQNAQLFATVK
-NQIIGVLDIDST--------------------VFGRFTDEDEQGLRQLVAQLEKVLATTDYKKFFA
-DELLGILTLMHS--------------------QVNHFTPACATAMEKTAELIALVLNNARIQTKHK
-HHVMGTLSIDRIw-----------------dGTARFRFDEDVRFLTMVANLVGQTVRLHKLVASDR
-STVVGTLTIDRIp------------------EGSSSLLEYDARLLAMVANVIGQTIKLHRLFAGDR
-NQPAGVLVAQPM-------------------ALHEDRLAASTRFLEMVANLISQPLRSATPPESLP
-REMLGVLCVFRDg------------------QSPSRSVDHEVRLLTMVANLIGQTVRLYRSVAAER
consensus
-GELLGVLALHRK-------------------DSPRPFTEEEEELLQALANQLAIALALAQLYEELR
345
150
314
503
196
375
249
441
179
202
220
198
186
180
SAMt99 : to detect remote structural homologues of this protein.
From the 11149 sequence homologies identified, 24 had a known
structure but none of those identified produced significant global alignment.
Local alignments covered either the C or N terminal regions. No alignment
was found that covered both putative GAF domains.
1 structural homologue was identified for DosS GAF A domain : 1MC0
UV-Visible Characterisation of GAF A
Haem
Absorption
Haemoglobin
Absorptionspectra
spectra ofof
Haemoglobin
Absorption
spectraof
of DosS
DosS 63-210
Absorption
spectra
GAF A
A
0.1
A. Oxy-ferrous (dashed line)
B. Ferric (solid line)
C. Ferrous (dotted line)
D. Ferrous-CO (solid line)
A. Ferric haemoglobin (solid line)
B. Oxy-ferrous haemoglobin (dashed line)
C. Ferrous haemoglobin (dotted line)
D. Ferrous-CO haemoglobin (solid line)
E. Ferrous-NO haemoglobin (solid line)
CO / NO / O2
E. Ferrous-NO (solid line)
Fe2+
A
0.005
His
500
550
600
650
700
Wavelength (nm)
550
600
Wavelength (nm)
650
Visible/UV spectrum of the DosS GAF A
(63-210) histidine to alanine mutants
0.12
H73A
H89A
H93A
0.10
H97A
H113A
0.08
H149A
DosS 63-210
0.06
0.04
0.02
40
2
41
3
42
4
43
5
44
6
45
7
46
8
47
9
49
0
50
1
51
2
52
3
53
4
54
5
55
6
56
7
57
8
58
9
60
0
0.00
Wavelength (nm)
Absorbance
H139A
The Model of Signalling
O2
Fe2+
A
OFF
B
DosR
NO
Fe2+
P
A
B
ON
P
DosR
GAF B - NMR
1H, 15N
labeled DosS GAF B HSQC
NMR experiments: HNCO, HNCA, HN(CO)CA, HNCACB, CBCA(CO)NH, HA(CA)NH
and HA(CACO)NH were obtained at 1H frequency of 500MHz on a 0.6mM [1H, 13C,
15N]-labelled DosS 231-379, pH6, 20mM phosphate, 100mM NaCl.
GAF B - NMR
PROBLEMS:
- 48 residues are still to be assigned
- 21 expected cross-peaks are missing
from the spectrum.
Sekharan MR, Rajagopal et al. 2005. Backbone 1H, 13C, and
15N resonance assignment of the 46 kDa dimeric GAF A
domain of phosphodiesterase 5 J Biomol NMR. 33(1):75
- Some of the cross-peaks do not form
one peak but multiple peaks.
Predicted secondary structure for DosS GAF 2 using PSIPRED.
- High content of Val, Leu and Ala
residues in the sequence.
Signalling mechanism
N
C
STRUCTURAL GENOMICS CENTRES IN
NORTH AMERICA, UK, FRANCE, JAPAN
• OXFORD STRUCTURAL GENOMICS
• Announced in 2003, with operations commencing in
July 2004 for an initial three-year period, this initiative
received funding from Canadian, Swedish and British
sponsors from both the public and private sectors. For
the second phase, July 2007, over £49 million is being
made from public funding agencies in Canada, Sweden
and Ontario, charitable foundations in the UK and
Sweden, GlaxoSmithKline plc, Novartis and Merck.
Laboratories at the University of Oxford , University of
Toronto and Karolinska Institutet, Stockholm.
BIOINFORMATIKA
U toku poslednjih nekoliko dekada, napredak u
molekularnoj biologiji, zajedno sa progresom u
genetskoj tehnologiji doveo je do eksplozije u kolicini
informacija stvorenih u naucnoj zajednici. Pojava te
mase informacija proizvela je potrebu i zahtev za
kompiuterizovanim bankama podataka (databases) da
bi se cuvali, organizovali i katalogovali podaci. Pritom
neophodno je bilo razviti sredstva (tools) za pregled,
vizualizaciju i analizu tih podataka.
Computational biology
(sam proces analize i interpretacije podataka)
• Razvoj i primena alatki (tools) koji
omogucavaju pristup, upotrebu i organizaciju
raznih informacija
• Razvoj novih algoritma i statistike sa kojima se
mogu proceniti relazije medju komponentama
u velikoj grupi podataka. Na primer metode za
lociranje gene u okviru sekvence, predvidjanje
strukture proteina/funkcije, i grupisanje
proteinskih sekvenci u familije povezanih
(related) slicnih sekvenci.
“Organizmi funkcionisu kao integrisani sistemi – nasa
cula, nasi misici, nas metabolizam i nas um rade zajedno
u povezanoj celini. Biolozi su tradicionalno proucavali
organizme deo po deo i uzivali u modernoj moci da
proucavaju molekul po molekul, gen po gen. ISM je
posvecen novoj nauci, kriticnoj nauci buducnosti kojoj je
za cilj da razume integraciju delova koji sacinjavaju
bioloski system.”
David Baltimore (Nobel Laureate)
President, Cal. Institute of Tech., Pasadena
Systems biology requires:
-Integration of biology, technology, computation
medicine
-a strong cross-disciplinary team of researchers.
-Institutes include scientists trained in biology, physics,
chemistry, engineering, computing, mathematics,
medicine, immunology, biochemistry, and genetics.
-They all speak the language of biology assembled into
a multiplicity of teams that are attacking focused and
important problems of systems biology.
Health Care in the 21st Century:
• Predictive
(genetic makeup, protein markers)
• Preventive
(probability of disease and response to treatment)
• Personalized
(customized therapeutic drugs)
http://csbi.mit.edu:8080/infoglueDeliverWorking/
The MIT CSBI links biologists, computer scientists and engineers
in a multi-disciplinary approach to the systematic analysis of
complex biological phenomena.
http://www.sbml.org/Main_Page
The Systems Biology Markup Language (SBML) is a
computer-readable format for representing models of
biochemical reaction networks in software. It's applicable
to models of metabolism, cell-signaling, and many
others. SBML has been evolving since mid-2000 thanks
to an international community of software developers
and users. This website is the portal for the global SBML
development effort; here you can find information about
all aspects of SBML.
Manchester Centre for Integrative Systems
Biology (MCISB)
• Molecular Biology / Biochemistry / Biophysics),
mathematical and computational (Modelling / Data
Integration / Text Mining
• Development and exploitation of methods for the
quantitative measurement of kinetic and binding
constants on a genome-wide scale
• Combined approaches will lead to computer models
of parts of living cells. Some of these 'silicon cells'
are already available for in silico experimentation,
through the Biomodels and JWS databases.
VIRTUELNA CELIJA
Acknowledgements:
My Group:
Sunita Sardiwal
Syeed Hussain
Shreenal Patel
Mark Jeeves
Christine Nunn
NMR:
Paul Driscoll
Richard Harris
Collaborators at RVC:
Neil Stoker
Sharon Kendall
Farahnaz Moahedzadeh
Stuart Rison
UV-VIS Spectra
Peter Rich
Doug Marshall
PHOSPHORYLATION Studies:
Irina Tsaneva
EM:
Helen Saibil
Nadav Elad
ITC/CD:
John Ladbury
Paul Leonard