Bioinformatics A

Download Report

Transcript Bioinformatics A

Bioinformatics A
Summary seminar
(with many hints for exam questions)
1) Introduction
1.
2.
3.
4.
5.
6.
The question: Transfer of information
The tools: MRS, BLAST, Clustal, Databases, SwissProt
Amino acid knowledge: understand secondary structure
Secondary structure -> protein structure
Protein structure helps make alignments
Alignments allow for transfer of information
Bioinformatics
Necessary evil, panacea, or just a useful tool?
With a month in the lab you can easily prevent
having to sit an hour in front of the computer.
Nothing is impossible for a biologist who doesn’t
have to discover it him/her-self.
Bio + informatica
Genome annotation
Bioinformatics
and medicines
One day we know everything about all
human (and flu) proteins and then can we
start to ‘calculate’ flu-medicines.
Drug Design
Mens vs parasiet
Parasite
Active site
H1N1 / H5N1
2) Tools
MRS, kind of bioGoogle
BLAST to find homologs
SwissProt: protein sequences
PDB: macromolecular structures
EMBL: nucleotide sequences
OMIM: genetic disorders
ProSite: motifs (e.g. {P} [ST] {P} N )
Biological databases (1)
Primary databases
contain biomolecular sequences or structures (experimental data!) and
associated annotation information
Sequences
Nucleic acid sequences
Protein sequences
EMBL, Genbank, DDBJ
SwissProt, trEMBL, UniProt
Structures
Protein Structures
PDB
Structures of small compounds CSD
Genomes
Ensembl
UCSC
©CMBI 2010
Databases
Data must be in a certain format for software to recognize
Every database can have its own format but some data elements are
essential for every database:
1. Unique identifier, or accession code
2. Name of depositor
3. Literature references
4. Deposition date
5. The real data
Nomenclature:
• Database entry or database record
• Database fields
©CMBI 2015
SwissProt database
• Database of protein sequences
• >500.000 sequence entries
• SwissProt is manually annotated and reviewed, thus of high
quality, but never complete; it contains many feature descriptions
and many hyperlinks to other databases; a bioinformatician
always looks in SwissProt first…
• Obligatory deposit of in SwissProt before publication
• SwissProt is part of UniProt
• The other main part of UniProt is Trembl (translated EMBL).
Trembl is automatically annotated and is not reviewed.
©CMBI 2015
Part III: Sequence Retrieval with MRS
Google Thé best generic search and retrieval system
Google searches everywhere for everything
MRS
Maarten’s Retrieval System (http://mrs.cmbi.ru.nl )
MRS searches in selected data environments
MRS is the Google of the biological database world
Search engine (like Google)
• Input/Query = word(s)
• Output = entry/entries from database
Other programs exist: Entrez, SRS, ....
©CMBI 2009
Transfer of information to corresponding residues
BLAST finds two database hits that are annotated to have a
phosphorylated serine.
DRT-GHNIPLMSTRK-TYHIHIENASEERTIKLLMN
DRR-GTTINLMTTKR-TYADELENASEDRTLLLNMN
AEPIYYHL---LTKRETYHIHIENASEEKIIKIVVN
“this serine is phorphorylated in a known protein from the
database, so in my protein the corresponding serine is likely to be
phosphorylated too”.
PAM250 Matrix (Dayhoff Matrix)
Symmetric
Many matrices exist
Question determines method
Amino Acid substitutions, some thoughts
Not all 20x20 possible mutations occur equally often
• Residues mutate more easily to similar ones (e.g.
Leucine and Isoleucine)
• Residues at surface mutate more easily
• Aromatics mutate preferably into aromatics
• Core tends to be hydrophobic;
• Cysteines are dangerous at the surface
• Cysteines in sulfur bridges (S-S) seldom mutate
• Some amino acids have similar codons
(for example TTT & TTC for Phe, TTA & TTG for Leu)
• Etc etc
BLAST Output
Click here to go
to the
corresponding
swissprot entry
Click here to
study alignment
in detail;
Look here first!!
A high score
indicates a likely
relationship
A low E-value
indicates that a
match is unlikely to
have arisen by
chance
Low complexity motifs visible
3) Amino acids
Hydrophobicity
Hydrophobicity
Hydrophobicity
Entropy of water
Amino acids have characteristics that determine their
behaviour, and what they are being used for (Gly,
Cys, His, Ser, Asp, etc).
Amino acids – Hydrophobicity
Hydrophobicity is the most important property
It drives the folding of a protein
The sticky amino acids glue together
The non-sticky amino acids point into the water
The waters must be ‘happy’
Amino acids - Hydrophobicity
(Not to scale)
Amino acids – Properties
Amino acids are not easily put into boxes according to their properties
Every amino acid belongs
to several categories
Every amino acid is unique
Hydrophobicity
Size
Secondary structure preference
Charge
Special characteristics
4) (Secondary) structure
Structure data often is not available.
Sequences don’t exist; structures exist.
Residues at corresponding positions in structures
have corresponding functions.
Sequence alignment is the poor man’s solution to
structure alignment.
Knowledge of the structure (even if only predicted)
can help improve the alignment.
Secondary structure – α-helix
N-terminus
Three things:
AMELK residues
Fobic-filic...
Helix dipole
C-terminus
Secondary structure – β-strand
A β-sheet consists of at least two β-strands that interact with each other
Two things:
VITWYF residues
Fobic-filic...
Anti-parallel
Parallel
Secondary structure – Turn
Turns connect the secondary structure elements. Turns
are between two things… Beta-turns hold PSDNG.
Secondary structure - Loop
A loop is everything that has no regular secondary structure; non of the above.
Amino acids – Secondary structure preference
 Residues that are good for a helix
 Ala, Met, Glu, Leu, Lys (AMELK)
 Residues that are good for strands
 Val, Ile, Thr, Trp, Tyr, Phe (VITWYF)
 Residues that are good for turns
 Pro, Ser, Asp, Asn, Gly (PSDNG)
3) Align CWEALALLAELALAAMKGSTPNGS met
CWEALALLLEALMRGTTPNGG
CWEALALLAELALAAMKGSTPNGS
??hhhhhhhhhhhhhhh------CWEALALLLEALMR---GTTPNGG
??hhhhhhhhhhhh---------CW obviously on top of CW. Predict and align the two helices. Gap
at end of helix.
©CMBI 20
5) Structure and alignment
Aquaporin
Aquaporin
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
G
P
Y
N
Q
F
G
G
G
A
N
S
V
A
L
G
Y
H
H
H
H
H
T
T
T
T
> S+
> S+
<>S+
><5S+
3<5S+
3<5SX 5 3 < 3 S+
<
+
>
T 3 S+
T 3 S+
<
-
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
48
32
4
65
147
72
29
0
5
22
1
29
4
37
180
82
95
597
599
599
599
600
600
600
599
601
601
601
601
601
602
602
601
601
70
91
4
77
79
72
13
0
0
9
0
83
25
47
75
3
2
AAAGDSHDTSASTGGNGASTTAAAGSSAKTNSSTSSGSAGSgggrKKKGKRKKNSGGSKADDSSGKD
YYYFDYQDYYYPPPPPLYYYIYFYPYYYQYPFYYFYRQFPQQQQVQQQPRPQQQYLLLPFAAEFLQA
FFYYYYFYYYYYYYYYYFYYYYYYYYYYYYYYYYYFYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
NDVATEDNNNVQQQQQQNTNQNVHQIVVQNQNTNNVEQVMEGGGEQQQHQQQQQTAEDEVQQEMAQQ
RRRRKVATRRRTTRRTTRRRVRRRRKRRTRTRRRRRVVRALRRRSAAAAASAAARRTRRRAAMDRAA
YYYYHYFHYYYLLLLLNYYYQYYYLYYYLYLYYYYYLHYLHLLLLLLLNLNLLLYYTYLYNNDHHLN
GGGGGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGKNGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
AAAAAATAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
TFEETEVTVQSTTVTTVTEMEEEMVSSSGMTMEVTFTREFMAAAVTTTLTLTTTGEVEVEMMSAETM
LLLVVLVVLLLVVVVVVLLLVLLLVLLLVLVVLLVLVVLVVVVVVVVVVVVVVVLVVLVLVVVVVVV
AHSSASAAAAAAAQAAAAASASSSNSAAASAAAASQNQSNSAAAAAAAAAAAAAASAASSAAAAGAA
ADDAVPLVDEDPPPHHHADDHEADPDDDDDHHDDDPHNSHAHHHSPPPHPHHPPHAPADSSSHPAPS
GGGGGGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYFYYYYYYYYYYYYYYYYYYYYFFFYYYYF
MSAs contain conserved residues, correlated mutations, and variable residues.
SFTDALKNMKPYESSFTRIVN
SFTASLKNLKPYCSSFTRVIG
SFTDALKLIVPYESSFTDVIH
SWTAVLKLMVPYLSSFTDILR
SYTDALKNVKPYESSFTRVVN
©CMBI 20
The amino acids in their natural habitat
Topics:
• Hydrogen bonds
• Secondary Structure
• Alpha helix
• Beta strands & beta sheets
• Turns
• Loop
• Tertiary & Quarternary Structure
• Protein Domains
6) Transfer of information
GPNANGPALLEILSLIAEAAQALAGGNQDDEA Can be
phosphorylated at exactly one spot by kinase X.
GGLEAAKLASSAASAAELLAGDNKKKW too.
Transfer of information
GPNANGPALLEILSLIAEAAQALAGGNQDDEA Can be
phosphorylated at exactly one spot by kinase X.
GGLEAAKLASSAASAAELLAGDNKKKW too.
GPNANGPALLEILSLIAEAAQALAGGNQDDEA
GGLEAAKLASSAASAAELLAGDNKKKW