Proteiinien merkitys - Helsingin yliopisto

Download Report

Transcript Proteiinien merkitys - Helsingin yliopisto

52925 Proteiinianalyysin työt
HOW – hands-on workshop on
protein analysis
Liisa Holm
Instructors
•
•
•
•
•
•
Patrik Koskinen
Samuli Eldfors
Xuan Hung Ta
Martin Heger
Petri Törönen
Jussi Nokso-Koivisto
Course web page
http://ekhidna.biocenter.helsinki.fi/how
– Schedule
– Talks
– Exercises
– Course assignments
– Instructions for computer use
Topics
Week I
• Monday
– Introduction
– Pairwise alignments
• Tuesday
– Manual editing of sequence
alignment
• Wednesday
– Secondary structure prediction
• Thursday
– Structure visualisation
• Friday
– Comparative modelling
Week II
• Monday
– Phylogenomics
• Tuesday
– Sequence classifications
• Wednesday
– Protein-protein interactions
• Thursday
– Structure classifications
• Friday
– Review day
Week III
– Work on course assignments
Course organization
• 1st and 2nd week (Structured)
– Demonstrations (12 -…)
– Practical exercises (…-17)
• 3rd week (Self-organized)
– Course assignment
• Instructor available two hours daily
– Discussion on Tuesday 13-15
• Written report due 8th December
Mode of work
• Demonstrations
• Practical exercises
– Structured questions
– You should first try yourself, then ask team
mate, then ask instructor
– Discuss results with team mate
• Course assignments
– Written reports, due 8 December
– Two sequence assignments per team
– Course grade based on report
Objectives
• Infer function and/or structure starting from
the amino acid sequence of a query
protein
– Identify related sequences, place in family
– Identify conserved positions in sequence and
structure
• Learn to use representative web-based
tools
• No programming, no Unix/Linux
Introduction
• Most cellular functions are performed or
facilitated by proteins.
– Primary biocatalyst
– Cofactor transport/storage
– Mechanical motion/support
– Immune protection
– Control of growth/differentiation
Linear DNA
Watson & Crick (1953)
3D structure
Myoglobin
Kendrew & Perutz (1957)
1mbn
Function = S interactions
Evolution
Sequence – Structure - Function
DNA sequence
Protein sequence
Natural selection
Protein function
Protein structure
What can sequence analysis do?
• Homology
– Inference of inherited complex features: what is
conserved is important
– Most powerful approach
– Good tertiary structure prediction
• Diagnostic patterns
– E.g. subcellular localization signals
• Physical preferences
– Good secondary structure prediction
– Prediction of transmembrane segments
– Poor ab initio tertiary structure prediction
Application: Finding Homologs
Application:
Finding Homologues
• Find Similar Ones in Different Organisms
• Human vs. Mouse vs. Yeast
– Easier to do Expts. on latter!
(Section from NCBI Disease Genes Database Reproduced Below.)
Best Sequence Similarity Matches to Date Between Positionally Cloned
Human Genes and S. cerevisiae Proteins
Human Disease
MIM #
Human
Gene
GenBank
BLASTX
Acc# for
P-value
Human cDNA
Yeast
Gene
GenBank
Yeast Gene
Acc# for
Description
Yeast cDNA
Hereditary Non-polyposis Colon Cancer
Hereditary Non-polyposis Colon Cancer
Cystic Fibrosis
Wilson Disease
Glycerol Kinase Deficiency
Bloom Syndrome
Adrenoleukodystrophy, X-linked
Ataxia Telangiectasia
Amyotrophic Lateral Sclerosis
Myotonic Dystrophy
Lowe Syndrome
Neurofibromatosis, Type 1
120436
120436
219700
277900
307030
210900
300100
208900
105400
160900
309000
162200
MSH2
MLH1
CFTR
WND
GK
BLM
ALD
ATM
SOD1
DM
OCRL
NF1
U03911
U07418
M28668
U11700
L13943
U39817
Z21876
U26455
K00065
L19268
M88162
M89914
9.2e-261
6.3e-196
1.3e-167
5.9e-161
1.8e-129
2.6e-119
3.4e-107
2.8e-90
2.0e-58
5.4e-53
1.2e-47
2.0e-46
MSH2
MLH1
YCF1
CCC2
GUT1
SGS1
PXA1
TEL1
SOD1
YPK1
YIL002C
IRA2
M84170
U07187
L35237
L36317
X69049
U22341
U17065
U31331
J03279
M21307
Z47047
M33779
DNA repair protein
DNA repair protein
Metal resistance protein
Probable copper transporter
Glycerol kinase
Helicase
Peroxisomal ABC transporter
PI3 kinase
Superoxide dismutase
Serine/threonine protein kinase
Putative IPP-5-phosphatase
Inhibitory regulator protein
Choroideremia
Diastrophic Dysplasia
Lissencephaly
Thomsen Disease
Wilms Tumor
Achondroplasia
Menkes Syndrome
303100
222600
247200
160800
194070
100800
309400
CHM
DTD
LIS1
CLC1
WT1
FGFR3
MNK
X78121
U14528
L13385
Z25884
X51630
M58051
X69208
2.1e-42
7.2e-38
1.7e-34
7.9e-31
1.1e-20
2.0e-18
2.1e-17
GDI1
SUL1
MET30
GEF1
FZF1
IPL1
CCC2
S69371
X82013
L26505
Z23117
X67787
U07163
L36317
GDP dissociation inhibitor
Sulfate permease
Methionine metabolism
Voltage-gated chloride channel
Sulphite resistance protein
Serine/threoinine protein kinase
Probable copper transporter
What you will learn
• Multiple alignment
– Used as input to many prediction tools
– Improves sequence-structure alignment
– Identify functional sites
• Protein structure
– Visualisation
– Comparative modelling
• Using phylogeny in function assignment
– Family classifications
Query = Protein sequence
Sequence similarity to other proteins?
Yes: does similarity imply homology?
Yes: place query in family tree
Known function(s) in family?
Yes
Transfer function
Verify conservation of functional motifs
No
Motif search
Use other data
Known structure in family?
Yes
Comparative modelling
Validate motifs against 3D model
No
Secondary structure prediction
No: use single sequence methods
No: single sequence methods
Motif search
Secondary structure prediction
Use other data
Flowchart
Course assignments
• Goal: using the flowchart, what can you
say, with what confidence, about the
structure and function of the protein?
• Max length of report ~10 pages. No need
to show negative results.
• More detailed guidelines given on Day 10.
Teams
Team n, n=1,…,12, works on both sequence_nA
and sequence_nB
•
A and B sequences have been selected to
present different challenges, therefore it is
strongly recommended that the team members
work together on both sequences
Sequences are here:
http://ekhidna.biocenter.helsinki.fi/how/proteinlist
.fasta
Preparing sequence reports
• Week 3 is reserved for preparing the
reports.
– Experience has shown that students progress
at different speeds. Fast students may try the
tools out on their sequence assignments
during weeks 1-2.
– Checkpoint on Tuesday (Day 12)
• It is expected that sequence database searches
and some downstream analyses have been done
by then
• The purpose is to summarize progress and discuss
strategies forward