Title goes here - Center for Biological Sequence Analysis

Download Report

Transcript Title goes here - Center for Biological Sequence Analysis

What is a Project
• Purpose
– Use a method introduced in the course to describe some biological
problem
• How
–
–
–
–
–
Construct a data set describing the problem
Define which method to use
(Develop method)
Train and evaluate method
(Compare performance to other methods)
• Documentation
– Write report in form of a research article (10-15 pages)
•
•
•
•
•
•
Abstract
Introduction
Materials and method
Results
Discussion
References
Project list
1.
2.
3.
Peptide MHC binding predictions using position specific
scoring matrices including pseudo counts and sequences
weighting clustering (Hobohm) techniques
Peptide MHC binding predictions using artificial neural
networks with different sequence encoding schemes
Comparative study of PSSM, ANN for peptide MHC binding
1.
Analysis on how data size defines predictive performance for
PSSM and ANN
4. NN-align, a neural network-based method for motif
recognition and peptide-binding prediction
5.
6.
7.
Improved protein template identification using hidden
Markov models (HMMER)
Implementation of HMM Baum-Welsh algorithm
Gibbs sampler for MHC class II binding
PSSM
• Peptide MHC binding predictions using position specific
scoring matrices including pseudo counts and sequences
weighting techniques
– Compare methods for sequence weighting
• Clustering vs heuristics
– Benchmark (Peters et al 2006) covering some
20 MHC molecules, compare to best other
methods
NN sequence encoding
• Peptide MHC binding predictions using artificial
neural networks with different sequence
encoding schemes
– Benchmark (Peters et al 2006) covering some
20 MHC molecules, compare to best other
methods
– Compare sequence encoding schemes
• Sparse, Blosum, composition, charge, amino acids
size,..
Comparative study
• Compare methods for MHC peptide
binding
– PSSM
– ANN
• Does size matter?
• Data: Benchmark by Peters et al 2006
covering some 20 MHC molecules
Hidden Markov models
• Improved protein template identification
using hidden Markov models (HMMER)
– Train profile HMM to remote protein fold
recognition
• Use the Hmmer program to construct profile HMM
for selected set of proteins from the CASP8
competition
• Use Hmmer model to identify PDB templates for
homology modeling for CASP8 targets
HMM
• Implement Baum-Welsh HMM training
– Based on code from Tapas Kanungo HMM toolkit
• A tar file can be found at:
– http://www.kanungo.com/software/umdhmm-v1.02.tar.
• A zip file:
– http://www.kanungo.com/software/umdhmm-v1.02.zip.
• README file
– http://www.kanungo.com/software/umdhmm-v1.02.
• Tutorial talks
– http://www.kanungo.com/software/hmmtut.ps
– http://www.kanungo.com/software/hmmtut.pdf
• Test code on un-fair casino example
Gibbs sampler
• Gibbs sampler approach to the prediction of
MHC class II binding motifs
– Develop Gibbs sampler to prediction of MHC
class II binding motifs
– Benchmark Nielsen et al 2007 covering 14
HLA-DR alleles