Lecture 13_summary

Download Report

Transcript Lecture 13_summary

Intro to Bioinformatics
Summary
What did we learn
• Pairwise alignment –
Local and Global Alignments
When? How ?
Tools : for local blast2seq ,
for global best use MSA tools such as Clustal X, Muscle
What did we learn
• Multiple alignments (MSA)
When? How ?
MSA are needed as an input for many different
purposes: searching motifs, phylogenetic analysis,
protein and RNA structure predictions, conservation of
specific nts/residues
Tools : Clustal X (for DNA and RNA), MUSCLE (for proteins)
Tools for phylogenetic trees: PHYLIP …
What did we learn
• Search a sequence against a database
When? How ?
- BLAST :Remember different option for BLAST!!! (blastP blastN…. ), make sure to search
the right database!!!
DO NOT FORGET –You can change the scoring matrices, gap penalty etc
- PSIBLAST
Searching for remote homologies
- PHIBLAST
Searching for a short pattern within a protein
What did we learn
• Motif search
When? How ?
- Searching for known motifs in a given promoter (JASPAR)
-Searching for overabundance of unknown regulatory
motifs in a set of sequences ; e.g promoters of genes which
have similar expression pattern (MEME)
Tools : MEME, logo,
Databases of motifs : JASPAR (Transcription Factors binding sites)
PRATT in PROSITE (searching for motifs in protein sequences)
What did we learn
• Protein Function Prediction
When? How ?
- Pfam (database to search for protein motifs/domain
(PfamA/PfamB)
- PROSITE
- Protein annotations in UNIPROT
(SwissProt/ Tremble)
What did we learn
• Protein Secondary Structure PredictionWhen? How ?
– Helix/Beta/Coil(PHDsec,PSIPRED).
– Predicts transmembrane helices (PHDhtm,TMHMM).
– Solvent accessibility: important for the prediction of
ligand binding sites (PHDacc).
What did we learn
• Protein Tertiary Structure PredictionWhen? How ?
– First we must look at sequence identity to a sequence with a
known structure!!
– Homology modeling/Threading
– MODEBase- database of models
Remember : Low quality models can be miss leading !!
Tools : SWISS-MODEL ,genTHREADER, MODEBase
What did we learn
• RNA Structure and Function PredictionWhen? How ?
– RNAfold – good for local interactions, several predictions of
low energy structures
– Alifold – adding information from MSA
– RFAM
– Specific database and search tools: tRNA, microRNA …..
What did we learn
• Gene expression
When? How ?
– Many database of gene expression
GEO …
– Clustering analysis
EPClust (different clustering methods K-means, Hierarchical
Clustering, trasformations row/columns/both…)
– GO annotation (analysis of gene clusters..)
So How do we start …
• Given a hypothetical sequence predict it
function….
What should we do???
Example
• Amyloids are proteins which tend to aggregate
in solution. Abnormal accumulation of
amyloid in organs is assumed to play a role in
various neurodegenerative diseases.
Question : can we predict whether a protein X is
an amyolid ?
Research Plan
1 Building a Database
-Search the protein database (swiss prot) for proteins annotated as amyloids
-Select a set of 10-30 proteins which are amyloids related to human diseases
2. Analyzing the unique properties of the family
-Use tools learned in class to calculate the protein properties (per each protein)
which can be related to amyloidosis (5-10 different sequence/predicted structural features)
For example
Fact : Amyloids tend to aggregare via beta sheet –
Calculate: the percent of secondary structure (H,E, C)
Fact : Amyloids tend to aggregate via aromatic residues
Calculate : the percent of different amino acids in the proteins
….
3. Summerize the results a and use basic statistical tools to evaluate if there are features
Which are characteristic of this group, suggest a model to predict a new protein related
to this group of proteins.