ppt - University of Illinois Urbana

Download Report

Transcript ppt - University of Illinois Urbana

Review of Course Topics
(Lecture for CS498-CXZ Algorithms in Bioinformatics)
Dec. 8, 2005
ChengXiang Zhai
Department of Computer Science
University of Illinois, Urbana-Champaign
Key Algorithms
•
•
•
•
•
•
•
•
•
DNA Sequencing:
–
Shortest superstring problem & Eulerian graph approach
–
–
Lander-Waterman model
Overlap-Layout-Consensus
Gene identification
–
–
Exon chaining (similarity)
Likelihood ratio (statistical)
Pairwise alignment
–
–
Dynamic programming
Scoring (scoring matrix, affine gap)
–
Variations (Local, Smith-Waterman algorithm)
Multiple sequence alignment
–
Exact: Multidimensional dynamic programming
–
Inexact: Feng-Doolittle progressive alignment
Hidden Markov models
–
Finding most likely path: Viterbi
–
–
Computing sequence probabilities: Forward/Backward
Supervised learning
–
ProfileHMM
Microarray data analysis
–
Agglomerative Hierarchical Clustering: Single-link, complete-link, avg/group link
–
K-means clustering
Phylogenetic tree construction
–
Neighbor-joining
–
Maximum parsimony
Regulatory motifs
–
Deterministic: Consensus
–
Sampling: Gibbs Sampler
Genome rearrangements
–
Sort by reversal (breakpoint elimination)
Typical Steps to Solve a
Bioinformatics Problem
•
Problem formulation
– Understand the original biology problem
– Formalize the problem as a computational problem
•
– Must make assumptions (many are unrealistic)
Find algorithms to solve the problem
– Brute force is often too slow or consumes too much memory
– Developing efficient algorithms is the main challenges
•
– When it’s impossible to find an extract solution quickly, think about
finding an approximate solution
Evaluate the algorithms and
– Further improve the algorithms
– Further improve the problem formulation
What To Do Next?
•
Research Track:
– For undergraduate students: consider graduate schools (many now
have Ph.D./MS in Bioinformatics)
•
•
– For graduate students: find a research advisor in this direction (UIUC is
hiring more faculty in bioinformatics)
Industry Track:
– Pharmaceutical industry is the main job market
Further Training:
– Molecular biology
– Advanced/specialized bioinformatics courses
– Machine learning
– Data mining (relational and textual)
– Statistics
– Databases/Web search
Course Evaluation
Thank You!
Good Luck for the Final!