A Risk Minimization Framework for Information Retrieval

Download Report

Transcript A Risk Minimization Framework for Information Retrieval

Algorithms in Computational Biology
Tanya Berger-Wolf
Compbio.cs.uic.edu/~tanya/teaching/CompBio
January 13, 2006
Outline
• What is computational biology?
• Computational Biology vs Bioinformatics
• Why is computational biology important?
• CompBio and other fields
• Topics in CompBio
What is Computational Biology?
•
•
•
No standard definition!
Our definition: computational techniques for biological problems
–
–
–
–
Data acquisition, management and representation (bioinformatics)
Pattern analysis and data mining (bioinformatics)
Data analysis and optimization
Using bio data to solve other problems (medicine, public policy, etc.)
Computational biology touches all parts of computer science
–
–
–
–
–
Databases
Data streaming
HPC and systems
Networking
Algorithms
– Privacy and security
– Image processing
– Visualization
http://www.colorbasepair.com/what_is_bioinformatics.html
Why is CompBio Important?
•
Biology perspective
– More and more biological information is available => need for
effectively accessing and using the information
– As more detailed information is available different questions
can be asked (models of evolution) => requires new math
•
Computer science perspective
– Excellent application domain
– Poses special computational challenges
– Brings computer science closer to scientific discovery
•
Currently growing …
The Growing Field of CompBio
• Research: Universities are expanding
research programs in bioinformatics/compbio
• Education: New degree programs are being
launched
• Industry: Pharmaceutical industry has a great
interest in bioinformatics
• Many job and funding opportunities
CompBio and Other Fields
Biology
Computer Science
Information
Management
Biochemistry
Molecular Bioinformatics/
Theoretical CS
Biology
CompBio
Biophysics
Numerical
Computing
Machine Learning
Data Mining
Applied Mathematics & Statistics
Topics in Bioinformatics
…In this paper, we report the
discovery of a new gene that
affects DNA reproduction in …
Genes
…
Gene expression & regulation
DNA Sequences
AATTCATGAAAATCGTATACTGGTCTGGTACCGGC
TGAGAAAATGGCAGAGCTCATCGCTAAAGGTA
TCTGGTAAAGACGTCAACACCATCAACGTGTC
ACATCGATGAACTGCTGAACGAAGATATCCTG
TTGCTCTGCCATGGGCGATGAAGTTCTCGAGG
Genomics
Biology Literature
Microarray data
1.2 2.2 ...1.5 
3.2 2.0 ...5.6 
....

0.5 1.5 ... 4.3
Transcriptomics
…
Text Mining
Proteins (Function)
Protein Sequences
MKIVYWSGTGNTEKMAELIAKGIIESGKDV
DELLNEDILILGCSAMGDEVLEESEFEPFIE
KVALFGSYGWGDGKWMRDFEERMNGYG
PDEAEQDCIEFGKKIANI
Proteomics
Sample Topic 1: Sequence Alignment
Multiple sequence alignment of 7 neuroglobins using clustalx
?
Brothers!
?
Sample Topic 2:
Population
Genetics
Take Away Messages
• Computational Biology is a growing field
• Many job/funding opportunities
• Many open problems to be solved
• Actually can do something good for the
humanity? – Nah!