17 - Digital Biology Laboratory
Download
Report
Transcript 17 - Digital Biology Laboratory
CS 7010: Computational
Methods in Bioinformatics
(course review)
Dong Xu
Computer Science Department
271C Life Sciences Center
1201 East Rollins Road
University of Missouri-Columbia
Columbia, MO 65211-2060
E-mail: [email protected]
573-882-7064 (O)
http://digbio.missouri.edu
Technical Definitions
NIH (http://www.bisti.nih.gov/)
Bioinformatics: “research, development, or
application of computational tools and
approaches for expanding the use of biological,
medical, behavioral or health data, including
those to acquire, represent, describe, store,
analyze, or visualize such data”.
Computational Biology: “the development and
application of data-analytical and theoretical
methods, mathematical modeling and
computational simulation techniques to the
study of biological, behavioral, and social
systems”.
Course Topics
Data interpretation in analytical technologies
Data management and computational
infrastructure
Discovery from data mining
Modeling, prediction and design
Theoretical in silico biology
Cover classical/mainstream bioinformatics
problems from computer science prospective
Discovery from Data Mining (I)
Discovery from Data Mining (II)
Data source
Genomic / protein sequence
Microarray data
Protein interaction
Complicated data
Large-scale, high-dimension
Noisy (false positives and false negatives)
Discovery from Data Mining (III)
Pattern/knowledge discovery from data
many biological data are generated by
biological processes which are not well
understood
interpretation of such data requires discovery of
convoluted relationships hidden in the data
which segment of a DNA sequence represents a
gene, a regulatory region
which genes are possibly responsible for a particular
disease
Modeling, Prediction
and Design (I)
Modeling
and prediction of biological
objects/processes
Sequence comparison
Secondary structure prediction
Gene finding
Regulatory sequence
identification
Modeling, Prediction
and Design (II)
Prediction of outcomes of biological processes
computing will become an integral part of modern biology through an
iterative process of
model
formulation
computational
prediction
experimental
validation
From prediction to engineering design
Drug design
Protein structure prediction to protein engineering
Design genetically modified species
Scope of Bioinformatics
data management; data mining; modeling; prediction; theory formulation
bioinformatics
genes, proteins, protein complexes, pathways, cells, organisms, ecosystem
an indispensable part of biological science
engineering
aspect
scientific
aspect
computer science, biology, statistics
mathematics, physics, chemistry, engineering,…
Bioinformatics Foundations
Technology
Biology/medicine
Computer
Science
Statistics
From
interdisciplinary field to a
distinct discipline
Course Coverage
A general introduction to the field of bioinformatics
problems definitions: from biological problem to computable problem
key computational techniques
A way of thinking: tackling “biological problem”
computationally
how to look at a biological problem from a computational point of view
how to formulate a computational problem to address a biological issue
how to collect statistics from biological data
how to build a computational model
how to design algorithms for the model
how to test and evaluate a computational algorithm
how to access confidence of a prediction result
Dong’s top 10 list for
computational methods in BI
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Dynamic programming
Neural network
Hidden Markov Model
Hypothesis test
Bayesian statistics
Clustering
Information theory
Support Vector Machine
Maximum likelihood
Sampling search (Gibbs, Monte Carlo, etc)
Research Areas
1.
“Solved” problems
2.
“Developed” areas with remaining
challenges hard to solve
3.
Developing areas
4.
Emergent areas
5.
Future directions
5
3
2
4
1
“Solved” Problems
DNA sequence base calling and assembly
Pairwise sequence comparison
Protein secondary structure prediction
Disordered region in proteins
Transmembrane segment prediction
Subcellular localization
Signal peptide prediction
Protein geometry
Homology modeling
Physical/genetic mapping informatics
“Developed” areas with
remaining challenges
Gene finding
Phylogenetic tree construction and evolution
Protein docking
Drug design
Protein design
Linkage analysis and quantitative traits (QTL)
Microarray data collection
Gene expression clustering
Developing Areas
Multiple sequence comparison and remote homolog
search
Repetitive sequence analysis
Protein structure comparison
Protein tertiary structure prediction
RNA secondary structure prediction
Regulatory sequence analysis
Computational proteomics
Protein interaction networks
Gene ontology and function prediction
Computational neural science and applications in various
species and systems (e.g., cancer)
Emergent Areas
Pathway (regulatory network) prediction
ChIP-chip analysis
Tiling array analysis
Haplotype/SNP analysis
Computational comparative genomics
Text (literature) mining
Small RNA and anti-sense regulation
Alternative splicing prediction
Computational metabolomics
Possible future directions
Genome semantics
Membrane protein structure prediction
RNA tertiary structure prediction
Post-translational modification
Dynamics of regulatory networks
Virtual cell/organism modeling
Phenotype-genotype relationship
… (nobody knows)
Where the science is
going? (1)
Bioinformatics has been a “technology” to biological
research: Interpretation of data generated by bench
biologists
We start to see a trend that computational predictions
can guide experimental design
With more high-throughput technologies become
available, discovery-driven science will play increasingly
more important roles in biology research
With computational techniques continue to mature for
biological applications, we will see more and more
computational applications with powerful prediction
capabilities
Where the science is
going? (2)
Like physics, where general rules and laws are
taught at the start, biology will surely be presented
to future generations of students as a set of basic
systems ....... duplicated and adapted to a very
wide range of cellular and organismic functions,
following basic evolutionary principles constrained
by Earth’s geological history.
--Temple Smith, Current Topics in Computational Molecular Biology
Major research centers (1)
National Center for Biotechnology Information
(NCBI) of NIH (http://www.ncbi.nlm.nih.gov/)
the home of many important databases including GenBank
the home of many important bioinformatics tools including
BLAST
Major research centers (2)
European Molecular Biology Laboratory (EMBL)
(http://www.embl-heidelberg.de/)
has some of the most powerful research groups in
bioinformatics
Has numerous tools and databases
Major research centers (3)
Sanger Institute (http://www.sanger.ac.uk/)
The Institute for Gonomic Research (TIGR,
http://www.tigr.org/)
Swiss-Prot (http://www.tigr.org/)
Major Universities in US
University of California at Santa Cruz
University of California at San Diego
Washington University
University of Southern California
Stanford University
Columbia University
Boston University
Harvard University
MIT
Virginia Tech
Major journals
Bioinformatics
Nucleic Acids Research
Genome Research
Journal of Computational Biology
Journal of Bioinformatics and Computational Biology
In silico Biology
Briefings in bioinformatics
Applied Bioinformatics
IEEE/ACM Transactions on Computational Biology and
Bioinformatics
Proteins: structure, function and bioinformatics
Journal of Computer Science and Technology
Genomics, Proteomics and Bioinformatics
…
Major conferences
Intelligent Systems for Molecular Biology (ISMB)
Annual Conference on Computational Biology (RECOMB)
IEEE/Computational Systems Bioinformatics Conference
(CSB)
Pacific Symposium on Biocomputing (PSB)
European Conference on Computational Biology (ECCB)
IEEE Conference on Biotechnology and Bioinformatics
(BIBE)
International Workshop on Genome Informatics (GIW)
Asia-Pacific Bioinformatics Conference (APBC)
…
Academicians
Michael
Phil
Waterman
Green
Gene
Myers
Barry
Honig
No
Nobel Price Winner yet…
Discussions
Scope of the new biology (large-scale)
Technology (tool development) vs. science
(biological application)
Knowledge vs. prediction
Experimental vs. computational/theoretical
First principle vs. empirical / statistical
Automated vs. curated
One machine can do the work of fifty ordinary
men. No machine can do the work of one
extraordinary man.
Choosing Bioinformatics
as Career - 1
Field
outlook
Must
be a believer of bioinformatics (for
its value to science)
Must
have a strong motivation and
willing to walk extra miles (learn more
disciplines)
Technologist
vs. technician
Choosing Bioinformatics
as Career - 2
Molecular & cellular and evolutionary biology
understanding the science
Computational, mathematical, and statistical
sciences
mastering the techniques
High-throughput measurement technologies
Knowing what biological data are obtainable