17 - Digital Biology Laboratory

Download Report

Transcript 17 - Digital Biology Laboratory

CS 7010: Computational
Methods in Bioinformatics
(course review)
Dong Xu
Computer Science Department
271C Life Sciences Center
1201 East Rollins Road
University of Missouri-Columbia
Columbia, MO 65211-2060
E-mail: [email protected]
573-882-7064 (O)
http://digbio.missouri.edu
Technical Definitions
NIH (http://www.bisti.nih.gov/)
Bioinformatics: “research, development, or
application of computational tools and
approaches for expanding the use of biological,
medical, behavioral or health data, including
those to acquire, represent, describe, store,
analyze, or visualize such data”.
Computational Biology: “the development and
application of data-analytical and theoretical
methods, mathematical modeling and
computational simulation techniques to the
study of biological, behavioral, and social
systems”.
Course Topics





Data interpretation in analytical technologies
Data management and computational
infrastructure
Discovery from data mining
Modeling, prediction and design
Theoretical in silico biology
Cover classical/mainstream bioinformatics
problems from computer science prospective
Discovery from Data Mining (I)
Discovery from Data Mining (II)

Data source
 Genomic / protein sequence
 Microarray data
 Protein interaction

Complicated data
 Large-scale, high-dimension
 Noisy (false positives and false negatives)
Discovery from Data Mining (III)
Pattern/knowledge discovery from data
many biological data are generated by
biological processes which are not well
understood
interpretation of such data requires discovery of
convoluted relationships hidden in the data
which segment of a DNA sequence represents a
gene, a regulatory region
 which genes are possibly responsible for a particular
disease

Modeling, Prediction
and Design (I)
 Modeling
and prediction of biological
objects/processes
Sequence comparison
Secondary structure prediction
Gene finding
Regulatory sequence
identification
Modeling, Prediction
and Design (II)

Prediction of outcomes of biological processes
 computing will become an integral part of modern biology through an
iterative process of
model
formulation
computational
prediction

experimental
validation
From prediction to engineering design
 Drug design
 Protein structure prediction to protein engineering
 Design genetically modified species
Scope of Bioinformatics
data management; data mining; modeling; prediction; theory formulation
bioinformatics
genes, proteins, protein complexes, pathways, cells, organisms, ecosystem
an indispensable part of biological science
engineering
aspect
scientific
aspect
computer science, biology, statistics
mathematics, physics, chemistry, engineering,…
Bioinformatics Foundations
 Technology
 Biology/medicine
 Computer
Science
 Statistics
 From
interdisciplinary field to a
distinct discipline
Course Coverage

A general introduction to the field of bioinformatics
 problems definitions: from biological problem to computable problem
 key computational techniques

A way of thinking: tackling “biological problem”
computationally







how to look at a biological problem from a computational point of view
how to formulate a computational problem to address a biological issue
how to collect statistics from biological data
how to build a computational model
how to design algorithms for the model
how to test and evaluate a computational algorithm
how to access confidence of a prediction result
Dong’s top 10 list for
computational methods in BI
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Dynamic programming
Neural network
Hidden Markov Model
Hypothesis test
Bayesian statistics
Clustering
Information theory
Support Vector Machine
Maximum likelihood
Sampling search (Gibbs, Monte Carlo, etc)
Research Areas
1.
“Solved” problems
2.
“Developed” areas with remaining
challenges hard to solve
3.
Developing areas
4.
Emergent areas
5.
Future directions
5
3
2
4
1
“Solved” Problems










DNA sequence base calling and assembly
Pairwise sequence comparison
Protein secondary structure prediction
Disordered region in proteins
Transmembrane segment prediction
Subcellular localization
Signal peptide prediction
Protein geometry
Homology modeling
Physical/genetic mapping informatics
“Developed” areas with
remaining challenges

Gene finding

Phylogenetic tree construction and evolution

Protein docking

Drug design

Protein design

Linkage analysis and quantitative traits (QTL)

Microarray data collection

Gene expression clustering
Developing Areas










Multiple sequence comparison and remote homolog
search
Repetitive sequence analysis
Protein structure comparison
Protein tertiary structure prediction
RNA secondary structure prediction
Regulatory sequence analysis
Computational proteomics
Protein interaction networks
Gene ontology and function prediction
Computational neural science and applications in various
species and systems (e.g., cancer)
Emergent Areas









Pathway (regulatory network) prediction
ChIP-chip analysis
Tiling array analysis
Haplotype/SNP analysis
Computational comparative genomics
Text (literature) mining
Small RNA and anti-sense regulation
Alternative splicing prediction
Computational metabolomics
Possible future directions

Genome semantics

Membrane protein structure prediction

RNA tertiary structure prediction

Post-translational modification

Dynamics of regulatory networks

Virtual cell/organism modeling

Phenotype-genotype relationship

… (nobody knows)
Where the science is
going? (1)




Bioinformatics has been a “technology” to biological
research: Interpretation of data generated by bench
biologists
We start to see a trend that computational predictions
can guide experimental design
With more high-throughput technologies become
available, discovery-driven science will play increasingly
more important roles in biology research
With computational techniques continue to mature for
biological applications, we will see more and more
computational applications with powerful prediction
capabilities
Where the science is
going? (2)

Like physics, where general rules and laws are
taught at the start, biology will surely be presented
to future generations of students as a set of basic
systems ....... duplicated and adapted to a very
wide range of cellular and organismic functions,
following basic evolutionary principles constrained
by Earth’s geological history.
--Temple Smith, Current Topics in Computational Molecular Biology
Major research centers (1)

National Center for Biotechnology Information
(NCBI) of NIH (http://www.ncbi.nlm.nih.gov/)
 the home of many important databases including GenBank
 the home of many important bioinformatics tools including
BLAST
Major research centers (2)

European Molecular Biology Laboratory (EMBL)
(http://www.embl-heidelberg.de/)
 has some of the most powerful research groups in
bioinformatics
 Has numerous tools and databases
Major research centers (3)

Sanger Institute (http://www.sanger.ac.uk/)

The Institute for Gonomic Research (TIGR,
http://www.tigr.org/)

Swiss-Prot (http://www.tigr.org/)
Major Universities in US










University of California at Santa Cruz
University of California at San Diego
Washington University
University of Southern California
Stanford University
Columbia University
Boston University
Harvard University
MIT
Virginia Tech
Major journals
 Bioinformatics
 Nucleic Acids Research
 Genome Research
 Journal of Computational Biology
 Journal of Bioinformatics and Computational Biology
 In silico Biology
 Briefings in bioinformatics
 Applied Bioinformatics
 IEEE/ACM Transactions on Computational Biology and
Bioinformatics
 Proteins: structure, function and bioinformatics
 Journal of Computer Science and Technology
 Genomics, Proteomics and Bioinformatics
…
Major conferences
 Intelligent Systems for Molecular Biology (ISMB)
 Annual Conference on Computational Biology (RECOMB)
 IEEE/Computational Systems Bioinformatics Conference
(CSB)
 Pacific Symposium on Biocomputing (PSB)
 European Conference on Computational Biology (ECCB)
 IEEE Conference on Biotechnology and Bioinformatics
(BIBE)
 International Workshop on Genome Informatics (GIW)
 Asia-Pacific Bioinformatics Conference (APBC)
…
Academicians
 Michael
 Phil
Waterman
Green
 Gene
Myers
 Barry
Honig
 No
Nobel Price Winner yet…
Discussions






Scope of the new biology (large-scale)
Technology (tool development) vs. science
(biological application)
Knowledge vs. prediction
Experimental vs. computational/theoretical
First principle vs. empirical / statistical
Automated vs. curated
One machine can do the work of fifty ordinary
men. No machine can do the work of one
extraordinary man.
Choosing Bioinformatics
as Career - 1
 Field
outlook
 Must
be a believer of bioinformatics (for
its value to science)
 Must
have a strong motivation and
willing to walk extra miles (learn more
disciplines)
 Technologist
vs. technician
Choosing Bioinformatics
as Career - 2

Molecular & cellular and evolutionary biology
 understanding the science

Computational, mathematical, and statistical
sciences
 mastering the techniques

High-throughput measurement technologies
 Knowing what biological data are obtainable