Transcript Document

PSB 2006
Maui Jan 2-7
• PSB 2006 post-meeting website:
http://psb.stanford.edu/psb06/
• PSB Proceedings online:
http://helix-web.stanford.edu/psb06/
• Current & future focus:
Applications of computational tools to
clinically important problems - also,
integration of computation & experiment
Keynote: Michael Ashburner (Cambridge)
Famous Drosophila geneticist - founder of GO
Ontologies for Biologists: A community model for
annotation of genomic data
http://www.geneontology.org/
• GO began at ISMB 1998: consortium of 3 founding databases
FlyBase
Saccharomyces
Mouse
• Co-founders of GO & OBO: S Lewis, J Blake, M Cherry
GO today? Includes all of major model organism
databases & significant multi-organism databases, e.g.
GeneDB, UniProt, TIGR
STILL: lacking human analog of model organism
databases...
The most serious annotation of human genes is done by
UniProt at EBI
Goal: to provide structure controlled vocabularies for
representation of biological knowledge in biological
databases
Content of GO today? 19,461 terms total
In future:
Plan to add specified relationships between concepts in different
ontologies (when?)
Will be fairly substantial change in architecture of GO
Current Architecture?
Tree vs directed acyclic graph – tree not rich enough for GO
Currently, DAG (any concept can have more than one parent)
Parent-child relationships can be only:
ISA (hypernomy/hyponomy)
PARTOF (meronomy/holonomy) – not the same
across the hierarchy
Sanity checks? Use annotations of orthologous genes to verify GO
annotations
Updates: Monthly, or so – a table of triples (actually ~ 11 attributes)
GO Gene Association Tables – now available for many organisms
(but not E. coli, hope to change this soon)
GO database now at Stanford with Mike Cherry
Curated GO Annotations: (from core of about 25 organisms)
2006? about 0.5 million human expert curated gene products
plus 1.5 million products automatically annotated by UniProt
GO provides database for other browsers – GO browser AmiGO
Entire GO can be downloaded from FTP site
GO as community project: Anyone can suggest changes to GO - & GO
content Geneontology.sourceforge.net
OBO (formerly EGO=extended GO)
Open Biology Ontologies
http://obo.sourceforge.net/
CBS site at sourceforge – setup as umbrella for collecting datat
OBO will change radically in next few months & take over OBO
(National Center for Biomedical Ontology)
due to cBio funded by NIH as National Center (Berkeley & Stanford)
http://bioontology.org/
cBio will have:
OBO
OBD (Open Biomedical Data)
BioPortal
Aims of Sequence Ontology (SO)
Develop shared set of terms & concepts to annotate biological
sequence
Apply this to separate projects to provide consistent query
capabilities between them
Provide software environment resource to assist in application &
distribution of SO
Wanted to enrich GenBank - something that would allow computation:
•
•
What fraction of genes in Drosophila have alternatively spliced
products?
What fraction of genes in worm are?
This can't be answered from GenBank feature table
SO has two parts:
1- Features that can be located on a sequence with coordinates
2- Properties of these features
sequence attributes
consequences of mutations
chromosome variation
Summary – Recent developments:
Make maintenance of GO in future more manageable
(& scalable)
Make GO more computable
Integrates ontologies
Explores new paradigms:
OBO-edit allows one to edit & instantiate crossproducts by hand or computationally
Includes visualization of hierarchical relationships
Important for future: SO & Phenotype annotation
Very hard, classifically done in free text
Subproject in cBio = attempt to annotate rich set of
genes in Fly, Worm, Human, using attribute valued triplet
(by M Westerfield & Ashburner)
Entity, attribute, value (showed example of comparing human vs zebrafish) - has great potential for cross-species learning
Ashburner: "must view much of current 'comparative
genomics' with suspicion until SO and phenotype
annotation improved - must be done before truly
meaningful comparative genomics can be done!"
Linking Biomedical Information Through Text Mining
K. Bretonnel Cohen, Olivier Bodenreider, and Lynette Hirschman; Pacific Symposium on Biocomputing 11:1-3(2006
Extraction of Gene-Disease Relations from Medline Using Domain Dictionaries and Machine Learning
Hong-Woo Chun, Yoshimasa Tsuruoka, Jin-Dong Kim, Rie Shiba, Naoki Nagata, Teruyoshi Hishiki, and Jun'ichi
Pacific Symposium on Biocomputing 11:4-15(2006)
Significantly Improved Prediction of Subcellular Localization by Integrating Text and Protein Sequence D
Annette Hoglund, Torsten Blum, Scott Brady, Pierre Donnes, John San Miguel, Matthew Rocheford, Oliv
Kohlbacher, and Hagit Shatkay; Pacific Symposium on Biocomputing 11:16-27(2006)
Evaluation of Lexical Methods for Detecting Relationships Between Concepts from Multiple Ontologies
Helen L. Johnson, K. Bretonnel Cohen, William A. Baumgartner Jr., Zhiyong Lu, Michael Bada, Todd Kester, Hy
Kim, and Lawrence Hunter; Pacific Symposium on Biocomputing 11:28-39(2006)
Automatically Generating Gene Summaries from Biomedical Literature
Xu Ling, Jing Jiang, Xin He, Qiaozhu Mei, Chengxiang Zhai, and Bruce Schatz; Pacific Symposium on Biocomp
11:40-51(2006)
Finding GeneRIFs via Gene Ontology Annotations
Zhiyong Lu, K. Bretonnel Cohen, and Lawrence Hunter; Pacific Symposium on Biocomputing 11:52-63(2006)
PhenoGO: Assigning Phenotypic Context to Gene Ontology Annotations with Natural Language Processing
Yves Lussier, Tara Borlawsky, Daniel Rappaport, Yang Liu, and Carol Friedman; Pacific Symposium on Biocom
11:64-75(2006)
Large-Scale Testing of Bibliome Informatics Using Pfam Protein Families
Ana G. Maguitman, Andreas Rechtsteiner, Karin Verspoor, Charlie E. Strauss, and Luis M. Rocha; Pacific Symposiu
Biocomputing 11:76-87(2006)
Predicting Gene Functions from Text Using a Cross-Species Approach Emilia Stoica and Marti Hearst; Pac
Symposium on Biocomputing 11:88-99(2006)
Bootstrapping the Recognition and Anaphoric Linking of Named Entities in Drosophila Articles
Andreas Vlachos, Caroline Gasperin, Ian Lewin, Ted Briscoe; Pacific Symposium on Biocomputing 11:100-111(2
Significantly Improved Prediction of Subcellular Localization by
Integrating Text and Protein Sequence Data
Annette Hoglund, Torsten Blum, Scott Brady, Pierre Donnes, John San
Miguel, Matthew Rocheford, Oliver Kohlbacher, & Hagit Shatkay PSB
11:16-27(2006)
• Nice talk
• Combine 5 separate classifiers: 4 sequence-based (3SVMs & 1
motif-search) & one text-based (rep protein as vector of
weighted text features)
– Text-based - assign set of PubMed abstracts, based on
Swiss-Prot (so this requires prev. annotation in Swiss-Prot)
• Table should be useful for comparison with Carson's results
– Report Acc, Sens, Spec & MCC for animal & plant datasets
• Cf Acc? Target P 85%, MultiLoc 75%, PLOC 78%
• Web Server: MultiLoc/TargetLoc
• http://www-bs.informatik.uni-tuebingen.de/Services/MultiLoc
Predicting Gene Functions from Text Using a Cross-Species Approach
Emilia Stoica and Marti Hearst - UC Berkeley
PSB 11:88-99(2006)
Use orthologous gene information in 2 ways:
• CSM Cross species match algorithm - using GO codes of
orthologous genes to generate a functional annotation
• CSC Cross species correlation algorithm - uses all GO codes &
then eliminates "illogical" ones
Final Annotation is computed as union (?) of two sets CSM & CSC
Test algorithm on dataset of Task 2.2 of BiocreAtive competition, on
EBI human & MGI - claim better than other solutions
Report F measure - harmonic mean of precision & recall
http://biotext.berkeley.edu/
Semantic Webs for Life Sciences
Roberts Stevens, Olivier Bodenreider, and Yves A. Lussier; Pacific Symposium on Biocomputing 11:112115(2006)
Selecting Biological Data Sources and Tools with XPR, a Path Language for RDF
Sarah Cohen-Boulakia, Christine Froidevaux, and Emmanuel Pietriga; Pacific Symposium on
Biocomputing 11:116-127(2006)
Fast, Cheap and Out of Control: A Zero Curation Model for Ontology Development
Benjamin M. Good, Erin M. Tranfield, Poh C. Tan, Marlene Shehata, Gurpreet K. Singhera, John
Gosselink, Elena B. Okon, and Mark D. Wilkinson; Pacific Symposium on Biocomputing 11:128139(2006)
Putting Semantics into the Semantic Web: How Well Can It Capture Biology?
Toni Kazic; Pacific Symposium on Biocomputing 11:140-151(2006)
Event Ontology: A Pathway-Centric Ontology for Biological Processes
Tatsuya Kushida, Toshihisa Takagi, and Ken Ichiro Fukuda; Pacific Symposium on Biocomputing
11:152-163(2006)
Discovering Biomedical Relations Utilizing the World-Wide Web
Sougata Mukherjea and Saurav Sahay; Pacific Symposium on Biocomputing 11:164-175(2006)
Biodash: A Semantic Web Dashboard for Drug Development
Eric K. Neumann and Dennis Quan; Pacific Symposium on Biocomputing 11:176-187(2006)
SemBiosphere: A Semantic Web Approach to Recommending Microarray Clustering Services
Kevin Y. Yip, Peishen Qi, Martin Schultz, David W. Cheung, and Kei-Hoi Cheung; Pacific Symposium
on Biocomputing 11:188-199(2006)
Experience in Reasoning with the Foundational Model of Anatomy in OWL DL
Songmao Zhang, Olivier Bodenreider, and Christine Golbreich; Pacific Symposium on Biocomputing
11:200-211(2006)
Computational Proteomics
Session Introduction
Bobbie-Jo Webb-Robertson, William Cannon, Joshua Adkins, and Deborah Gracio; Pacific
Symposium on Biocomputing 11:212-218(2006)
A Machine Learning Approach to Predicting Peptide Fragmentation Spectra
Randy J. Arnold, Narmada Jayasankar, Divya Aggarwal, Haixu Tang, and Predrag
Radivojac; Pacific Symposium on Biocomputing 11:219-230(2006)
Identifying Protein Complexes in High-Throughput Protein Interaction Screens Using an
Infinite Latent Feature Model
Wei Chu, Zoubin Ghahramani, Roland Krause, and David L. Wild; Pacific Symposium on
Biocomputing 11:231-242(2006)
High-Accuracy Peak Picking of Proteomics Data Using Wavelet Techniques
Eva Lange, Clemens Gršpl, Knut Reinert, Oliver Kohlbacher, and Andreas Hildebrandt;
Pacific Symposium on Biocomputing 11:243-254(2006)
Fast De novo Peptide Sequencing and Spectral Alignment via Tree Decomposition
Chunmei Liu, Yinglei Song, Bo Yan, Ying Xu, and Liming Cai; Pacific Symposium on
Biocomputing 11:255-266(2006)
Experimental Design of Time Series Data for Learning from Dynamic Bayesian Networks
David Page and Irene M. Ong; Pacific Symposium on Biocomputing 11:267-278(2006)
Finding Diagnostic Biomarkers in Proteomic Spectra
Pallavi N. Pratapa, Edward F. Patz, Jr., Alexander J. Hartemink; Pacific Symposium
on Biocomputing 11:279-290(2006)
Gaussian Mixture Modeling of Helix Subclasses: Structure and Sequence Variations
Ashish V. Tendulkar, Babatunde Ogunnaike, and Pramod P. Wangikar; Pacific
Symposium on Biocomputing 11:291-302(2006)
An SVM Scorer for More Sensitive and Reliable Peptide Identification via Tandem
Mass Spectrometry
Haipeng Wang, Yan Fu, Ruixiang Sun, Simin He, Rong Zeng, and Wen Gao; Pacific
Symposium on Biocomputing 11:303-314(2006)
Normalization Regarding Non-Random Missing Values in High-Throughput Mass
Spectrometry Data
Pei Wang, Hua Tang, Heidi Zhang, Jeffrey Whiteaker, Amanda G. Paulovich, and
Martin Mcintosh ; Pacific Symposium on Biocomputing 11:315-326(2006)
A Point-Process Model for Rapid Identification of Post-Translational
Modification
Bo Yan, Tong Zhou, Peng Wang, Zhijie Liu, Vincent A. Emanuele II, Victor Olman,
and Ying Xu; Pacific Symposium on Biocomputing 11:327-338(2006)
A New Approach for Alignment of Multiple Proteins
Xu Zhang and Tamer Kahveci; Pacific Symposium on Biocomputing 11:339350(2006)
A Point-Process Model for Rapid Identification of PostTranslational Modifications
Bo Yan, Tong Zhou, Peng Wang, Zhijie Liu, Vincent A.
Emanuele II, Victor Olman, and Ying Xu - UGa & Ga Tech
PSB 11:327-338(2006)
• Seems to be a good approach for estimating & identifying PSMs
• Usual approach is exhaustive search
• Point-process model that finds optimal mass shifts to maximize
alignment between experimental MS/MS spectrum & candidate
theoretical spectrum, through cross-correlation calculation
• Gives rapid seasrch in "blind" mode - without giving types of PTMs in
advance
• Comparable to other blind approaches but more efficient & simpler
• http://csbl.bmb.uga.edu
• http://csbl.bmb.uga.edu/resources.html
Special tutorial:
Phil Bourne
New PDB