Transcript BioHaystack
IBM Watson Research
BioHaystack: Gateway to the
Biological Semantic Web
Dennis Quan
[email protected]
© 2004 IBM Corporation
IBM Watson Research
Problems in bioinformatics
Myriad of public databases have specific facets of
information about biological objects of interest (e.g.,
proteins, genes, etc.)
Databases have their own access protocols, data formats,
naming conventions, and means of describing relationships
between objects in different databases
Different software required to view information from different
databases
– User must be keenly aware of which tool or site to use
– Relevant information comes in fragments
– Exploration process is discontinuous
© 2004 IBM Corporation
IBM Watson Research
A common naming convention: LSID URNs
Life Sciences Identifiers (LSIDs) are URNs for biological
objects that are backed by RDF metadata:
– E.g., urn:lsid:ncbi.nlm.nih.gov.lsid.i3c.org:genbank:nm_001240
LSID and LSID protocol (SOAP-based) specification
sponsored by I3C and undergoing standardization by OMG
Most of the publicly available bioinformatics databases
available via LSID today
– PDB LSID authority online; “proxy” LSID authorities for
databases such as NIH databases, SwissProt hosted by I3C
Really easy to set up LSID clients and servers
– IBM Internet Technology group provides Open Source LSID
client and server software for a variety of languages and
platforms
© 2004 IBM Corporation
IBM Watson Research
RDF/XML: on demand data integration
human
hemoglobin
LSID
atagccgta
cctgcgagt
ctagaagct
derives from
GenBank
derives from
atagccgta
cctgcgagt
ctagaagct
+
human
hemoglobin
LSID
is a
oxygen
transport
protein
human
hemoglobin
LSID
is a
oxygen
transport
protein
Gene Ontology
+
has 3D structure
human
hemoglobin
LSID
has 3D structure
Unified view
PDB
© 2004 IBM Corporation
IBM Watson Research
Haystack: letting users interact with their data
Haystack is a tool for creating, exploring, and organizing
information:
– Personal information: e-mails, contacts, documents, etc.
– Bioinformatics: proteins, publications, genes, etc.
Research project originating from MIT CSAIL
Uses RDF as an underlying data model
Built on Java and Eclipse, IBM’s Open Source rich client
platform
http://haystack.lcs.mit.edu/
© 2004 IBM Corporation
IBM Watson Research
Browsing highly interconnected information
Single screen
presents multiple
facets of a single
object originating
from separate
databases
Users navigate
space like a Web
browser:
hyperlinking, drag
and drop, etc.
© 2004 IBM Corporation
IBM Watson Research
Personalization
People keep track of their information by
personalizing their workspaces:
– Grouping paperwork into folders
– Highlighting important text in documents
– Attaching sticky notes as reminders
– Jotting down lists of related items
Haystack has pervasive support for
annotation and allows users to group
related objects together arbitrarily for their
own purposes
© 2004 IBM Corporation
IBM Watson Research
BioHaystack
BioHaystack: application of Haystack technologies to
bioinformatics problem
– Integrated environment for working with biological data
– Intended for end users, i.e., non-programmers
– Builds on LSID, RDF, and Haystack
Integration offers the promise of lowering barriers to access
to different backend systems (e.g., LSID servers, Grids, Web
Services, relational databases, annotation servers)
Just as the Web browser acts as a client for Web content,
BioHaystack can act as a client for biological Semantic
content and services
© 2004 IBM Corporation
IBM Watson Research
Real world collaboration: myGrid
UK-funded joint project with the University of
Manchester and other UK research institutions
RDF-based platform for supporting e-Science
experiments
Real use cases; developed in collaboration with
bioinformaticians
myGrid creates LSIDs and RDF metadata in the
process of enacting experiments for scientists
Using BioHaystack as a browser for metadata
© 2004 IBM Corporation
IBM Watson Research
myGrid Architecture
Bioinformaticians
Registry
Query & register
Annotation/description Discovery View
Query &
Retrieve
invoking
Annotation
providers
Pedro
Annotation tool
Interface
Description
Others
Vocabulary
Ontology Store
Service
WSDL SoapProviders
lab
Taverna
WF Builder
Workflow
Execution
FreeFluo
Enactor
Store data/
knowledge
mIR
Haystack
Provenance
Browser
Data descriptions
Courtesy of Professor Carole Goble, University of Manchester
Scientists
© 2004 IBM Corporation
IBM Watson Research
BioHaystack + myGrid
Courtesy of Professor Carole Goble, University of Manchester
© 2004 IBM Corporation
IBM Watson Research
Thank you for your attention
Dennis Quan, [email protected] (IBM Watson Research)
Haystack project home page (download available May 24)
– http://haystack.lcs.mit.edu/
IBM LSID home page
– http://www.ibm.com/developerworks/oss/lsid/
myGrid home page
– http://www.mygrid.org.uk/
See also our session on constructing Haystack applications:
– Developer’s Day, Saturday, 4:30pm
© 2004 IBM Corporation