Slide - Gerstein Lab

Download Report

Transcript Slide - Gerstein Lab

Bioinformatics 2.0/3.0
Kei Cheung
Yale Center for Medical Informatics
Outline
• Introduction
• Web 2.0
• Web 3.0
– Semantic Web
– Topic Map
• Merging Web 2.0 and Web 3.0
Introduction
• The Human Genome Project (HGP) has transformed
genome sciences from being experimental to being
increasingly computational
• HGP has intensified the growth of bioinformatics
• The Web has become a popular medium for accessing
information over the Internet
• Numerous bioinformatics databases and tools are Web
accessible
• These databases and tools as well as the Web have
become indispensable for modern-day genomic research
• Web 1.0 -> Web 2.0 -> Web 3.0
Web 1.0
•
•
•
•
•
It is read-only
It is about a single person, organization, …
It is document centric
It is based on HTML
It is for human to read
Web 2.0
Web 2.0
• Social networking (wiki, blog, tagging,
bookmarking, rating, etc)
• Multimedia content (photo, audio, video,
etc)
• Interactive, responsive, and dynamic web
interface (Facebook, Flickr, YouTube, etc)
• Mashup (assembly tools and visualization
tools)
Folksonomy (Social Tagging)
• Folksonomy is the practice and method
of collaboratively creating and managing
tags to annotate and categorize content
• In contrast to traditional subject indexing,
metadata is not only generated by experts
but also by creators and consumers of the
content
• Freely chosen keywords are used instead
of a controlled vocabulary
Tag Cloud
• A tag cloud (or weighted list in visual
design) is a visual depiction of usergenerated tags used typically to describe
the content of web sites.
Web 2.0 (cont’d)
• It is decentralized
• It is a community/collaborator model
instead of authority/consumer model
• It is fun
• It can be seriously used to share and
integrate scientific datasets and algorithms
Bioinformatics Applications of Web 2.0
Wiki Proteins
Nature Precedings (pre-publication
research and preliminary findings)
Scientific Podcasts
Multimedia (cont’d)
Journal of Visualized Experiments
myExperiment
Mashup (1): Assembly Tools
• Dapper (scrape web content and convert it
into machine readable format)
• Yahoo! Pipes (fetch, filter, and integrate
data)
Yahoo! Pipes Demo
Yahoo! Pipes Use Case
GeoCommons: Mashup of Maps
Mashup (2): Visualization Tools
• E.g., Google Earth
Geo-Mashup: Google Earth
(tracking H5N1 virus over time)
Bioinformatics Mashup’s
• Mashup of biological entities of the same
type
– Protein network mashup
– Sequence annotation mashup
• Mashup of biological entities of different
types
Mashup of pathway data and gene
expression data
Calvin cycle pathway associated with gene expressions
Challenges to Data Mashup
•
•
•
•
•
Lack of annotation
Lack of links
Lack of link semantics
Lack of data semantics
Lack of standards or use of standards
Lack of Semantic Annotation
Kei Tsi Daniel Cheng
(this is not me!!)
Kei Cheung
(16 years ago)
Kei Cheung
(6 months ago)
Lack of Links
colllaborators
Lack of Link Semantics
(?)
prototyped
Lack of Data Semantics
<html”
<body>
…
<table>
<tr>
<td>Alcohol Dehydrogenase 1B (class I), beta polypeptide</td><td>ADH1B</td>
</tr>
…
</table>
…
</body>
</html>
Lack of Standards (Use of
Standards)
• Different naming rules (based on phenotype, sequence,
function, organisms, etc)
– Armadillo (fruitflies) vs. i-catenin (mice)
– PSM1 (human) = PSM2 (yeast); PSM1 (yeast) = PSM2 (human)
– Sonic Hedgehog
• ID proliferation
– Different ID schemes: 1OF1 (PDB ID) and P06478 (SwissProt
ID) correspond to Herpes Thymidine Kinase
– Lexcial variation: GO1234, GO:1234, GO-1234
• Synonyms vs. homonyms
– Dopamine receptor D2: DRD2, DRD-2, D2
– PSA: prostate specific antigen, puromycin-sensitive
aminopeptidase, psoriatric arthritis, pig serum albumin
Web 3.0
Web 3.0
• It refers to a third generation of Internetbased services that emphasize machinefacilitated understanding of information in
order to provide a more productive and
intuitive user experience.
– Semantic Web
– Topic Map
Semantic Web
• "The Semantic Web is an extension of the current web in
which information is given well-defined meaning, better
enabling computers and people to work in cooperation." -Tim Berners-Lee, James Hendler, Ora Lassila, The
Semantic Web, Scientific American, May 2001
• It provides a common framework that allows data to be
shared and reused across application, enterprise, and
community boundaries
• It is based on the Resource Description Framework (RDF)
– URI for naming/identify web objects
– Graph structure (directed acyclic graph or DAG) for connecting
web objects
Resource Description Framework
(RDF)
• It is a standard data model (directed acyclic graph)
for representing information (metadata) about
resources in the World Wide Web
• In general, it can be used to represent information
about “things” or “resources” that can be identified
(using URI’s) on the Web
• It is intended to provide a simple way to make
statements (descriptions) about Web resources
RDF Statement
A RDF statement consists of:
• Subject: resource identified by a URI
• Predicate: property (as defined in a name space identified by a
URI)
• Object: property value (literal) or a resource
A resource can be described by multiple statements.
Graphical & XML Representation
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&list_uids=125
http://en.wikipedia.org/wiki/Name
“Alcohol Dehydrogenase 1B (class I), beta polypeptide”
http://en.wikipedia.org/wiki/Snynonym
“ADH1B”
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf=“http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:en=“http://en.wikipedia.org/wiki/” >
<rdf:Description about=“http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&list_uids=125”>
<en:name>Alcohol Dehydrogenase 1B (class I), beta polypeptide”></en:name>
<en:synonym>ADH1B</en:synonym>
</rdf:Description>
</rdf:RDF>
RDF Schema (RDFS)
•
RDF Schema terms:
– Class
– Property
– type
– subClassOf
– range
– Domain
•
Example:
<DNASequence, type, Class>
<Promoter,subClassOf,DNASequence>
<Protein,type,Class>
<TranscriptionFactor,subClassOf,Protein>
<Bind,type,Property>
<Bind,domain, TranscriptionFactor>
<Bind,range, Promoter>
Ontologies
• In both computer science and information
science, an ontology is a representation of a
set of concepts within a domain and the
relationships between those concepts.
• It is a shared conceptualization of a domain
• Ontologies are commonly encoded using
ontology languages.
Web Ontology Language (OWL)
• Latest standard in ontology languages from the W3C
• Built on top of RDF
• OWL semantically extends RDF while it is
syntactically the same as RDF
• Three species of OWL
– OWL-Lite
– OWL-DL
– OWL-Full
OWL > RDF/RDFS
• Cardinality restrictions: (e.g., a gene may have more
than one transcription factor binding sites)
• Disjointedness of classes: (e.g., mRNA may be classified
either as introns or exons)
• Other OWL constructs
– uniqueness: (e.g.,a GO term can have only one GO identifier)
– unionOf: (e.g., gene may be the unionOf intron and exons
– sameAs: specifying synonymous relationship between classes
(e.g., “Cerebellar Purkinje Cell” sameAs “Purkinje Neuron”).
Topic Map
• A topic map (an ISO standard) is used represent
information using topics (concepts), associations, and
occurrences
• It is used to organize information in a way that can be
optimized for navigation.
association
occurrence
Neuroscience Topic Map
Topic Map Encoding/Querying
• XML Topic Map (XTM)
• Top Map Query Language (TMQL)
Visual Topic Maps
• A Visual Topic Map can be defined as a
topic map including visual topics. A visual
topic is defined by a topic name which
refers to a visual content.
NCBI Site Map
Mosaic of Chinese Characters in
Stories about the Meaning of Ideograms
Visualization of the del.icio.us Tags
in an Interactive Graph
Combining Semantic Web and
Topic Map
Visualization
Machine
reasoning
Topic Map
Semantic Web
Knowledge organization
& representation
(mapping between XTM and RDF/OWL)
Web 2.0 Meets Web 3.0
• Folksonomy meets ontology
– Tags can evolve into standard heavy-weight
ontologies, while light-weight ontologies can be
applied to tagging
• Human readability meets machine readability
– Visual network vs. semantic network
• Social network meets semantic network
– FOAF, semantic wiki
• Syntactic mashup meets semantic mashup
– Dapper and yahoo pipes may become ontologically
aware
Conclusions
• Web 2.0 and 3.0 provides a platform for
data/tool sharing and integration (mashup)
and scientific collaboration
• More use cases are needed
• Question?
– While Web 1.0 has played an important role in
organizing/disseminating information
produced by HGP, can Web 2.0/3.0 offer more
to present “big science” projects like
ENCODE?
The End