Ontologies - Bioinformatics

Download Report

Transcript Ontologies - Bioinformatics

e-science is…
17/07/2015
BioAID
1
Legos
“Science is built up of facts, as a house is built of
stones; but an accumulation of facts is no more a
science than a heap of stones is a house.”
– Henri Poincaré,
Science and Hypothesis, 1905
http://adaptivedisclosure.org
Who will annotate the
annotators themselves?
facilitating resource management
with (semantic) web services
M. Scott Marshall
Examples on web
• Example of less accessible: WSDL list for AIDA
services http://ws.adaptivedisclosure.org/ (these
services “annotate”)
• Human-readable service info:
http://xml.ddbj.nig.ac.jp/wsdl/index.jsp
• But not machine-readable..
Outline
• Vision – an e-science virtual laboratory
• Some definitions
• Some requirements
• Essential concepts of semantic web
• facets for interfaces
• Conclusions
The Vision:
Scientist as knowledge worker
• For Knowledge Workers:
– Knowledge is the data (i.e. rules, relations, properties,
hypotheses, etc.)
• For Today's Biologist:
– Numbers, sequences, organisms(!), and images are the data
• Manipulate knowledge instead of data
– Find support for relations between concepts instead of
discovering table and column names and numbers.
• In the virtual laboratory, everything is a resource that can
be described and manipulated with semantics
User
• ....?
– End users – scientists using our applications
– API users – programmers extending and using
our code
– System administrators – setting up services,
grids etc.
– Other classes...
• If you’re not sure which one someone
means please shout and ask them!
Slide courtesy of Tom Oinn, OMII-EBI Workshop
Service Oriented Architecture (SOA)
• A way of doing computing where services are
somehow combined to perform some overall
function
• Implies a communication framework between the
services
• Used because it’s easier to reconfigure the
arrangement of a set of services than to rewrite a
script
– Services as LEGO bricks 
Slide courtesy of Tom Oinn, OMII-EBI Workshop
Grid
• Not just Globus, or EGEE, or Naregi...
• No such thing as ‘the grid’
– Unlike ‘the internet’ which does exist!
• We mean :
– A computational facility, normally comprising multiple
computers, which provides some combination of
compute and data storage capacity and which can
abstract over its inner workings in some fashion
– Very loose definition!
• Can be part of a Service Oriented Architecture
Slide courtesy of Tom Oinn, OMII-EBI Workshop
Knowledge
“data”, “information”, “facts”,
“knowledge”
Knowledge is a statement that can be
tested for truth.
(by a machine)
RDF : a web format for knowledge
RDF is a W3C language to express
statements.
RDF Triple:
Subject Predicate Object
Graph of Knowledge:
Node
Edge
Node
OWL : The Web Ontology Language
A W3C standard for ontology
representation based on description
logic.
Resources are shared on the web
• Shared:
–
–
–
–
CPU time
network bandwidth
memory
storage space
• But also:
– Data
– Knowledge
– Services
Computational experiment:
what we want to do with the resources
Database
Database
Computational experiment
in workflow environment
...
Database
What are the tasks?
• Search – discovering resources that
match our needs
• Workflow composition
• Data integration
• Enactment/Deployment
• Access control
• Registry of a resource
Issues raised by computational
experimentation
• How will we find relevant data?
• How will we automatically integrate such data
into our experiment?
• How will we find apropriate services?
• How will we integrate our results as usable data
for a new (computational) experiment?
• -> annotation
Finding the stone…
Where is the piece
that
is red, has a triangular
top, and was previously
used to build a roof?
17/07/2015
BioAID
17
Computational Experiments
Anticipated needs of the data consumer
• Data integration - combining different types of data
– Data annotation: beyond formats
• Not only:
– Data types (integer, string, etc.)
• But also:
– Data semantics: What do the data represent?
» Determined by the experimental design
– Provenance: What has been done to the data?
» Description of the procedure(s) that produced/transformed the data
• Discover and enact appropriate (web) services with appropriate
data
• Reuse results from a computational experiment as data in another
computational experiment
– derived data is “tagged” and put into the repository
Anticipated needs of the data supplier
(and consumer)
• Data in:
– Simple submission/registration of data to e-science repository
• Semi-automatic annotation
• Data out:
– Easy search and retrieval of previous datasets (my personal
and my group’s data)
– Easy search and retrieval of relevant datasets from public
repository
• Combining data:
– Different types and different sources
• Example: Intersecting views of data
– data mapped to physical or semantic space (Examples follow..)
The Semantic Gap
User
Application
Middleware
Resources
The Model in the middle
My Model
User
Model
Application
Model
Middleware
Resources
Why semantic annotation?
We want annotation to be “machine-readable”:
• Free text – arbitrary text tags generated by users won’t always
match up
– Simplest problem: Finding a “named” object
• Hyponyms - Different names exist for the same object in different contexts
and roles.
• Synonyms - The same name is used for different objects.
• Which name should I use?
• Standardized vocabulary list
– can only find literal matches
• Example: Using data types to search for services will find too many!
• Semantic tags
– allow searching for similar items:
• “Find items like this one.”
– allow searching with a description:
• “Find items with these properties.”
– semantic description of service (SA-WSDL) as well as data (OWL)
What is an ontology?
Definitions:
– A collection of things that are defined in terms of their properties and relations to
other things.
– A specification of a conceptualization that is designed for reuse across multiple
applications and implementations (Gruber ’93, ‘95, Guarino’ ‘96, Guarino and
Giaretta ‘95)
General applications:
– Searching for objects that are resources, documents, concepts, experimental data,
or collections of these things.
– Knowledge capture
• Example: Biological model with hypothetical knowledge
Common applications in bioinformatics:
– Annotation of database entries (e.g. gene products)
– Categorization of clustered elements (e.g. genes)
Inheritance in ontologies
Animal
Bird
Robin
Heron
Mammal
Penguin
• Often represented as DAG’s (Directed Acyclic Graphs) or
hierarchies (trees)
• Power of inheritance
– Subsumption relations (ISA) apply transitivity to create
inheritance of class and properties downward along chains in
the hierarchy.
• Use an element as a metadata tag for semantic annotation
(ontotag)
– An ontotag serves as a pointer into a “semantic space”
Gene Ontology
Mouse p53:
{List of GO identifiers}
Process:
apoptosis, DNA damage
response, signal transduction
by p53 class mediator...
Component: cytoplasm, cytosol...
Function:
DNA binding, protein binding...
Cluster of genes X from micro array analysis
Collection of {List of GO identifiers} per gene in cluster
Þ Most prevalent GO identifiers:
Þ Apoptosis, Cytosol, Protein Binding
Þ Significant relationships between GO classes
(e.g. cell death and DNA damage response)
Semantic annotation - ontotags
Provenance
Evidence
Ontology
Metadata
Author
Workflow provenance
Author
Evidence
Scientific Model
Data type
Data value(s)
Gene
Ontology
Resource mngmt use case: data integration
Finding a basis for relation
Epigenetic
Mechanisms
Hypothesis
“There is a relation”
Chromatin
Transcription
Factors
Histone
Modification
Transcription Factor
Binding Sites
Classes
Instances
Transcription
Common Domain
position
KSinBIT’06
Scenario: A Use Case is born
• E-scientist explains benefits of semantic web to
(wet lab) biologist
• Biologist wants to see a demonstration with
actual data
• => Use Case: Find evidence of a relation
between transcription and histone modifications
• Our approach: Annotate data with our own
semantic types so that we can issue a query
using our own terms
KSinBIT’06
E-science perspective on data integration:
From cartoon to model to semantic data integration
Computer
readable
model
Biological
concepts
(‘myModel’)
Biologist
readable
model
Data
KSinBIT’06
Some of the pieces we need
• knowledge representation – triples
• pointing at things: EPR's and URI's, not just the things
but the statements about the things
• unification and reasoning
• annotation: linking knowledge to resources
Provenance – example in Taverna
Computational experiment
Database
Database
Some provenance should
be added by the
module/service itself
...
Database
The AIDA toolbox
for knowledge extraction andBioAID
knowledge management
17/07/2015
reusable components to enhance science
34
Living examples:
dynamic interfaces
• http://aida.science.uva.nl:9999/search/AID
• Yahoo Pipes interface to AIDA medline search:
http://pipes.yahoo.com/pipes/pipe.info?_id=cv7nIBpw3BGw4
NOLJphxuA
• MeSH facet interface from Exhibit:
http://aida.science.uva.nl:9999/search/json_test.html
• W3C Health Care and Life Sciences KB (unofficial URL):
http://www.w3.org/2001/sw/hcls/notes/kb/
http://esw.w3.org/topic/HCLS/Banff2007Demo
Conclusions
• The Web is a collection of resources: resource sharing
• Disclosure of semantic models can greatly enhance
resource sharing and resource management
• Semantic annotation can be applied to any type of
resource: data and (web)services.
• Semantic annotation and provenance can be added by the
(web)services themselves.
• Need text mining for web services (to support semantic
annotation)
• Need web services for text mining
The End
“Science is built up of facts, as a house is built of
stones; but an accumulation of facts is no more a
science than a heap of stones is a house.”
– Henri Poincaré,
Science and Hypothesis, 1905
http://adaptivedisclosure.org