Grid, Ontologies and the Semantic Web

Download Report

Transcript Grid, Ontologies and the Semantic Web

Information Grids,
the Semantic Web &
Why Ontologies Matter
Professor Carole Goble
University of Manchester
UK
Comparative
Functional Genomics


Vast amounts of
data & escalating
Highly
heterogeneous





Data types
Data forms
Community
Highly complex
and inter-related
Volatile
Take home message
Content with extensive meta data
Services that exploit this enriched content
Knowledge
Fundamentally involves the construction and
deployment of ontologies
Ontologies on a Grid scale need reasoning
support
Grid Computing
?
eScience
Agents
e-Business
Web Services
Semantic interoperation
process
views/queries
ontologies
metamodels
objects
transport
packet
data link
physical
Semantic
interoperation
Data
exchange
process
views/queries
ontologies
metamodels
objects
transport
packet
data link
physical
Web Service
Descriptions
=> Automated
Discovery & Search
Selection
Matching
Composition & Interoperation
Invocation
Execution monitoring
What to describe?
Resource
provides
Service
supports
presents
What it does
describedby
Service profile
description
Service grounding
Service model
How to access it
How it works
functionalities
functional
attributes
The Tower of Babel
Interoperating
resources, be it by
people or systems, requires a consistent
shared understanding of what the
information contained means
“... people [and machines] can’t share
knowledge if they don’t speak a common
language”
(Davenport)

Metadata
Data describing the content and meaning
of resources
 But everyone must speak the same
language…


Terminologies
Shared and common vocabularies
 For search engines, agents, curators,
authors and users
 But everyone must mean the same thing…


Ontologies
Shared and common understanding of a
domain
 Essential for search, exchange and
discovery

Machine processable
Knowledge on the Web


Annotating services requires a shared
vocabulary
Ontologies :





a vocabulary of terms,
a precise and principled specification of their
meaning
structure on the domain of the terms
constrain the possible interpretations of terms
Inference applies the knowledge in the
metadata and the ontology to create new
metadata and new knowledge
Three Layer Orthodoxy
(Schreiber et al. 1998)
Knowledge Layer
Mining:
inference, prediction & discovery
Information Layer
Middleware & Metadata:
discovery, description, interoperation, association,
sharing, composition, personalisation
Data / Computation Layer
What is an Ontology?
Catalog/
ID
Thesauri
“narrower
term”
relation
Terms/
glossary
Informal
is-a
Frames
Formal
General
(properties) Logical
is-a
constraints
Formal
instance
Disjointness
Value
Inverse,
Restrs. part-of…
From Debbie McGuinness
Ontology desiderata

Precision



Formal, unambiguous
High fidelity
Explicitness




Clarity
Commitment
Reuse
Systematic



Quality
Clarity
Flexibility


Expressivity
Evolution
machine computable
Ontology Description Space
Coverage
Knowledge
representational
languages
Inference
mechanisms
Expressivity
Taxonomy, Relationships, Axioms
What do Ontologies offer?
Controlled description and organisational
framework




Controlled vocabularies
Accurate data collection or retrieval
Classification
Finding, sharing, discovering, navigation,
indexing
Control
Controlled Vocabularies
ID PRIO_HUMAN
STANDARD;
PRT;
253 AA.
DE MAJOR PRION PROTEIN PRECURSOR (PRP) (PRP27-30) (PRP33-35C) (ASCR).
OS Homo sapiens (Human).
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo.
CC -!- FUNCTION: THE FUNCTION OF PRP IS NOT KNOWN. PRP IS ENCODED IN THE HOST GENOME AND IS
CC
EXPRESSED BOTH IN NORMAL AND INFECTED CELLS.
CC -!- SUBUNIT: PRP HAS A TENDENCY TO AGGREGATE YIELDING POLYMERS CALLED "RODS".
CC -!- SUBCELLULAR LOCATION: ATTACHED TO THE MEMBRANE BY A GPI-ANCHOR.
CC -!- DISEASE: PRP IS FOUND IN HIGH QUANTITY IN THE BRAIN OF HUMANS AND ANIMALS INFECTED
WITH
CC NEURODEGENERATIVE DISEASES KNOWN AS TRANSMISSIBLE SPONGIFORM ENCEPHALOPATHIES OR PRION
CC DISEASES, LIKE: CREUTZFELDT-JAKOB DISEASE (CJD), GERSTMANN-STRAUSSLER SYNDROME (GSS),
CC FATAL FAMILIAL INSOMNIA (FFI) AND KURU IN HUMANS; SCRAPIE IN SHEEP AND GOAT; BOVINE
CC SPONGIFORM ENCEPHALOPATHY (BSE) IN CATTLE; TRANSMISSIBLE MINK ENCEPHALOPATHY (TME);
CC CHRONIC WASTING DISEASE (CWD) OF MULE DEER AND ELK; FELINE SPONGIFORM ENCEPHALOPATHY
CC (FSE) IN CATS AND EXOTIC UNGULATE ENCEPHALOPATHY(EUE) IN NYALA AND GREATER KUDU. THE
CC PRION DISEASES ILLUSTRATE THREE MANIFESTATIONS OF CNS DEGENERATION: (1) INFECTIOUS (2)
CC SPORADIC AND (3) DOMINANTLY INHERITED FORMS. TME, CWD, BSE, FSE, EUE ARE ALL THOUGHT TO
CC OCCUR AFTER CONSUMPTION OF PRION-INFECTED FOODSTUFFS.
CC -!- SIMILARITY: BELONGS TO THE PRION FAMILY.
KW Prion; Brain; Glycoprotein; GPI-anchor; Repeat; Signal; Polymorphism; Disease mutation.
Functional
genomics Tissue
Structural
Genomics
Disease
Population
Genetics
Genome Clinical Data
Clinical trial
sequence



Data resources have been
built introspectively for
human researchers
Information is machine
readable not machine
understandable
Sharing vocabulary is a
step towards unification
What do Ontologies offer?
Community reference model
Common framework for integration 
OpenMMS, TAMBIS
 Search support: querying and matching
 Information extraction  PASTA
 Information checking  Irbane
 Intelligent interfaces for queries and
accurate data capture

Control +Semantics
Quality: reap what you sow
"The problem is: the databases are God-awful.
… If the data is still fundamentally flawed,
then better algorithms add little.”
Temple Smith,
Director
Molecular Engineering Research Center
Boston University
The Web Services Stack
Workflow
WFDL
Adverts: Description and discovery
UDDI
Service connection
WSDL
Ontology
Message protocol
SOAP
Message syntax
XML
Transport
HTTP
TCP/IP
What do Ontologies offer?
Knowledge discovery
Knowledge-acquisition tools
 Decision Support
 Hypothesis generation  RiboWeb,
Ingenuity

Control + Semantics + Inference

“The technical advantages of knowledge
modeling are obvious. Knowledge bases can be
automatically checked for consistency; they
support inference mechanisms which derive
data which have not been explicitly stored;
they also offer extensive request and
navigation facilities. However, the most
immediate benefit of knowledge base design
lies in the modeling process itself, through
the effort of explication, organization and
structuration [sic] of the knowledge it
requires.”
Editorial, Bioinformatics, July 2000
Scale =>
Reasoning & Inference
1.
2.

Keeping the classification together
Expressing constraints and sticking to ‘em
Ontology design



Ontology integration


Creation, extension, maintenance
Large, multiply authored evolving ontologies
Merging
Ontology deployment


Determining consistency of description & instances
Query validation/refinement/containment & Service
matching

Ontologies are the cornerstone of
encoding understanding, BUT to be
shared they they need a standard
representation and exchange language
Requirements for an
Ontology-language (1)
Well designed





Useful and proven modelling primitives
Intuitive to human users
Can say simple things simply but as
complex as necessary
Expressive enough to capture many
ontologies
Efficient, sound and complete reasoning
support
Requirements for an
Ontology-language (2)
Well defined
clear syntax - read ontologies
 Formal semantics – understand (process)
ontologies - to facilitate machine
interpretation of that semantics;
 Expressive enough to capture many
ontologies

Requirements for an
Ontology-language (3)
Compatible
Easy mapping to/from other ontology
languages
 Maximum compatibility with XML and
RDF(S);

The Ontology Language Stack
DAML-S
DAML-R
DAML+OIL
DAML-Ont
OIL
DC
RDF(S)
XOL Topic Maps SMIL
RDF
HTML
XML + Name Space + XML Schema
Unicode
URI
PICS
DAML+OIL

Ontology language




Web




Logic with model theoretic semantics
Classes, properties & axioms
OIL -> frame syntax mapped to description logic
Mapping to RDF(S)
Decidable and empirically tractable
Tools: editors (OilEd) reasoners (FaCT)
Extensions: DAML-R, DAML-S
DAML-S Upper Ontology
Resource
provides
Service
supports
presents
What it does
describedby
Service profile
description
Service grounding
Service model
How to access it
How it works
functionalities
functional
attributes
The Semantic Web
Knowledge Technologies
for the Grid
• Ability to store and retrieve huge volumes of data
• Ability to effectively process large volumes of
data
• Ability to capture, enrich, classify and structure
knowledge about
•Services
•Domains
•Organisations
•Individuals
•Research Collaborations
•Experiments
•Results
Places to go
www.semanticweb.org
 www.daml.org
 www.ontoweb.org


www.bioontologies.org
Spares
Three Layer Orthodoxy
(Schreiber et al. 1998)
Knowledge Layer
Knowledge is the whole body of data and information that people
bring to bear to practical use in action, in order to carry out
tasks and to create & infer new information.”
Information Layer
Information is data equipped with meaning…
Data / Computation Layer
Data is the uninterpreted signals that reach our senses every
minute in time by the zillions…
m
Where I’m coming from
MyGrid
Personalised
extensible environments for
data-intensive
“in silico” experiments
in biology
OIL: Ontology Inference Layer
Frames:
modelling primitives,
OKBC-Lite
Originally
based on XOL
from the
BioOntology
Working Group
Description Logics:
formal semantics &
automated reasoning
support
A knowledge
representation
language and
inference
mechanism for
the web
Web languages:
XML & RDF based syntax, RDFS mapping
OIL example
Slot-def part-of
subslot-of structural-relation
inverse has-part
properties transitive
Class-def defined herbivore
subclass-of animal
slot-constraint eats
value-type plant OR
slot-constraint part-of
has-value plant
min-cardinality 2 vegetable
Disjoint herbivore carnivore




part-of is a slot
sub-slot of structural-relation
inverse is has-part
it is transitive

herbivore exactly defined as:
sub-class of animal
that eats
only plants
or parts of
plants
and >= 2 types of vegetable

herbivore and carnivore disjoint






Reasoning when querying?

Classification-based retrieval



Query generalisation
Query refinement
Reasoning about query descriptors




Query validation
Query organisation
Query inclusion/containment
Intensional query processing

Knowledge
Information



Data
Knowledge
Ontologies can use not
just for retrieval but
for discovery
Middleware
Metadata


To describe the
information and
computational resources
Essential for navigation,
integration, analysis, use
Tools







Ontology development
environments
Ontology application
servers
Metadata extractors &
annotators
E-Science agents
Personalisation agents
Ontology learning tools





Change management
tools
Semantic Retrieval
tools
Semantic Web Portal
builders
Semantic Authoring
tools
Ontology and metadata
visualisation
Intelligent browsers
Why Reasoning support?
Using the Tbox as a big Index

In interfaces



In mediation



Most specific wrapper function of a resource
Most specific codes in an external system for some concept
In retrieval




Most specific reasonable assertions
Most specific data entry forms for some condition for some
kind of patient
Most specific interactions for two drugs
Most specific web pages for some topic
Most specific bibliographic references for some problem
In decision support…

Mark up Web Services to make them




Computer interpretable
Use-apparent
Agent-ready
Declarative API




Capturing data & metadata associated with a
source
Specification of its properties & capabilities
Interface for its execution
Pre-requisites and consequences of its use
Grid = Infrastructure
Knowledge

Mining, visualisation, knowledge
management, reasoning & prediction…

Metadata, middleware, fusion,
intelligent retrieval, information
modelling, curation management, semiautomatic annotation, data
warehousing, workflow,
information/content distribution,
active content management
(distribution, security…), consistency
management (versioning, quality…),
Information
Data

Warehousing, distributed databases,
streaming, near-line storage, large
objects, efficient access mechanisms,
data staging, query optimisation…
What’s special about
Bioinformatics?

Complexity

Diversity
size isn’t
everything
Gene
sequence
Genome
sequence
Phenotype
Gene
Gene
expression
expression
Disease
Clinical trial
Disease
Drug
Proteome
Disease
Disease
Protein
Protein
Structure
a+b
Protein
Sequence
P-P interactions