Semantics and the Grid - Pegasus Workflow Management System

Download Report

Transcript Semantics and the Grid - Pegasus Workflow Management System

Semantics and the Grid
or
“What’s does it all mean?”
Slides taken from
http://www.iceage-eu.org/issgc07/sessionDescription.cfm?id=78
http://www.semanticgrid.org/presentations/DeRoureSemanticGrid
Web2Stockholm.ppt
Overview
• The Semantic Web
– Introduction
– What is the Semantic Web
• Annotation, Integration, Inference
– Semantic Web Technologies
• RDF, RDF Schema and OWL
– Summary
• Semantic Grid
– Motivation
– Examples
• Service Descriptions, Data Management, Provenance
– Putting it together
– The Future
What is the Semantic Web?
• An extension of the current Web…
– … where information and services are given well-defined and
explicitly represented meaning, …
– … so that it can be shared and used by humans and machines,
...
– ... better enabling them to work in cooperation
• How?
– Promoting information exchange by tagging web content with
machine processable descriptions of its meaning.
– And technologies and infrastructure to do this
The Semantic Web Vision
•
The Web was made possible through established standards
– TCP/IP for transporting bits down a wire
– HTTP & HTML for transporting and rendering hyperlinked text
•
Applications able to exploit this common infrastructure
– Result is the WWW as we know it
•
Generations
– 1st generation web mostly handwritten HTML pages
– 2nd generation (current) web often machine
generated/active
• Both intended for direct human processing/interaction
– In the next generation web, resources should be more
accessible to automated processes
• To be achieved via semantic markup
• Metadata annotations that describe content/function
The Syntactic Web
The Semantic Web
Where we are Today: the Syntactic
Web
Resource
href
Resource
Resource
href
href
href
Resource
Resource
href
Resource
href
Resource
href
href
href
Resource
href
href
Resource
href
Resource
• A place where computers do
the presentation (easy) and
people do the linking and
interpreting (hard).
• Why not get computers to do
more of the hard work?
What’s the Problem?
• Typical web page markup
consists of:
• Rendering information
(e.g., font size and
colour)
• Hyper-links to related
content
• Semantic content is
accessible to humans but
not (easily) to computers…
Information We Can See
CS599 Introduction to Grid Computing
Contents
1 CS599 Introduction to Grid Computing, Fall 2007
1.1 Prerequisites
…
CS599 Introduction to Grid Computing, Fall 2007
….
Prerequisities
Graduate courses in Operating Systems (CS 555) and/or Networks
(CS 551)
Course Description
This course provides a graduate-level introduction to wide area
distributed computing research, focusing on a wide…
…
What the computer sees…
CS599 Introduction to
Contents
1
CS599 Introduction
Fall 2007
1.1 Prerequisites
…
CS599 Introduction to
Fall 2007
Grid Computing
to
Grid
Computing,
Grid Computing,
….
Prerequisities
Graduate courses in Operating Systems
555) and/or Networks (CS 551)
(CS
Course Description
This course provides a graduate-level
introduction to wide area distributed
computing research, focusing on a wide…
…
Solution: XML markup with “meaningful”
tags?
<title>CS599 Introduction to Grid
Computing</title>
<contents>Contents
1 CS599 Introduction to Grid
Computing, Fall 20071.1 Prerequisites
</contents>
<intro>
CS599 Introduction to Grid
Computing, Fall 2007 </intro>
<prereqs>
Prerequisites
Graduate courses in Operating Systems
(CS 555) and/or Networks (CS 551)<</prereqs>
<description>
Course Description
This course provides a graduate-level
introduction to wide area distributed
</description>
Still the Machine only sees…
<title>CS599 Introduction to Grid
Computing</title>
<contents>Contents
1 CS599 Introduction to Grid
Computing, Fall 20071.1 Prerequisites
</contents>
<intro>
CS599 Introduction to Grid
Computing, Fall 2007 </intro>
<prereqs>
Prerequisites
Graduate courses in Operating Systems
(CS 555) and/or Networks (CS
551)<</prereqs>
<description>
Course Description
This course provides a graduate-level
introduction to wide area distributed
</description>
Need to Add “Semantics”
• External agreement on meaning of annotations
– E.g., Dublin Core for annotation of library/bibliographic
information
• Agree on the meaning of a set of annotation tags
– Problems with this approach
• Inflexible
• Limited number of things can be expressed
• Use Ontologies to specify meaning of annotations
– Ontologies provide a vocabulary of terms
– New terms can be formed by combining existing ones
• “Conceptual Lego”
– Meaning (semantics) of such terms is formally specified
– Can also specify relationships between terms in multiple
ontologies
Ontology in Computer Science
• An ontology is an engineering artifact:
– It is constituted by a specific vocabulary used to describe a
certain reality, plus
– a set of explicit assumptions regarding the intended meaning of
the vocabulary.
• Almost always including concepts and their classification
• Almost always including properties between concepts
• Similar to an object oriented model
• Thus, an ontology describes a formal specification of a certain
domain:
– Shared understanding of a domain of interest
– Formal and machine manipulable model of a domain of interest
Ontology Languages
• Work on Semantic Web has concentrated on the definition of
a collection or “stack” of languages.
– Used to support the representation and use of metadata
– Basic machinery that we can use to represent the extra semantic
information needed for the Semantic Web
RDF(S)
XML
Annotation
RDF
Integration
Integration
RDFS
Inference
OWL
Reasoning over the information we have
Could be light-weight (taxonomy)
Could be heavy-weight (logic-style)
Integrating information sources
Associating metadata to resources (bindings)
RDF
• RDF stands for Resource Description Framework
• It is a W3C Recommendation
– http://www.w3.org/RDF
• RDF is a graphical formalism ( + XML syntax + semantics)
– for representing metadata
– for describing the semantics of information in a machineaccessible way
• Provides a simple data model based on triples.
The RDF Data Model
• Statements are <subject, predicate, object> triples:
–
<Paul,presents,SemClass>
• Can be represented as a graph:
Paul
presents
SemClass
• Statements describe properties of resources
• A resource is any object that can be pointed to by a URI
– The generic set of all names/addresses that are short strings that
refer to resources
– a document, a picture, a paragraph on the Web,
http://users.ecs.soton.ac.uk/pg03r, a book in the library, a real
person, isbn://0141184280
• Properties themselves are also resources (URIs)
Linking Statements
• The subject of one statement can be the object of another
• Such collections of statements form a directed, labeled graph
“Paul Groth”
hasName
Paul
presents
preparedBy
preparedBy
Oscar
SemClass
hasHomePage
http://vtcpc.isi.edu/CS599_Gri
dComputing/
• The object of a triple can also be a “literal” (a string)
RDF Syntax
• RDF has an XML syntax that has a specific meaning:
• Every Description element describes a resource
• Every attribute or nested element inside a Description is a
property of that Resource
• We can refer to resources by URIs
<rdf:Description rdf:about="some.uri/person#paul>
<o:presents rdf:resource="some.uri/class#SemClass"/>
<o:hasName rdf:datatype="&xsd;string">Paul Groth</o:hasName>
</rdf:Description>
<rdf:Description rdf:about="some.uri/session#SemClass">
<o:hasHomePage> http://vtcpc.isi.edu/CS599_GridComputing/ </o:hasHomePage>
<o:preparedBy rdf:resource=“some.uri/person#oscar>
<o:preparedBy rdf:resource=“some.uri/person#paul">
</rdf:Description>
What does RDF give us?
•
•
•
•
Single (simple) data model.
Syntactic consistency between names (URIs).
A mechanism for annotating data and resources.
Low level integration of data.
RDF
XML
Annotation
RDF(S)
Integration
Integration
RDFS
Inference
OWL
What doesn’t RDF give us?
• RDF does not give any special meaning to vocabulary
– Such as subClassOf or type (supporting OO-style modelling)
• So, what’s the difference between this graph...
“Paul Groth”
hasName
presents
Paul
SemClass
preparedBy
• ... and this one?
“Paul Groth”
isAlsoKnownAs
Paul
talksIn
presentedBy
SemClass
RDFS: RDF Schema
• RDF Schema is another W3C Recommendation
– http://www.w3.org/TR/rdf-schema/
• It extends RDF with a schema vocabulary that allows you to
define basic vocabulary terms and the relations between
those terms
– Class, type, subClassOf,
– Property, subPropertyOf, range, domain
– it gives “extra meaning” to particular RDF predicates and
resources
– this “extra meaning”, or semantics, specifies how a term should
be interpreted
• The combination of RDF and RDF Schema is normally known as
RDF(S)
Example
xsd:date
eventDate
Event
subClassOf
subClassOf
subClassOf
Personal_Event
Local_Event
Regional_Event
involves
Person
subClassOf
Professor
subClassOf
Researcher
RDF(S) Inference
rdfs:Class
rdf:type
Person
rdf:type
rdfs:subClassOf
rdfs:subClassOf
Academi
c
rdf:subClassOf
Professor
rdf:type
RDF(S) Inference
rdfs:Class
rdf:type
Academic
rdfs:subClassOf
rdf:type
Professo
r
rdf:type
Ewa
rdf:type
What does RDFS provide?
• Ability to use simple schema/vocabularies to describe our
resources
• Consistent vocabulary use and sharing
• Simple inference
• Query mechanisms: SPARQL, SeRQL, RDQL, …
– SELECT N FROM {N} rdf:type {sti:Event}
USING NAMESPACE sti=<http://www.ontogrid.net/StickyNote#>
What is RDFS lacking?
• RDFS is too weak to describe resources in sufficient detail
– No localised range and domain constraints
• Can’t say that the range of hasEducationalMaterial is Slides
when applied to TheoreticalSession and Code when applied
to HandsonSession
– TheoreticalSession hasEducationalMaterial Slides
– HandsonSession hasEducationalMaterial Code
– No existence/cardinality constraints
• Can’t say:
– Sessions must have some EducationalMaterial
– Sessions have at least one Presenter
– No transitive, inverse or symmetrical properties
• Can’t say that presents is the inverse property of isPresentedBy
The OWL Family Tree
DAML
RDF/RDF(S)
DAML-ONT
Joint EU/US Committee
Frames
DAML+OIL
OIL
OntoKnowledge+Others
Description
Logics
OWL
W3C
OWL
• W3C Recommendation (February 2004)
• A family of Languages
– OWL Full
– OWL DL
– OWL Lite
• Formal semantics
– Description Logics (DL/Lite)
– Relationship with RDF
OWL Basics (on top of RDF and RDFS)
• Set of constructors for concept expressions
– Booleans: and/or/not
• A Session is a TheoreticalSession or a HandsonSession
• Slides are not the same as Code
– Quantification: some/all
• Sessions must have some EducationalMaterial
• Sessions can only have Presenters that have developed Grid
applications or Grid middleware
• Axioms for expressing constraints
– Necessary and Sufficient conditions on classes
• A Session that hasEducationalMaterial Code is a HandsonSession.
– Disjointness
• TheoreticalSessions are disjoint with HandsonSessions
– Property characteristics: transitivity, inverse
Reasoning
• OWL DL based on a well understood Description Logic (SHOIN(Dn))
– Formal properties well understood (complexity, decidability)
– Known reasoning algorithms
– Implemented systems (highly optimised)
• Because of this, we can reason about OWL ontologies
– Subsumption reasoning
• Allows us to infer when one class is a subclass of another
• Can then build concept hierarchies representing the taxonomy.
• This is classification of classes.
– Satisfiability reasoning
• Tells us when a concept is unsatisfiable
– i.e. when it is impossible to have instances of the class.
• Allows us to check whether our model is consistent.
– Instance Retrieval/Instantiation
• What are the instances of a particular class C?
• What are the classes that x is an instance of?
Reasoning Tasks. Classification
What does OWL provide?
• Ability to use complex schema/vocabularies to describe our
resources.
• Consistent vocabulary use and sharing.
• Robust data integration techniques
• Complex inference and several reasoning functions
• Query mechanisms: OWL QL
Summary
• Good Things about RDF(S) + OWL
– They let us describe resources in a machine
understandable way
– Resources can be anything addressable by URIs
– These languages are standards
– They allow for different levels of reasoning
Lots of Resources on the Grid
•
•
•
•
•
•
•
Web Services
Computational facilities
Appartus
Disk and networking infrastructure
Policies
Workflows
Programs
The Semantic Grid
“The Semantic Grid is an extension of the
current Grid in which information and services
are given well-defined and explicitly
represented meaning, so that it can be shared
and used by humans and machines, better
enabling computers and people to work in
cooperation” D. De Roure, et. al
http://www.semanticgrid.org
Where things meet…
Motivation: Metadata Matters
• Particularly for the following activities:
–
–
–
–
–
Information provision and resource discovery
Data integration
Provenance
Systems Configuration
Policy representation and reconciliation
• Using:
– Open, flexible and extensible self describing schemas that don’t have to
be nailed down
• “Let’s describe my data set, or the output format of this tool”
• Lightweight schemas
• Decoupled, interoperable systems, which resist to syntactic changes
– Open world
• “This metadata is no longer valid because...”
– Data integration across different data models (e.g. RDF)
• Like policy or resource models
– Formalization & Reasoning support
Example: Web Services
list of strings
Photo Lookup Service
Provided by
USC campus
jpegs
Example: Web Services
Keywords,
Describe Pictures,
Describe things at USC
list of strings
Photo Lookup Service
Provided by
USC campus
jpegs
Photos,
Taken at USC,
Photo taxonomy
Availability,
It uses Google,
It runs on an old 486
It logs your data
OWL-S
• OWL-S is a language based on OWL for the description of
Web Services
– http://www.w3.org/Submission/OWL-S/
• Motivating Tasks
– Automatic Web service discovery
– Automatic Web service invocation
– Automatic Web service composition and interoperation
• Bringing Semantics to Web Services with OWL-S, Martin, D. and
Burstein, M. and McDermott, D. and McIlraith, S. and Paolucci,
M. and Sycara, K. and McGuinness, D. and Sirin, E. and
Srinivasan, N., 2007
OWL-S - Upper Ontology
OWL-S Service Profile
OWL-S and WSDL
Semantics make better registries
• Grimories
– http://twiki.grimoires.org/bin/view/Grimoires/
– A UDDI Service Registry
– Allows for the additon of arbitrary metdata (in RDF) to describe
services.
– Allows for the registration of workflows and associated metada
desciptions
– Exposes service descriptions through a WSRF interface
• Benefits
– More advanced search taking advantage of RDFS style
reasoning
• e.g. find a photo service, finds the USC photo service
– Descriptions can be enhanced over time
Taverna Workflow Workbench
From Carole
Goble
Composing Services
• Taverna (http://taverna.sourceforge.net)
–
–
–
–
Quite a few users
Not very “gridy”
Stores all of its worklfows and related information in RDF
Mainly uses semantics for lookup
• Wings (http://www.isi.edu/ikcap/wings/)
– Takes advantage of service descriptions, data descriptions, and
workflow templates in OWL to automatically generate workflows
that can be run with Pegasus.
• McIlraith et al.
– Using Golog to automatically compose services based on OWL-S
descriptions.
Data Management
•
•
•
•
Data from multiple institutions
Data discovery and search
Integration through semantics
End-to-end data understanding
– Data from apparatus
– Data from lab notebooks
• CombeChem
• Scientific Application Middleware
Provenance of data
Bioinformatics: verification and
auditing of “experiments” (e.g.
for drug approval)
High Energy Physics:
tracking, analysing, verifying
data sets in the ATLAS
Experiment of the Large
Hadron Collider (CERN)
Why not use log files?
INFO: Starting Coyote HTTP/1.1 on http-8443
Jun 25, 2007 4:26:54 PM org.apache.jk.common.ChannelSocket init
INFO: JK2: ajp13 listening on /0.0.0.0:8009
Jun 25, 2007 4:26:54 PM org.apache.jk.server.JkMain start
INFO: Jk running ID=0 time=4/38 config=/Users/pgroth/Develop/jakarta-tomcat-5/conf/jk2.properties
Jun 25, 2007 4:26:54 PM org.apache.catalina.startup.Catalina start
INFO: Server startup in 17476 ms
Jun 25, 2007 4:36:49 PM org.apache.catalina.core.ContainerBase log
INFO: Removing web application at context path /preserv
Jun 26, 2007 8:59:59 AM org.apache.catalina.core.StandardHostDeployer install
INFO: Installing web application at context path /preserv-1.0 from URL jar:file:/Users/pgroth/Develop/jakarta-tomcat5/webapps/preserv-1.0.war!/
Jun 26, 2007 9:00:03 AM org.apache.catalina.loader.WebappClassLoader validateJarFile
INFO: validateJarFile(/Users/pgroth/Develop/jakarta-tomcat-5/webapps/preserv-1.0/WEB-INF/lib/servlet-api-2.4.jar) - jar not
loaded. See Servlet Spec 2.3, section 9.7.2. Offending class: javax/servlet/Servlet.class
Jun 26, 2007 9:00:08 AM org.apache.catalina.startup.HostConfig deployWARs
SEVERE: Exception while expanding web application archive preserv-1.0.war
java.lang.IllegalStateException: Context path /preserv-1.0 is already in use
at org.apache.catalina.core.StandardHostDeployer.install(StandardHostDeployer.java:190)
at org.apache.catalina.core.StandardHost.install(StandardHost.java:832)
at org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:617)
at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:431)
at org.apache.catalina.startup.HostConfig.check(HostConfig.java:1068)
at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:327)
at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
at org.apache.catalina.core.StandardHost.backgroundProcess(StandardHost.java:800)
at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1619)
at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.processChildren(ContainerBase.java:1628)
at org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1608)
at java.lang.Thread.run(Thread.java:613)
Jun 26, 2007 9:03:18 AM org.apache.catalina.core.ContainerBase log
INFO: Removing web application at context path /preserv-1.0
Provenance
• Provides the causal
connections between data
items
• Needed is a meaningful
representation of the
provenance of data so that it
can be reasoned about
• Several groups have worked on
provenance in the context of
workflows
• Many use Semantic Web
Technologies
• The Provenance Challenge
– http://twiki.ipaw.info/bin/view/
Challenge/
Adding Semantics to the Grid
– Semantic-OGSA
– Semantic Grid Reference
Architecture
– A low-impact extension of OGSA
– Mixed ecosystem of Grid and
Semantic Grid services
• Services ignorant of bindings
• Services binding aware but
unable to process them
• Services binding aware and
capable of processing (part of)
them
– Everything is OGSA compliant
Model
provide/
consume
expose
Mechanisms
Capabilities
use
From Carole
Goble
OGSA
Core
Grid
Telecontrol
Protocol
Delegation
Data
Replication
Community
Data Access
Authorization & Integration
Contrib/
Preview
Community
Scheduling
Framework
WebMDS
Python
WS Core
Workspace
Management
Trigger
C
WS Core
Authentication
Authorization
Reliable
File
Transfer
Grid Resource
Allocation &
Management
Index
Java
WS Core
Pre-WS
Authentication
Authorization
GridFTP
Pre-WS
Grid Resource
Alloc. & Mgmt
Pre-WS
Monitoring
& Discovery
C Common
Libraries
Credential
Mgmt
Replica
Location
Security
Data Mgmt
eXtensible
IO (XIO)
Execution
Mgmt
Info
Services
Common
Runtime
Deprecated
Web
Services
Components
Non-WS
Components
S-OGSA (OntoKit implementation)
Annotation
Metadata
Reasoning
Ontology
Semantic
Delegation
Core
Grid
Telecontrol
Protocol
Ontology
Role-based
AuthZ
Data
Replication
Community
Data Access
Authorization & Integration
Contrib/
Preview
Community
Scheduling
Framework
WebMDS
Python
WS Core
Workspace
Management
Trigger
C
WS Core
Authentication
Authorization
Reliable
File
Transfer
Grid Resource
Allocation &
Management
Index
Java
WS Core
Pre-WS
Authentication
Authorization
GridFTP
Pre-WS
Grid Resource
Alloc. & Mgmt
Pre-WS
Monitoring
& Discovery
C Common
Libraries
Credential
Mgmt
Replica
Location
Security
Data Mgmt
eXtensible
IO (XIO)
Execution
Mgmt
Semantically
Aware
Info
Services
Common
Runtime
Deprecated
Web
Services
Components
Non-WS
Components
The Future
• Increasing amounts of RDF data
– May just be because it’s a nice way to store
graphs
• Semantics are becoming easier to integrate with
the Grid because of the move towards Web Service
technology
• Difficult to markup services and data
• Concerns about the “friendliness” of tools that use
semantics
• Everybody loves Web 2.0, Semantic Grid people do
to (now)