DataforACESApril6-06 - Digital Science Center

Download Report

Transcript DataforACESApril6-06 - Digital Science Center

ACES Scholars’ Grid
5th ACES International Workshop
Maui Prince Hotel, Island of Maui, Hawaii
April 6 2006
Geoffrey Fox
Marlon Pierce
Community Grids laboratory
Computer Science, Informatics, Physics
Indiana University Bloomington IN 47401
http://grids.ucs.indiana.edu/ptliupages/presentations/
[email protected]
http://www.infomall.org
1
Semantically Rich Services with a Semantically
Rich Distributed Operating Environment
O
SOAP Message
Streams
SS
S
Another
Service
Filter Service
FS
Wisdom
MD
Data
FS
SS
Raw Data
FS
Data
FS
Raw Data
O
S
O
FS
Knowledge
S
O
S
MD
Information
FS
MD
SS
FS
SS
FS
O
S
FS
FS
FS
MD
F
S
MD
Knowledge
O
S
MD
F
S
Information
O
S
O
S
FS
Other
Service
MD
O
S
DataFS
FS
O
S
FS
MD
Data
FS
Decisions
O
S
FS
FS
SS
SS
MD
O
S Information FS
SS
Another
Service
FS
MetaData
SS
S
S
Another
Database
Grid
S
S
Raw Data
S
S
S
S
Grids of Grids Architecture
S
S
S
S
S
S
S
S
Raw Data
SOAP
Message Streams
Another
Grid
S
S
Sensor Service
is same as outward
facing application
2
service
ACES Grid and Services







Services receive data in SOAP messages, manipulate it and
produce transformed data as further messages
Meta-data is carried in SOAP messages but stored in databases
with XML-defined interfaces
Meta-data controls processing and transport of SOAP Messages
Meta-data describes what Quake information there is and how it
was created (provenance)
Knowledge is created from data by services
The Grid enhances Web services with semantically rich system
and application specific management
One must exploit and work around the different approaches to
meta-data and their manipulation in Web Services
• Just as we work around Job submission, security etc. choices that 5 years
from now will be clearly irrelevant in the big Service Architecture picture

Grids of Grids: Compose Grids from smaller Grids and a service
is “just” a special case of a Grid
• Sub-Grids could make GEON Globus SCEC SERVOGrid idiosyncratic
choices
3
What Type of Services are there?


There are a horde of support services supplying security,
collaboration, database access, user interfaces
The support services WS-* and GS-* are either associated with
system or application
• Globus, Apache, OMII, EGEE, and many Grid Project produce these
• Microsoft IBM Amazon Google will be major players



There are generalized filter services which are applications that
accept messages and produce new messages with some data
derived from that in input
• Simulations (Such as PDE’s)
• Data-mining
• Transformations
• Agents
• Reasoning
are all termed filters here
Note databases, sensors and simulations are sort of same thing:
they are services that produce (Web Feature Service WFS
formatted) Earth Science relevant messages – We call them
ACESNodes
All services and their interactions are bathed in sea of meta-data
and so implicitly need the Semantic Grid
4
WMS uses WFS that uses data sources
<gml:featureMember>
<fault>
<name> Northridge2 </name>
<segment> Northridge2
</segment>
<author> Wald D. J.</author>
<gml:lineStringProperty>
<gml:LineString
srsName="null">
<gml:coordinates>
-118.72,34.243 118.591,34.176
</gml:coordinates>
</gml:LineString>
</gml:lineStringProperty>
</fault>
</gml:featureMember>
`
WMS
le
ec
tio
n
Fe
a
ol
tur
eC
eC
oll
Ge
tF
ea
e
r
tu
r
tu
a
Fe
a
Fe
et
G
tur
e
Client
io
ct
n
s
ad
i l ro ]
a
R [a-b
Railroads
WFS Server
Hi
River [a-d]
Bridge [1-5]
ry
SQL Query
ue
LQ
SQ
SQ
L
gw
ay
[1
2-
Q
ue
18
ry
]
Interstate
Highways
Rivers
Bridges
90
5
Google maps
can be
integrated with
Web Feature
Service
Archives to
filter and
browse seismic
records.
Integrating
Archived Web
Feature Services
and Google Maps
6
Typical use of Grid Messaging in NASA
Sensor Grid
Grid Eventing
Datamining Grid
WFS is Universal Interface
GIS Grid
7
Real Time GPS
and Google Maps
Subscribe to live GPS
station. Position data
from SOPAC is
combined with Google
map clients.
Select and zoom to
GPS station location,
click icons for more
information.
8
ACESNodes: ACESSensors,
ACESRepositories, ACESFilters




Sensors are real-time and typically get their data from
the “edge of the Grid”
Repositories are typically databases storing ACESData
Filter are Simulations and transformations
ACESNodes (Skynodes in Astronomy Virtual
Observatory) accept and produce messages in the same
ACESFS Syntax – an enhanced Web Feature Service
WFS that knows about faults, plates etc (ADQL, SIA,
SSA in astronomy)
• Copy VOTable use from Astronomy for all output

ACES should agree on ACESFS and the partners
should agree that all Sensors, Repositories and Filters
will be presented to world as ACESNodes
• Astronomy has IVOA masterminding this
9





From Earthquake Occurrence with
aftershocks to
Wave Motion to
Directly damaged infrastructure to
Behavior of people, traffic, telephony,
energy (14 critical infrastructures) ….
• These use “activity data” of where
people are at a given time to model
transportation, energy and phone use
etc.
Package as a training game either on
Xbox or TeraGrid
• Get FEMA officials to play it!
Coupled
Simulations
Electric Power and Natural Gas systems from
LANL Interdependent Critical Infrastructure
10
Simulations using SERVOGrid GIS sub-Grid
ACESNodes Integration
Country
Data
Australia
Earthquake
Forecast/Model
Wave
Motion
Critical
Infrastructure
Finley, LSM
PANDAS
Canada
Polaris
Radarsat
P.I.
P.R. China
Seismic
LURR
Japan
GPS
GeoFEM
Seismic
Daichi (InSAR)
Matsu’ura
Talk
Taiwan
Chen talk
Chen talk
U.S.A.
QuakeTables
Sesismic
InSAR
PBO (GPS)
International
IMS
P.I.
ALLCAL
GeoFEST, PARK,
VirtualCalifornia
Tsinghua (CNG)
Shanghai Grid
TeraShake
DoE NISAC
D Division
LANL
11
The Core Service Areas I
Service or Feature
WS-*
GS-
NCES
* (DoD)
Comments
A: Broad Principles
FS1: Use SOA: Service
Oriented Arch.
WS1
Core Service Model, Build Grids on Web
Services. Industry best practice
FS2: Grid of Grids
Strategy for legacy subsystems and modular
architecture
B: Core Services
FS3: Service Internet,
Messaging
WS2
NCES3 Streams/Sensors
FS4: Notification
WS3
NCES3 JMS, MQSeries
FS5 Workflow
WS4
NCES5 Grid Programming
FS6 : Security
WS5
FS7: Discovery
WS6
FS8: System Metadata
& State
WS7
FS9: Management
WS8
FS10: Policy
WS9
GS7
NCES2 Grid-Shib, Permis Liberty Alliance ...
NCES4
Globus MDS
Semantic Grid
GS6
NCES1 CIM
ECS
12
The Core Service Areas II
Service or Feature
WS-*
GS-*
NCES
Comments
NCES7
Portlets JSR168, NCES Capability Interfaces
NCES8
NCOW Data Strategy
B: Core Services (Continued)
FS11: Portals and User WS10
assistance
FS12: Computing
GS3
FS13: Data and Storage
GS4
FS14: Information
GS4
FS15: Applications and User
Services
GS2
FS16: Resources and
Infrastructure
GS5
FS17: Collaboration and
Virtual Organizations
GS7
FS18: Scheduling and
matching of Services and
Resources
GS3
JBI for DoD, WFS for OGC
NCES9
Standalone Services
Proxies for jobs
Ad-hoc networks
NCES6
XGSP, Shared Web Service ports
13
SERVOGrid http://www.servogrid.org Services I
Area
Service Name
Description
FS3
Messaging Service
This is used to stream data in workflow fed by real-time sources. It is based
on NaradaBrokering which can also be used in cases just involving archival
data
FS3
Sensor Grid
Services
We are developing infrastructure to support streaming GPS signals and their
successive filtering into different formats. This is built over NaradaBrokering
(see messaging service). This does not use Web Services as such at present
but the filters can be controlled by HPSearch services.
FS4
Notification
Service
This supplies alerts to users when filters (data-mining) detects features of
interest
FS5
FS9
Workflow
/Monitoring
/Management
Services
The HPSearch project uses HPSearch Web Services to execute JavaScript
workflow descriptions. It has more recently been revised to support WSManagement and to support both workflow (where there are many
alternatives) and system management (where there is less work).
Management functions include life cycle of services and QoS for interservice links
FS6
Authentication and
Authorization
This uses capabilities built into portal. Note that simulations are typically
performed on machines where user has accounts while data services are
shared for read access
FS7
Information
Service
We have built data model extensions to UDDI to support XPath queries over
Geographical Information System capability.xml files. This is designed to
14
replace OGC (Open Geospatial Consortium) Web registry service
SERVOGrid http://www.servogrid.org Services II
Area
Service Name
Description
FS8
Context Data
Service
We store information gathered from users’ interactions with the portal
interface in a generic, recursively defined XML data structure. Typically we
store input parameters and choices made by the user so that we can recover
and reload these later. We also use this for monitoring remote workflows.
We have devoted considerable effort into developing WS-Context to support
the generalization of this initial simple service.
FS11
Portal
We use an OGCE based portal based on portlet architecture
FS11
Appl.
Web Map Service
We built a Web Service version of this Open Geospatial Consortium
specification. The WMS constructs images out of abstract feature
descriptions.
FS11
Appl.
Scientific Plotting
Services
We are developing Dislin-based scientific plotting services as a variation of
our Web Map Service: for a given input service, we can generate a raster
image (like a contour plot) which can be integrated with other scientific and
GIS map plot images.
FS12
File Services
We built a file web service that could do uploads, downloads, and crossloads
between different services. Clearly this supports specific operations such as
file browsing, creation, deletion and copying.
FS13
Appl.
QuakeTables
Database Services
The USC QuakeTables fault database project includes a web service that
allows you to search for Earthquake faults.
15
SERVOGrid http://www.servogrid.org Services III
Area
Service Name
Description
FS13
Data Tables Web
Service
We are developing a Web Service based on the National Virtual Observatory’s
VOTables XML format for tabular data. We see this as a useful general
format for ASCII data produced by various application codes in SERVO and
other projects.
FS14
Appl.
Application and
Host Metadata
Service
We have an Application and a Host Descriptor service based on XML schema
descriptors. Portlet interfaces allow code administrators to make applications
available through the browser.
FS14
Appl.
Web Feature
Service
We’ve built a Web Service version of this OGC standard. We’ve extended it
to support data streaming for increased performance.
FS15
Specific
Applications:
Virtual California,Geofest, Park, RDAHMM ..
These can be all launched by a single Job Management service or by custom
instances of this with metadata preset to a particular application
Key
interfaces/standards/software
Used
GML WFS WMS
WSDL XML Schema with pull parser XPP, SOAP with Axis 1.x
UDDI WS-Context, JSR-168 JDBC Servlets
WS-Management VOTables in Research
Key
interfaces/standards/software
NOT Used (often just for
historical reasons as project
predated standard)
WS-Security JSDL WSRF BPEL OGSA-DAI
16
Delicious ACES




http://del.icio.us purchased by Yahoo for ~$30M
http://www.CiteULike.org
http://www.connotea.org (Nature)
http://www.bibsonomy.org/
• Associate metadata with Bookmarks specified by URL’s,
DOI’s (Digital Object Identifiers)
• Users add comments and keywords (called tags)
• Users are linked together into groups (communities)
• Information such as title and authors extracted automatically
from some sites (PubMed, ACM, IEEE, Wiley etc.)
• Bibtex like additional information

This is de facto Semantic Web – remarkable for its
simplicity
17
Connotea
18
Connotea queried by SERVOGrid
19
Provenance and Delicious ACES

All ACESData should be associated with provenance
that describes its lineage
• How and when it was created
• Compiler options used in simulation
• ACESFS query used on what ACESNodes



Provenance produced by computer automatically
and/or by user
All ACESData can and should be labeled by a URI
aces://acesnodenumber.xx.yy.whathaveyou
We can use del.icio.us style interface to annotate
ACESData with missing provenance and user
comments of any type (describing quality of data or a
keyword relating different data etc.)
20
Semantic Scholar Grid




Citeseer and Google Scholar scour the Internet and
analyze documents for incidental metadata
Title, author and institution of documents
Citations with their own metadata allowing one to
match to other documents
These capabilities are sure to become more powerful
and to be extended
• Give “Citation Index” in real time
• Tell you all authors of all papers that cite a paper that cites
you etc. (Note it’s a small world so don’t go too far in link
analysis)
• Tell you all citations of all papers in a workshop

Such high value tools will appear on “publisher” sites
of future (or less publishers will disappear)
21
OSCAR2 Chemistry
Document analysis

It detects “magic”
chemical strings in text
and then
• Stores them as metadata
associated with
document


Queries
ChemInformatics
repositories to tell you
lots of information
about identified
compounds
Tells you which other
documents have this
compound
22
ACES Version of OSCAR

Some of the ACESNodes will store metadata associated
with ACESData – including documents
• Note documents could be anywhere on the Internet – the
ACESNode may choose to store (a copy of) document or just
its metadata
• Note all ACESNodes are federated i.e. there is no “one
central” store of any type of data


Metadata will be user annotations including tags,
Citeseer style citation information for all scientific
fields
Then each scientific field has its own version of OSCAR
tuned to extract natural metadata for science – for
ACES this is GML (Chemistry is CML …) and
ACESFS extensions
23
Semantic Scholars’ Grid I
Local MD
Store
Local Harvest
Store
Fetch MD
and Documents
PubMed
Gatherer
Indexer
Index all
Local MD
Query and
Get list
Analyzer
Run filter such as
OSCAR2 on
harvested MD
and documents
Store new MD
Science.gov
Google Scholar
e-Prints
Dspace
etc.
24
Semantic Scholars’ Grid II
Local MD
Store
ACM
CiteULike
IEEE
Connotea
Del.icio.us
Google
Scholar
etc.
Wiley
Plug-in
Updater
Synchronize
SSG and
foreign MD
etc.
Community
Tools
SSG
Viewer
Instant Citation
Index etc.
Update local MD
Control foreign interactions
View all MD’
Access Community Tools
Foreign
User Interface
Update and view
foreign MD
25