SERVOGrid_ChinaAug26-04 - Digital Science Center
Download
Report
Transcript SERVOGrid_ChinaAug26-04 - Digital Science Center
SERVO Grid:
Solid Earth Research Virtual Observatory
Grid/Web Services and Portals
Supporting Earthquake Science
Current SERVOGrid is USA Project led by JPL (Jet
Propulsion Laboratory) but next is iSERVO with
August 26 2004
International Collaboration between Australia, China,
Beijing China
Japan and USA
Geoffrey Fox, Marlon Pierce
Community Grids Lab,
Pervasive Technologies Laboratories
Indiana University
Solid Earth Science Questions
1.
2.
3.
What is the nature of
deformation at plate
boundaries and what are the
implications for earthquake
hazards?
How do tectonics and climate
interact to shape the Earth’s
surface and create natural
hazards?
What are the interactions
among ice masses, oceans,
and the solid Earth and their
implications for sea level
change?
4.
How do magmatic systems
evolve and under what
conditions do volcanoes
erupt?
5.
What are the dynamics of the
mantle and crust and how
does the Earth’s surface
respond?
6.
What are the dynamics of the
Earth’s magnetic field and its
interactions with the Earth
system?
From NASA’s Solid Earth Science Working Group
Report, Living on a Restless Planet, Nov. 2002
The Solid Earth is:
Complex, Nonlinear, and Self-Organizing
1.
2.
3.
4.
5.
6.
Relevant questions that Computational
technologies can help answer:
How can the study of strongly correlated solid earth
systems be enabled by space-based data sets?
What can numerical simulations reveal about the
physical processes that characterize these systems?
How do interactions in these systems lead to spacetime correlations and patterns?
What are the important feedback loops that mode-lock
the system behavior?
How do processes on a multiplicity of different scales
interact to produce the emergent structures that are
observed?
Do the strong correlations allow the capability to
forecast the system behavior in any sense?
Characteristics of Computing for
Solid Earth Science
Widely distributed datasets in various formats
• GPS, Fault data, Seismic data sets, InSAR satellite data
• Many available in state of art tar files that can be FTP’d
• Provenance problems: faults have controversial parameters
like slip rates which have to be estimated.
Distributed models and expertise
• Lots of codes with different regions of validity, ranging from
cellular automata to finite element to data mining applications
(HMM)
• Simplest challenges are just making these codes useable for
other researchers.
• And hooking this codes to data sources
• Some codes also have export or IP restrictions
• Other codes are highly specialized to their deployment
environments.
Decomposable problems requiring interoperability for
linking full models
• The fidelity of your fault modeling can vary considerably
• Link codes (through data) to support multiple scales
SERVOGrid Requirements
Seamless Access to data repositories and computing
resources
Integration of multiple data sources including
databases, file systems, sensors, …, with simulation
codes.
Core web services for common tasks like command
execution and file management.
Meta-data generation, archiving, and access with
extending openGIS (Geography as a Web service)
standards.
Portals with component model (portlets) for user
interfaces and web control of all capabilities
Basic Grid tools: complex job management and
notification
Collaboration to support world-wide work
• “Collaboration” can range from data sharing to Audio-video
conferencing
SERVOGrid Applications
Codes range from simple “rough estimate” codes to
parallel, high performance applications.
• Disloc: handles multiple arbitrarily dipping dislocations
(faults) in an elastic half-space.
• Simplex: inverts surface geodetic displacements for fault
parameters using simulated annealing downhill residual
minimization.
• GeoFEST: Three-dimensional viscoelastic finite element
model for calculating nodal displacements and tractions.
Allows for realistic fault geometry and characteristics,
material properties, and body forces.
• Virtual California: Program to simulate interactions
between vertical strike-slip faults using an elastic layer over
a viscoelastic half-space
• RDAHMM: Time series analysis program based on Hidden
Markov Modeling. Produces feature vectors and probabilities
for transitioning from one class to another.
Preprocessors, mesh generators: AKIRA suite
Visualization tools: RIVA, GMT, IDL
SERVOGrid Codes, Relationships
Elastic Dislocation Inversion
Viscoelastic FEM
Viscoelastic Layered BEM
Elastic Dislocation
Pattern Recognizers
Fault Model BEM
This linkage called Workflow in Grid/Web Service parlance
Role of Workflow
Service-1
Service-3
Service-2
Programming the Grid: Workflow describes linkage
between services
As distributed, linkage must be by messages
Linkage is two-way and has both control and data
Apply to multi-scale (complexity) linkage, multiprogram linkage, link visualization to simulation, GIS
to simulations and viz filters to each other
Microsoft-IBM specification BPEL is current preferred
Web Service XML specification of workflow
SERVOGrid uses ANT (well known XML build tool) to
perform workflow and this works well in our relatively
simple cases)
(i)SERVO Web (Grid) Services
Programs: All applications wrapped as Services using proxy strategy
Job Submission: support remote batch and shell invocations
• Used to execute simulation codes (VC suite, GeoFEST, etc.), mesh
generation (Akira/Apollo) and visualization packages (RIVA,
GMT).
File management:
• Uploading, downloading, backend crossloading (i.e. move files
between remote machines)
• Remote copies, renames, etc.
Job monitoring
Workflow: Apache Ant-based remote service orchestration
• For coupling related sequences of remote actions, such as RIVA
movie generation.
Data services: support remote data bases and query construction
• XML data model being adopted for common formats with
translation services to “legacy” formats.
• Migrating to Geography Markup Language (GML) descriptions.
Metadata Services: for archiving user session information.
Security: Authentication and Authorization
• Authentication describes who the user is
• Authorization describes what a given user can do
– What data and computers can be accessed
– Basically a database
• Current portal uses password accounts and provides services for free for
demonstration.
– iSERVO should decide on “charging for” services
• We have (through Community portal effort OGCE) support for GSI and
Kerberos authentication services.
– These just plug in and replace the default login service.
• Authorization is currently simple: you can only reach your files.
– iSERVO should develop an authorization policy
• Simultaneous Cross Administrative Domain access is a very hard Grid
problem and no consensus as to good solution
• Systematic use of Services helps security/privacy/IP issues as “danger
of misuse” is lower for services (which have limited privileges) than for
direct computer access
SERVO Data Sources
Fault Data
• Developed as part of the project
• QuakeTables: http://infogroup.usc.edu:8080
Seismic data formats
• Available from www.scec.org
•
SCSN, SCEDC, Dinger-Shearer, Haukkson
GPS data formats
• Available from www.scign.org
•
JPL, SOPAC, USGS
Applications and Observational Data
Several SERVO codes work directly with
observational data.
Scenarios include
• GeoFEST, VirtualCalifornia, Simplex, and Disloc all
depend upon fault models.
• RDAHMM and Pattern Informatics codes use seismic
catalogs.
• RDAHMM primarily used with GPS data
Problem: We need to provide a way to integrate
these codes with the online data repositories.
• QuakeTables Fault Database was developed
• What about GPS and Earthquake Catalogs?
• Many formats, data available in tars or files, not
searchable, not easy to integrate with applications
Solution: use databases to store catalog data; use
XML (GML) as exchange data format; use Web
Services for data exchanges, invoking queries, and
filtering data.
Geographical Information Service
(GIS) Data Formats and Services
OpenGIS Consortium (OGC) is an international group
for defining GIS data formats and services.
Main data format language is the XML-based GML.
• Subdivided into schemas for drawing maps, representing
features, observations, …
First Step: design GML schemas and build specialized
Web Services for GPS and Earthquake data.
OGC also defines services.
• Services include Web Features Services, Web Map Services,
and similar.
• These are currently pre-Web Service, based on HTTP Post, but
they are being revised to comply with WS standards.
Next Step: Implement OGC compatible Web Services
for this problem i.e. build a GIS Grid
• Also build services to interact with QuakeTables Fault DB.
GML and Existing Data Formats
GPS or seismic data used in this project
are retrieved from different URLs and
have different text formats.
Seismic data formats
•
SCSN, SCEDC, Dinger-Shearer, Haukkson
•
JPL, SOPAC, USGS
GPS data formats
We defined 2 GML Schemas to unify
these
• http://grids.ucs.indiana.edu/~gaydin/servo
A summary of all supported formats and
data sources can also be found there.
Prototype GML Service
First version of the
system available
• Tried XML databases
but performance was
awful
• Currently database
uses MySQL
Download results are
in GML, but we can
convert to appropriate
text formats.
Search DB
For Earthquake
Catalogs
Search XML DB For GPS Catalogs
1
openGIS Grid Semantics
• Note GIS (Geographical Information System) Grid at heart of all these
Grids
• Geography Markup Language (GML) is an XML encoding for the
specification of the geometry and properties of geographic features.
GML utilizes the OpenGIS Abstract Specification geometry model
which has been harmonized with the ISO geospatial geometry model.
– We are building CI specific ontologies in terms of GML to define
faults, satellites etc.
– http://ripvanwinkle.ucs.indiana.edu:4780/examples/download/schema/
• Styled Layer Descriptor (SLD) specifies the format of a map-styling
language for portraying the output of Web Map Servers, Web Feature
Servers and Web Coverage Servers etc. SLD will enable different
communities in the Emergency Response area to develop a set of
customized portrayal rules that best fit their mission requirements.
– This becomes the specification of portals to different composite Grids
• Sensor Markup Language (SensorML) defines the information model
for discovering, querying and controlling Web-resident sensors.
• Observations & Measurements (O&M) defines the information model
for observations that are returned from the CrisisGrid sensors.
GIS Grid Services I
• Web Feature Service (WFS) supports the query and discovery of
geographic features delivering GML representations of simple
geospatial features in response to queries from HTTP clients. WFS
can access geographic features including critical infrastructure
features, incident locations, and flood-related geographic features
including inundation areas, watershed boundaries, and demographic
feature.
• Web Coverage Service (WCS) supports the query and discovery of
digital geospatial information such as digital elevation models,
imagery, orthophotography, weather coverages (such as predicted
rainfall, air pressure, wind speed and direction), and any other spacevarying flood-related phenomena.
• Web Map Service (WMS) uses a SLD portrayal to generate "pictures"
of georeferenced feature or coverage data.WMS will provide a means
to portray geographic information independent of the underlying data
model (WFS or WCS).
• Coverage Portrayal Service (CPS) defines a standard interface for
producing visual pictures from coverage data typically accessed via
WCS with a SLD portrayal.
GIS Grid Services II
• Web Terrain Service (WTS) augments WMS with advanced
visualization including 3D terrains.
• Catalog Service - Web Profile (CS-W) is a catalog service
that will be built on a general Grid metadata service
• Sensor Collection Service (SCS) fetches observations from
a sensor or group of sensors and will be integrated with
research on Grid sensor services
• Sensor Planning Service (SPS) assists in 'collection
feasibility plans' and to process collection requests for a
sensor or group of sensors.
• Web Notification Service (WNS) will be replaced by
standard Grid notification service
QuakeTables+OGC Web Map Service
Demo
http://rio.ucs.indiana.edu:8080/wmsClient/
Streams and Workflow
NaradaBrokering can manage streams from
• Audio/Video conferences
• Sensors
• Inter-service communication in workflow
http://www.hpsearch.org/demo/ describes scripting
management
interface to
NaradaBrokering
Grids involve streams
as well as compute and
data nodes
Workflow and dataflow
like BPEL imply
streams
SERVOGrid Ontology
SERVOGrid has many types of metadata
We are designing RDFS descriptions for the
following components:
•
•
•
•
•
Simulation codes, mesh generators, etc.
Visualization tools
Data types
Computing resources
…
These are easily expressed as RDFS (actually
DAML) “nuggets” of information.
• Create instances of these
• Use properties to link instances.
Some Sample Relationships
installedOn
Danube
Computer
installedOn
GMT
Viz Appl
Disloc
Application
visualizedBy
createsOutput
usesInput
USC Fault DB
Data Storage
storedIn
Fault
DataType
Stress Map
DataFormat
More Information
SERVOGrid/QuakeSim:
• http://quakesim.jpl.nasa.gov/
Full Portal Demo:
• http://complexity.ucs.indiana.edu:8080
• Request an account
• Downloads available in November
Fault database
• http://infogroup.usc.edu:8080
GPS and Seismic Database Demo:
• http://gf3.ucs.indiana.edu:6060/cce/sql/
Setting up your own GPS or Seismic database
• http://complexity.ucs.indiana.edu/~gaydin/cce/install/install.html
Publications:
• http://grids.ucs.indiana.edu/ptliupages/publications/
• http://grids.ucs.indiana.edu/ptliupages/presentations/
Some Grid Controversies
• 1) There are several proposals for the Web Service
extensions needed for Grids – why do we ignore?
–
–
–
–
OGSI (GT3)
WSRF (GT4)
WS-GAF (Newcastle)
WS-I+ (Pure Web Services)
• We use WS-I+ approach – can later add extensions when
consensus clear
– This approach adopted by next phase of UK e-Science Program
• 2) Web Services are too slow as use HTTP with clumsy
ASCII XML data (SOAP)?
– Currently no problem but can use separate control channel from
data channel if need high performance
iSERVO Strategy
• Agree on what (type of) resources and capabilities need to put on the
ISERVO Grid
– Computers, instruments, databases, visualization, maps, job
submittal ….
• Agree on interfaces to resources from OGSA-DAI (databases) to
particular data structures (GML/OpenGIS) – specify in XML
• Implement Resources and Capabilities as Services
– User Interface should be a portlet that can be integrated by the
portal into web interface
• Make certain overarching Grid capabilities such as workflow,
federation and metadata are sufficient
• SERVO Grid is a prototype of this strategy using several US sites
rather than several countries
– Can be naturally extended to iSERVO, education, emergency
response by extending resources
• Web Service Architecture ensures continued interoperability and
extensibility
Lessons Learned
Web service performance is not an issue when
used to invoke services that take hours to
complete.
• Later real-time sensors will probe performance
Reliability is a larger problem.
• Need monitoring/heartbeat services.
Information systems still have a long way to go.
• UDDI is part of WS-I but has/had some well known
limitations.
• WS-Discovery has some interesting concepts but is too
specialized to ad-hoc networks.
• Peer-to-peer systems provide many useful concepts like
discovery and caching.
• Semantic Web provides powerful resource descriptions
that could be exploited.
• XML Databases slow
Further iSERVO Challenges
• Make everything a Service
• Think about Data Curation
– Set up policies for observational data and criteria for inclusion
in iSERVO data repositories
• Think about Data Provenance
– Generate and maintain metadata describing ownership, origins
and transformations
– Applies to both “experimental data” and results from
simulations (visualizations)
• Curation and Provenance change in research
methodologies and requires funding!
• Education and Emergency Response/Planning interesting
offshoots of iSERVO
QuakeSim Portal for SERVOGrid
The services need user interfaces
• WSDL descriptions are all you need to create
client stubs (if not client applications).
The QuakeSim portal effort aggregates
these service interfaces into a portal.
• Customizable displays, access controls to
services, etc.
QuakeSim is just one of many, many such
projects.
Challenge is to develop reusable portal
components
Each Service
has its own
portlet
Individual
portlet for the
Proxy Manager
Use tabs or
choose
different
portlets to
navigate
through
interfaces to
different
services
2 Other Portlets
OGCE
Consortium
SERVOGrid Portal Screen
Shots
Computational Web Portal
Stack
Web service dream
is that core
services, service
aggregation, and
user interface
development
decoupled.
How do I manage
all those user
interfaces?
Use portlets.
Aggregate Portals
Portlet User Interface
Components
Application Web Services
and Workflow
Core Web Services
Portal Architecture
Clients (Pure HTML, Java Applet ..)
Aggregation and Rendering
Portlet Class:
WebForm
Clients
Portal
Portlet Class
Portlet Class
Portlet Class
Portal
Internal
Services
Portlets
Gateway
(IU)
Remote
or Proxy
Portlets
Web/Grid
service
Computing
Web/Grid
service
Data Stores
Web/Grid
service
Instruments
GridPort
etc.
(Java)
COG Kit
Local
Portlets
Libraries
Hierarchical
arrangement
Services
Resources
Why Are Portlets a Good Idea?
You don’t have to reinvent
everything
• Makes it easy (but not effortless) to
share portal components between
projects.
• So you can pull in portlets from all the
other earthquake grid projects.
You can easily combine a wide range
of capabilities
• Add document managers, collaboration
tools, RSS news lists, etc for your portal
users.
Lessons Learned: Portals
Developing good user interfaces is a lot of work.
• Effort doesn’t scale: how do you simplify this for
computational scientists to do it themselves without
lots of background in XML, Java, portlets, etc?
Portal interfaces have advantages and disadvantages.
• Everyone has a browser.
• But it has a limited widget set, a limited event
model, limited interactivity.
• You can of course overcome a lot of this with
applets.
Following the service model, you can in principal use
any number of GUIs
• Browsers are not the only possible clients.
• Web service interoperability means that Java Swing
apps, Python, Perl GUIs are all possible, but this has
not been fully exploited.
Important Lessons/Principles
Use OGCE Portal Architecture and portal services
Can expect GGF activities like OGSA to define/refine interfaces and
projects around the world to produce more powerful services
• Obsolescing of implementations is a consequence of interoperability
Use Grids of Grids of Services Architecture
• Interoperable Component Grids Built from interoperable services
• Collaboration, Compute, Database, GIS, Sensor, Visualization Grids
Build a GIS (Geographical Information Systems) Grid spanning
simulation/crisis management and different fields with openGIS
compliance
• openGIS has defined Web Service Interfaces
• Visualization should build on these
Geoscience Education Grid by transformations on research grid
Emergency Response and Planning Grids by adding real-time
control/collaboration and GIS tools
• These additions common to all crises
Collaboration between Beihang University and Indiana University to
produce Web Service based audio/video conferencing