QuakeSim Project: Portals and Web Services for

Download Report

Transcript QuakeSim Project: Portals and Web Services for

QuakeSim Project: Portals
and Web Services for
Geo-Sciences
Marlon Pierce
Indiana University
[email protected]
QuakeSim Project Summary
 Goal is to provide a distributed environment
for connecting scientific computing and data
resources with Web based user interfaces.
 QuakeSim’s IT development includes
 Portals for user interfaces.
 Web Services for running remote
applications and accessing databases
 Databases for semantic fault models and
InSAR data (USC)
Some QuakeSim Applications and Their Data
 Disloc, Simplex
 Fault models are used to calculate surface displacements (Disloc)
using Okada method.
 Simplex is the inverse.
 These help researchers refine their fault models from observed
displacements, or model displacements associated with faults.
 GeoFEST
 Finite element code for detailed modeling of fault stresses,
seismic displacements, uses fault models as input.
 Coupled to mesh generation tools
 Can (for example) calculate post- and co-seismic displacements.
 RDAHMM
 Time series analysis code, can be applied to GPS and seismic
archives.
 Identifies signal components (possibly associated with underlying
physical causes) with no fixed parameters.
HTTP(S)
Portlets + Client Stubs
SOAP/HTTP
WSDL WSDL WSDL WSDL
WSDL
WSDL WSDL
Job Sub/Mon
And File
Services
WSDL
Visualization
Or Map
Service
DB
Operating and
Queuing
Systems
DB,
etc
Host 1 (QT or GRWS)
Host 2 (Comp Grid)
DB Service
JDBC
Host 3 (GIS)
Daily RDAHMM Updates
Daily analysis and
event classification
of GPS data from
REASoN’s GRWS.
April 21, 2006:
What Happened?
We can also anlyze
real-time GPS data
from the CRTN.
Real time state
changes?
Disloc model of Northridge
fault. Disloc used in Gerry
Simila’s geophysics classes
(CSUN).
Integrating QuakeSim and UAVSAR
 July 29, 2008 M 5.4
Chino Hills Earthquake
 Used QuakeSim to model
expected surface
displacements from the
event
 Passed on KML file to
UAVSAR
program/project
 Overlaid displacements
with UAVSAR image
 Will continue to merge
projects using the Los
Angeles ShakeOut in
mid–November as a
testbed
TeraGrid Supercomputing Resources (GPIR)
GeoFEST Finite
Element Modeling
portlet and plotting
tools
QuakeSim and Web 2.0
 Export all observations and computational
results as KML, GeoRSS.
 Use Social Networks to share projects, results,
papers, proposals, etc.
 Facebook and OpenSocial have open APIs.
 Use social (Google) gadgets to deliver your Web
components to everyone.
 Use Google’s APIs to integrate your services with
Calendar, Blogspot, YouTube, etc.
More Information
 Email: [email protected]
 QuakeSim Web Site:
 www.quakesim.org
 Portal URL:
 http://gf7.ucs.indiana.edu:8080/gridsphere
 Portal SourceForge Page:
 https://sourceforge.net/projects/crisisgrid
 Code SVN:
 http://crisisgrid.svn.sourceforge.net/viewvc/crisisgrid
/
Acknowledgments
 QuakeSim work is funded by NASA AIST (A.
Donnellan, PI) and ACCESS (Y. Bock, PI)
programs.
 Indiana University developers: Galip Aydin,
Xiaoming Gao, Zhigang Qi
 Robert Granat (JPL), Jay Parker (JPL), Maggi
Glasscoe (JPL), John Rundle (UC-Davis), Harout
Nazerian (JPL), Rami Al-Ghanmi (USC), Dennis
Mcleod (USC), Paul Jamason (Scripps), Ruey-Juin
Chang (Scripps), Gerry Simila (CSUN)
Stop Talking Now, Champ
Enterprise Approach
Web 2.0 Approach
JSR 168 Portlets
Gadgets, Widgets
Server-side integration and
processing
AJAX, client-side integration and
processing, JavaScript
SOAP
RSS, Atom, JSON
WSDL
REST (GET, PUT, DELETE, POST)
Portlet Containers
Open Social Containers (Orkut,
LinkedIn, Shindig); Facebook;
StartPages
User Centric Gateways
Social Networking Portals
Workflow managers (Taverna, Kepler, Mash-ups
etc)
Grid computing: Globus, condor, etc
Cloud computing: Amazon WS Suite,
Xen Virtualization
Q: What Is Web 2.0?
 A: IT for everyone
 Too much of “Enterprise” computing requires
specialized knowledge and specialized tools.
 Result: specialization of tasks within teams like
QuakeSim.
 Waste of talent: scientists can write code, just don’t
have time to waste on difficult operating
environments.
 What then is Web 2.0 in detail?
What Is a Gadget?
Simple gadgets for getting a Grid proxy credential and running remote
commands. Both run on my own Web server.
Google Reader and GeoRSS
Google Maps and GeoRSS
Google Earth and KML
Cloud Computing and Gateways
 Cloud computing is the combination of virtualization (Xen,
VMWare, OpenVZ,…) with Web Services
 Web Services control the life cycle of the virtual machines.
 The virtual machines are under the control of the application
developer.
 UC-D can distribute the VC Service VM, for example
 Examples include Amazon EC2, Eucalyptus (UCSD), and Virtual
Workspace/Nimbus (UChicago)
 Data clouds focus on data virtualization
 Google’s BigTable, Facebook’s Cassandra
 Apache’s Hadoop and related projects (HBase, HDFS)
 Challenges
 MPI on clouds
 Mounting high performance file systems
What Would You Want a Cloud?
 Application Developers: Reproducible operating
environments
 Develop your application and be sure it will be
deployed under the same conditions.
 Distribute reproducible results.
 Have control of your operating environment
 Move applications closer to data.
 Data replication built-in
 Assume vast amounts of cheap diskspace
Simplex refines fault
models from GPS
displacements
UCSB’s Queue Prediction Service (QBETS)
Forecasts time you will
wait in the queue on
various TG super
computers. Inherited
from OGCE project.
OGCE’s XBaya
Workflow Composer
Some Design Choices
 Build portals out of portlets (Java Standard)
 Reuse capabilities from our Open Grid Computing Environments (OGCE) project, the
REASoN GPS Explorer project, and many TeraGrid Science Gateways.
 Decorate with Google Maps, Yahoo UI gadgets, etc.
 Use Java Server Faces to build individual component portlets.
 Build standalone tools, then convert to portlets at the very end.
 Use simple Web Services for accessing codes and data.
 Keep It Stateless …
 Use Condor-G and Globus job and file management services for interacting
with high performance computers.
 TeraGrid
 Favor Google Maps and Google Earth for their simplicity, interactivity and
open APIs.
 Generate KML and GeoRSS
 Use Apache Maven based build and compile system, SVN on SourceForge
QuakeSim, Version 1
Reason to Revise
QuakeSim, Version 2
Application Web Service for
wrapping a.out executables.
Execution management
service built with Apache
Ant.
Services too coupled to
portal; no simple WSDL
programming interface; could
not be used in workflow
engines; not self contained
Give each code a proper
service interface. Retain
Apache Ant core but extend.
Keep WSDL message
structure simple (Strings,
ints, doubles, URLs), wrapped
as Java Beans
File Management Service
Unnecessary, too coupled to
Apache Axis 1.0
HTTP GET, URLs
Context Management
Service manages persistent
portal sessions using
recursive XML structure.
Too slow (file system); didn’t Using DB40; all services
scale; XML databases didn’t
communicate with easily
mature; Object-Relational
XML serializable JavaBeans.
Mappings (ORM) not efficient
OGC-compatible map and
data services
Too complicated; ORM is a
big overhead.
Google Maps, KML
generating services
Serial job submission
NSF TeraGrid and Open
Science Grid run full time
production Grids for HPC.
Condor-G/Birdbath based
job management extensions
to GeoFEST service.
Grid Job Submission
 Globus provides a universal queuing system interface.
 PBS, LoadLeveler, Sun Grid Engine, LSF
 We chose Condor-G as our job management software for
submitting jobs to HPC queuing systems.
 University of Wisconsin
 Works with Globus, Matlab DCE, Unicore, etc.
 We co-locate Condor-G with our GeoFEST Web Service.
 Communication is through Birdbath, Condor’s Web Service interface.
 So GeoFEST service API is more or less the same, just now Grid enabled.
 We also plan to release a general version of this service.
 Condor command line and Birdbath have different names for job
description parameters.
 Big Easter Egg hunt to find this, but now we know.
Portlet Summary
RDAHMM
Set up and run RDAHMM, query Scripps
GRWS GPS Service, maintain persistent user
sessions.
ST_Filter
Similar to RDAHMM portlet; ST_Filter has
much more input.
Station Monitor
Shows GPS stations on a Google Map,
displays last 10 minutes of data.
Real Time RDAHMM
Displays RDAHMM results of last 10 minutes
of GPS data in a Google map.
Daily RDAHMM
Calculates, updates RDAHMM event
classifications with daily updated GPS data
from SOPAC’s GRWS service (14 day delay,
but uses all the data).
GeoFEST
Create input geometries, generate FE meshes,
run parallel FEM solvers.
Disloc, Simplex
Calculate service displacements from fault
models.
Security Concerns
They’ll see the Big Board!
QuakeSimDistributed Environment for Modeling Observations
Managing Real Time GPS Data
Slides from Galip Aydin
California Real Time Network
Continuous GPS Stations (CGPS) are depicted as
triangles while the Real-Time stations are
represented as circles. Image is obtained from
SOPAC GPS Explorer at
http://sopac.ucsd.edu/projects/realtime
Message Format
Network Data Rates
CRTN GPS
Site Positions
(9 Stations)
Entire SCIGN
Network (250
stations)
Time
RYO
ASCII
GML
1 second
1.5KB
4.03KB
48.7KB
1 hour
5.31MB
14.18MB
171.31MB
1 day
127.44MB
340.38MB
4.01GB
1 month
3.8GB
9.97GB
123.3GB
1 year
45.8GB
119.67GB
1.41TB
1year
1.23TB
16.18TB
160TB
How does one manage all the data generated by the
85 stations? How can you get just the data you want?
Note this is fundamentally different from traditional
request/response style Web Services.
Processing Real-Time GPS Streams
ascii2gm
l
ryo2asc
ii
RYO
Ports
ascii2pos
7010
Raw Data
Scripps
RTD
Server
7011
NB
Server
ryo2nb
Single
Station
7012
Displaceme
nt Filter
GPS Networks
RDAHMM
Filter
Raw
Data
ryo2nb
ryo2asc
ii
ascii2pos
Station
Health
Filter
Single
Station
RDAHMM
Filter
/SOPAC/GPS/CRTN01/RY
O
/SOPAC/GPS/CRTN01/AS
CII
/SOPAC/GPS/CRTN01/PO
S
/SOPAC/GPS/CRTN01/DSM
E
A Complete Sensor Message Processing Path, including a data analysis application.
40
Application Integration with Real-Time
Filters



41
RDAHMM
Station
Monitor
Filter records
Filter
records real-time
real-time
positionspositions
for 10
for 10 minutes
minutes
and invokes
and
calculates position
RDAHMM
application
changes
which
determines
state
changes in
Graph Plotter
Application
the
XYZ visual
signal.
creates
representation
Graph
Plotter Application
of the
positions.
creates
visual
representation of the
RDAHMM output.
42
2 – Multiple Publishers Test
5
2
1
22:30
21:00
19:30
18:00
16:30
15:00
13:30
12:00
10:30
9:00
7:30
6:00
4:30
0
3:00
RYO
Publisher
n
3
0:00
Topi
c 1B
NB Topi
cn
Server
4
1:30
Topi
c2
Topi
c 1A
RYO To
ASCII
Converter
6
Time (ms)
RYO
Publisher
1
Multiple Publishers Test
RYO
Publisher
2
Time Of The Day
Simple
Filter
Transfer Time
Standard Deviation
 We add more GPS networks by running more publishers.
 The results show that 1000 publishers can be supported
with no performance loss. This is an operating system
43
4 – Multiple Brokers Test
RYO
Publisher
 NaradaBrokering allows
creation of Broker networks.
RYO To
ASCII
Converter
Topi
c 1A
NB
Server
1
Topi
c 1B
Simpl
e
Filter
Simpl
1
e
Filter
2
Simple
Filter
750
Simple
Filter
751
NB
Serve
r2
NB
Server
2
Simple
Filter
752
Topi
c 1B
Simple
Filter
1500
 We create a two-broker
network.
 Messages published to first
broker can be received from
the second broker.
 We take timings on each
broker.
 We connect 750 clients to
each broker and run for 24
hours. We chose 750 clients to
stay well below the saturation
limit.