PolicyInformaticsSept13-07 - Indiana University Bloomington

Download Report

Transcript PolicyInformaticsSept13-07 - Indiana University Bloomington

Computational Infrastructure
for Policy Informatics
Policy Informatics in an Interdependent World
Workshop
Washington DC September 13 2007
Geoffrey Fox
Computer Science, Informatics, Physics
Pervasive Technology Laboratories
Indiana University Bloomington IN 47401
http://grids.ucs.indiana.edu/ptliupages/presentations/
[email protected]
http://www.infomall.org
1
e-moreorlessanything







‘e-Science is about global collaboration in key areas of science,
and the next generation of infrastructure that will enable it.’ from
its inventor John Taylor Director General of Research Councils
UK, Office of Science and Technology
e-Science is about developing tools and technologies that allow
scientists to do ‘faster, better or different’ research
Similarly e-Business captures an emerging view of corporations as
dynamic virtual organizations linking employees, customers and
stakeholders across the world.
This generalizes to e-moreorlessanything including presumably ePolicyinformatics
A deluge of data of unprecedented and inevitable size must be
managed and understood.
People (see Web 2.0), computers, data and instruments must be
linked.
On demand assignment of experts, computers, networks and
storage resources must be supported
2
Role of Cyberinfrastructure







Cyberinfrastructure is infrastructure that supports
distributed science (e-Science)– data, people, computers
Exploits Internet technology (Web2.0) adding (via Grid
technology) management, security, supercomputers etc.
It has two aspects: parallel – low latency (microseconds)
between nodes and distributed – highish latency (milliseconds)
between nodes
Parallel needed to get high performance on individual large
simulations, data analysis etc.; must decompose problem
Distributed aspect integrates already distinct components –
especially natural for data
Cyberinfrastructure is in general a distributed collection of
parallel systems
Cyberinfrastructure is made of services (originally Web
services) that are “just” programs or data sources packaged
for distributed access
3
Structure of Cyberinfrastructure




Distributed software systems are being “revolutionized” by
developments from e-commerce, e-Science and the consumer
Internet. There is rapid progress in technology families termed
“Web services”, “Grids” and “Web 2.0”
The emerging distributed system picture is of distributed services
with advertised interfaces but opaque implementations
communicating by streams of messages over a variety of protocols
• Complete systems are built by combining either services or
predefined/pre-existing collections of services together to
achieve new capabilities
As well as Internet/Communication revolutions (distributed
systems), multicore chips will likely be hugely important (parallel
systems)
Industry not academia is leading innovation in these technologies
4
Policy Informatics Infrastructure


The Party Line approach is clear – one creates a
Cyberinfrastructure consisting of distributed services accessed
by portals/gadgets/gateways/RSS feeds
Services include:
• “original data”
• Transformations or filters implementing DIKW (Data Information
Knowledge Wisdom) pipeline
• Final “Decision Support” step converting wisdom into action
• Generic services such as security, profiles etc.


Some filters could correspond to large simulations
Infrastructure will be set up as a System of Systems (Grids of
Grids)
• Services and/or Grids just accept some form of DIKW and produce
another form of DIKW
• “Original data” has no explicit input; just output
5
Raw Data 
S
S
S
S
FS
FS
FS
FS
MD
FS
MD
O
S
FS
O
S
FS
F
S
FS
MD
MD
SS
O
S
FS
FS
O
S
FS
MD
O
S
FS
F
S
O
S
MD
Filter Service
FS
O
S
FS
Other
Service
MD
O
S
FS
MetaData
SS
S
S
Database
O
S
FS
SS
Another
Grid
FS
O
S
O
S
SS
Decisions
MD
MD
FS
SS
FS
S
S
O
S
SS
Another
Service
 Wisdom
Knowledge
Another
Grid
FS
SS
Information 
S
S
Another
Grid
Data 
S
S
S
S
Another
Service
S
S
S
S
S
S
S
S
S
S
S
S
Sensor Service
6
Information Management/Processing




Diagram describes e-Science, Military Command and Control
and perhaps Policy Informatics
Data  Information  Knowledge  Wisdom transformation
(SOAP or just RSS) messages transport information expressed in
a semantically rich fashion between sources and services that
enhance and transform information so that complete system
provides
• Semantic Web technologies like RDF and OWL might help us
to have rich expressivity but they might be too complicated
We are meant to build application specific information
management/transformation systems for each domain
• Each domain has specific services/standards (for API’s and Information)
and will use generic services (like R for datamining) and standards (RDF,
WSDL)
• What is PIML Policy Informatics Markup Language?
• Standards made before consensus or not observant of technology progress
are dubious (cf. HLA in simulation or many grid standards)
7
Too much Computing?

Historically one has tried to increase computing capabilities by
• Optimizing performance of codes
• Exploiting all possible CPU’s such as Graphics co-processors and “idle
cycles”
• Making central computers available such as NSF/DoE/DoD
supercomputer networks

Next Crisis in technology area will be the opposite problem –
commodity chips will be 32-128way parallel in 5 years time and
we currently have no idea how to use them – especially on clients
• Only 2 releases of standard software (e.g. Office) in this time span

Gaming and Generalized decision support (data mining) are two
obvious ways of using these cycles
• Intel RMS analysis
• Note even cell phones will be multicore

“Too much data” matched to “Too much computing” but
implications involved rather different
8
Intel’s Projection
9
RMS: Recognition Mining Synthesis
Recognition
Mining
Synthesis
What is …?
Is it …?
What if …?
Model
Find a model
instance
Create a model
instance
Today
Model-less
Real-time streaming and
transactions on
static – structured datasets
Very limited realism
Tomorrow
Model-based
multimodal
recognition
Real-time analytics on
dynamic, unstructured,
multimodal datasets
Pradeep K. Dubey, [email protected]
Photo-realism and
physics-based
animation
10
Recognition
What is a tumor?
Mining
Synthesis
Is there a tumor here?
What if the tumor progresses?
It is all about dealing efficiently with complex multimodal datasets
Images courtesy: http://splweb.bwh.harvard.edu:8000/pages/images_movies.html
Pradeep K. Dubey, [email protected]
11
Intel’s Application Stack
12
What should we do?

There will be high quality parallel data mining algorithms
• Speech Recognition, Text and multimedia search and browsers
• New generation of desktop aides
• What are synergies to “Personal aides in an information rich world” (future of
PC?) and Policy Informatics?


What filters (data mining) does policy informatics need?
As computing free, focus on identifying information/knowledge/wisdom
needed (there is probably too much data but not so much wisdom in DIKW
pipeline)
• We should use supercomputer/computer services but Information services more
important and less “controversial”



Identify standards for data and data-mining API’s
Set up distributed Policy Informatics Services
Use Web 2.0 (as it makes things easier) not current Grids (which makes things
harder)
• Build a “Programmable Policy Informatics Web”’
• Emphasize Simplicity
• Is “Secrecy” important and in fact viable?

Should we care just about “original data” or also about the whole pipeline
DIKW?
13
Web 2.0 Mashups
and APIs


http://www.programmable
web.com/apis has (Sept 12
2007) 2312 Mashups and
511 Web 2.0 APIs and with
GoogleMaps the most often
used in Mashups
Mashups are called
workflow in Grid arena
14
The List of Web
2.0 API’s





Each site has API and
its features
Divided into broad
categories
Only a few used a lot
(49 API’s used in 10
or more mashups)
RSS feed of new APIs
Amazon S3 growing
in popularity
15
Spare Slides
16
Grid Service Philosophy I


Services receive data in SOAP messages, manipulate it
and produce transformed data as further messages
Knowledge is created from information by services
• Information is created from data by services




Semantic Grid comes from building metadata rich
systems of services
Meta-data is carried in SOAP messages
The Grid enhances Web services with semantically rich
system and application specific management
One must exploit and work around the different
approaches to meta-data (state) and their manipulation
in Web Services
17
Grid Service Philosophy II





There are a horde of support services supplying security,
collaboration, database access, user interfaces
The support services are either associated with system or
application where the former are WS-* and GS-* which
implicitly or explicitly define many support services
There are generalized filter services which are applications that
accept messages and produce new messages with some data
derived from that in input
• Simulations (including PDE’s and reactive systems)
• Data-mining
• Transformations
• Agents
• Reasoning
• Decision making Tools are all termed filters here
Agent Systems are a special case of Grids
Peer-to-peer systems can be built as a Grid with particular
discovery and messaging strategies
18
Grid Service Philosophy III





Filters can be a workflow which means they are
“just collections of other simpler services”
Grids are distributed systems that accept
distributed messages and produce distributed result
messages
A service or a workflow is a special case of a Grid
A collection of services on a multi-core chip is a
Grid
Sensors or Instruments are “managed” by services;
they may accept non SOAP control messages and
produce data as messages (that are not usually
SOAP)
19
Virtual Observatory Astronomy Grid
Integrate Experiments
Radio
Far-Infrared
Visible
Dust Map
Visible + X-ray
20
Galaxy Density Map
Service or Web service Approach


One uses GML, CML etc. to define the data in a system and one
uses services to capture “methods” or “programs”
In eScience, important services fall in three classes
• Simulations
• Data access, storage, federation, discovery
• Filters for data mining and manipulation





Services use something like WSDL (Web Service Definition
Language) to define interoperable interfaces (see OPAL talk!)
WSDL establishes a “contract” independent of implementation
between two services or a service and a client
Services should be loosely coupled which normally means they
are coarse grain
Services will be composed (linked together) by mashups
(typically scripts) or workflow (often XML – BPEL)
Software Engineering and Interoperability/Standards are closely
related
21
Philosophy of Web Service Grids





Much of Distributed Computing was built by natural
extensions of computing models developed for sequential
machines
This leads to the distributed object (DO) model represented
by Java and CORBA
• RPC (Remote Procedure Call) or RMI (Remote Method
Invocation) for Java
Key people think this is not a good idea as it scales badly
and ties distributed entities together too tightly
• Distributed Objects Replaced by Services
Note CORBA was considered too complicated in both
organization and proposed infrastructure
• and Java was considered as “tightly coupled to Sun”
• So there were other reasons to discard
Thus replace distributed objects by services connected by
“one-way” messages and not by request-response messages
22
Web services

resources
Humans
service logic
BPEL, Java, .NET
Databases
Programs
Computational resources
message processing

Web Services build
loosely-coupled,
distributed
applications, (wrapping
existing codes and
databases) based on the
SOA (service oriented
architecture) principles.
Web Services interact
by exchanging messages
in SOAP format
The contracts for the
message exchanges that
implement those
interactions are
described via WSDL
interfaces.
SOAP and WSDL

Devices
<env:Envelope>
<env:Header>
...
</env:header>
<env:Body>
...
</env:Body>
</env:Envelope>
SOAP messages
23
A typical Web Service


In principle, services can be in any language (Fortran .. Java ..
Perl .. Python) and the interfaces can be method calls, Java RMI
Messages, CGI Web invocations, totally compiled away (inlining)
The simplest implementations involve XML messages (SOAP) and
programs written in net friendly languages like Java and Python
Web Services
WSDL interfaces
Portal
Service
Security
WSDL interfaces
Web Services
Payment
Credit Card
Catalog
Warehouse
Shipping
control
24
The Grid and Web Service Institutional Hierarchy
4: Application or Community of Interest (CoI)
Specific Services such as “Map Services”, “Run
BLAST” or “Simulate a Missile”
XBML
XTCE VOTABLE
CML
CellML
3: Generally Useful Services and Features
(OGSA and other GGF, W3C) Such as “Collaborate”,
“Access a Database” or “Submit a Job”
OGSA GS-*
and some WS-*
GGF/W3C/….
XGSP (Collab)
2: System Services and Features
(WS-* from OASIS/W3C/Industry)
Handlers like WS-RM, Security, UDDI Registry
1: Container and Run Time (Hosting)
Environment (Apache Axis, .NET etc.)
Must set standards to get interoperability
WS-* from
OASIS/W3C/
Industry
Apache Axis
.NET etc.
25
The Ten areas covered by the 60 core WS-* Specifications
WS-* Specification Area
Examples
1: Core Service Model
XML, WSDL, SOAP
2: Service Internet
WS-Addressing, WS-MessageDelivery; Reliable
Messaging WSRM; Efficient Messaging MOTM
3: Notification
WS-Notification, WS-Eventing (Publish-Subscribe)
4: Workflow and Transactions
BPEL, WS-Choreography, WS-Coordination
5: Security
WS-Security, WS-Trust, WS-Federation, SAML,
WS-SecureConversation
6: Service Discovery
UDDI, WS-Discovery
7: System Metadata and State
WSRF, WS-MetadataExchange, WS-Context
8: Management
WSDM, WS-Management, WS-Transfer
9: Policy and Agreements
WS-Policy, WS-Agreement
10: Portals and User Interfaces
WSRP (Remote Portlets)
26
Activities in Global Grid Forum Working Groups
GGF Area
GS-* and OGSA Standards Activities
1: Architecture
High Level Resource/Service Naming (level 2 of slide 6),
Integrated Grid Architecture
2: Applications
Software Interfaces to Grid, Grid Remote Procedure Call,
Checkpointing and Recovery, Interoperability to Job Submittal services,
Information Retrieval,
3: Compute
Job Submission, Basic Execution Services, Service Level Agreements
for Resource use and reservation, Distributed Scheduling
4: Data
Database and File Grid access, Grid FTP, Storage Management, Data
replication, Binary data specification
and interface, High-level
publish/subscribe, Transaction management
5: Infrastructure
Network measurements, Role of IPv6 and high performance
networking, Data transport
6: Management
Resource/Service configuration, deployment and lifetime, Usage
records and access, Grid economy model
7: Security
Authorization, P2P and Firewall Issues, Trusted Computing
27
Two-level Programming I
• The Web Service (Grid) paradigm implicitly assumes a
two-level Programming Model
• We make a Service (same as a “distributed object” or
“computer program” running on a remote computer) using
conventional technologies
– C++ Java or Fortran Monte Carlo module
– Data streaming from a sensor or Satellite
– Specialized (JDBC) database access
• Such services accept and produce data from users files and
databases
Service
Data
• The Grid is built by coordinating such services assuming
we have solved problem of programming the service 28
Two-level Programming II




The Grid is discussing the composition of distributed
services with the runtime Service1
Service2
interfaces to Grid as
opposed to UNIX
Service3
Service4
pipes/data streams
Familiar from use of UNIX Shell, PERL or Python
scripts to produce real applications from core programs
Such interpretative environments are the single
processor analog of Grid Programming
Some projects like GrADS from Rice University are
looking at integration between service and composition
levels but dominant effort looks at each level separately
29
Grid Workflow Data Assimilation in Earth Science

Grid services triggered by abnormal events and controlled by workflow process real
time data from radar and high resolution simulations for tornado forecasts
Typical
graphical
interface to
service
composition
30