Agent Technology for Data Analysis

Download Report

Transcript Agent Technology for Data Analysis

Agent Technology for Data
Analysis
Tony Johnson - SLAC
21st October 1998
WORKSHOP ON SCIENTIFIC DATA
MANAGEMENT PROBLEMS AND
SOLUTIONS
Motivation and Disclaimer
 Many efforts to use supernetworks to link
supercomputers to transfer huge datasets
 Few efforts to make effective use of
existing real-world networks
• Allow university users to access remote data
 I am not an agent technology expert
• We do have a prototype application
• I’m hoping some of you are!
Outline
 Overview of problem
• Network restraints
 Why agent technology?
 Why Java
• For Agent Technology?
• For Data Analysis?
 Analysis Studio application
 More information
What Problem are we trying to
solve?
 Widely distributed users who need access
to petabyte datasets
• Many university users with mediocre networks
• Most universities have no way to handle
petabyte data samples
 Physicist needs unfettered access to data
• Would like effective use of desktop machine
• Canned analysis wont do
 CPU/data access requirements are infinite
Faster
networks?
• Faster networks will
not solve our problems
anytime soon
• No matter how fast
networks are they are
always saturated.
• As networks become
saturated latency
becomes high
Why Agent Technology?

By encapsulating users analysis code as a “user
agent” we can send it to the data, wide-area
network bandwidth requirements become trivial
• Analysis modules are typically small <10’s kBytes
• HEP output is typically histograms (binned) and
scatterplots, which are both small
Possible to do GUI based analysis of large
datasets using 28.8 modem connection
 Give user the impression his analysis is running
locally.

Why Java for Agent Technology?

Java produces machine independent bytecodes
• Trivial to move from one machine to another
• Network handling and Remote Method
Invocation (RMI c.f. Corba) built-in
• (Remote) Dynamic loading build-in
• Multithreaded servers easy to write
• Built-in Java “Sandbox” can be used to restrict
agents
Why Java for Data Analysis

Easy to learn yet very powerful, fully OO language
• Very wide industry support
• Just In Time compilation = Fast
• Dynamic Optimization = Faster
• Very fast code, load, test, fix cycle
• Built in debugger, including remote debugging
• Numerical functionality good
– Java Grande Forum enhancing numerical support
“Java Analysis Studio”
Local Data
Desktop
Client
DIM
Remote Data
Network
Data
Server DIM
Data
Controller
Distributed Data
Data
Server
DIM
Data
Server
DIM
Data
Server
DIM
Data
Server
DIM
Data
Server
Data Server DIM
DIM
Demo
Network Performance
View
(Histogram)
View Adapter
Model
(Data Source)
Model Adapter
Caching
Prefetching of data
Data clumping, streaming
More Information
 Java
• http://java.sun.com
 Java Analysis Studio
• http://www-sldnt.slac.stanford.edu/jas
 Java Grande Forum (numeric computing in Java)
• http://www.javagrande.org/
• Desktop access to remote resources
– http://www-fp.mcs.anl.gov/~gregor/datorr/