Agent Technology for Data Analysis
Download
Report
Transcript Agent Technology for Data Analysis
Agent Technology for Data
Analysis
Tony Johnson - SLAC
21st October 1998
WORKSHOP ON SCIENTIFIC DATA
MANAGEMENT PROBLEMS AND
SOLUTIONS
Motivation and Disclaimer
Many efforts to use supernetworks to link
supercomputers to transfer huge datasets
Few efforts to make effective use of
existing real-world networks
• Allow university users to access remote data
I am not an agent technology expert
• We do have a prototype application
• I’m hoping some of you are!
Outline
Overview of problem
• Network restraints
Why agent technology?
Why Java
• For Agent Technology?
• For Data Analysis?
Analysis Studio application
More information
What Problem are we trying to
solve?
Widely distributed users who need access
to petabyte datasets
• Many university users with mediocre networks
• Most universities have no way to handle
petabyte data samples
Physicist needs unfettered access to data
• Would like effective use of desktop machine
• Canned analysis wont do
CPU/data access requirements are infinite
Faster
networks?
• Faster networks will
not solve our problems
anytime soon
• No matter how fast
networks are they are
always saturated.
• As networks become
saturated latency
becomes high
Why Agent Technology?
By encapsulating users analysis code as a “user
agent” we can send it to the data, wide-area
network bandwidth requirements become trivial
• Analysis modules are typically small <10’s kBytes
• HEP output is typically histograms (binned) and
scatterplots, which are both small
Possible to do GUI based analysis of large
datasets using 28.8 modem connection
Give user the impression his analysis is running
locally.
Why Java for Agent Technology?
Java produces machine independent bytecodes
• Trivial to move from one machine to another
• Network handling and Remote Method
Invocation (RMI c.f. Corba) built-in
• (Remote) Dynamic loading build-in
• Multithreaded servers easy to write
• Built-in Java “Sandbox” can be used to restrict
agents
Why Java for Data Analysis
Easy to learn yet very powerful, fully OO language
• Very wide industry support
• Just In Time compilation = Fast
• Dynamic Optimization = Faster
• Very fast code, load, test, fix cycle
• Built in debugger, including remote debugging
• Numerical functionality good
– Java Grande Forum enhancing numerical support
“Java Analysis Studio”
Local Data
Desktop
Client
DIM
Remote Data
Network
Data
Server DIM
Data
Controller
Distributed Data
Data
Server
DIM
Data
Server
DIM
Data
Server
DIM
Data
Server
DIM
Data
Server
Data Server DIM
DIM
Demo
Network Performance
View
(Histogram)
View Adapter
Model
(Data Source)
Model Adapter
Caching
Prefetching of data
Data clumping, streaming
More Information
Java
• http://java.sun.com
Java Analysis Studio
• http://www-sldnt.slac.stanford.edu/jas
Java Grande Forum (numeric computing in Java)
• http://www.javagrande.org/
• Desktop access to remote resources
– http://www-fp.mcs.anl.gov/~gregor/datorr/