Presentation (PowerPoint File)

Download Report

Transcript Presentation (PowerPoint File)

ADaM System Architecture
Rahul Ramachandran, Sara Graves and
Ken Keiser
Mathematical Challenges in Scientific Data Mining
IPAM January 14-18, 2002
Information Technology and Systems Center
University of Alabama in Huntsville
[email protected]
ITSC/University of Alabama in
Huntsville
Talk Overview
Mining System Requirements
ADaM System Architecture
ADaM Plan Builder
Research directions
ITSC/University of Alabama in
Huntsville
Mining System Requirements:
When,Where and Who
WHEN
•Real Time
•On-Ingest
•On-Demand
•Repeatedly
WHERE
•User Workstation
•Data Archive Center
•Data Mining Center
WHO
•Casual Users
•Domain Experts
•Mining Experts
Data Mining
ITSC/University of Alabama in
Huntsville
Algorithm Development and
Mining (ADaM) System
ADaM system developed under NASA research grant
The system provides knowledge discovery, feature
detection and content-based searching for data values, as
well as for metadata.

It contains over 120 different operations to be performed on
the input data stream.

Operations vary from specialized atmospheric science dataset specific algorithms to different digital image processing
techniques, processing modules for automatic pattern
recognition, machine perception, neural networks and genetic
algorithms.
ITSC/University of Alabama in
Huntsville
ADaM Features
Handles science data set variability




Multiple resolution/multiple scales
Variability of formats
Granularity of data
Includes spatial/temporal dimensions
Allows addition of new algorithms
Allow scientists to select and sequence
different operations
ITSC/University of Alabama in
Huntsville
ADaM Engine
Architecture
Results
Translated
Data
Data
Preprocessed
Data
Patterns/
Models
Processing
Input
HDF
HDF-EOS
GIF PIP-2
SSM/I Pathfinder
SSM/I TDR
SSM/I NESDIS Lvl 1B
SSM/I MSFC
Brightness Temp
US Rain
Landsat
ASCII Grass
Vectors (ASCII Text)
Intergraph Raster
Others...
Preprocessing
Analysis
Clustering
Selection and Sampling
K Means
Subsetting
Isodata
Subsampling
Maximum
Select by Value
Pattern Recognition
Bayes Classifier
Coincidence Search
Min. Dist. Classifier
Grid Manipulation
Image Analysis
Grid Creation
Boundary Detection
Bin Aggregate
Cooccurrence Matrix
Dilation and Erosion
Bin Select
Histogram
Grid Aggregate
Operations
Grid Select
Polygon
Find Holes
Circumscript
Spatial Filtering
Image Processing
Texture
Operations
Cropping
Genetic Algorithms
Inversion
Neural Networks
Thresholding
ITSC/University of Alabama
Others... in
Others...
Huntsville
Output
GIF Images
HDF-EOS
HDF Raster Images
HDF SDS
Polygons (ASCII, DXF)
SSM/I MSFC
Brightness Temp
TIFF Images
Others...
ADaM Mining Environment
Distributed Clients
Web-based
Workstation
based
Other Systems
Analysis/Vis
Tools
Data Mining Server
Common Client API
Knowledge Base
Mining Engine (ADaM)
Input
Modules
Analysis
Modules
Output
Modules
Mining
Results
Data Stores
ITSC/University of Alabama in
Huntsville
Event/
Relationship
Search
System
ADaM Architecture
ITSC/University of Alabama in
Huntsville
ADaM Miner Engine
Manages the processing of data through a
series of specified operations
Loads input, processing and output modules
dynamically as needed at execution time
Allows for the addition of newly developed
modules without the need to rebuild the
engine
Interprets a mining plan script that provides
the details about specified operations and the
order that they should be executed
ITSC/University of Alabama in
Huntsville
ADaM Miner Database
Used to store information that includes
the names, locations and related
metadata for input data sets available
on the server
Includes information about users, jobs,
mining results, and other related
information
Simple relational database
ITSC/University of Alabama in
Huntsville
ADaM Daemon and
Scheduler
Scheduler


Examines the list of jobs to be executed on the
server and determines which job or jobs to
execute at any given time
Queues the requests and executes them
sequentially.
Daemon


Handles all network communications with the
mining system
Is configured to listen on a specific port for any
socket communications
ITSC/University of Alabama in
Huntsville
ADaM Input/Operation
Filters
Input/Output Filters are data readers and
writers
Operations are the algorithms
Each of the operations and (input/output)
filters is implemented as a shared library
New modules may be added to the system
without recompiling or relinking.
All operations/filters either produce or operate
on a data collection, which provides a
common format for representing scientific
ITSC/University of Alabama in
data.
Huntsville
General Mining Steps
Select data files to be mined
“Check-In” the data files into the Miner
Database
Write a “Mining Plan” consisting of sequence
of input filter and operations
Execute the Mining Plan using the engine
Check and save results
Iterate
ITSC/University of Alabama in
Huntsville
What is Check-In?
Process of encoding information such as the names,
locations and related metadata for input data sets available
on the server
Create complex data hierarchy in the database
ITSC/University of Alabama in
Huntsville
ADaM Plan Builder: Check-In
Two Modes of Operation
-General: which only requires
minimal information
-Advanced: requires more
detailed information and
Allows user to set up
structured database
Path to the data file
Data file name
Input Filter associated with the
Data file
Load an XML file containing
existing Check-In specifications
ITSC/University of Alabama in
Huntsville
ADaM Plan Builder – Layout
Operation Menu contains the list
of operations one can select
Input Menu contains the list
of Input Filters one can select
Plan Menu allows one to:
•Select a new plan
•Load existing plan
•Check-In data
ITSC/University of Alabama in
Huntsville
ADaM Plan Builder – Layout
Panel where Mining Plan can be
viewed either as text or a tree
ITSC/University of Alabama in
Huntsville
ADaM Plan Builder – Layout
Description about the Operation/Input Filter
can be viewed in this panel
ITSC/University of Alabama in
Huntsville
ADaM Plan Builder – Layout
All the parameters needed for
the Operation are described here
ITSC/University of Alabama in
Huntsville
ADaM Plan Builder – Layout
Sample values for Operation’s
parameters are shown in this panel
ITSC/University of Alabama in
Huntsville
ADaM Plan Builder – Layout
Allows user to select the operation
and add it to the Mining Plan
Go Mine the data
using the Mining
Plan
ITSC/University of Alabama in
Huntsville
Research Directions
Generic Data Reader for ADaM

ESML – Earth Science Markup Language
Programmers Guide for ADaM
Distributed Mining
Grid Mining

Successful implementation and testing of the ADaM
system on the NASA Information Power Grid
Mining Onboard the Space Craft

The EnVironmEnt for On-Board Processing (EVE)
system
ITSC/University of Alabama in
Huntsville
ADaM Information
Web site:

datamining.itsc.uah.edu
ADaM Lite beta version download

Contact: [email protected]
ITSC/University of Alabama in
Huntsville