GSD Realtime Systems Product Strategy or “What the TReK is going

download report

Transcript GSD Realtime Systems Product Strategy or “What the TReK is going

UAH GRIDS Center
Middleware Testing
Sandra Redman
Information Technology and Systems Center
and
Information Technology Research Center
National Space Science and Technology Center
256-961-7806
[email protected]
[email protected]
www.itsc.uah.edu
“…drowning in data but starving for knowledge”
Data glut affects
business, medicine,
military, science
How do we leverage
data to make
BETTER decisions???
Information
User
Community
Data Mining
• Automated discovery of patterns, anomalies from vast
•
observational data sets
Derived knowledge for decision making, predictions
and disaster response
http://datamining.itsc.uah.edu
Mining Environment:
When,Where, Who and Why?
WHEN
•Real Time
•On-Ingest
•On-Demand
•Repeatedly
WHERE
•User Workstation
•Data Mining Center
•GRID
WHO
•End Users
•Domain Experts
•Mining Experts
Data Mining
WHY
•Event
•Relationship
•Association
•Corroboration
•Collaboration
Algorithm Development and Mining
(ADaM)
ADaM consists of:
• a data mining engine
• an extensible set of
core functional
applications to aid
researchers in defining
and performing data
mining operations on
spatial data sets
• data mining modules
as Open Grid Services
Architecture (OGSA)
services
ADaM Engine Architecture
Results
Translated
Data
Data
Preprocessed
Data
Patterns/
Models
Processing
Input
HDF
HDF-EOS
GIF PIP-2
SSM/I Pathfinder
SSM/I TDR
SSM/I NESDIS Lvl 1B
SSM/I MSFC
Brightness Temp
US Rain
Landsat
ASCII Grass
Vectors (ASCII Text)
Intergraph Raster
Others...
Preprocessing
Selection and Sampling
Subsetting
Subsampling
Select by Value
Coincidence Search
Grid Manipulation
Grid Creation
Bin Aggregate
Bin Select
Grid Aggregate
Grid Select
Find Holes
Image Processing
Cropping
Inversion
Thresholding
Others...
Analysis
Output
Clustering
K Means
Isodata
Maximum
Pattern Recognition
Bayes Classifier
Min. Dist. Classifier
Image Analysis
Boundary Detection
Concurrence Matrix
Dilation and Erosion
Histogram
Operations
Polygon
Circumscript
Spatial Filtering
Texture Operations
Genetic Algorithms
Neural Networks
Others...
GIF Images
HDF-EOS
HDF Raster Images
HDF SDS
Polygons (ASCII, DXF)
SSM/I MSFC
Brightness Temp
TIFF Images
Others...
NMI Testing
ADaM Feature Subset Selection application
chosen for testing
 Supervised pattern classification is a technique
important in many domains
 Used to improve both the runtime and accuracy of a
supervised pattern classifier by eliminating noisy,
irrelevant or redundant attributes or features from the
data set.
 Feature subset selection is the process of choosing a
subset of the features from the original data set in order
to maximize classifier accuracy
 Both processor and data-intensive
Parallel Version of Cloud Extraction
• GOES images can be
used to recognize
cumulus cloud fields
• Cumulus clouds are
small and do not show
up well in 4km
resolution IR channels
• Detection of cumulus
cloud fields in GOES
can be accomplished
by using texture
features or edge
detectors
GOES Image
Energy
Computation
Laplacian Filter
Sobel Horizontal
Filter
Sobel Vertical
Filter
Energy
Computation
Energy
Computation
Energy
Computation
Classifier
Cloud Image
GOES Image Cumulus Cloud
Mask
• Three edge detection filters are used together to detect cumulus clouds
which lends itself to implementation on a parallel cluster
Feature Subset Selection Application
• Application ported to
•
•
•
•
•
•
linux
Support Vector Machine
downloaded and tested
Developed application
scripts
Modified for Globus
environment by writing
simple Globus RSL file
Ran each combination of
tools on a different node
on the grid
Globus used to execute
jobs on different
machines
Experimented with both
real and synthetic data
Satellite
Data
Grid Mining
Agent
Archive X
Grid
Processor
Grid Mining
Agent
Grid
Processor
Satellite
Data
Grid Mining
Agent
Archive Y
Grid
Processor
Components used in testing
 Globus toolkit - the “defacto standard,” an open source software
toolkit and libraries for building grid applications; Resource Management,
scheduling, information services, file transfer
 GSI- OpenSSH - a modified version of OpenSSH that adds support for GSI
authentication, providing a single sign-on remote login capability for the
Grid
 Condor-G - workload management system for compute-intensive jobs;
job queueing mechanism, scheduling policy, priority scheme, resource
monitoring, and resource management.
 Network Weather Service - monitors and dynamically forecasts the
performance various network and computational resources can deliver
over a given time interval
Some Lessons Learned
• Component testing went well
 Globus documentation improved, installation
trouble-free, application port straight-forward
 No problems encountered during Condor-G
installation, but found problem with Condor-G under
Redhat linux 7.3 when using nss_ldap. Developer
provided workaround - start name service caching
daemon (nscd)
 GSI-OpenSSH installed, but Kerberos authentication
did not work since linux was not compiled with PAM
option (undocumented)
 Network Weather Service installed, but learned we
are more interested in MDS
Some Lessons Learned
• NMI Testbed Process working well
•
•
•
 Answers found through NMI discussion lists
from developers and other users
Have to “sell” the grid concept to developers,
administrators, users
NMI Work proven helpful in other grid work
 TeraGrid
 ISS Space-based Science Operations Grid
 CEOS Grid
Need more components!