Geosensor06 - Computer Science and Engineering

Download Report

Transcript Geosensor06 - Computer Science and Engineering

Ohio State University
Department of Computer
Science and Engineering
Cyberinfrastructure for Coastal
Forecasting and Change Analysis
Gagan Agrawal
Hakan Ferhatosmanoglu
Xutong Niu
Ron Li
Keith Bedford
The Ohio State University
1
Ohio State University
Department of Computer
Science and Engineering
Context
• New Award from Office of Cyberinfrastructure (OCI)
– Under Cyberinfrastructure for Environmental Observatories
Program
– September 2006 – August 2009, total amount $1,400,000
• Involves 2 Computer Scientists and 2 Environmental
Scientists
–
–
–
–
G. Agrawal (PI) – Grid Middleware
H. Ferhatosmanoglu – Databases
K. Bedford: Great Lakes Now/Forecasting
R. Li: Coastal Erosion Analysis
2
Ohio State University
Department of Computer
Science and Engineering
Coastal Forecasting and Change
Detection (Lake Erie)
3
Ohio State University
Department of Computer
Science and Engineering
Project Premise
• Limitation of Current Environmental
Observation Systems
– Tightly coupled systems
» No reuse of algorithms
» Very hard to experiment with new algorithms
– Closely tied to existing resources
• Our claim
– Emerging trends towards web-services and gridservices can help
4
Ohio State University
Department of Computer
Science and Engineering
Challenges
• Existing Grid Middleware Systems have not
considered
– Processing of Streaming Data
– Data Integration Issues
• The applications involved needs techniques for
multi-modal data fusion, query planning, and
data mining
– Need to implement them as grid or web-services
5
Ohio State University
Department of Computer
Science and Engineering
Proposed Infrastructure and
Collaboration
6
Ohio State University
Department of Computer
Science and Engineering
Application Details: Great Lakes
Now/ForeCasting
• GLOS: Great Lakes Observing System
– Co-designer/project manager: K. Bedford, a co-PI on
this project
– Collaboration with NOAA
• Limitations: Hard-wired
– Cannot incorporate new streams or algorithms
• Create an Implementation using our
Middleware for Streaming Data
7
Ohio State University
Department of Computer
Science and Engineering
Application Details: Coastal Erosion
Prediction and Analysis
• Focus: Erosion along Lake Erie Shore
– Serious problem
– Substantial Economic Losses
• Prediction requires data from
– Variety of Satellites
– In-situ sensors
– Historical Records
• Challenges
– Analyzing distributed data
– Data Integration/Fusion
8
Ohio State University
Department of Computer
Science and Engineering
Middleware Developed at Ohio State
• Automatic Data Virtualization Framework
– Enabling processing and integration of data in lowlevel formats
• GATES (Grid-based AdapTive Execution on
Streams)
– Processing of distributed data streams
• FREERIDE-G (FRamework for Rapid
Implementation of Datamining Engines in
Grid)
– Supporting scalable data analysis on remote data
9
Ohio State University
Department of Computer
Science and Engineering
Automatic Data Virtualization:
Motivation
• Access mechanisms for remote repositories
– Complex low-level formats make accessing and
processing of data difficult
– Main desired functionality
» Ability to select, down-load, and process a subset of data
• Sensor Data
– Again, low level data
– Need to convert formats
– Need a flexible architecture
10
Ohio State University
Department of Computer
Science and Engineering
Data Virtualization
An abstract view of data
Data Service
Data
Virtualization
dataset
By Global Grid Forum’s DAIS working group:
• A Data Virtualization describes an abstract view of data.
• A Data Service implements the mechanism to access and process data
through the Data Virtualization
11
Ohio State University
Department of Computer
Science and Engineering
Our Approach: Automatic Data
Virtualization
• Automatically create data services
– A new application of compiler technology
• A metadata descriptor describes the layout of
data on a repository
• An abstract view is exposed to the users
• Two implementations:
– Relational /SQL-based
– XML/XQuery based
12
Ohio State University
Department of Computer
Science and Engineering
Streaming Data Model
• Continuous data arrival and processing
• Emerging model for data processing
– Sources that produce data continuously: sensors, long running
simulations
– Critical In Environmental Observatories
• Active topic in many computer science communities
– Databases
– Data Mining
– Networking ….
13
Ohio State University
Department of Computer
Science and Engineering
Need for a Grid-Based Stream
Processing Middleware
• Application developers interested in data stream
processing
– Will like to have abstracted
» Grid standards and interfaces
» Adaptation function
– Will like to focus on algorithms only
• GATES is a middleware for
– Grid-based
– Self-adapting
Data Stream Processing
14
Ohio State University
Department of Computer
Science and Engineering
Adaptation for Real-time Processing
• Analysis on streaming data is approximate
• Accuracy and execution rate trade-off can be
captured by certain parameters (Adaptation
parameters)
– Sampling Rate
– Size of summary structure
• Application developers can expose these
parameters and a range of values
15
Ohio State University
Department of Computer
Science and Engineering
FREERIDE-G: Supporting Distributed Data-Intensive
Science
User
?
Compute Cluster
Data Repository Cluster
16
Ohio State University
Department of Computer
Science and Engineering
Challenges for Application Development
• Analysis of large amounts of disk resident data
• Incorporating parallel processing into analysis
• Processing needs to be independent of other
elements and easy to specify
• Coordination of storage, network and
computing resources required
• Transparency of data retrieval, staging and
caching is desired
17
Ohio State University
Department of Computer
Science and Engineering
FREERIDE-G Goals
• Support High-End Processing
– Enable efficient processing of large scale data mining
computations
• Ease Use of Parallel Configurations
– Support shared and distributed memory parallelization starting
from a common high-level interface
• Hide Details of Data Movement and Caching
– Data staging and caching (when feasible/appropriate) needs to be
transparent to application developer
18
Ohio State University
Department of Computer
Science and Engineering
Data Analysis Services
• Multi-model Multi-Sensor Data Integration
– Built on our Data Virtualization Framework
• Query Planning Service
– Feature Extraction: Integration with Grid Metadata
Catalogs
• Remote Mining of Spatio-Temporal Data
– Built using FREERIDE-G
• Mining algorithms for Data Streams
– Built using GATES
19
Ohio State University
Department of Computer
Science and Engineering
Recap
20
Ohio State University
Department of Computer
Science and Engineering
Looking For
• Feedback on our approach
• Synergy with other efforts
• Lessons learnt by others
21