DDADS_EMPACT

Download Report

Transcript DDADS_EMPACT

Distributed Data Analysis &
Dissemination System (D-DADS)
Special Interest Group
on Data Integration
June 2000
Overview
Environmental data are collected by multiple, disparate data
providers, such as individual EMPACT projects
Each data provider presents their data in their own format
making it difficult to find, access, read, and integrate the
data
Standardized formats and data dissemination systems are
required for data accessibility and integration of distributed
data sets
This proposal presents a distributed data analysis and
delivery system that provides users with data access to
multiple sources
The Data Flow Process:
From Raw Data to Refined Knowledge
•
•
•
Primary data are gathered from providers of sensory data
Data are integrated, filtered, aggregated and fused into secondary data
Reports are prepared for delivering environmental knowledge to the public
EMPACT
Data Flow Resistances
The data flow process is hampered by a number of resistances.
•The user does not know what data are available
•The available data are poorly described (metadata)
•There is a lack of QA/QC information
•The data come in various formats requiring hand crafted
codes to read and manipulate them
These resistances can be overcome through a distributed
system that catalogs and standardizes the data allowing easy
access for data manipulation and analysis.
Interoperability
One requirement for an effective distributed environmental
data system is interoperability, defined as,
“the ability to freely exchange all kinds of spatial
information about the Earth and about objects and
phenomena on, above, and below the Earth’s surface;
and to cooperatively, over networks, run software
capable of manipulating such information.” (Buehler &
McKee, 1996)
Such a system has two key elements:
• Exchange of meaningful information
• Cooperative and distributed data management
Distributed Data Analysis
& Dissemination System:
D-DADS
• Specifications:
Uses standardized forms of data, metadata and access protocols
 Supports distributed data archives, each run by its own provider
 Provides tools for data exploration, analysis and presentation

• Features:
Data are organized as multidimensional data cubes
 Dimensional data cubes are distributed but shared
 Analysis is supported by built-in and user functions
 Supports other data types, such as images, GIS data layers, etc.

D-DADS Architecture
The D-DADS Components
• Data Providers supply primary data to system, through SQL or
other data servers.
• Standardized Description & Format populate and describe
the data cubes and other data types using a standard metadata
describing data
• Data Access and Manipulation tools for providing a unified
interface to the data cubes and GIS data layers for accessing and
processing (filtering, aggregating, fusing) data and integrating data
into virtual data cubes
• Users are the analysts who access the D-DADS and produce
knowledge from the data
The multidimensional data access and manipulation
component of D-DADS can be implemented using OLAP.
On-line Analytical Processing:
OLAP
•
A multidimensional data model making it easy to select, navigate,
integrate and explore the data.
• An
analytical query language providing power to filter, aggregate
and merge data as well as explore complex data relationships.
• Ability
to create calculated variables from expressions based on
other variables in the database.
•
Pre-calculation of frequently queried aggregated values, i.e.
monthly averages, enables fast response time to ad hoc queries.
Fast Analysis of Shared
Multidimensional Information (FASMI)
(Nigel, P. “The OLAP Report”)
An OLAP system is characterized as:
being Fast – The system is designed to deliver relevant data to
users quickly and efficiently; suitable for ‘real-time’ analysis
facilitating Analysis – The capability to have users extract not only
“raw” data but data that they “calculate” on the fly.
being Shared – The data and its access are distributed.
being Multidimensional – The key feature. The system provides a
multidimensional view of the data.
exchanging Information – The ability to disseminate large
quantities of various forms of data and information.
Multi-Dimensional Data Cubes
•Multi-dimensional data models
use inherent relationships in data
to populate multidimensional
matrices called data cubes.
•A cube's data can be queried
using any combination of
dimensions
•Hierarchical data structures are
created by aggregating the data
along successively larger ranges
of a given dimension, e.g time
dimension can contain the
aggregates year, season, month
and day.
User Interaction with D-DADS
Query
XML data
Distributed
Database
Data View
(Table, Map,
Time Chart, etc.)
XML data
Example Application: Visibility D-DADS
Visibility observations (extinction coefficient) are an indicator
of air quality and serve as an important data set in the public’s
understanding of air quality.
A visibility D-DADS will consist of multiple forms of
visibility data, such as visual range observations and digital
images from web cameras.
Potential visibility data providers include:
- EMPACT projects and their hourly visual range data
- The IMPROVE database
- CAPITA, a warehouse for global surface observation
data available every six hours
Possible Node in Geography Network
National Geographic and ESRI are establishing a geography
network consisting of distributed spatial databases.
Some EMPACT projects are participating as nodes in the initial
start-up phase
The visibility distributed data and analysis system could link to
and become another node in the geography network, making use
of the geography network’s spatial viewers.
Other views, such as a time view could be linked with the spatial
viewer to take advantage of the multidimensional visibility data
cubes.
Example Viewer
Map
View
Time
View
Variable
View
WebCam
View
The views are linked so that making a change in one view, such as
selecting a different location in the map view, updates the other views.
Summary
In the past, data analysis has been hampered by data flow
resistances. Fortunately, the tools and framework to
overcome these resistances now exist, including:
• World Wide Web
• XML
• OLAP
• ArcIMS
• Metadata standards
It appears timely to consider a distributed environmental
data analysis and dissemination system.