Transcript DDADS2

Distributed Data Analysis &
Dissemination System (D-DADS)
Special Interest Group on Data
Integration
June 2000
The Data Flow Process:
From Raw Data to Refined Knowledge
•
•
•
Primary data are gathered from providers of sensory data
Data are integrated, filtered, aggregated and fused into secondary data
Reports are prepared for many purposes, using the integrated data
The Researcher’s Challenge
“The researcher cannot get access to the data;
if he can, he cannot read them;
if he can read them,
he does not know how good they are;
and if he finds them good he cannot merge them
with other data.”
Information Technology and the Conduct of Research: The Users View
National Academy Press, 1989
Data Flow Resistances
The data flow process is hampered by a number of resistances.
•The user does not know what data are available
•The available data are poorly described (metadata)
•There is a lack of QA/QC information
•The data come in various formats requiring hand crafted
codes to read and manipulate them
These resistances can be overcome through a distributed
system that catalogs and standardizes the data allowing easy
access for data manipulation and analysis.
Distributed Data Analysis &
Dissemination System:
D-DADS
• Specifications:
Uses standardized forms of data, metadata and access protocols
 Supports distributed data archives, each run by its own provider
 Provides tools for data exploration, analysis and presentation

• Features:
Data are organized as multidimensional data cubes
 Dimensional data cubes are distributed but shared
 Analysis is supported by built-in and user functions
 Supports other data types, such as images, GIS data layers, etc.

D-DADS Architecture
The D-DADS Components
• Data Providers supply primary data to system, through SQL or
other data servers.
• Standardized Description & Format populate and describe
the data cubes and other data types using a standard metadata
describing data
• Data Access and Manipulation tools for providing a unified
interface to the data cubes, GIS data layers for accessing and
processing (filtering, aggregating, fusing) data and integrating data
into virtual data cubes
• Users are the analysts who access the D-DADS and produce
knowledge from the data
The multidimensional data access and manipulation
component of D-DADS will be implemented using OLAP.
On-line Analytical Processing:
OLAP
•
A multidimensional data model making it easy to select, navigate,
integrate and explore the data.
• An
analytical query language providing power to filter, aggregate
and merge data as well as explore complex data relationships.
• Ability
to create calculated variables from expressions based on
other variables in the database.
•
Pre-calculation of frequently queried aggregated values, i.e.
monthly averages, enables fast response time to ad hoc queries.
User Interaction with D-DADS
Query
XML data
Distributed
Database
Data View
(Table, Map, etc.)
XML data
Metadata Standardization
Metadata standards for describing air quality data are
currently being actively pursued by several
organizations, including:
• The Supersite Data Management Workgroup
• NARSTO
• FGDC
Potential D-DADS Nodes
The following organizations are potential nodes in a
distributed data analysis and dissemination system:
• CAPITA
• NPS-CIRA
• EMPACT
• EPA Supersites
- California
- Texas
- St. Louis
Summary
In the past, data analysis has been hampered by data flow
resistances. Fortunately, the tools and framework to
overcome each of these resistances now exist, including:
• World Wide Web
• XML
• OLAP
• ArcIMS
• Metadata standards
It appears timely to consider a distributed air quality data
analysis and dissemination system.