Requirements - Information Sciences Institute

Download Report

Transcript Requirements - Information Sciences Institute

Discovery Systems Program
Barney Pell, Ph.D.
RIACS / NASA Ames Research Center
[email protected]
Presentation to IJCAI-2003 Workshop on Information
Integration Using the Web
Outline of Talk
• Discovery Systems Program Context
– NASA’s Computing Information and Communications
Technology Program
– NASA Program Funding Philosophy
• Discovery Systems Project
–
–
–
–
–
Project Overview
Exploratory Environments and Collaboration
Distributed Data Search, Access, and Analysis
Machine-Assisted Model Discovery and Refinement
Demonstrations, Applications, and Infusions
• Schedule and participation
FY02-FY08 CICT
Overall Project Structure Phasing
Intelligent Systems
Computing, Networking, and Info. Systems
Space Communications
Information Technology Strategic Research
Collaborative Decision Systems
Discovery Systems
Advanced Networking
& Communications
Advanced Computing
Reliable Software
Adaptive Embedded
Information Systems
FY02
FY03
FY04
FY05
FY06
FY07
FY08
CICT Project Definition
- Existing Projects •
Intelligent Systems
– Smarter, more adaptive systems and tools that work collaboratively with humans in a
goal-directed manner to achieve the mission/science goals
•
Computing, Networking and Information Systems
– Seamless access to ground-, air-, and space-based distributed information technology
resources
•
Space Communications
– Innovative technology products for space data delivery enabling high data rates, broad
coverage, internet-like data access
•
Information Technology Strategic Research
– Fundamental information, biologically-inspired, and nanoscale technologies for infusion
into NASA missions
CICT Project Definition
- Proposed FY05-FY07 New-Start Projects •
Collaborative Decision Systems
(FY05)
– Information technologies enabling improved decision making for science and exploration missions
•
Discovery Systems
(FY05)
– Knowledge management and discovery technologies accelerating the scientific process and
engineering analysis
•
Advanced Networking and Communications
(FY05)
– Integrated, intelligent, deeply networked ground and in-space system technologies to enable the
next generation of NASA Enterprise communication architectures
•
Advanced Computing
(FY05)
– Advanced ground and space-based computing technologies to enable NASA’s science and
engineering activities
•
Reliable Software
(FY07)
– Software development, verification, and validation technologies to maintain and increase the
reliability of increasingly complex NASA operational and analysis software systems
•
Adaptive Embedded Information Systems
(FY07)
– Embedded information systems capable of adapting to evolving mission science requirements,
system health, and environmental factors in support of improved science return with reduced
mission risk.
Funding Philosophy
• Cross-cutting Information Technologies
• “As Only NASA Can”
• NASA Relevance
– Future needs of NASA Enterprises
– Would not be filled without funding by NASA
• Research Excellence
– Competitive Evaluation
• Technology Maturity Spectrum
– Breakthrough research
– Demonstrations of capability
– Selective infusions for NASA-relevant efforts
• Milestones and Metrics
– Failable
– “So-what”-able
Discovery Systems
Project Overview
• Objective
–
–
–
–
Create and demonstrate new discovery and analysis technologies
Make them easier to use
Extend them to complex problems in massive, distributed, diverse data
Enabling scientists and engineers to solve increasingly complex
interdisciplinary problems in future data-rich environments.
• Subprojects
– Exploratory Environments and Collaboration
– Distributed Data Search, Access, and Analysis
– Machine-Assisted Model Discovery and Refinement
– Demonstrations, Applications, and Infusions
Discovery Systems Project
- WBS Technology Elements -
– Distributed data search, access and analysis
•
•
•
•
•
Grid based computing and services
Information retrieval
Databases
Planning, execution, agent architecture, multi-agent systems
Knowledge representation and ontologies
– Machine-assisted model discovery and refinement
• Information and data fusion
• Data mining and Machine learning
• Modeling and simulation languages
– Exploratory environments and Collaboration
•
•
•
•
Visualization
Human-computer interaction
Computer-supported collaborative work
Cognitive models of science
Discovery Systems Before/After
Technical Area
Start of Project
After 5 years
Distributed Data
Search Access and
Analysis
Answering queries requires
specialized knowledge of content,
location, and configuration of all
relevant data and model resources.
Solution construction is manual.
Search queries based on high-level
requirements. Solution construction is
mostly automated and accessible to
users who aren’t specialists in all
elements.
Machine integration
of data / QA
Publish a new resource takes 1-3
years. Assembling a consistent
heterogeneous dataset takes 1-3
years. Automated data quality
assessment by limits and rules.
Publish a new resource takes 1 week.
Assembling a consistent heterogeneous
dataset in real-time. Automated data
quality assessment by world models and
cross-validation.
Machine Assisted
Model Discovery
and Refinement
Physical models have hidden
assumptions and legacy restrictions.
Machine learning algorithms are
separate from simulations, instrument
models, and data manipulation codes.
Prediction and estimation systems
integrate models of the data collection
instruments, simulation models,
observational data formatting and
conditioning capabilities. Predictions and
estimates with known certainties.
Exploratory
environments and
collaboration
Co-located interdisciplinary teams
jointly visualize multi-dimensional
preprocessed data or ensembles of
running simulations on wall-sized
matrixed displays.
Distributed teams visualize and interact
with intelligently combined and presented
data from such sources as distributed
archives, pipelines, simulations, and
instruments in networked environments.
Distributed Search, Access and
Analysis
• Objective
– Develop and demonstrate technologies to enable investigating
interdisciplinary science questions by finding, integrating, and
composing models and data from distributed archives, pipelines;
running simulations, and running instruments.
– Support interactive and complex query-formulation with
constraints and goals in the queries; and resource-efficient
intelligent execution of these tasks in a resource-constrained
environment.
– Milestone: Enable novel what-if and predictive question
answering
•
•
•
•
•
Across NASA’s complex and heterogeneous data and simulations
By non data-specialists
Use world-knowledge and meta-data
Support query formulation and resource discovery
Example query: “Within 20%, what will be the water runoff in the
creeks of the Comanche National Grassland if we seed the clouds
over southern Colorado in July and August next year?”
Years-To-Centuries
Chemistry
CO2, CH4, N2O
ozone, aerosols
Climate
Temperature, Precipitation,
Radiation, Humidity, Wind
Heat
Moisture
Momentum
CO2 CH4
N2O VOCs
Dust
Biogeochemistry
Carbon Assimilation
Decomposition
Mineralization
Aerodynamics
Energy
Water
Biogeophysics
Microclimate
Canopy Physiology
Phenology
Evaporation
Transpiration
Snow Melt
Infiltration
Runoff
Intercepted
Water
Snow
Hydrology
Soil
Water
Days-To-Weeks
Minutes-To-Hours
Terrestrial Biogeoscience Involves Many Complex Processes and Data
Bud Break
Leaf Senescence
Gross Primary
Production
Plant Respiration
Microbial Respiration
Nutrient Availability
Species Composition
Ecosystem Structure
Nutrient Availability
Water
Watersheds
Surface Water
Subsurface Water
Geomorphology
Hydrologic
Cycle
Ecosystems
Species Composition
Ecosystem Structure
Vegetation
Dynamics
(Courtesy Tim Killeen and Gordon Bonan, NCAR)
Disturbance
Fires
Hurricanes
Ice Storms
Windthrows
Solution Construction via Composing Models
modeled
phenomenon
evaporation
model
runoff model
snow melt
metadata
data
preparation
surface water
community
snow coverage
snow and ice
DAAC (NASA)
service interface:
required inputs,
provided outputs,
data descriptions,
events
binary data streams
climate model
Each model typically has a
community of experts that
deal with the complexity of the
model and its environment
parameterized
phenomenon
rainfall
Nat. Weather
Service
evaporati
evaporati
runoff mo
runoff mo
topography
USGS
snow melt
metadata
data
preper
data
preper
modeled surface water
phenomenon community
modeled
phenomenon
snow melt
metadata
surface water
community
Virtual Data Grid Example
Application: Three data types of interest:  is derived from ,  is derived from a, which is primary data
(interaction and and operations proceed left to right)
Need 
Need 
 is known. Contact
Materialized Data
Catalogue.
Metadata
Catalogue
Need 
Have 
Proceed?
Need 
How to generate 
( is at LFN)
Estimate for
generating 
Abstract Planner
(for materializing data)
Need 
Request 
LFN for 
Concrete Planner
(generates workflow)
Notify
that 
exists
PERS
requires 
Materialize 
with PERS
Need to
materialize 
Materialized Data Catalogue
LFN = logical file name
PFN = physical file name
PERS = prescription for
generating unmaterialized data
As illustrated, easy to deadlock w/o QoS and SLAs.
Exact steps to
Resolve
generate 
LFN
Grid workflow
PFN
 is
engine
materialized
at LFN
 data
and LFN
Virtual Data Catalogue
(how to generate  and
)
Inform that 
is materialized
Grid storage
resources
Grid compute
resources
Data Grid replica
services
Store an archival copy,
if so requested.
Record existence of
cached copies.
Machine assisted model discovery
and refinement
• Develop and demonstrate methods to
– assist discovery of and fit physically descriptive models with
quantifiable uncertainty for estimation and prediction
– improve the use of observational or experimental data for
simulation and assimilation applied to distributed instrument
systems (e.g. sensor web)
– integrate instrument models with physical domain modeling and
with other instruments (fusion) to quantify error, correct for
noise, improve estimates and instrument performance.
• Eg. Metrics
– 50% reduction in scientist time forming models
– 10% reduction in uncertainty in parameter estimates or a 10%
reduction in effort to achieve current accuracies
– 10% reduction in computational costs associated with a forward
model
– ability to process data on the order of 1000s of dimensions
– ability to estimate parameters from tera-scale data.
Prediction of the 97/98 El Nino
JFM
1998
Predicted
Precipitation
1997
1999
A reasonable 15 month prediction of the 97/98 El Nino is
achieved when ocean height, temperature and surface wind
data are combined to initialize the model.
•Partners
Observing
System of the
Future
•
•
•
•
•
NASA
DoD
Other Govt
Commercial
International
•Advanced Sensors
• Information
Synthesis
• Access to
Knowledge
•Sensor Web
Information
User
Community
Exploratory Environments and
Collaboration
• Objective
– Develop exploratory environments in which
interdisciplinary and/or distributed teams visualize
and interact with intelligently combined and
presented data from such sources as distributed
archives, pipelines, simulations, and instruments in
networked environments.
– Demonstrate that these environments measurably
improve scientists’ capability to answer questions,
evaluate models, and formulate follow-on questions
and predictions.
Multi-parameter Explorations
Conclusion
• Discovery Systems Program
– Exciting NASA funding program
• Follow-on to CNIS and IS/IDU
• ~$250M total over 5 years
– Information Integration is highly relevant
– Focus on NASA needs, but these are challenging
• Program Funding starts FY 2005
– Targeting funding external community FY05
• So likely a broad call sometime in FY04
• We’d like your help
–
–
–
–
Technical workshops in FY04
Advisors wanted for planning teams
Submissions to funding calls
Reviewers