Towards Personalized and Active Information Management for

Download Report

Transcript Towards Personalized and Active Information Management for

Towards Personalized and
Active Information
Management for Meteorological
Investigations
Beth Plale
Indiana University
USA
Problem Statement
• Mesoscale meteorology research is highly datadriven.
– Large percentage of data streams in from
observational platforms. Available in OPeNDAP
servers.
– Data that is over 10 minutes old is too old.
– Researchers are currently working on increasing realtime responsiveness to developing weather
conditions.
• Mesoscale meteorology is a vast information
space.
– Forecasting models assimilate data from growing
number of sources
Solution Statement
• Internet has proven the utility of user-oriented
view towards information space management
– Browser, bookmarks to organize
– Blogs, web page tools (FrontPage, Dreamweaver) to
publish
• We apply concept of user-oriented view to
management of mesoscale meteorology
information space.
• myLEAD: tool to help an investigator make
sense of, and operate in, the vast information
space that is mesoscale meteorology.
Motivation for LEAD
• Each year, mesoscale weather – floods, tornadoes,
hail, strong winds, lightning, and winter storms –
causes hundreds of deaths, routinely disrupts
transportation and commerce, and results in annual
economic losses > $13B.
Conventional Numerical Weather Prediction
OBSERVATIONS
Radar Data
Mobile Mesonets
Surface Observations
Upper-Air Balloons
Commercial Aircraft
Geostationary and Polar
Orbiting Satellite
Wind Profilers
GPS Satellites
Conventional Numerical Weather Prediction
OBSERVATIONS
Analysis/Assimilation
Radar Data
Mobile Mesonets
Surface Observations
Upper-Air Balloons
Commercial Aircraft
Geostationary and Polar
Orbiting Satellite
Wind Profilers
GPS Satellites
Quality Control
Retrieval of Unobserved
Quantities
Creation of Gridded Fields
Conventional Numerical Weather Prediction
OBSERVATIONS
Analysis/Assimilation
Prediction
Radar Data
Mobile Mesonets
Surface Observations
Upper-Air Balloons
Commercial Aircraft
Geostationary and Polar
Orbiting Satellite
Wind Profilers
GPS Satellites
Quality Control
Retrieval of Unobserved
Quantities
Creation of Gridded Fields
PCs to Teraflop Systems
Conventional Numerical Weather Prediction
OBSERVATIONS
Analysis/Assimilation
Prediction
Radar Data
Mobile Mesonets
Surface Observations
Upper-Air Balloons
Commercial Aircraft
Geostationary and Polar
Orbiting Satellite
Wind Profilers
GPS Satellites
Quality Control
Retrieval of Unobserved
Quantities
Creation of Gridded Fields
PCs to Teraflop Systems
Product Generation,
Display,
Dissemination
Conventional Numerical Weather Prediction
OBSERVATIONS
Analysis/Assimilation
Prediction
Radar Data
Mobile Mesonets
Surface Observations
Upper-Air Balloons
Commercial Aircraft
Geostationary and Polar
Orbiting Satellite
Wind Profilers
GPS Satellites
Quality Control
Retrieval of Unobserved
Quantities
Creation of Gridded Fields
PCs to Teraflop Systems
Product Generation,
Display,
Dissemination
End Users
NWS
Private Companies
Students
Conventional Numerical Weather Prediction
OBSERVATIONS
Analysis/Assimilation
Prediction
Radar Data
Mobile Mesonets
Surface Observations
Upper-Air Balloons
Commercial Aircraft
Geostationary and Polar
Orbiting Satellite
Wind Profilers
GPS Satellites
Quality Control
Retrieval of Unobserved
Quantities
Creation of Gridded Fields
PCs to Teraflop Systems
Product Generation,
Display,
Dissemination
The process is entirely serial
and pre-scheduled: no response
to weather!
End Users
NWS
Private Companies
Students
The LEAD Vision: No Longer Serial or Static
OBSERVATIONS
Analysis/Assimilation
Prediction
Radar Data
Mobile Mesonets
Surface Observations
Upper-Air Balloons
Commercial Aircraft
Geostationary and Polar
Orbiting Satellite
Wind Profilers
GPS Satellites
Quality Control
Retrieval of Unobserved
Quantities
Creation of Gridded Fields
PCs to Teraflop Systems
Product Generation,
Display,
Dissemination
End Users
NWS
Private Companies
Students
The LEAD Vision: No Longer Serial or Static
OBSERVATIONS
Analysis/Assimilation
Prediction
Radar Data
Mobile Mesonets
Surface Observations
Upper-Air Balloons
Commercial Aircraft
Geostationary and Polar
Orbiting Satellite
Wind Profilers
GPS Satellites
Quality Control
Retrieval of Unobserved
Quantities
Creation of Gridded Fields
PCs to Teraflop Systems
Product Generation,
Display,
Dissemination
End Users
NWS
Private Companies
Students
LEAD data: initial working data set
•
•
•
•
ETA model gridded analysis
METAR surface observations
Rawinsondes – upper air balloon observations
ACARS – commercial aircraft temperature and wind
observations
• NEXRAD Level II data
• GOES visible satellite data
Returning to Solution Statement
• We apply concept of user-oriented view to
management of mesoscale meteorology
information space.
• myLEAD: tool to help an investigator
make sense of, and operate in, the vast
information space that is mesoscale
meteorology.
Information space management
tool
• At core is metadata catalog
– Why? Observational products already being
stored elsewhere.
• Public file and could be large, so do not want to
copy user’s file system. Instead maintain
“bookmark”
• Scale to support thousands of distributed
users, including individual investigators,
pre-college classroom investigators,
casual observers.
Technical Challenges
• Querying must be efficient
– Over data products described by rich domain-specific metadata
– Over data products whose description can be augmented over time
• Obtaining metadata is hard
– Automate as much as possible
• Privacy must be fully enforced
– Any data product that user designates as private must remain private
• Publishing
– Publish product to larger community:
• data file, model output, full experiment
– Must be under user control
– Discovery of information that has been made public
• Build trust
– User may work within myLEAD space for 5 years of graduate work, for
instance
– User must be convinced of privacy, reliability, longevity, etc.
Rundown on Implementation Specs
• Building on top of MCS and OGSA-DAI
– MCS for extensible db schema, general db schema,
and security infrastructure already in place
– OGSA-DAI for grid/web service architecture
• Database used is mySQL 5.0
– Supports stored procedures
– Ogsa-dai to mySQL is JDBC
• Data product descriptions in and out of database
conform to LEAD-specific XML schema.
• myLEAD server and myLEAD agent are written
in java.
Related Work
• mySpace – AstroGrid, UK
– Similar to myLEAD in reigning information space
– Creates swatches in large federation of data archives for the cache and
persistent data for a “community”
– Provides common query access over cache space and persistent space
• RDF (Resource Description Framework)
– Basic building block is the subject-predicate-object triple:
– [S] – P -> [O] [Dickens] – hasWritten -> [Pickwick Papers]
– Good for storing detailed relationship information (good for
understanding the relationship between two terms)
• NEESgrid – NCSA
– Uses RDF
– Little available in public literature
• myGrid Information Repository (MIR) – myGRID, Manchester
– Most similar to myLEAD
– Support for text search scientific papers, uses Life Sciences Identifier
(LSID)
– myLEAD stronger personal orientation (gurantees, publishing, automatic
metadata generation)
myLEAD Architecture
Portal access to myLEAD
User
interface
Client side
services
myLEAD agent
MCS client
Server side
services
MCS
myLEAD
service
myLEAD
OGSA-DAI
JDBC
myLEAD stored procedures
data
model
relational DB
myLEAD
use
myLEAD portlet
as component of
scenario
LEAD portal
Factory
myLEAD
service
myLEAD
“agent” instance
Storage
Repository
Service (RLS)
IU
Data mining
task
NCSA
workflow
WRF model
/var/tmp/wrf_tmp
Workflow confers with myLEAD “agent” to
determine location of scratch space
Metadata Catalog Data Model
• Users
Abe
Bing
Caru
• Investigations
– Tornado April 20 Chicago Illinois
• Experiments
– Ensemble: run of 100 simultaneous forecast models
parameterized slightly differently
• Collections
• Logical files
– Input observational files, input parameters, derived files, analysis
results, images, model results, workflows, execution status
messages
Attributes stored
in “type” tables: i.e.,
string, float, temporal,
int. Great extensibility,
but need to carefully
control naming; efficient
querying could be an
issue as well.
Data Model
Investigation
Logical file
Collection
User – Dublin
Core
Data Model
myWorkspace: J. Kowaleski
preferences
Workflow template vizEta 03Aug04:13:35:40
Workflow template WRF 15May04:05:25:59
Favorite spaces
Home disk space
Browser provides user
a hierarchical view of
space that is essentially
flat. Users like hierarchy.
Thor cluster scratch space
Experiment 1: Norman, OK 21Oct04:23:11:45
Input observational
Collection level
NEXRAD 26Oct04:13:45:40
GOES-infrared 26Oct04:12:00:00
Logical file level
METAR 26Oct04:09:10:05
Input parameters
WRF-out
Wrf-out1-26Oct04:13:35:40
Wrf-out2-26Oct04:13:37:25
Wrf-out3-26Oct04:13:43:15
workflow instance
Have associated a set of attributes
that describe this data product
myLEAD agent
• Separate transient grid/web service
– Has state about user, current investigation and experiment
– Embeds myLEAD client API
• Purpose:
– Controls naming
– Helps use database structure in repeatable, meaningful way
• Maintains FSM of current state of execution; stores into new
collection based on state
– Input  model run  analysis  final results
– Derives metadata attributes for new data product object when
created during course of workflow by means of:
• Case-based reasoning
• Internal state
• Consulting ontology
Data Product Metadata
Resources: “things that need
describing (i.e., metadata)”
Resources
Geo- Data products
Observational data
Model generated data
Collections
Derived data
Data analytics
Workflow scripts
Data mining
compute resources,
storage resource
Data analytics resources (statistics table)
services
Model input resources
Data Product Metadata
Notes
Global ID
“LSID” for geosciences
Temporal coverage
Same as spatial
Spatial coverage
GML, THREDDS, FGDC, COARDS-CF
Geophysical quantity
Defined by common vocabulary
Platform
Goes10, Goes8; WSR-88, CASA
Instrument type
site
East-west; KXYZ
Model run info
Model derived data product
Syntactic description
Binary format of data product
Contact info
Dublin core
Physical location of service
Protocol to access service
Dataset summary
Dublin core
list of predecessors
GID of input data products, workflow instance
Event
mesocyclone, storm cell, tornado
Quality
Complex
Completeness
Current Research Challenges
• Publishing
– Publishing data product to larger community:
• data file, model output, full experiment
– Discovery of information that has been made public
• Guarantees
– Any data product that user designates as private must remain private
– When request for product is issued, product must exist
• Flexible yet efficient schema
– Inherited from MCS, supports evolved understanding of data product
over time by means of extended attributes
• Immutable investigations
– Collections, views, and logical files can be reused from earlier
investigations without destroying integrity of earlier investigation
• Proactive agent
– Infers metadata attributes from context of active experiment using casebase reasoning.
Beth Plale
[email protected]
4 days away from our national elections … wish us well.