Trace Effect Analysis for Software Security

Download Report

Transcript Trace Effect Analysis for Software Security

Data Provenance in Remote
Environmental Monitoring
Dr. Christian Skalka, University of
Vermont, USA
Data Provenance in Remote
Environmental Monitoring (REM)
REM = automated collection of data from the
natural environment in remote settings.
Central points:
 Data provenance is fundamental to REM.


Data source, times, ownership are intrinsic.
REM hardware and software architectures pose
unique challenges for establishing provenance.

Heterogeneous, distributed, low-power systems.
Outline
Two REM case studies and problem statements:
1.
Snowpack monitoring (SnowMAN)



The SnowMAN project summary.
Microcosmic provenance issues, challenges.
SnowMAN provenance “coping mechanisms”.
Sagehen Creek Field Station network
2.



Overview of project setting.
Macrocosmic provenance issues, challenges.
Possible approaches to central challenges.
How Much Snow is Out There?

Snow/Water Equivalent (SWE):
measurement of water content in snowpack

Not the same as snow height.
How Much Snow is Out There?



Regional snowpack profiles are critically important to
natural resource planning, public safety.
Real world measurement is complicated by terrain, forest
canopies, wind, exposure.
Accurate realtime SWE measurement is a “holy grail” of
REM.
The UVM SnowMAN Project

A new approach to SWE
measurement




Use modern computer technology for
data acquisition and retrieval
A multi-modal approach to SWE
approximation
Lightweight, low cost, robust,
adaptable
Improved spatial and temporal
resolution
Multimodal Sensor Fusion
Algorithms on sensing nodes combine multiple
sensing technologies of variable power cost:

1.
2.
3.
Snow height via ultrasound (cheap)
Snow density via microwave absorption (moderate)
Snow density via gamma ray attenuation (expensive)
SnowMAN System Architecture



Multiple data gathering-and-processing nodes
connected via a Wireless Sensor Network (WSN)
Arduino-based on-site gateway provides
datalogging via SD card, data processing
Remote data retrieval via TCP/IP over cellmodem
Provenance Issues in SnowMAN

Data reported by sensors meaningless
without provenance information:




Time of sampling event
Location of sample
Type and ADC conversion formula of sensor
Refinement of multimodal fusion algorithm
requires history/cause of sampling event.
Provenance Challenges in SnowMAN

Low-bandwidth requirements in WSNs


Volatility of low-cost devices


WSN node failures require data reliability
solutions
Heterogeneous network architecture


Messages must be small, infrequent.
Data formats must be converted in network
communications
Time synchronization
Managing Provenance in SnowMAN

Reliability ensured by datalogging on gateway,
replication within WSN.


Provenance information reported with data readings.


Component of packet format; not onerously large.
Data converted at “protocol boundaries”.


Requires data source, time to be stored with readings.
802.15.4 to RS232 to TCP/IP to SQL.
Time synchronization handled by simple protocols.

Low precision sufficient; cellmodem provides “true” time.
Outstanding Provenance Issues in
SnowMAN



How to verify that data is converted properly
at protocol boundaries?
How to encode history of multi-modal
readings, for analysis and refinement of
algorithms?
How to detect errors in data readings, due to
sensor, time synchronization, node failure?
REM in Macrocosm: Sagehen Creek Field
Station
Sagehen Creek Field Station and Experimental Forest
located near Truckee, CA
 Research and Teaching Facility of UC Berkeley
 9,000 acres of undisturbed wilderness, extensive
REM technology
REM in Macrocosm: Sagehen Creek Field
Station

Literally hundreds of various sensor devices







Temperature, wind, humidity
Streamflow, Stream temperature
Snow height, SWE
Video
9 hubs with (programmable) dataloggers, power,
wireless transmission
Goal: wireless connectivity to field house and
internet, off-site data warehousing
Multiple user, administration groups
Sagehen Creek Field Station
Provenance Issues at Sagehen





Inherits microcosmic issues (time, location,
sensor modality essential to data).
Video triggering events should be reported.
Group data ownership now important to
report (and maintain through data cycle).
Sagehen provenance should be credited in
myriad end-uses of data.
Diagnostics of network functionality and
services.
Provenance Challenges at Sagehen
Inherits microcosmic challenges, but:
 Increased sampling rates, network traffic
 Time synchronization much more complex
 GPS auto-location for some sensors, manual
for others
 Much greater diversity of devices,
communications mediums (wired, wireless)
 More protocol boundaries
 Multimedia
Sagehen Provenance Issues: Scalability
Sagehen network modeled as source-to-sink
dataflow, from sensors to end-users.
 Sources extensible by user groups



New sensors, sensor networks (e.g. WSNs)
New remote datalogging/replication architecture
Sink usable by end-user groups


Arbitrary visualization technologies
Diverse research and education applications
Sagehen Network: The Current Reality



Establishing data communications backbone
over IEEE802.11 wireless LAN.
Limited data collection over network (onehop) via canned proprietary software.
Most data collection being done manually
from dataloggers.


Sensors hardwired to dataloggers, no WSNs in
the field.
Some one-hop connectivity between hubs.
Sagehen Network: The Vision

Seamless source-to-sink dataflow.





From sensors in the field to off-site, permanent
data warehouse.
Also accessible onsite at remote hubs (reliable).
Wireless sensor network capabilities in the
field.
Attribution of data to source groups and
Sagehen.
Easy extensibility of network at source end,
to allow addition of new sensors (and WSNs).
Some Ideas for Supporting Provenance in
the Sagehen Software Architecture
Treating data like messages on a protocol
stack.
 Stack defined across device (protocol)
boundaries:



Sensor data is “raw”, collects more provenance
information as it moves towards the sink.
Higher layers of provenance (time, ownership)
encapsulate lower layers.
Allows compositional (principled) treatment of
cross-protocol data transformation.
Some Ideas for Supporting Provenance in
the Sagehen Software Architecture
Watermarking data to establish Sagehen and
group ownership.
 Easily done for video media.


Video retrieved only from the internet;
watermarking performed on traditional platform.
Watermarking sensor data??


Need to preserve data may not tolerate traditional
techniques.
In-the-field retrieval requires in-the-field
watermarking.
Conclusion


Remote environmental monitoring requires
provenance for correct interpretation of data.
REM networks heterogeneous, some
components computationally “weak”.



Power, cost restrictions.
Protocol hodgepodge!
Adapting to REM environment a unique
challenge for provenance in software.
Conclusion
Two case studies:
 SnowMAN: lightweight, low cost SWE monitoring.
 Sagehen Creek Field Station: REM in macrocosm.
http:www.cs.uvm.edu/~skalka
http://sagehen.ucnrs.org/