Models and Sensor Networks

Download Report

Transcript Models and Sensor Networks

Sensor Data Management:
Challenges and (some)
Solutions
Amol Deshpande, University of Maryland
Motivation

Unprecedented, and rapidly increasing, instrumentation of our
every-day world
Distributed measurement
networks (e.g. GPS)
RFID
Wireless sensor
networks
Industrial Monitoring
Sensor Data Processing: Now
Sensor
Network
Database
Table raw-data
time
id
temp
10am
1
20
10am
2
21
..
..
…
10am
7
29
User
1. Extract all readings into a file
2. Run MATLAB/R/other data
processing tools
3. Write output to a file/back to
the database
4. Write data processing tools to
process/aggregate the output
(maybe using DB)
5. Decide new data to acquire
Repeat
Sensor Data Processing: What
we want
Database
Sensor
Network
User
Models to be applied to data in
real-time (at least simple ones)
Table raw-data
Data
time
id
temp
10am
1
20
10am
2
21
..
..
…
10am
7
29
Continuous (standing) queries
e.g. alert monitoring
Table processed-data
Tasks
time
id
temp
10am
1
20
10am
2
21
..
..
…
10am
7
29
Results to continuous queries
Ad hoc queries (possibly against
processed, modeled data)
Data Management Challenges

Very, very large scale

Spatio-temporal querying essential

Need new indexing techniques, data description formats,
techniques for “data ingest” (cleaning the data etc)

Much work in scientific data management


E.g. SkyServer
Data is typically imprecise, unreliable, or incomplete
(data quality)

Measurement noise, failures in sensor/GPS data

High message loss rate in wireless/RFID
Balazinska et al; Data Management in the Worldwide Sensor Web; IEEE Pervasive, 2007.
Data Management Challenges

Data is generated continuously and must be processed
in real-time (distributed data streams)

Need different query processing paradigms

Typically very high data rates

Must be able to handle a large number of continuous queries
efficiently

Much recent work on “Data Streams”


Research systems: TelegraphCQ [Berkeley], STREAM [Stanford],
Aurora [Brown/MIT/Brandeis] etc…
Commercial systems: Streambase, TruViso, …
Balazinska et al; Data Management in the Worldwide Sensor Web; IEEE Pervasive, 2007.
Data Management Challenges

Need for real-time statistical modeling of data

Eliminate spatial/temporal biases, handle missing data through
extrapolation (e.g. regression, interpolation models)

Filter measurement noise (e.g. Kalman Filters)

Infer hidden variables, pattern recognition (e.g. HMMs)

Fault or anomaly detection

Forecasting/prediction (e.g. ARIMA)
Temperature monitoring
Regression/interpolation models
GPS Data
Kalman Filters …
Data Management Challenges


The applications have strong acquisitional aspects

Data has to be actively acquired as needed

Typically high data acquisition costs (e.g. energy consumption in
battery-powered devices)
Data provenance

Being able to trace something back to its origins

Data exploration and visualization

Data interoperability

Data security and privacy

…
Balazinska et al; Data Management in the Worldwide Sensor Web; IEEE Pervasive, 2007.
My Research Interests

Managing imprecise and incomplete data


Support statistical modeling and querying of sensor data in
relational databases

Clean, declarative abstractions

Real-time processing of streaming data
Probabilistic databases


Store and query data annotated with probabilities
Energy-efficient algorithms for wireless sensornets

Data acquisition, target monitoring, data compression ..

In-network query processing
MauveDB

Written using Apache Derby Java open source DBMS

Supports an abstraction called model-based views

Declarative specification of models to be applied

Can query the output of the models using SQL

Models kept updated as new data/measurements arrive
A. Deshpande, S. Madden; MauveDB: Supporting Model-based User Views in Database Systems; SIGMOD 2006
B. Kanagal, A. Deshpande; Online Filtering, Smoothing and Probabilistic Modeling of Streaming data; ICDE 2008
MauveDB
A. Deshpande, S. Madden; MauveDB: Supporting Model-based User Views in Database Systems; SIGMOD 2006
B. Kanagal, A. Deshpande; Online Filtering, Smoothing and Probabilistic Modeling of Streaming data; ICDE 2008
MauveDB

Written using Apache Derby Java open source DBMS

Supports an abstraction called model-based views



Declarative specification of models to be applied

Can query the output of the models using SQL

Models kept updated as new data/measurements arrive
Status:

Support for Regression- and Interpolation-based views

Currently building support for views based on Dynamic Bayesian
networks (Kalman Filters, HMMs etc)
Ongoing work:

Query processing and optimization, continuous queries

APIs for arbitrary models …
A. Deshpande, S. Madden; MauveDB: Supporting Model-based User Views in Database Systems; SIGMOD 2006
B. Kanagal, A. Deshpande; Online Filtering, Smoothing and Probabilistic Modeling of Streaming data; ICDE 2008
Probabilistic Databases

Motivation: Increasing amounts of uncertain data


From sensor networks

Imprecise data, data with confidence/accuracy bounds

Human-observed data
Statistical modeling/machine learning



Many models provide a distribution over a set of labels (e.g. HMMs)

Information extraction from text

Social networks
How to manage and query such data in relational databases ?

Different types of uncertainties

Complex correlation patterns
Much work in database community over last few years
P. Sen, A. Deshpande; Representing and Querying Correlated Tuples in Probabilistic Databases; ICDE 2007
Thanks !

Questions ?