Models and Sensor Networks
Download
Report
Transcript Models and Sensor Networks
Sensor Data Management:
Challenges and (some)
Solutions
Amol Deshpande, University of Maryland
Motivation
Unprecedented, and rapidly increasing, instrumentation of our
every-day world
Distributed measurement
networks (e.g. GPS)
RFID
Wireless sensor
networks
Industrial Monitoring
Sensor Data Processing: Now
Sensor
Network
Database
Table raw-data
time
id
temp
10am
1
20
10am
2
21
..
..
…
10am
7
29
User
1. Extract all readings into a file
2. Run MATLAB/R/other data
processing tools
3. Write output to a file/back to
the database
4. Write data processing tools to
process/aggregate the output
(maybe using DB)
5. Decide new data to acquire
Repeat
Sensor Data Processing: What
we want
Database
Sensor
Network
User
Models to be applied to data in
real-time (at least simple ones)
Table raw-data
Data
time
id
temp
10am
1
20
10am
2
21
..
..
…
10am
7
29
Continuous (standing) queries
e.g. alert monitoring
Table processed-data
Tasks
time
id
temp
10am
1
20
10am
2
21
..
..
…
10am
7
29
Results to continuous queries
Ad hoc queries (possibly against
processed, modeled data)
Data Management Challenges
Very, very large scale
Spatio-temporal querying essential
Need new indexing techniques, data description formats,
techniques for “data ingest” (cleaning the data etc)
Much work in scientific data management
E.g. SkyServer
Data is typically imprecise, unreliable, or incomplete
(data quality)
Measurement noise, failures in sensor/GPS data
High message loss rate in wireless/RFID
Balazinska et al; Data Management in the Worldwide Sensor Web; IEEE Pervasive, 2007.
Data Management Challenges
Data is generated continuously and must be processed
in real-time (distributed data streams)
Need different query processing paradigms
Typically very high data rates
Must be able to handle a large number of continuous queries
efficiently
Much recent work on “Data Streams”
Research systems: TelegraphCQ [Berkeley], STREAM [Stanford],
Aurora [Brown/MIT/Brandeis] etc…
Commercial systems: Streambase, TruViso, …
Balazinska et al; Data Management in the Worldwide Sensor Web; IEEE Pervasive, 2007.
Data Management Challenges
Need for real-time statistical modeling of data
Eliminate spatial/temporal biases, handle missing data through
extrapolation (e.g. regression, interpolation models)
Filter measurement noise (e.g. Kalman Filters)
Infer hidden variables, pattern recognition (e.g. HMMs)
Fault or anomaly detection
Forecasting/prediction (e.g. ARIMA)
Temperature monitoring
Regression/interpolation models
GPS Data
Kalman Filters …
Data Management Challenges
The applications have strong acquisitional aspects
Data has to be actively acquired as needed
Typically high data acquisition costs (e.g. energy consumption in
battery-powered devices)
Data provenance
Being able to trace something back to its origins
Data exploration and visualization
Data interoperability
Data security and privacy
…
Balazinska et al; Data Management in the Worldwide Sensor Web; IEEE Pervasive, 2007.
My Research Interests
Managing imprecise and incomplete data
Support statistical modeling and querying of sensor data in
relational databases
Clean, declarative abstractions
Real-time processing of streaming data
Probabilistic databases
Store and query data annotated with probabilities
Energy-efficient algorithms for wireless sensornets
Data acquisition, target monitoring, data compression ..
In-network query processing
MauveDB
Written using Apache Derby Java open source DBMS
Supports an abstraction called model-based views
Declarative specification of models to be applied
Can query the output of the models using SQL
Models kept updated as new data/measurements arrive
A. Deshpande, S. Madden; MauveDB: Supporting Model-based User Views in Database Systems; SIGMOD 2006
B. Kanagal, A. Deshpande; Online Filtering, Smoothing and Probabilistic Modeling of Streaming data; ICDE 2008
MauveDB
A. Deshpande, S. Madden; MauveDB: Supporting Model-based User Views in Database Systems; SIGMOD 2006
B. Kanagal, A. Deshpande; Online Filtering, Smoothing and Probabilistic Modeling of Streaming data; ICDE 2008
MauveDB
Written using Apache Derby Java open source DBMS
Supports an abstraction called model-based views
Declarative specification of models to be applied
Can query the output of the models using SQL
Models kept updated as new data/measurements arrive
Status:
Support for Regression- and Interpolation-based views
Currently building support for views based on Dynamic Bayesian
networks (Kalman Filters, HMMs etc)
Ongoing work:
Query processing and optimization, continuous queries
APIs for arbitrary models …
A. Deshpande, S. Madden; MauveDB: Supporting Model-based User Views in Database Systems; SIGMOD 2006
B. Kanagal, A. Deshpande; Online Filtering, Smoothing and Probabilistic Modeling of Streaming data; ICDE 2008
Probabilistic Databases
Motivation: Increasing amounts of uncertain data
From sensor networks
Imprecise data, data with confidence/accuracy bounds
Human-observed data
Statistical modeling/machine learning
Many models provide a distribution over a set of labels (e.g. HMMs)
Information extraction from text
Social networks
How to manage and query such data in relational databases ?
Different types of uncertainties
Complex correlation patterns
Much work in database community over last few years
P. Sen, A. Deshpande; Representing and Querying Correlated Tuples in Probabilistic Databases; ICDE 2007
Thanks !
Questions ?