Models and Sensor Networks

Download Report

Transcript Models and Sensor Networks

Probabilistic Databases
Amol Deshpande, University of Maryland
Overview

V.S. Subrahmanian


Lise Getoor


ProbView, PXML, Temporal Probabilistic
Databases, Probabilistic Aggregates
Statistical Relational Learning, Probabilistic
Relational Models, Entity Resolution
Amol

MauveDB: Statistical Modeling in Databases,
Correlated tuples in probabilistic databases
Overview of Today’s Presentation

Model-based Views/MauveDB [Amol]

Statistical Relational Learning [Lise]

Representing arbitrarily correlated data and processing
queries over it [Prithviraj]
Overview of Today’s Presentation

Model-based Views/MauveDB [Amol]

Goal: Making it easy to continuously apply statistical models to
streaming data

Current focus on designing declarative interfaces, and on efficient
maintenance algorithms

Less on the “probabilistic databases” issues

Statistical Relational Learning [Lise]

Representing arbitrarily correlated data and processing
queries over it [Prithviraj]
Motivation

Unprecedented, and rapidly increasing,
Wireless sensor
networks
instrumentation of our every-day world

Huge data volumes generated continuously
that must be processed in real-time

Typically imprecise, unreliable and incomplete
Distributed measurement
networks (e.g. GPS)
data

Measurement noises, low success rates,
failures etc…
RFID
Industrial Monitoring
Data Processing Step 1

Process data using a statistical/probabilistic model

Regression and interpolation models


To eliminate spatial or temporal biases, handle missing data, prediction
Filtering techniques (e.g. Kalman Filters), Bayesian Networks

To eliminate measurement noise, to infer hidden variables etc
Temperature monitoring
Regression/interpolation models
GPS Data
Kalman Filters et
A Motivating Example

Inferring “transportation mode”/ “activities” [Henry Kautz
et al]

Using easily obtainable sensor data, e.g. GPS, RFID proximity
data

Can do much if we can infer these automatically
home
office
Have access to noisy “GPS” data
Infer the transportation mode:
walking, running, in a car, in a bus
Motivating Example

Inferring “transportation mode”/ “activities” [Henry Kautz
et al]

Using easily obtainable sensor data, e.g. GPS, RFID proximity
data

Can do much if we can infer these automatically
home
office
Preferred end result:
Clean path annotated with transportation mode
Dynamic Bayesian Network
Use a “generative model” for describing how the
observations were generated
Time = t
Transportation Mode:
Walking, Running, Car, Bus
Mt
True velocity and location
Xt
Need conditional probability
distributions
e.g. a distribution on
(velocity, location)
given the transportation mode
Prior knowledge or learned from
data
Observed location
Ot
Dynamic Bayesian Network
Use a “generative model” for describing how the
observations were generated
Time = t
Time = t+1
Transportation Mode:
Walking, Running, Car, Bus
Mt
Mt+1
True velocity and location
Xt
Xt+1
Ot
Ot+1
Observed location
Dynamic Bayesian Network
Given a sequence of observations (Ot), find the most likely
Mt’s that explain it.
Or could provide a probability distribution on the possible Mt’s.
Time = t
Time = t+1
Transportation Mode:
Walking, Running, Car, Bus
Mt
Mt+1
True velocity and location
Xt
Xt+1
Ot
Ot+1
Observed location
Statistical Modeling of Sensor Data


No support in database systems --> Database
ends up being used as a backing store

With much replication of functionality

Very inefficient, not declarative…
How can we push statistical modeling inside a
database system ?
Abstraction: Model-based Views

An abstraction analogous to traditional database
views

Present the output of the application of model as
a database view

That the user can query as with normal database
views
Example DBN View
User
User
Time
Location
Mode
prob
John
5pm
(x’1, y’1)
Walking
0.9
John
5pm
(x’1, y’1)
Car
0.1
John
5:05pm
(x’2, y’2)
Walking
0
John
5:05pm
(x’2, y’2)
Car
1
User
Time
Location
John
5pm
(x1, y1)
John
5:05pm
(x2, y2)
User view of the data
- Smoothed locations
- Inferred variables
e.g.
select count(*)
group by mode
sliding window 5 minutes
Application of the model/inference
is pushed inside the database
Opens up many optimization
opportunities
e.g. can do inference lazily when
queried etc
Original noisy GPS data
Correlations
User
User
Time
Location
Mode
prob
John
5pm
(x’1, y’1)
Walking
0.9
John
5pm
(x’1, y’1)
Car
0.1
John
5:05pm
(x’2, y’2)
Walking
0
John
5:05pm
(x’2, y’2)
Car
1
Strong and complex
correlations across tuples
- Mutual exclusivity
- Temporal correlations
MauveDB: Status

Written in the Apache Derby Java open source
database system

Support for Regression- and Interpolation-based views


Neither produce probabilistic data

SIGMOD 2006 (w/ Sam Madden)
Currently building support for views based on Dynamic
Bayesian networks [Bhargav]

Kalman Filters, HMMs etc

Initial focus on the user interfaces and efficient inference

Will generate probabilistic data; may not be able to do
anything too sophisticated with it
Research Challenges/Future Work


Generalizing to arbitrary models ?

Develop APIs for adding arbitrary models

Try to minimize the work of the model developer
Probabilistic databases

Uncertain data with complex correlation patterns

Query processing, query optimization

View maintenance in presence of high-rate
measurement streams
Thanks !!
Mauve == Model-based User Views