Optimizing Query Processing In Sensor Networks

Download Report

Transcript Optimizing Query Processing In Sensor Networks

Optimizing Query Processing
In Sensor Networks
Ross Rosemark
Our Research
• We argue that the sensor network can be
viewed as a database where each node is
a table.
– Under this view.. we argue that a query
plan can now dictate how to abstract data
from the sensor network rather than a
sensor node.
Our Goal
• Given a query Q2
– Define how to process the query
• What metadata (if any) should be collected
• What query plan should a node utilize to abstract data from
it’s local sensors
• What is the routing infrastructure of the query
• Example
– Given Q2
• Define cost of
– Collecting metadata + Execution Cost + Routing cost
• Define cost of
– Not collecting metadata + execution cost + routing cost
• Choose the lowest cost
Idea
• Evaluate multiple different
infrastructures for a query
• Choose the infrastructure
that utilizes the least
energy
• The * operator means
aggregation
– Not database aggregation
(i.e. Sum, Count) but rather
aggregation that is
discussed in networks
Research Issue
• We use metadata to evaluate different
query plans
– Metadata becomes an important research
issue
• Which nodes should send metadata to the AP
• What metadata does the AP require
• We do an on demand approach in terms of
collecting metadata
Metadata Collection
• Algorithm to collect metadata
– Only nodes participating in query send metadata
• Algorithm
– Access Point routes query
to spatial area utilizing
GPSR
– Utilizing GPSR query is
routed around spatial area
– Each node on perimeter of
spatial area floods msg
inside spatial area
– Each node in spatial area
sends metadata to the AP
utilizing GPSR
Metadata
For a given query Q1
•
Initially the access point knows:
–
The number of nodes in the network (N)
–
The spatial area of the network (SA)
–
The query area (QA)…. (we only consider spatial queries)
–
A histogram that represents the selectivity of each attribute
•
–
Bad representation
Using this information
•
Query Plan 1
–
–
–
•
Query Plan 2
–
–
–
Estimate query execution cost if metadata is not collected
Estimate result collection cost if metadata is not collected
If (Query Plan 1 > Query Plan 2)
•
–
Estimate metadata collection cost
Estimate query execution cost if metadata is collected
Estimate result collection cost
Choose Query Plan 1
Else
•
Choose Query Plan 2
Metadata Collection
• When metadata is collected
– nodes participating in a query send
• the selectivity of each of the queries predicates
• It’s longitude and latitude
• Example
– Query 1-> Select * From Sensors Where Light > 10
– Node participating in query send the selectivity of the predicate
Light > 10
– Node participating in query send Longitude/Latitude (i.e.
Longitude = 1002.3 Latitude = 2003.1
Metadata
• If query 2 now comes in and covers the
same/subset of the spatial area of query 1
then we evaluate the following:
– Should we collect more metadata, or just
optimize with our current metadata
• Estimate the metadata collection cost
• Estimate query execution cost
• Estimate routing cost
• This is a repeat of the initial problem
– Our estimates are now better though
Results
• Estimated mathematically the energy associated with
– Metadata collection
– Query Execution
• Ran simulations to get real values for these metrics
• In simulations inserted 1 spatial query into the network
• Ran this experiment varying
– The query (spatial area) (predicates)
– Topology (5 different topologies)
– Metadata (5 different metadata distributions)
Query Execution
.
Query Execution
2.5
2
1.5
Actual
Estimate
1
0.5
0
Collected Metadata
Did not Collect Metadata
Metadata Collection
Metadata Collection
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Actual
Estimate
Total
2.5
2
1.5
Actual
Estimate
1
0.5
0
Metadata
Collection
No Metadata
Collection
Questions?