Transcript ppt

HiFi: Network-centric Query
Processing in the Physical World
Mike Franklin
UC Berkeley
SAP Research Forum
February 2005
Introduction
• Receptors everywhere!
•
Wireless sensor networks, RFID technologies,
digital homes, network monitors, ...
Large-scale deployments will be as High Fan-In Systems
Mike Franklin
UC Berkeley EECS
High Fan-in Systems
The “Bowtie”
Large numbers of receptors = large data volumes
Hierarchical, successive aggregation
Mike Franklin
UC Berkeley EECS
High Fan-in Example (SCM)
Headquarters
Regional
Centers
Warehouses,
Stores
Dock doors,
Shelves
Receptors
RFID
RFID
Mike Franklin
UC Berkeley EECS
Properties
• High Fan-In, globally-distributed architecture.
• Large data volumes generated at edges.
• Filtering and cleaning must be done there.
• Successive aggregation as you move inwards.
• Summaries/anomalies continually, details later.
•
•
•
•
Strong temporal focus.
Strong spatial/geographic focus.
Streaming data and stored data.
Integration within and across enterprises.
Mike Franklin
UC Berkeley EECS
Design Space: Time
Filtering,
Cleaning,
Alerts
seconds
On-the-fly
processing
Monitoring,
Time-series
Data mining
(recent history)
Time
Scale
Stream/Disk
Processing
Mike Franklin
UC Berkeley EECS
Archiving
(provenance
and schema
evolution)
years
Disk-based
processing
Design Space: Geography
Filtering,
Cleaning,
Alerts
local
Several
Readers
Monitoring,
Time-series
Data mining
(recent history)
Archiving
(provenance
and schema
evolution)
Geographic
Scope
global
Regional
Centers
Central
Office
Mike Franklin
UC Berkeley EECS
Design Space: Resources
Filtering,
Cleaning,
Alerts
tiny
Devices
Monitoring,
Time-series
Data mining
(recent history)
Individual
Resources
Stargates/
Desktops
Mike Franklin
UC Berkeley EECS
Archiving
(provenance
and schema
evolution)
huge
Clusters/
Grids
Design Space: Data
Filtering,
Cleaning,
Alerts
Monitoring,
Time-series
Data mining
(recent history)
Degree of
Detail
Dup Elim
history: hrs
Archiving
(provenance
and schema
evolution)
Aggregate
Data Volume
Interesting Events
history: days
Mike Franklin
UC Berkeley EECS
Trends/Archive
history: years
State of the Art
• Current approaches: hand-coded, script-based
•
expensive, one-off, brittle, hard to deploy and keep
running
• Piecemeal/stovepipe systems
•
Each type of receptor (RFID, sensors, etc) handled
separately
• Standards-efforts not addressing this:
•
•
•
Protocol design bent
Different “data models” at each level
Reinventing “query languages” at each level
 No end-to-end, integrated middleware for
managing distributed receptor data
Mike Franklin
UC Berkeley EECS
HiFi
• A data management infrastructure for high
fan-in environments
• Uniform Declarative Framework
• Every node is a data stream processor that
speaks SQL-ese
 stream-oriented queries at all levels
• Hierarchical, stream-based views as an
organizing principle
Mike Franklin
UC Berkeley EECS
Why Declarative?
(database dogma)
• Independence: data, location, platform
• Allows the system to adapt over time
• Many optimization opportunities
• In a complex system, automatic optimization
is key.
• Also, optimization across multiple
applications.
• Simplifies Programming
• ???
Mike Franklin
UC Berkeley EECS
Building HiFi
Mike Franklin
UC Berkeley EECS
Integrating RFID & Sensors
(the “loudmouth” query)
Mike Franklin
UC Berkeley EECS
A Tale of Two Systems
•
TinyDB
•
•
•
•
Declarative query processing for
wireless sensor networks
In-network aggregation
Released as part of TinyOS Open Source Distribution
TelegraphCQ
•
•
Data stream processor
Continuous, adaptive query
processing with aggressive sharing
•
•
Built by modifying PostgreSQL
Open source “beta” release out now; new release soon
Mike Franklin
UC Berkeley EECS
TinyDB
• The Network is the Database:
• Basic idea: treat the sensor
net as a “virtual table”.
• System hides
details/complexities of
devices, changing topologies,
failures, …
• System is responsible for
efficient execution.
• Developed on TinyOS/Motes
http://telegraph.cs.berkeley.edu/tinydb
Mike Franklin
UC Berkeley EECS
SELECT MAX(mag)
FROM sensors
WHERE mag > thresh
SAMPLE PERIOD 64ms
App
Query,
Trigger
Data
TinyDB
Sensor Network
TelegraphCQ:
Data Stream Monitoring
• Streaming Data
• Network monitors
• Sensor Networks, RFID
• News feeds, Stock tickers, …
• B2B and Enterprise apps
• Trade Reconciliation, Order Processing etc.
• (Quasi) real-time flow of events and data
• Manage these flows to drive business processes.
• Can mine flows to create and adjust business rules.
• Can also “tap into” flows for on-line analysis.
http://telegraph.cs.berkeley.edu
Mike Franklin
UC Berkeley EECS
Data Stream Processing
Result Tuples
Queries
Result Tuples
Data
Queries
Data
Traditional
Database
Data Stream Processor
•Data streams are unending
•Continuous, long running queries
•Real-time processing
Mike Franklin
UC Berkeley EECS
Windowed Queries
A typical streaming query
Window
Clause
SELECT S.city, AVG(temp)
FROM SOME_STREAM S
[range by ‘5 seconds’ slide by ‘5 seconds’]
WHERE S.state = ‘California’
GROUP BY S.city
Window
“I want
to look at 5
seconds worth of data”
“I want a result tuple
every 5 seconds”
…
Result Tuple(s)Result Tuple(s)
Mike Franklin
UC Berkeley EECS
Data
Stream
TelegraphCQ Architecture
Shared Memory
Query Plan
Queue
TelegraphCQ
TelegraphCQ
Back End
Back
End
Planner
Parser
Listener
Eddy Control
Queue
Modules
CQEddy
TelegraphCQ
Front End
Split
Query Result
Queues
Split
Scans
}
Shared Memory Buffer Pool
Wrappers
TelegraphCQ
Wrapper
ClearingHouse
Disk
Mike Franklin
UC Berkeley EECS
Mini-Executor
Catalog
Proxy
The HiFi System
PC
TelegraphCQ
Stargates
TinyDB
Sensor
Networks
&
RFID
Readers
RFID
Wrappers
Mike Franklin
UC Berkeley EECS
Basic HiFi Architecture
DSQP
HiFi Glue
DSQP
MDR
HiFi Glue
DSQP Management
DSQP
•DSQP
•Query
Planning HiFi Glue
HiFi
Glue
•Archiving
•Internode coordination
and communication
Mike Franklin
UC Berkeley EECS
• Hierarchical federation
of nodes
• Each node:
• Data Stream Query
Processor (DSQP)
• HiFi Glue
• Views drive system
functionality
• Metadata Repository
(MDR)
HiFi Processing Pipelines
The CSAVA Framework
Analyze
On-line Data Mining
Validate
Join w/Stored Data
Arbitrate
Multiple Receptors
Smooth
Window
Clean
CSAVA
Single Tuple
Generalization
Mike Franklin
UC Berkeley EECS
CSAVA Processing
Clean
CREATE VIEW cleaned_rfid_stream AS
(SELECT receptor_id, tag_id
FROM rfid_stream rs
WHERE read_strength >= strength_T)
Mike Franklin
UC Berkeley EECS
CSAVA: Processing
Smooth
CREATE VIEW smoothed_rfid_stream AS
(SELECT receptor_id, tag_id
FROM cleaned_rfid_stream
[range by ’5 sec’,
slide by ’5 sec’]
GROUP BY receptor_id, tag_id
HAVING count(*) >= count_T)
Clean
Mike Franklin
UC Berkeley EECS
CSAVA: Processing
Arbitrate
CREATE VIEW arbitrated_rfid_stream AS
(SELECT receptor_id, tag_id
FROM smoothed_rfid_stream rs
[range by ’5 sec’,
slide by ’5 sec’]
GROUP BY receptor_id, tag_id
HAVING count(*) >= ALL
(SELECT count(*)
FROM smoothed_rfid_stream
[range by ’5 sec’,
slide by ’5 sec’]
WHERE tag_id = rs.tag_id
GROUP BY receptor_id))
Smooth
Clean
Mike Franklin
UC Berkeley EECS
CSAVA: Processing
Validate
CREATE VIEW validated_tags AS
(SELECT tag_name,
FROM arbitrated_rfid_stream rs
[range by ’5 sec’,
slide by ’5 sec’],
known_tag_list tl
WHERE tl.tag_id = rs.tag_id
Arbitrate
Smooth
Clean
Mike Franklin
UC Berkeley EECS
CSAVA: Processing
Analyze
CREATE VIEW tag_count AS
(SELECT tag_name, count(*)
FROM validated_tags vt
[range by ‘5 min’,
slide by ‘1 min’]
GROUP BY tag_name
Validate
Arbitrate
Smooth
Clean
Mike Franklin
UC Berkeley EECS
Ongoing Work
• Bridging the physical-digital divide
• VICE – A “Virtual Device” Interface
• Hierarchical query processing
• Automatic Query planning & dissemination
• Complex event processing
• Unifying event and data processing
Mike Franklin
UC Berkeley EECS
Virtual Device (VICE) Layer
*The branch of philosophy that deals with
the ultimate nature of reality and
existence. (name due to Shawn Jeffery)
“Metaphysical*
Data
Independence”
RFID
RFID
Mike Franklin
UC Berkeley EECS
The Virtues of VICE
• A simple RFID Experiment
• 2 Adjacent Shelves, 8 ft each
• 10 EPC-tagged items each, plus 5 moved
between them.
• RFID antenna on each shelf.
Mike Franklin
UC Berkeley EECS
Ground Truth
Mike Franklin
UC Berkeley EECS
Raw RFID Readings
Mike Franklin
UC Berkeley EECS
After VICE Processing
Under the covers (in this case):
Cleaning, Smoothing, and Arbitration
Mike Franklin
UC Berkeley EECS
Other VICE Uses
• Once you have the right abstractions:
•
•
•
•
•
•
•
•
“Soft Sensors”
Quality and lineage streams
Pushdown of external validation information
Power management and other optimizations
Data Archiving
Model-based sensing
“Non-declarative” code
…
Mike Franklin
UC Berkeley EECS
Hierarchical Query Processing
• Continuous and
Streaming
•
“I provide national monthly
values for the US”
Automatic
placement and
optimization
“I provide avg weekly
values for California”
• Hierarchical
•
•
“I provide avg
daily values for
Berkeley”
Temporal
granularity vs.
geographic
scope
Sharing of
lower-level
streams
“I provide raw
readings for
Soda Hall”
Mike Franklin
UC Berkeley EECS
Complex Event Processing
•
•
•
•
•
•
Needed for monitoring and actuation
Key to prioritization (e.g., of detail data)
Exploit duality of data and events
Shared Processing
“Semantic Windows”
Challenge: a single system that
simultaneously handles events spanning
seconds to years.
Mike Franklin
UC Berkeley EECS
Next Steps
• Archiving and Detail Data
•
•
•
Dealing with transient overloads
Rate matching between stored and streaming data
Scheduling large archive transfers
• System design & deployment
•
Tools for provisioning and evaluating receptor
networks
• System monitoring & management
•
Leverage monitoring infrastructure for introspection
Mike Franklin
UC Berkeley EECS
Conclusions
•
•
•
•
•
Receptors everywhere  High Fan-In Systems
Current middleware solutions are complex & brittle
Uniform declarative framework is the key
The HiFi project is exploring this approach
Our initial prototype
•
•
•
Leveraged TelegraphCQ and TinyDB
Demonstrated RFID/multiple sensor integration
Validated the HiFi approach
• We have an ambitious on-going research agenda
• See http://hifi.cs.berkeley.edu for more info.
Mike Franklin
UC Berkeley EECS
Acknowledgements
• Team HiFi:
Shawn Jeffery, Sailesh Krishnamurthy, Frederick
Reiss, Shariq Rizvi, Eugene Wu, Nathan
Burkhart, Owen Cooper, Anil Edakkunni
• Experts in VICE:
Gustavo Alonso, Wei Hong, Jennifer Widom
• Funding and/or Reduced-Price Gizmos from
NSF, Intel, UC MICRO program, and Alien
Technologies
Mike Franklin
UC Berkeley EECS