I 1 - RPI ECSE - Rensselaer Polytechnic Institute
Download
Report
Transcript I 1 - RPI ECSE - Rensselaer Polytechnic Institute
Peer-to-Peer (P2P) and Sensor
Networks
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
[email protected]
http://www.ecse.rpi.edu/Homepages/shivkuma
Based in part upon slides of Don Towsley, Ion Stoica, Scott Shenker, Joe Hellerstein, Jim Kurose, Hung-Chang Hsiao, Chung-Ta King
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
1
Overview
P2P networks: Napster, Gnutella, Kazaa
Distributed Hash Tables (DHTs)
Database perspectives: data-centricity, dataindependence
Sensor networks and its connection to P2P
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
2
P2P: Key Idea
Share the content, storage and bandwidth of
individual (home) users
Internet
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
3
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
4
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
5
What is P2P (Peer-to-Peer)?
P2P as a mindset
Slashdot
P2P as a model
Gnutella
P2P as an implementation choice
Application-layer multicast
P2P as an inherent property
Ad-hoc networks
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
6
P2P Application Taxonomy
P2P Systems
Distributed Computing File Sharing
SETI@home
Gnutella
Collaboration
Jabber
Platforms
JXTA
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
7
How to Find an Object in a Network?
Network
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
8
A Straightforward Idea
Use a BIG server
Store the object
How to do it inNetwork
a
distributed way?
Provide a directory
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
9
Why Distributed?
Client-server model:
Client is dumb
Server does most things (compute, store, control)
Centralization makes things simple, but introduces
Single point of failure, performance bottleneck,
tighter control, access fee and manage cost, …
ad hoc participation?
Estimate of net PCs
10 billions of Mhz CPUs
10000 terabytes of storage
Clients are not that dumb after all
Use the resources in the clients (at net edges)
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
10
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
11
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
12
First Idea: Napster
Distributing objects, centralizing directory:
Network
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
13
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
14
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
15
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
16
Today: P2P Video traffic is dominant
Source: cachelogic; Video, bittorrent, edonkey !
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
17
40-60%+
P2P traffic
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
18
2006 p2p Data
Between 50 and 65 percent of all download traffic is P2P related.
Between 75 and 90 percent of all upload traffic is P2P related.
And it seems that more people are using p2p today
In 2004 1 CacheLogic-server registered 3 million IP-addresses in 30 days
In 2006 1 CacheLogic-server registered 3 million IP-addresses in 8 days
So what do people download?
61,4 percent video
11,3 percent audio
27,2 percent is games/software/etc.
The average filesize of shared files is 1 gigabyte!
Source: http://torrentfreak.com/peer-to-peer-traffic-statistics/
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
19
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
20
A More Aggressive Idea
Distributing objects and directory:
How to find
Blind
objects
w/o
flooding
directory?
!
Network
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
21
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
22
Gnutella
Distribute file location
Idea: flood the request
Hot to find a file:
Send request to all neighbors
Neighbors recursively multicast the request
Eventually a machine that has the file receives the
request, and it sends back the answer
Advantages:
Totally decentralized, highly robust
Disadvantages:
Not scalable; the entire network can be swamped
with request (to alleviate this problem, each request
has a TTL)
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
23
Gnutella: Unstructured P2P
Ad-hoc topology
Queries are flooded for bounded number of hops
No guarantees on recall
xyz
xyz
Query: “xyz”
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
24
Now Bittorrent & Edonkey2000
(2006)
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
25
Lessons and Limitations
Client-Server performs well
But not always feasible
Ideal performance is often not the key issue!
Things that flood-based systems do well
Organic scaling
Decentralization of visibility and liability
Finding popular stuff
Fancy local queries
Things that flood-based systems do poorly
Finding unpopular stuff [Loo, et al VLDB 04]
Fancy distributed queries
Vulnerabilities: data poisoning, tracking, etc.
Guarantees about anything (answer quality, privacy, etc.)
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
26
Detour …. Bittorrent
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
27
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
28
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
29
BitTorrent – joining a torrent
new leecher
2
join
peer list
metadata file
1
3
tracker
data
request
4
website
seed/leecher
Peers divided into:
seeds: have the entire file
leechers: still downloading
1. obtain the metadata file
2. contact the tracker
3. obtain a peer list (contains seeds & leechers)
4. contact peers from that list for dataShivkumar Kalyanaraman
Rensselaer Polytechnic Institute
30
BitTorrent – exchanging data
leecher B
leecher A
I have
seed
leecher C
● Verify pieces
using hashes
● Download sub-pieces in parallel
● Advertise received pieces to the entire peer list
● Look for the rarest pieces
Rensselaer Polytechnic Institute
31
Shivkumar Kalyanaraman
!
BitTorrent - unchoking
leecher B
leecher A
seed
leecher C
leecher D
● Periodically calculate data-receiving rates
● Upload to (unchoke) the fastest downloaders
● Optimistic unchoking
▪ periodically select a peer at random and upload to it
Shivkumar Kalyanaraman
▪ continuously
look for the fastest partners
Rensselaer Polytechnic
Institute
32
End of Detour ….
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
33
Back to … P2P Structures
Unstructured P2P architecture
Napster, Gnutella, Freenet
No “logically” deterministic structures to organize the
participating peers
No guarantee objects be found
How to find objects within some no. of hops?
Extend hashing
Structured P2P architecture
CAN, Chord, Pastry, Tapestry, Tornado, …
Viewed as a distributed hash table for directory
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
34
How to Bound Search Quality?
Many ideas …, again
Work on
placement!
Network
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
35
High-Level Idea: Indirection
Indirection in space
Logical (content-based) IDs, routing to those IDs
“Content-addressable” network
Tolerant of churn
nodes joining and leaving the network
to h
h=z
Indirection in time
Want some scheme to temporally decouple send and receive
Persistence required. Typical Internet solution: soft state
Combo of persistence via storage and via retry
“Publisher” requests TTL on storage
Republishes as needed
Metaphor: Distributed Hash Table
z
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
36
Basic Idea
Publish (H(y))
P2P Network
Object “y”
Objects have
hash keys
Join (H(x))
Peer “x”
H(y)
H(x)
y
Hash key
Peer nodes also
x have hash keys
in the same
hash space
Place object to the peer with closest hash keys
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
37
Distributed Hash Tables (DHTs)
Abstraction: a distributed hash-table data structure
insert(id, item);
item = query(id); (or lookup(id);)
Note: item can be anything: a data object, document, file, pointer
to a file…
Proposals
CAN, Chord, Kademlia, Pastry, Tapestry, etc
Goals:
Make sure that an item (file) identified is always found
Scales to hundreds of thousands of nodes
Handles rapid arrival and failure of nodes
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
38
Viewed as a Distributed Hash Table
Hash
table
0
2128-1
Peer
node
Each is responsible for a range of the hash table,
according to the peer hash key
Objects
Note
that are placed in the peer with the closest key
peers are
Internet
edges
Internet
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
39
How to Find an Object?
Hash
table
0
2128-1
Peer
node
Want to keep onlyone hop to
a few entries! find the object
Simplest idea:
Everyone knows everyone else!
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
40
Structured Networks
Distributed Hash Tables (DHTs)
Hash table interface: put(key,item), get(key)
O(log n) hops
Guarantees on recall
K I
(K1,I1)
K I
K I
K I
K I
I1
K I
K I
put(K1,I1)
get (K1)
K I
K I
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
41
Content Addressable Network, CAN
Distributed hash table
Hash table as in a Cartesian coordinate space
A peer only needs to know its logical neighbors
Dimensional-ordered multihop routing
Hash
table
0
2128-1
Peer
node
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
42
Content Addressable Network (CAN)
Associate to each node and
item a unique id in an ddimensional Cartesian space
on a d-torus
Properties
Routing table size O(d)
Guarantees that a file is
found in at most d*n1/d
steps, where n is the total
number of nodes
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
43
CAN Example: Two Dimensional Space
Space divided between
nodes
All nodes cover the entire
space
Each node covers either a
square or a rectangular
area of ratios 1:2 or 2:1
Example:
Node n1:(1, 2) first node
that joins cover the
entire space
7
6
5
4
3
n1
2
1
0
0
1
2
3
4
5
6
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
44
7
CAN Example: Two Dimensional Space
Node n2:(4, 2) joins
space is divided between
n1 and n2
7
6
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
45
7
CAN Example: Two Dimensional Space
Node n2:(4, 2) joins
space is divided between
n1 and n2
7
6
n3
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
46
7
CAN Example: Two Dimensional Space
Nodes n4:(5, 5) and
n5:(6,6) join
7
6
n5
n4
n3
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
47
7
CAN Example: Two Dimensional Space
Nodes: n1:(1, 2);
n2:(4,2); n3:(3, 5);
n4:(5,5);n5:(6,6)
Items: f1:(2,3); f2:(5,1);
f3:(2,1); f4:(7,5);
7
6
n5
n4
n3
5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
48
7
CAN Example: Two Dimensional Space
Each item is stored by the
node who owns its
mapping in the space
7
6
n5
n4
n3
5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
49
7
CAN: Query Example
Each node knows its
neighbors in the d-space
Forward query to the
neighbor that is closest to
the query id
Example: assume n1
queries f4
Can route around some
failures
7
6
n5
n4
n3
5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
50
7
Another Design: Chord
Node and object keys:
random location around a
circle
Neighbors:
2-i
nodes around the circle
found by routing to desired key
Routing: greedy
pick nbr closest to destination
Storage: “own” interval
node owns key range between
her key and previous node’s key
Ownership
range
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
51
OpenDHT
A shared DHT service
The Bamboo DHT
Hosted on PlanetLab
Simple RPC API
You don’t need to deploy
or host to play with a real
DHT!
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
52
Review: DHTs vs Unstructured P2P
DHTs good at:
exact match for “rare” items
DHTs bad at:
keyword search, etc. [can’t construct DHT-based Google]
tolerating extreme churn
Gnutella etc. (unstructured P2P) good at:
general search
finding common objects
very dynamic environments
Gnutella etc. bad at:
finding “rare” items
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
53
Distributed Systems Pre-Internet
Connected by LANs (low loss and delay)
Small scale (10s, maybe 100s per server)
PODC literature focused on algorithms to
achieve strict semantics in the face of failures
Two-phase commits
Synchronization
Byzantine agreement
Etc.
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
54
Distributed Systems Post-Internet
Very different context:
Huge scales (thousands if not millions)
Highly variable connectivity
Failures common
Organic growth
Abandoned distributed strict semantics
Adaptive apps rather than “guaranteed” infrastructure
Adopted pairwise client-server approach
Server is centralized (even if server farm)
Relatively primitive approach (no sophisticated dist.
algms.)
Little support from infrastructure or middleware
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
55
A Database viewpoint on DHTs…:
Towards Data-centricity, Data
Independence
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
56
Host-centric Protocols
Protocols defined in terms of IP addresses:
Unicast: IP address = host
Multicast: IP address = set of hosts
Destination address is given to protocol
Protocol delivers data from one host to another
unicast: conceptually trivial
multicast: address is logical, not physical
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
57
Host-centric Applications
Classic applications: destination is “intrinsic”
telnet: target machine
FTP: location of files
electronic mail: email address turns into mail server
multimedia conferencing: machines of participants
Destination is specified by user (not network)
Usually specified by hostname not address
DNS translates names into addresses
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
58
Domain Name System (DNS)
DNS is built around recursive delegation
Top level domains (TLDs): .com, .net, .edu, etc.
TLDs delegate authority to subdomains
berkeley.edu
Subdomains can further delegate
cs.berkeley.edu
Hierarchy fits host administrative structure
Local decentralized control
Crucial to efficient hostname resolution
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
59
Modern Web Data-Centricity
URLs often function as names of data
users think of www.cnn.com as data, not a host
Fact that www.cnn.com is a hostname is irrelevant
Users want data, not access to particular host
The web is now data-centric
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
60
Data-centric App in Host-centric World
Data still associated with host names (URLs)
administrative structure of data same as hosts
weak point in current web
Key enabler: search engines
Searchable databases map keywords to URLs
Allowed users to find desired data
Networkers focused on technical problems:
HTTP, persistence (URNs), replication (CDNs), ...
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
61
A DNS for Data? DHTs…
Can we map data names into addresses?
a data-centric DNS, distributed and scalable
doesn’t alter net protocols, but aids data location
not just about stolen music, but a general facility
A formidable challenge:
Data does not have a clear administrative hierarchy
Likely need to support a flat namespace
Can one do this scalably?
Data-centrism requires scalable flat lookups => DHTs
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
62
Data Independence In DB Design
Decouple app-level API from data organization
Can make changes to data layout without
modifying applications
Simple
version: location-independent names
Fancier:
declarative queries
“As clear a paradigm shift as we can hope to find in computer science”
- C. Papadimitriou
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
63
The Pillars of Data Independence
DBMS
B-Tree
Indexes
Value-based lookups have
to compete with direct
access
Must adapt to shifting data
distributions
Must guarantee
performance
Join Ordering,
AM Selection,
etc.
Query Optimization
Support declarative queries
beyond lookup/search
Must adapt to shifting data
distributions
Must adapt to changes in
environment
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
64
Generalizing Data Independence
A classic “level of indirection” scheme
Indexes are exactly that
Complex queries are a richer indirection
The key for data independence:
It’s all about rates of change
Hellerstein’s Data Independence Inequality:
Data independence matters when
d(environment)/dt >> d(app)/dt
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
65
Data Independence in Networks
d(environment)/dt >> d(app)/dt
In databases, the RHS is unusually small
This drove the relational database revolution
In extreme networked systems, LHS is unusually high
And the applications increasingly complex and datadriven
Simple indirections (e.g. local lookaside tables)
insufficient
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
66
Hierarchical Networks (& Queries)
IP
Hierarchical name space (www.vldb.org, 141.12.12.51)
Hierarchical routing
Autonomous Systems correlate with name space (though not
perfectly)
DNS
Hierarchical name space (“clients” + hierarchy of servers)
Hierarchical routing w/aggressive caching
13 managed “root servers”
Traditional pros/cons of Hierarchical data mgmt
Works well for things aligned with the hierarchy
Esp. physical locality a la Astrolabe
Inflexible
No data independence!
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
67
The Pillars of Data Independence
DBMS
B-Tree
Indexes
Value-based lookups have
to compete with direct
access
Must adapt to shifting data
distributions
Must guarantee
performance
P2P
ContentAddressable
Overlay
Networks
(DHTs)
Join Ordering, Multiquery
AM Selection, dataflow
etc.
sharing?
Query Optimization
Support declarative queries
beyond lookup/search
Must adapt to shifting data
distributions
Must adapt to changes in
environment
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
68
Sensor Networks:
The Internet Meets the Environment
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
69
Today: Internet meets Mobile Wireless Computing
iPoD: impact of disk size/cost
Samsung Cameraphone
w/ camcorder
SONY PSP: mobile gaming
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
Blackberry: phone + PDA
Computing: smaller, faster
Disks: larger size, small
form
Communications: wireless
voice, data
Multimedia integration:
voice, data, video, games
70
Tomorrow: Embedded Networked Sensing Apps
Seismic Structure
response
Marine
Microorganisms
Micro-sensors, on-board
processing, wireless
interfaces feasible at very
small scale--can monitor
phenomena “up close”
Enables spatially and
temporally dense
environmental monitoring
Contaminant
Transport
Embedded Networked
Sensing will reveal
previously
unobservable
phenomena
Ecosystems,
Biocomplexity
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
71
Embedded Networked Sensing:
Motivation
Imagine:
high-rise buildings self-detect structural faults (e.g., weld cracks)
schools detect airborn toxins at low concentrations, trace
contaminant transport to source
buoys alert swimmers to dangerous bacterial levels
earthquake-rubbled building infiltrated with robots and sensors:
locate survivors, evaluate structural damage
ecosystems infused with chemical, physical, acoustic, image
sensors to track global change parameters
battlefield sprinkled with sensors that identify track friendly/foe
air, ground vehicles, personnel
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
72
Embedded Sensor Nets: Enabling Technologies
Embed numerous distributed
devices to monitor and interact
with physical world
Network devices
to coordinate and perform
higher-level tasks
Embedded
Networked
Exploit
collaborative
Sensing, action
Control system w/
Small form factor
Untethered nodes
Sensing
Tightly coupled to physical world
Shivkumar Kalyanaraman
Exploit spatially/temporally dense, in situ/remote, sensing/actuation
Rensselaer Polytechnic Institute
73
Sensornets
Vision:
Many sensing devices with radio and processor
Enable fine-grained measurements over large areas
Huge potential impact on science, and society
Technical challenges:
untethered: power consumption must be limited
unattended: robust and self-configuring
wireless: ad hoc networking
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
74
Similarity w/ P2P Networks
Sensornets are inherently data-centric
Users know what data they want, not where it is
Estrin, Govindan, Heidemann (2000, etc.)
Centralized database infeasible
vast amount of data, constantly being updated
small fraction of data will ever be queried
sending to single site expends too much energy
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
75
Sensor Nets: New Design Themes
Self configuring systems that adapt to unpredictable
environment
Leverage data processing inside the network
dynamic, messy (hard to model), environments preclude preconfigured behavior
exploit computation near data to reduce communication
collaborative signal processing
achieve desired global behavior with localized algorithms
(distributed control)
Long-lived, unattended, untethered, low duty cycle
systems
energy a central concern
communication primary consumer of scarce energy resource
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
76
From Embedded Sensing to Embedded
Control
embedded in unattended “control systems”
control network, and act in environment
critical app’s extend beyond sensing to control and actuation
transportation, precision agriculture, medical monitoring and
drug delivery, battlefield app’s
concerns extend beyond traditional networked systems and
app’s: usability, reliability, safety
need systems architecture to manage interactions
current system development: one-off, incrementally tuned,
stove-piped
repercussions for piecemeal uncoordinated design: insufficient
longevity, interoperability, safety, robustness, scaling
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
77
Why cant we simply adapt Internet
protocols, “end to end” architecture?
Internet routes data using IP Addresses in Packets and Lookup
tables in routers
humans get data by “naming data” to a search
engine
many levels of indirection between name and IP
address
embedded, energy-constrained (un-tethered,
small-form-factor), unattended systems cant
tolerate communication overhead of indirection
special purpose system function(s): don’t need want Internet
general purpose functionality designed for elastic applications.
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
78
Sample Layered Architecture
User Queries, External Database
Resource
constraints call
for more tightly
integrated layers
In-network: Application processing,
Data aggregation, Query processing
Data dissemination, storage, caching
Open Question:
What are defining
Architectural
Principles?
Adaptive topology, Geo-Routing
MAC, Time, Location
Rensselaer Polytechnic Institute
Phy: comm, sensing, actuation,
SP
Shivkumar
Kalyanaraman
79
Coverage measures
D
x
S
area coverage: fraction of
area covered by sensors
detectability: probability
sensors detect moving
objects
node coverage: fraction
of sensors covered by
other sensors
control:
Given: sensor field (either
known sensor locations, or
spatial density)
where to add new nodes
for max coverage
how to move existing
nodes for max coverage
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
80
In Network Processing
communication expensive
when limited
power
bandwidth
perform (data) processing
in network
close to (at) data
forward
fused/synthesized
results
e.g., find max. of data
distributed data, distributed
computation
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
81
Distributed Representation and Storage
K V
K V
K V
K V
K V
K V
K V
K V
K V
Data Centric Protocols, In-network Processing goal:
K V K V
pattern-triggered data collection
Time
Interpretation of spatially distributed data (Pernode processing alone is not enough)
network does in-network processing based on
distribution of data
Queries automatically directed towards nodes
that maintain relevant/matching data
Multi-resolution data storage and retrieval
Distributed edge/feature detection
Index data for easy temporal and spatial
searching
Finding global statistics (e.g., distribution)
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
82
Directed Diffusion: Data Centric Routing
Basic idea
name data (not nodes) with externally relevant
attributes: data type, time, location of node, SNR,
diffuse requests and responses across network using
application driven routing (e.g., geo sensitive or not)
support in-network aggregation and processing
data sources publish data, data clients subscribe to data
however, all nodes may play both roles
node that aggregates/combines/processes incoming sensor
node data becomes a source of new data
node that only publishes when combination of conditions
arise, is client for triggering event data
true peer to peer system?
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
83
Traditional Approach: Warehousing
data extracted from sensors, stored on server
Query processing takes place on server
Warehouse
Front-end
Sensor Nodes
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
84
Sensor Database System
Sensor Database System supports distributed query
processing over sensor network
Sensor
DB
Sensor
DB
Sensor
DB
Sensor
DB
Front-end
Sensor
DB
Sensor
DB
Sensor
DB
Sensor
DB
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
85
Sensor Nodes
Sensor Database System
•
Characteristics of a Sensor
Network:
Streams of data
Uncertain data
Large number of nodes
Multi-hop network
No global knowledge
about the network
Node failure and
interference is common
Energy is the scarce
resource
Limited memory
No administration, …
Can existing database
techniques be reused?
What are the new problems
and solutions?
Representing sensor data
Representing sensor
queries
Processing query
fragments on sensor
nodes
Distributing query
fragments
Adapting to changing
network conditions
Dealing with site and
communication failures
Deploying and Managing a
sensor database system
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
86
Summary
P2P networks: Napster, Gnutella, Kazaa
Distributed Hash Tables (DHTs)
Database perspectives: data-centricity, dataindependence
Sensor networks and its connection to P2P
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
87