I 1 - RPI ECSE - Rensselaer Polytechnic Institute

Download Report

Transcript I 1 - RPI ECSE - Rensselaer Polytechnic Institute

Peer-to-Peer (P2P) and Sensor
Networks
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
[email protected]
http://www.ecse.rpi.edu/Homepages/shivkuma
Based in part upon slides of Don Towsley, Ion Stoica, Scott Shenker, Joe Hellerstein, Jim Kurose, Hung-Chang Hsiao, Chung-Ta King
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
1
Overview

P2P networks: Napster, Gnutella, Kazaa

Distributed Hash Tables (DHTs)

Database perspectives: data-centricity, dataindependence

Sensor networks and its connection to P2P
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
2
P2P: Key Idea

Share the content, storage and bandwidth of
individual (home) users
Internet
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
3
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
4
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
5
What is P2P (Peer-to-Peer)?
P2P as a mindset
 Slashdot
 P2P as a model
 Gnutella
 P2P as an implementation choice
 Application-layer multicast
 P2P as an inherent property
 Ad-hoc networks

Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
6
P2P Application Taxonomy
P2P Systems
Distributed Computing File Sharing
SETI@home
Gnutella
Collaboration
Jabber
Platforms
JXTA
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
7
How to Find an Object in a Network?
Network
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
8
A Straightforward Idea
Use a BIG server
Store the object
How to do it inNetwork
a
distributed way?
Provide a directory
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
9
Why Distributed?




Client-server model:
 Client is dumb
 Server does most things (compute, store, control)
 Centralization makes things simple, but introduces
 Single point of failure, performance bottleneck,
tighter control, access fee and manage cost, …
 ad hoc participation?
Estimate of net PCs
 10 billions of Mhz CPUs
 10000 terabytes of storage
Clients are not that dumb after all
Use the resources in the clients (at net edges)
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
10
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
11
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
12
First Idea: Napster

Distributing objects, centralizing directory:
Network
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
13
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
14
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
15
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
16
Today: P2P Video traffic is dominant

Source: cachelogic; Video, bittorrent, edonkey !
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
17
40-60%+
P2P traffic
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
18
2006 p2p Data


Between 50 and 65 percent of all download traffic is P2P related.
Between 75 and 90 percent of all upload traffic is P2P related.
And it seems that more people are using p2p today

In 2004 1 CacheLogic-server registered 3 million IP-addresses in 30 days
In 2006 1 CacheLogic-server registered 3 million IP-addresses in 8 days

So what do people download?
 61,4 percent video
11,3 percent audio
27,2 percent is games/software/etc.

The average filesize of shared files is 1 gigabyte!
Source: http://torrentfreak.com/peer-to-peer-traffic-statistics/

Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
19
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
20
A More Aggressive Idea

Distributing objects and directory:
How to find
Blind
objects
w/o
flooding
directory?
!
Network
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
21
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
22
Gnutella
Distribute file location
 Idea: flood the request
 Hot to find a file:

Send request to all neighbors
 Neighbors recursively multicast the request
 Eventually a machine that has the file receives the
request, and it sends back the answer

Advantages:
 Totally decentralized, highly robust
 Disadvantages:


Not scalable; the entire network can be swamped
with request (to alleviate this problem, each request
has a TTL)
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
23
Gnutella: Unstructured P2P
Ad-hoc topology
 Queries are flooded for bounded number of hops
 No guarantees on recall

xyz
xyz
Query: “xyz”
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
24
Now Bittorrent & Edonkey2000
(2006)
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
25
Lessons and Limitations

Client-Server performs well
 But not always feasible
 Ideal performance is often not the key issue!

Things that flood-based systems do well
 Organic scaling
 Decentralization of visibility and liability
 Finding popular stuff
 Fancy local queries

Things that flood-based systems do poorly
 Finding unpopular stuff [Loo, et al VLDB 04]
 Fancy distributed queries
 Vulnerabilities: data poisoning, tracking, etc.
 Guarantees about anything (answer quality, privacy, etc.)
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
26
Detour …. Bittorrent
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
27
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
28
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
29
BitTorrent – joining a torrent
new leecher
2
join
peer list
metadata file
1
3
tracker
data
request
4
website
seed/leecher
Peers divided into:

seeds: have the entire file

leechers: still downloading
1. obtain the metadata file
2. contact the tracker
3. obtain a peer list (contains seeds & leechers)
4. contact peers from that list for dataShivkumar Kalyanaraman
Rensselaer Polytechnic Institute
30
BitTorrent – exchanging data
leecher B
leecher A
I have
seed
leecher C
● Verify pieces
using hashes
● Download sub-pieces in parallel
● Advertise received pieces to the entire peer list
● Look for the rarest pieces
Rensselaer Polytechnic Institute
31
Shivkumar Kalyanaraman
!
BitTorrent - unchoking
leecher B
leecher A
seed
leecher C
leecher D
● Periodically calculate data-receiving rates
● Upload to (unchoke) the fastest downloaders
● Optimistic unchoking
▪ periodically select a peer at random and upload to it
Shivkumar Kalyanaraman
▪ continuously
look for the fastest partners
Rensselaer Polytechnic
Institute
32
End of Detour ….
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
33
Back to … P2P Structures



Unstructured P2P architecture
 Napster, Gnutella, Freenet
 No “logically” deterministic structures to organize the
participating peers
 No guarantee objects be found
How to find objects within some no. of hops?
 Extend hashing
Structured P2P architecture
 CAN, Chord, Pastry, Tapestry, Tornado, …
 Viewed as a distributed hash table for directory
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
34
How to Bound Search Quality?

Many ideas …, again
Work on
placement!
Network
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
35
High-Level Idea: Indirection

Indirection in space
 Logical (content-based) IDs, routing to those IDs
 “Content-addressable” network
 Tolerant of churn
 nodes joining and leaving the network
to h
h=z

Indirection in time
 Want some scheme to temporally decouple send and receive
 Persistence required. Typical Internet solution: soft state
 Combo of persistence via storage and via retry
 “Publisher” requests TTL on storage
 Republishes as needed

Metaphor: Distributed Hash Table
z
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
36
Basic Idea
Publish (H(y))
P2P Network
Object “y”
Objects have
hash keys
Join (H(x))
Peer “x”
H(y)
H(x)
y
Hash key
Peer nodes also
x have hash keys
in the same
hash space
Place object to the peer with closest hash keys
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
37
Distributed Hash Tables (DHTs)

Abstraction: a distributed hash-table data structure





insert(id, item);
item = query(id); (or lookup(id);)
Note: item can be anything: a data object, document, file, pointer
to a file…
Proposals
 CAN, Chord, Kademlia, Pastry, Tapestry, etc
Goals:



Make sure that an item (file) identified is always found
Scales to hundreds of thousands of nodes
Handles rapid arrival and failure of nodes
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
38
Viewed as a Distributed Hash Table
Hash
table
0
2128-1
Peer
node
Each is responsible for a range of the hash table,
according to the peer hash key
Objects
Note
that are placed in the peer with the closest key
peers are
Internet
edges
Internet
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
39
How to Find an Object?
Hash
table
0
2128-1
Peer
node
Want to keep onlyone hop to
a few entries! find the object
Simplest idea:
Everyone knows everyone else!
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
40
Structured Networks
Distributed Hash Tables (DHTs)
 Hash table interface: put(key,item), get(key)
 O(log n) hops
 Guarantees on recall

K I
(K1,I1)
K I
K I
K I
K I
I1
K I
K I
put(K1,I1)
get (K1)
K I
K I
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
41
Content Addressable Network, CAN
Distributed hash table
 Hash table as in a Cartesian coordinate space
 A peer only needs to know its logical neighbors
 Dimensional-ordered multihop routing

Hash
table
0
2128-1
Peer
node
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
42
Content Addressable Network (CAN)


Associate to each node and
item a unique id in an ddimensional Cartesian space
on a d-torus
Properties
 Routing table size O(d)
 Guarantees that a file is
found in at most d*n1/d
steps, where n is the total
number of nodes
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
43
CAN Example: Two Dimensional Space
Space divided between
nodes
 All nodes cover the entire
space
 Each node covers either a
square or a rectangular
area of ratios 1:2 or 2:1
 Example:
 Node n1:(1, 2) first node
that joins  cover the
entire space

7
6
5
4
3
n1
2
1
0
0
1
2
3
4
5
6
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
44
7
CAN Example: Two Dimensional Space

Node n2:(4, 2) joins 
space is divided between
n1 and n2
7
6
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
45
7
CAN Example: Two Dimensional Space

Node n2:(4, 2) joins 
space is divided between
n1 and n2
7
6
n3
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
46
7
CAN Example: Two Dimensional Space

Nodes n4:(5, 5) and
n5:(6,6) join
7
6
n5
n4
n3
5
4
3
n2
n1
2
1
0
0
1
2
3
4
5
6
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
47
7
CAN Example: Two Dimensional Space
Nodes: n1:(1, 2);
n2:(4,2); n3:(3, 5);
n4:(5,5);n5:(6,6)
 Items: f1:(2,3); f2:(5,1);
f3:(2,1); f4:(7,5);

7
6
n5
n4
n3
5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
48
7
CAN Example: Two Dimensional Space

Each item is stored by the
node who owns its
mapping in the space
7
6
n5
n4
n3
5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
49
7
CAN: Query Example
Each node knows its
neighbors in the d-space
 Forward query to the
neighbor that is closest to
the query id
 Example: assume n1
queries f4
 Can route around some
failures

7
6
n5
n4
n3
5
f4
4
f1
3
n2
n1
2
f3
1
f2
0
0
1
2
3
4
5
6
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
50
7
Another Design: Chord


Node  and  object keys:
 random location around a
circle
Neighbors:




2-i
nodes around the circle
found by routing to desired key
Routing: greedy
 pick nbr closest to destination
Storage: “own” interval

node owns key range between
her key and previous node’s key


 


















Ownership




range
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
51
OpenDHT

A shared DHT service
 The Bamboo DHT
 Hosted on PlanetLab
 Simple RPC API
 You don’t need to deploy
or host to play with a real
DHT!
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
52
Review: DHTs vs Unstructured P2P


DHTs good at:
 exact match for “rare” items
DHTs bad at:




keyword search, etc. [can’t construct DHT-based Google]
tolerating extreme churn
Gnutella etc. (unstructured P2P) good at:
 general search
 finding common objects
 very dynamic environments
Gnutella etc. bad at:
 finding “rare” items
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
53
Distributed Systems Pre-Internet

Connected by LANs (low loss and delay)

Small scale (10s, maybe 100s per server)

PODC literature focused on algorithms to
achieve strict semantics in the face of failures
 Two-phase commits
 Synchronization
 Byzantine agreement
 Etc.
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
54
Distributed Systems Post-Internet

Very different context:
 Huge scales (thousands if not millions)
 Highly variable connectivity
 Failures common

Organic growth

Abandoned distributed strict semantics
 Adaptive apps rather than “guaranteed” infrastructure

Adopted pairwise client-server approach
 Server is centralized (even if server farm)
 Relatively primitive approach (no sophisticated dist.
algms.)
 Little support from infrastructure or middleware
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
55
A Database viewpoint on DHTs…:
Towards Data-centricity, Data
Independence
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
56
Host-centric Protocols

Protocols defined in terms of IP addresses:
 Unicast: IP address = host
 Multicast: IP address = set of hosts

Destination address is given to protocol

Protocol delivers data from one host to another
 unicast: conceptually trivial
 multicast: address is logical, not physical
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
57
Host-centric Applications

Classic applications: destination is “intrinsic”
 telnet: target machine
 FTP: location of files
 electronic mail: email address turns into mail server
 multimedia conferencing: machines of participants

Destination is specified by user (not network)
 Usually specified by hostname not address

DNS translates names into addresses
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
58
Domain Name System (DNS)

DNS is built around recursive delegation
 Top level domains (TLDs): .com, .net, .edu, etc.
 TLDs delegate authority to subdomains
 berkeley.edu
 Subdomains can further delegate
 cs.berkeley.edu

Hierarchy fits host administrative structure
 Local decentralized control
 Crucial to efficient hostname resolution
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
59
Modern Web  Data-Centricity

URLs often function as names of data
users think of www.cnn.com as data, not a host
 Fact that www.cnn.com is a hostname is irrelevant


Users want data, not access to particular host

The web is now data-centric
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
60
Data-centric App in Host-centric World

Data still associated with host names (URLs)
 administrative structure of data same as hosts
 weak point in current web

Key enabler: search engines
 Searchable databases map keywords to URLs
 Allowed users to find desired data

Networkers focused on technical problems:
 HTTP, persistence (URNs), replication (CDNs), ...
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
61
A DNS for Data? DHTs…

Can we map data names into addresses?
 a data-centric DNS, distributed and scalable
 doesn’t alter net protocols, but aids data location
 not just about stolen music, but a general facility

A formidable challenge:
 Data does not have a clear administrative hierarchy
 Likely need to support a flat namespace
 Can one do this scalably?

Data-centrism requires scalable flat lookups => DHTs
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
62
Data Independence In DB Design

Decouple app-level API from data organization
 Can make changes to data layout without
modifying applications
 Simple
version: location-independent names
 Fancier:
declarative queries
“As clear a paradigm shift as we can hope to find in computer science”
- C. Papadimitriou
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
63
The Pillars of Data Independence


DBMS
B-Tree
Indexes
 Value-based lookups have
to compete with direct
access
 Must adapt to shifting data
distributions
 Must guarantee
performance
Join Ordering,
AM Selection,
etc.
Query Optimization
 Support declarative queries
beyond lookup/search
 Must adapt to shifting data
distributions
 Must adapt to changes in
environment
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
64
Generalizing Data Independence

A classic “level of indirection” scheme
 Indexes are exactly that
 Complex queries are a richer indirection

The key for data independence:
 It’s all about rates of change

Hellerstein’s Data Independence Inequality:
 Data independence matters when
d(environment)/dt >> d(app)/dt
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
65
Data Independence in Networks
d(environment)/dt >> d(app)/dt

In databases, the RHS is unusually small
 This drove the relational database revolution

In extreme networked systems, LHS is unusually high
 And the applications increasingly complex and datadriven
 Simple indirections (e.g. local lookaside tables)
insufficient
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
66
Hierarchical Networks (& Queries)

IP
Hierarchical name space (www.vldb.org, 141.12.12.51)
 Hierarchical routing
 Autonomous Systems correlate with name space (though not
perfectly)
DNS
 Hierarchical name space (“clients” + hierarchy of servers)
 Hierarchical routing w/aggressive caching
 13 managed “root servers”



Traditional pros/cons of Hierarchical data mgmt
 Works well for things aligned with the hierarchy
 Esp. physical locality a la Astrolabe
 Inflexible
 No data independence!
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
67
The Pillars of Data Independence


DBMS
B-Tree
Indexes
 Value-based lookups have
to compete with direct
access
 Must adapt to shifting data
distributions
 Must guarantee
performance
P2P
ContentAddressable
Overlay
Networks
(DHTs)
Join Ordering, Multiquery
AM Selection, dataflow
etc.
sharing?
Query Optimization
 Support declarative queries
beyond lookup/search
 Must adapt to shifting data
distributions
 Must adapt to changes in
environment
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
68
Sensor Networks:
The Internet Meets the Environment
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
69
Today: Internet meets Mobile Wireless Computing
iPoD: impact of disk size/cost
Samsung Cameraphone
w/ camcorder




SONY PSP: mobile gaming
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
Blackberry: phone + PDA
Computing: smaller, faster
Disks: larger size, small
form
Communications: wireless
voice, data
Multimedia integration:
voice, data, video, games
70
Tomorrow: Embedded Networked Sensing Apps
Seismic Structure
response
Marine
Microorganisms

Micro-sensors, on-board
processing, wireless
interfaces feasible at very
small scale--can monitor
phenomena “up close”

Enables spatially and
temporally dense
environmental monitoring
Contaminant
Transport
Embedded Networked
Sensing will reveal
previously
unobservable
phenomena
Ecosystems,
Biocomplexity
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
71
Embedded Networked Sensing:
Motivation
Imagine:






high-rise buildings self-detect structural faults (e.g., weld cracks)
schools detect airborn toxins at low concentrations, trace
contaminant transport to source
buoys alert swimmers to dangerous bacterial levels
earthquake-rubbled building infiltrated with robots and sensors:
locate survivors, evaluate structural damage
ecosystems infused with chemical, physical, acoustic, image
sensors to track global change parameters
battlefield sprinkled with sensors that identify track friendly/foe
air, ground vehicles, personnel
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
72
Embedded Sensor Nets: Enabling Technologies
Embed numerous distributed
devices to monitor and interact
with physical world
Network devices
to coordinate and perform
higher-level tasks
Embedded
Networked
Exploit
collaborative
Sensing, action
Control system w/
Small form factor
Untethered nodes
Sensing
Tightly coupled to physical world
Shivkumar Kalyanaraman
Exploit spatially/temporally dense, in situ/remote, sensing/actuation
Rensselaer Polytechnic Institute
73
Sensornets

Vision:
 Many sensing devices with radio and processor
 Enable fine-grained measurements over large areas
 Huge potential impact on science, and society

Technical challenges:
 untethered: power consumption must be limited
 unattended: robust and self-configuring
 wireless: ad hoc networking
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
74
Similarity w/ P2P Networks

Sensornets are inherently data-centric
 Users know what data they want, not where it is
 Estrin, Govindan, Heidemann (2000, etc.)

Centralized database infeasible
 vast amount of data, constantly being updated
 small fraction of data will ever be queried
 sending to single site expends too much energy
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
75
Sensor Nets: New Design Themes

Self configuring systems that adapt to unpredictable
environment


Leverage data processing inside the network




dynamic, messy (hard to model), environments preclude preconfigured behavior
exploit computation near data to reduce communication
collaborative signal processing
achieve desired global behavior with localized algorithms
(distributed control)
Long-lived, unattended, untethered, low duty cycle
systems


energy a central concern
communication primary consumer of scarce energy resource
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
76
From Embedded Sensing to Embedded
Control

embedded in unattended “control systems”
 control network, and act in environment

critical app’s extend beyond sensing to control and actuation
 transportation, precision agriculture, medical monitoring and
drug delivery, battlefield app’s
 concerns extend beyond traditional networked systems and
app’s: usability, reliability, safety

need systems architecture to manage interactions
 current system development: one-off, incrementally tuned,
stove-piped
 repercussions for piecemeal uncoordinated design: insufficient
longevity, interoperability, safety, robustness, scaling
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
77
Why cant we simply adapt Internet
protocols, “end to end” architecture?

Internet routes data using IP Addresses in Packets and Lookup
tables in routers
humans get data by “naming data” to a search
engine
 many levels of indirection between name and IP
address
 embedded, energy-constrained (un-tethered,
small-form-factor), unattended systems cant
tolerate communication overhead of indirection


special purpose system function(s): don’t need want Internet
general purpose functionality designed for elastic applications.

Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
78
Sample Layered Architecture
User Queries, External Database
Resource
constraints call
for more tightly
integrated layers
In-network: Application processing,
Data aggregation, Query processing
Data dissemination, storage, caching
Open Question:
What are defining
Architectural
Principles?
Adaptive topology, Geo-Routing
MAC, Time, Location
Rensselaer Polytechnic Institute
Phy: comm, sensing, actuation,
SP
Shivkumar
Kalyanaraman
79
Coverage measures


D
x

S

area coverage: fraction of
area covered by sensors
detectability: probability
sensors detect moving
objects
node coverage: fraction
of sensors covered by
other sensors
control:

Given: sensor field (either
known sensor locations, or
spatial density)

where to add new nodes
for max coverage
how to move existing
nodes for max coverage
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
80
In Network Processing



communication expensive
when limited
 power
 bandwidth
perform (data) processing
in network
 close to (at) data
 forward
fused/synthesized
results
 e.g., find max. of data
distributed data, distributed
computation
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
81
Distributed Representation and Storage
K V
K V
K V
K V
K V
K V
K V
K V

K V
Data Centric Protocols, In-network Processing goal:

K V K V



pattern-triggered data collection



Time
Interpretation of spatially distributed data (Pernode processing alone is not enough)
network does in-network processing based on
distribution of data
Queries automatically directed towards nodes
that maintain relevant/matching data

Multi-resolution data storage and retrieval
Distributed edge/feature detection
Index data for easy temporal and spatial
searching
Finding global statistics (e.g., distribution)
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
82
Directed Diffusion: Data Centric Routing

Basic idea
name data (not nodes) with externally relevant
attributes: data type, time, location of node, SNR,
 diffuse requests and responses across network using
application driven routing (e.g., geo sensitive or not)
 support in-network aggregation and processing


data sources publish data, data clients subscribe to data

however, all nodes may play both roles
node that aggregates/combines/processes incoming sensor
node data becomes a source of new data
 node that only publishes when combination of conditions
arise, is client for triggering event data


true peer to peer system?
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
83
Traditional Approach: Warehousing


data extracted from sensors, stored on server
Query processing takes place on server
Warehouse
Front-end
Sensor Nodes
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
84
Sensor Database System

Sensor Database System supports distributed query
processing over sensor network
Sensor
DB
Sensor
DB
Sensor
DB
Sensor
DB
Front-end
Sensor
DB
Sensor
DB
Sensor
DB
Sensor
DB
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
85
Sensor Nodes
Sensor Database System

•
Characteristics of a Sensor
Network:

Streams of data

Uncertain data

Large number of nodes

Multi-hop network

No global knowledge
about the network

Node failure and
interference is common

Energy is the scarce
resource

Limited memory

No administration, …
Can existing database
techniques be reused?
What are the new problems
and solutions?
 Representing sensor data
 Representing sensor
queries
 Processing query
fragments on sensor
nodes
 Distributing query
fragments
 Adapting to changing
network conditions
 Dealing with site and
communication failures
 Deploying and Managing a
sensor database system
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
86
Summary

P2P networks: Napster, Gnutella, Kazaa

Distributed Hash Tables (DHTs)

Database perspectives: data-centricity, dataindependence

Sensor networks and its connection to P2P
Shivkumar Kalyanaraman
Rensselaer Polytechnic Institute
87