Slides - CS, Technion

Download Report

Transcript Slides - CS, Technion

Gil Einziger
Roy Friedman
Computer Science Department
Technion
Background: Publish/Subscribe
• Publisher: any entity that wishes to publish
some event
Look at my
new hairstyle
Background: Publish/Subscribe
• Subscriber: any entity that wishes to be
notified about events that match its interests
(also called subscription)
I want to
know
everything
about Ariana
Grande
I only care
about science
fiction
I want to
know
everything
about
hairstyles
Background: Publish/Subscribe
• The system’s goal is to deliver notifications about
events to interested subscribers and only to them
– Decoupling of information producers from consumers
• Applications:
–
–
–
–
–
Social networking (Twitter and the likes)
Stock quotes
Control systems
Data-center management
Etc.
Background: P2P
• Decentralized systems in which (most of) the
communication is performed directly between the
end nodes of the system – the peers
• Often, peers are donated users’ machines
– But can also be set-top boxes, routers, a large datacenter’s
servers, etc.
• Famous example applications:
– Skype, Bittorrent (and other file sharing), IPTV, Bitcoin
It’s a Brave New World Out There
• Most users access online content through their
mobiles
–
–
–
–
Intermittent connectivity
Limited bandwidth
Limited battery life
Limited resources
• We need to decouple the devices used to access the
data and the ones serving the P2P network
A Vision for Future P2P Solutions
• Whether ran as a true P2P network between
donated machines or inside a data-center:
– Decoupling between devices that consume services and
the ones providing the service
• Incentives might be in the form of revenue share with advertisers
or paid subscribers
– P2P machines are used for providing multiple services
• Not feasible to optimize the P2P overlay for a specific service
Problem Statement
• A scalable and efficient pub/sub system for selfsustained P2P networks
– The challenge
• Subscribers might not be present much of the time
• Short client sessions
• Use the existing overlay
– Quality Goals
•
•
•
•
High delivery rate
Efficient publication delivery
Reasonable latency
High churn resilience
Our Solution: Overview
Our Solution: Subscribing to Mailbox
• Clients that are aware of mailboxes serving their topics, simply
notify these mailboxes about it
• A client that is unaware of such mailboxes, initiates multiple
biased random walks
– Each mailbox distributes a hint (Bloom filter) with the topics it
subscribes to its overlay neighbors up to some distance
– The random walks favor visiting nodes whose hints include a match
– The random walks continue for a given TTL trying to find as many
matching mailbox as possible
• If none is found, then the home node becomes a mailbox for
these topics
Our Solution: Mailboxes
• An overloaded mailbox can refuse to accept
new clients and topics
• Mailboxes disappear naturally due to churn
or when they are underutilized
• The important objective is load sharing rather
than load balancing
Our Solution: Dissemination
• Spanning tree among mailboxes that know each other
• Random walks to discover new mailboxes and
disseminate to them
Our Solution: Dissemination
• Spanning tree among mailboxes that know each other
• Random walks to discover new mailboxes and
disseminate to them
Our Solution: Dissemination
• Spanning tree + random walks between mailboxes
• Normally, a mailbox pushes events to corresponding
registered clients
• Additionally, out-of-band gossip between mailboxes
and clients
– Clients poll their set of known mailboxes
– Exchange list of known events with each polled mailbox
– Occurs periodically plus after re-connection
Implementation
– Written in Java
– Open source project
• All code including testing available online
– Can be run on top of real IP networks as well as the
PeerSim simulator
• In the real networks case, executed on top of the
OpenKAD implementation of the Kademlia DHT
– Measurements confirm similarity between results with similar
size networks
• Simulations can be used to explore scalability
• Real networks can be used to validate simulation results
Evaluation: Methodology
• Traces:
– Synthetic traces
• Subscriptions are spread to clients/home nodes uniformly
• Topic publication distribution is Zipf-like with α=0.9
– Twitter traces
• Metrics:
–
–
–
–
Delivery rate
Communication load
Mailbox subscription pattern vs. users’
Effects of churn
Results: Synthetic Workload
Delivery Rate
Delivery rate over time vs. network size
1) Delivery rate approachs 100% after a few minutes
2) For 1500 nodes, simulation ~ real runs
Time (minutes)
Results: Twitter Traces
Delivery Rate
Delivery rate over time
Time (minutes)
Results: Load Distribution
Total Handled Messages
Load Distribution (Twitter)
1) Almost all load goes to mailboxes only
2) Even most loaded need to handle
fewer than 10 messages per second
# node
Results: Subscription Pattern
Subscription Number
Twitter (Feb. 7 2010 19:00-20:20)
Subscription pattern of mailboxes
much more uniform than of clients
=> balanced dissemination trees
# node
Results: Subscription Pattern
Mailbox Subscriptions
#Registered Clients/#Registered Mailboxes
Only a small number of mailboxes register even to
the most popular topics => the dissemination trees
are relatively small
Client Subscriptions
Results: Single 10% Churn Event
Miss Rate
Churn Recovery Time
Time (minutes)
Results: Repetitive 10% Churn
Miss Rate
Churn Recovery Over Time
Time (minutes)
Results: Repetitive 100% Churn
Miss Rate
Churn Recovery Time (Loosing All Mailboxes)
Time (minutes)
Summary
• The concept of elastic mailboxes
– Self electing, self evaporating, self organizing
• Complementing delivery mechanisms
– Spanning tree
– Random walks
– Out-of-band gossip through clients interaction
• End result:
– Mailboxes dramatically reduce the scalability problem
– Highly efficient, highly effective, highly robust to failures
and churn
Open Issues
• Exploit subscription similarity
• Privacy
Q&A
• Thanks for listening…