Transcript Porcupine

CSE 291 Presentation on
Porcupine: a highly scalable
email service
Authors: Y. Saito, B. N. Bershad and
H. M. Levy
This presentation by:
Pratik Mukhopadhyay
Full citation:
Yasushi Saito, Brian N. Bershad, and Henry M. Levy. "Manageability, Availability
and Performance in Porcupine: A Highly Scalable Cluster-based Mail Service."
Proceedings of the 17th ACM Symposium on Operating Systems Principles,
December 1999.
2/1/00
Goals of Porcupine
• High Performance :
must handle billions of messages
• Good Scalability :
must scale to 100’s of nodes, yet have competitive
single node performance
• High Availability :
must mask node failures from users
• Easy System Administration
2/1/00
System Architecture
• Functionally homogeneous nodes
• Key processes :
• membership manager
• mailbox manager, user profile manager
• replication manager
• mail delivery proxy (SMTP)
• mail retrieval proxy (POP & IMAP)
2/1/00
Terminology
• Mailbox fragment
• Mailbox fragment list
• User profile database
• User profile soft state
• User map
• Cluster membership list
2/1/00
System Management
Desired features :
• Transparent handling of node addition, deletion and
temporary node failures
• Load balancing across nodes automatically in face
of changing workloads
2/1/00
Membership services
• Uses a variant of Three Round Membership Protocol
Failure detection methods:
• remote operation timeout
• ping neighbor in IP address order periodically
• broadcast probe packets periodically
Is broadcasting a good idea ? Allowing any
node to be a coordinator ?
2/1/00
Recovery process
• User map reconstruction
• Soft state reconstruction
-- A 2 step process :
+ Find changes
+ Notify changes
Do we reconfigure after every failure ?
Cache soft state information ?
2/1/00
Scaling
• Easy addition of new nodes : just install software and
connect to network ( make IP address known to users )
• Performance studies show that the system uses
the newly available resources
2/1/00
Replication
Basic properties :
• update anywhere
• eventual consistency
• total updates
• no locking
• ordered by loosely synchronized clocks
Relaxed consistency for the user database ?
2/1/00
Load balancing
• Collecting load information :
+ side effect of RPC operations
+ load information packets
• Limit spread of a users mail for better performance
2/1/00
Conclusions
• Performance studies show that Porcupine is
scalable, highly available and makes good use of resources
under all workloads.
2/1/00
Miscellaneous
Functional homogeneity -- good or bad ?
Will it work for stuff other than email ?
For a very large email system do we want a single
geographical presence ?
Special support for mailing list mail ?
2/1/00