Building Undo for Operators: An Undoable E

Download Report

Transcript Building Undo for Operators: An Undoable E

Building Undo for Operators:
An Undoable E-mail Store
Aaron Brown
ROC Research Group
University of California, Berkeley
Winter 2003 ROC Research Retreat
14 January 2003
Outline
• Recap of “Three-R’s” Undo model
• Implementing Undo for e-mail
• Analysis of e-mail undo prototype
• Conclusions and future directions
Slide 2
Recap: Undo for Operators
• Dependability is achieved at the interface of
the system and its operator
• Undo goal: support the operator
– create a forgiving operations environment
» tolerate human error, allow trial-and-error experiments
– provide last-resort recovery from unknown damage
» retroactive repair of software bugs, broken upgrades,
unknown operator changes, external attacks
– be easily comprehensible by operator and developer
– preserve user data while refreshing system state
Slide 3
Recap: The Three-R’s Undo Model
• Provide Undo as virtual time travel
– Rewind: roll back all system state, physically
– Repair: make changes to past timeline to avert
original disaster
– Replay: roll state forward logically, merging original
timeline with effects of repairs
• Key properties of 3R’s Undo
– recovery from problems at any system layer
– recovery from unanticipated problems
– no assumptions about correct application behavior
Slide 4
Outline
• Recap of “Three-R’s” Undo model
• Implementing Undo for e-mail
• Analysis of e-mail undo prototype
• Conclusions and future directions
Slide 5
Building an E-mail Undo Prototype
• Target: an undoable e-mail store service
– a leaf node in the Internet e-mail network
– accessed via IMAP and SMTP
• Built around existing e-mail store service
– e.g., sendmail+imapd, iPlanet, Exchange
• Extensible to other apps
– architected around a reusable undo core with e-mailspecific extensions
• Written in Java
Slide 6
Undo System Architecture
Control
UI
User
IMAP, SMTP
Verbs
E-mail Proxy
Undo
Manager
Timeline
Log
IMAP, SMTP
E-mail
Service
Includes:
- user state
- application
- OS
Time-travel
Storage
Slide 7
Architecture Design Points
• Proxy-based design targeted at services
• Application-neutral core
– undo manager, timeline log, time-travel storage, UI
– contains all undo-cycle logic
• E-mail-specific semantics encapsulated
– proxy: encodes e-mail interactions into verbs
– verbs: define e-mail semantics
» model of preserved state
» model of acceptable external (observed) consistency
Slide 8
Verbs
• Key construct in undo architecture
– represent end-user state changes, state exposure
– building block of timeline and history
– used to detect and manage inconsistencies
• Essential for reasoning about undo
– verbs define exactly what state is preserved during an
undo cycle
– provide framework for defining application’s model of
acceptable external inconsistency
Slide 9
Verbs and the Undo Cycle
• Verb flow
Normal operation
Replay
Control
UI
User
Verbs
E-mail Proxy
Undo
Manager
E-mail
Service
Includes:
- user state
- application
- OS
Timeline
Log
Time-travel
Storage
Slide 10
Comparison to Optimistic
Replication
• Insight: undo can be recast as replication—
not in space, but in time
–
–
–
–
two virtual replicas diverge at the Rewind point
verb history defines an update log to one replica
repairs define updates to the other replica
Replay involves reconciling the two update streams,
detecting and handling inconsistencies
• Verb-based design is heavily influenced by
replica systems
– verbs map to replica system “actions” or “updates”
Slide 11
Comparison to Optimistic
Replication (2)
• But Undo has key differences:
– reconciliation between a logical log (verb history) and
a physical state (repaired system)
» so log-merging schemes (e.g., IceCube) don’t apply
– not all inconsistencies are bad
» no need to check for inconsistency until externalization
» no complex, error-prone guards on verb execution
– inconsistencies can be handled after-the-fact
» post-conditions on externalized data are sufficient
» fewer assumptions of application correctness
» if necessary, can rewind undesired inconsistencyproducing verb
Slide 12
E-mail example: original timeline
System
boundary
System
state
Hello
olleH
m
m
Inbox
olleH
olleH
Folder1
Deliver
Externalizes: —
Verbs
History
log
+ input “Hello”
Time
Copy
Fetch
Externalizes: —
Externalizes: m
+ Signature(m)=“olleH”
Slide 13
E-mail example: replay timeline
System
boundary
System
state
Hello
Hello
olleH
m
m
X
Inbox
olleH
Hello
olleH
Hello
Folder1
Deliver
Externalizes: —
Verbs
History
log
+ input “Hello”
Time
mismatch!
=> inconsistency
Copy
Fetch
Externalizes: —
Externalizes: m
+ Signature(m)=“olleH”
Slide 14
Verbs and External Inconsistency
• Detecting inconsistency
– verbs that externalize state record a signature of
externalized data and define a comparison function
» comparison fn. defines the external consistency model
• Handling inconsistency
– verbs define compensation routines, invoked when
inconsistencies are found
– later non-commuting verbs are squashed to prevent
data loss
Slide 15
Verbs for E-mail
• SMTP & IMAP protocols mapped into 13 verbs
– SMTP: Deliver
– IMAP: Append, Fetch, Store, Copy, List, Status,
Select, Expunge, Close, Create, Rename, Delete
– set could easily be extended with user and
subscription management functionality
• Each verb is a Java class implementing a
generic, app-neutral Verb interface
Slide 16
Example Verb: STORE (set msg flags)
• Tag
– input: target folder, message IDs, flag value
– externalized output: resultant flags of messages
• Sequencing tests
– commutes with & independent of SMTP verbs, all
IMAP verbs except Fetch/Store/Close on same folder
• Consistency check
– new flags must match original flags for all messages
in common; no original messages should be missing
• Inconsistency handler
– leave user a message explaining and listing changes
Slide 17
An External Consistency Model
for E-mail
• Retrieval (IMAP)
– tracked state includes message bodies, key message
headers (to/from/subject/cc), flags, folder lists, and
execution status of state-altering commands
– inconsistency if objects are missing or altered on
replay or commands fail; order & new objects ignored
– compensation via explanatory messages and creation
of “lost-and-found” containers
• Delivery (SMTP)
– only possible inconsistency is in execution status
– one tricky case: originally-failed delivery succeeds
» delay bounce to provide window for undo-repair
Slide 18
Undo System Architecture
Control
UI
User
IMAP, SMTP
Verbs
E-mail Proxy
Undo
Manager
Timeline
Log
IMAP, SMTP
E-mail
Service
Time-travel
Storage
• Next up:
– proxy
– undo manager
– time-travel
storage layer
Slide 19
Proxying E-mail
• IMAP proxy loop is straightforward
– accept client connection and open server connection
– parse commands and instantiate corresponding verbs
– invoke undo manager to schedule, execute, log verb
» passing a handle to the client and server connections
• SMTP is more complicated
– failed SMTP deliveries must be logged for later retry
– proxy poses as a server that always accepts deliveries
» deliver verb is created after client has been ACK’d
» some finesse to avoid being an open-relay
– if verb fails, proxy sends a bounce after a delay
Slide 20
Undo Manager Implementation
• Timeline log
– BerkeleyDB recno database storing serialized verbs
– also stores control records to make undo manager
state persistent and recoverable
• Verb scheduling
– all verb execution passes through undo manager
– scoreboard-like structure sequences verbs according
to independence, commutativity, ordering properties
• External inconsistency management
– undo manager coordinates verb APIs to ensure that
external inconsistencies are detected and handled
Slide 21
A Time-Travel Storage Layer
• Base: Network Appliance Filer with snapshots
• Java wrapper for Filer’s management CLI
– provides direct control of snapshot create/restore
– periodically takes snapshots, aging out old ones
» hierarchical scheduling allows the 31 snapshots to span
one month, with density inversely proportional to age
• Rewind restores the closest prior snapshot,
then replays up to the exact rewind point
• Challenge: making snapshot restore undoable
– solution: implement rollback by copying old snapshot
forward to present (expensive, but necessary)
Slide 22
Outline
• Recap of “Three-R’s” Undo model
• Implementing Undo for e-mail
• Analysis of e-mail undo prototype
• Conclusions and future directions
Slide 23
Code complexity
• Entire system is about 23K lines of Java
– split evenly between app-neutral and
e-mail-specific code
– bulk of e-mail code is in verbs
• Implementing and debugging the e-mail verbs
and proxy took roughly two man-months
Slide 24
Measuring Overhead for Undo
• Workload
– modified SPECmail2000 e-mail benchmark
» simulates traffic seen by an ISP mail server
» modified to use IMAP instead of POP, all mail local
– configured with 5000 users
» about 56 SMTP cxns/minute, 149 IMAP cxns/minute
• Setup
–
–
–
–
mail server: Linux, sendmail + UW imapd, 5000 users
undo proxy: Win2k, Sun 1.4.1 JDK
workload generator: Win2k, Sun 1.4.1 JDK
storage: all mailboxes and logs on NetApp Filer
» mailboxes accessed via NFS, logs via CIFS
Slide 25
Overhead Results: Space and Time
• Space overhead
– cumulative distribution plot
of IMAP and SMTP session
lengths
100
Percentage of Sessions
– 5GB/day for timeline log
= 1GB/day per 1000 users
– 325KB per 1000 mail folder
name translations
– per 120GB disk:
» 7 weeks of timeline for
1000 ISP users
» or 350 million folder
name translations
• Time overhead
80
60
40
IMAP, no undo
IMAP, undo
SMTP, no undo
SMTP, undo
20
0
0.01
0.1
1
10
100
Session Length (seconds)
Slide 26
Outline
• Recap of “Three-R’s” Undo model
• Implementing Undo for e-mail
• Analysis of e-mail undo prototype
• Conclusions and future directions
Slide 27
Next Steps
• Perform more analysis of prototype
– speed of rewind, replay, entire undo cycle
• Measure efficacy of undo with human expts.
– compare recovery time from pre-defined scenarios
with and without undo
» based on survey of e-mail administrators
• Examine other applications to understand
generality of undo architecture
– auctions, e-commerce, others?
Slide 28
Conclusions
• We can build a real Undo implementation
–
–
–
–
for a real application
with tolerable overhead
with a reusable, application-independent core
based on a verb model that enables reasoning about
state and undo behavior
• Look forward to final analysis at next
retreat!
Slide 29
Building Undo for Operators:
An Undoable E-mail Store
Aaron Brown
ROC Research Group
University of California, Berkeley
[email protected]
http://roc.cs.berkeley.edu/
Backup Slides
Slide 31
Summary: What’s in a Verb?
• Encapsulation of a user-service interaction
– action specification
– procedure to execute verb via proxy
– tag: input parameters, signature of externalized data,
verb execution status
• Commutativity/independence test functions
• Consistency management functions
– routine to check two tags for external inconsistency
– inconsistency handler
– squash routine
Slide 32
Verbs and the Undo Timeline
• Recorded history of verbs must be consistent
with actual verb execution
– but logging at proxy may not match execution order
• Solution: partial serialization of verbs
– undo manager tracks arriving & executing verbs in
scoreboard
– arriving verbs that commute are executed in parallel
– non-commuting verbs stall until dependencies clear
• Guarantees consistent recorded timeline
without requiring hooks into application
Slide 33
Verb Issue: Naming State
• Verbs need to specify names of target state
– folders, messages, users
• Names must be time-invariant and
unaffected by changes to execution timeline
– IMAP protocol names can’t guarantee this
• Solution: allocate “UndoIDs” to objects
– UndoID never changes and is restored on replay
– undo system maintains translations to IMAP names
» messages: embed UndoID in header field
» folders: track UndoIDs in parallel database
Slide 34