A Framework for Highly-Available Cascaded Real

Download Report

Transcript A Framework for Highly-Available Cascaded Real

Wide-Area Service Composition:
Availability, Performance, and
Scalability
Bhaskaran Raman
SAHARA, EECS, U.C.Berkeley
SAHARA Retreat, Jan 2002
Service Composition: Motivation
Provider A
Cellular
Phone
Video-on-demand
server
Provider A
Provider B
Transcoder
Thin
Client
Provider B
Replicated
instances
Text
to
speech
Provider R
Email
repository
Service-Level Path
Provider Q
Other examples:
ICEBERG, IETF OPES’00
In this work: Problem Statement and
Goals
• Problem Statement
– Path could stretch across
– multiple service providers
– multiple network domains
– Inter-domain Internet paths:
– Poor availability [Labovitz’99]
– Poor time-to-recovery
[Labovitz’00]
– Take advantage of service replicas
• Goals
– Performance: Choose set of service instances
– Availability: Detect and handle failures quickly
– Scalability: Internet-scale operation
In this work: Assumptions and Nongoals
• Operational model:
– Service providers deploy different services at various
network locations
– Next generation portals compose services
– Code is NOT mobile (mutually untrusting service
providers)
• We do not address service interface issue
• Assume that service instances have no persistent
state
– Not very restrictive [OPES’00]
Related Work
• Other efforts have addressed:
– Semantics and interface definitions
• OPES (IETF), COTS (Stanford)
– Fault tolerant composition within a single cluster
• TACC (Berkeley)
– Performance constrained choice of service, but not for
composed services
• SPAND (Berkeley), Harvest (Colorado), Tapestry/CAN
(Berkeley), RON (MIT)
• None address wide-area network performance or
failure issues for long-lived composed sessions
Solution: Requirements
• Failure detection/liveness tracking
– Server, Network failures
• Performance information collection
– Load, Network characteristics
• Service location
• Global information is required
– Hop-by-hop approach will not work
Challenges
• Scalability and Global information
– Information about all service
instances, and network paths inbetween should be known
• Quick failure detection and
recovery
– Internet dynamics  intermittent
congestion
Is “quick” failure detection possible?
• What is a “failure” on an Internet path?
– Outage periods happen for varying durations
• Study outage periods using traces
– 12 pairs of hosts
– Periodic UDP heart-beat, every 300 ms
– Study “gaps” between receive-times
• Main results:
– Short outage (1.2-1.8 sec)  Long outage (> 30 sec)
• Sometimes this is true over 50% of the time
– False-positives are rare:
• O(once an hour) at most
– Okay to react to short outage periods, by switching
service-level path
Towards an Architecture
• Service execution platforms
– For providers to deploy services
– First-party, or third-party service platforms
• Overlay network of such execution platforms
– Collect performance information
– Exploit redundancy in Internet paths
Architecture
Destination
Application
plane
Composed services
Logical
platform
Peering relations,
Overlay network
Hardware
platform
Service clusters
Internet
Source
Peering:
exchange
perf. info.
Service cluster: compute
cluster capable of running
services
Key Design Points
• Overlay size:
– Could grow much slower than #services, or #clients
– How many nodes?
• A comparison: Akamai cache servers
• O(10,000) nodes for Internet-wide operation
• Overlay network is virtual-circuit based:
– “Switching-state” at each node
• E.g. Source/Destination of RTP stream, in transcoder
– Failure information need not propagate for recovery
• Problem of service-location separated from that of
performance and liveness
• Cluster  process/machine failures handled within
Location of Service Replicas
Finding Overlay Entry/Exit
Software Architecture
Service-Level Path
Creation, Maintenance, Recovery
Link-State Propagation
At-least
-once UDP
Perf.
Meas.
Liveness
Detection
Functionalities at the Cluster-Manager
Service-Composition
Layer
Link-State Layer
Peer-Peer Layer
Layers of Functionality
• Link-State layer: Why Link-State?
• Service-Composition layer:
– What algorithm for path creation?
– Algorithm for path recovery?
• State management?
Evaluation
• What is the effect of recovery mechanism on
application?
• What is the scaling bottleneck?
• Idea: Use real
implementation, emulate the
wide-area network behavior
(NistNET)
• Opportunity: Millennium
cluster
Node 1
App
Evaluation:
Emulation Testbed
Rule for 12
Emulator
Rule for 13
Lib
Rule for 34
Node 2
Rule for 43
Node 3
Node 4
Evaluation: Recovery of
Application Session
• Text-to-Speech application
Leg-2
End-Client
Leg-1
Text
to
audio
Text Source
Request-response protocol
Data (text, or RTP audio)
Keep-alive soft-state refresh
Application soft-state (for restart on failure)
• Two possible places of failure
• Setup:
– 20-node overlay network
– One service instance for each service
– Deterministic failure for 10sec during session
• Metric: gap between arrival of successive audio packets at
the client
Jump 2: at 2,963 ms
Jump 1: 350-450 ms
Recovery of
Application
Session:
CDF of
gaps>100ms
Jump at 10,000 ms
Discussion
• Jump 1: Due to synchronous text-to-speech processing
• Jump 2: Recovery after failure
–
–
–
–
Breakup: 2,963 = 1,800 + O(700) + O(450)
1,800 ms: timeout to conclude failure
700 ms: signaling to setup alternate path
450 ms: recovery of application soft-state
• Re-processing current sentence
• Without recovery algorithm: takes as long as failure
duration
• O(3 sec) recovery
– Can be completely masked with buffering
– Interactive apps: still much better than without recovery
• Quick recovery possible since failure information does
not have to propagate across network
Evaluation: Scaling
• Scaling bottleneck:
– Simultaneous recovery of all client sessions on a failed
overlay link
• Can recover at least 1,500 paths without hitting
bottlenecks
– Translates to about 700 simultaneous client sessions per
cluster-manager
– In comparison, our text-to-speech implementation can
support O(15) clients per machine
• Other scaling concerns:
– Link-State floods
– Graph computation for service-level path creation
Summary
• Service Composition: flexible service creation
• We address performance, availability, scalability
• Initial analysis: Failure detection -- meaningful
to timeout in O(1.2-1.8 sec)
• Design: Overlay network of service clusters
• Evaluation: results so far
– Good recovery time for real-time applications:
O(3 sec)
– Good scalability -- minimal additional provisioning
for cluster managers
• Ongoing work:
– Overlay topology issues: how many nodes, peering
– Stability issues
Feedback, Questions?
Presentation made using VMWare
References
•
•
•
•
•
•
[OPES’00] A. Beck and et.al., “Example Services for Network Edge
Proxies”, Internet Draft, draft-beck-opes-esfnep-01.txt, Nov 2000
[Labovitz’99] C. Labovitz, A. Ahuja, and F. Jahanian, “Experimental
Study of Internet Stability and Wide-Area Network Failures”, Proc.
Of FTCS’99
[Labovitz’00] C. Labovitz, A. Ahuja, A. Bose, and F. Jahanian, “Delayed
Internet Routing Convergence”, Proc. SIGCOMM’00
[Acharya’96] A. Acharya and J. Saltz, “A Study of Internet RoundTrip Delay”, Technical Report CS-TR-3736, U. of Maryland
[Yajnik’99] M. Yajnik, S. Moon, J. Kurose, and D. Towsley,
“Measurement and Modeling of the Temporal Dependence in Packet
Loss”, Proc. INFOCOM’99
[Balakrishnan’97] H. Balakrishnan, S. Seshan, M. Stemm, and R. H.
Katz, “Analyzing Stability in Wide-Area Network Performance”, Proc.
SIGMETRICS’97