application - Networked Systems Laboratory
Download
Report
Transcript application - Networked Systems Laboratory
Real-time Application Monitoring and
Diagnosis for Service Hosting
Platforms of Black Boxes
Huadong Liu (U. of Tennessee)
Hui Zhang, Rauf Izmailov, Guofei Jiang,
Xiaoqiao Meng (NEC Labs America)
Presented by: Hui Zhang
©NEC Laboratories America
1
outline
Motivation
SRAMD architecture
Application component dependency
discovery
Evaluation
Conclusions
©NEC Laboratories America
2
Motivation
App. 1
App. 3
App. 2
App. 4
Service hosting systems
Web farms, service-oriented utility computing networks, Peer-to-Peer
service composition based computing grids, …
Service management
Fault diagnosis, capacity planning, performance analysis, impact
analysis, etc.
Challenges
Application components are usually delivered as black-boxes w/o
sufficient instrumentation
The huge amount of logging information in large-scale systems makes
real-time monitoring and debugging unrealistic with a centralized
approach
©NEC Laboratories America
3
An intuition of the SRAMD Art
Source: www.pictureMOSAICs.com
©NEC Laboratories America
4
Scalable Real-time Application Monitoring and Diagnosis
SRAMD: an extensible tool that is
easy to deploy
scalable, and
able to effectively profile the intricate dependency
relationships among interacting application components seen
as black boxes.
Our approach
uses low level packet traces instead of high level event traces
to get insight into application components
Has end-system instrumentation for close observation on the
correlation between application performance and local
resource utilization, and for enabling a rich set of queries for
diagnosis
understands the overall system/application behavior and
performance by aggregating and correlating summarizations
from distributed components
©NEC Laboratories America
5
SRAMD in Operation
An extensible framework for
application topology discovery,
capacity planning and
performance debugging
application X
application Y
application Z
hosting server
An application level
passive resource
monitor with active
summarization
©NEC Laboratories America
6
The SRAMD Controller
collector
passively collects summarization
data from distributed monitors
through UDP.
Aggregator
Visualizer
Collector
Diagnosis
Aggregator
retrieves, validates information
blocks available in the
repository, and organizes them
into per-application groups.
Visualizer
Diagnosis
generates probing requests to
constructs in-memory DOT files
related monitors with operator
[DOT] using outputs from the
interaction to get detailed
aggregator and calls the Grappa
information about application
[Grappa] to visualize application
components and to isolate
topologies enriched with
possible bottlenecks for
component traffic statistics and
causal probabilities. ©NEC Laboratories America performance debugging.
7
The SRAMD Controller snapshot
©NEC Laboratories America
8
The SRAMD Monitor
Periodically probe for CPU, memory and disk
usage of every registered application component.
Passively capture network traffic and associate
captured packets to registered application
components,
Actively calculate useful local application
statistics and dependencies from packet traces
Temporarily perform diagnosis tasks on-demand
to assist performance diagnosis and debugging.
©NEC Laboratories America
9
Application component dependency discovery
r1
A
B
C
r2
D
E
A
B
time line
r3
A
B
A
C
D
B
C
C
E
D
E
D
E
Given two application components A and B in the
system, we want to discover the following real-time
dependency relationships between A and B during a
time interval:
are the input requests of one components caused by another
one (directly or indirectly)? and in what percentage if yes?
©NEC Laboratories America
10
Dealing with transient connections
Local Dependency Discovery (LDD)
Find IDs of peer application components that local ones
talked to in the last report interval. Every SRAMD monitor
sends a list of (LocalPort, AppCompID) to the monitor at
every hosting server that the communicating application
components are running on.
Count the number of requests (including nesting requests)
between application components and calculate the probability
of their causal dependency.
Although requests appear to be nested by accident, if the same
nesting relationship appears with a high probability, it is highly
possible that the nesting represents a causal dependency of
application components.
©NEC Laboratories America
11
Dealing with persistent connections and connectionless
communications
Traffic Regulation based Component Dependency
Discovery (TRCDD)
Divert socket based traffic regulation. Under
investigation.
B->C
A->B
©NEC Laboratories America
12
Evaluation: SRAMD overhead (1)
Experiment setup
SRAM
Controller
Sender
thrulay
UDP Packets
over giga ethernet
SRAM
Monitor
Receiver
thrulayd
Intel 2.8GHz SMP
©NEC Laboratories America
13
Evaluation: SRAMD overhead (2)
CPU overhead of the SRAMD monitor with bulk UDP traffic
using different packet sending rates and packet sizes
©NEC Laboratories America
14
Evaluation: SRAMD overhead (3)
CPU overhead of packet-application matching and sniffing
data rate 100Mb/s and packet size 1500 Bytes.
Association
Probability
1/250k
0.01
0.02
0.03
0.04
CPU Overhead %
4.22
5.53
6.15
7.12
7.86
©NEC Laboratories America
15
Evaluation: LDD algorithm (1)
Experiment setup
C
Clients (httperf)
W1
W2
A1
A2
D1
D2
Logic view
Web Server
(tinyproxy)
Web Server
(tinyproxy)
Application Application
Server I1
Server I2
Database Server
(derby)
Application Application
Server I1
Server I2
Database Server
(derby)
physical view
©NEC Laboratories America
16
Evaluation: LDD algorithm (2)
Causal probability as observed on application server
A1 with different number of concurrent clients
©NEC Laboratories America
17
Conclusions and Future Work
An unobtrusive application-level monitoring and
diagnosis tool that does not make any assumptions
about the traced applications.
Two schemes to infer dependency relationships of
application components in different scenarios.
An initial assessment of the quality and overhead of
application-level packet tracing and an evaluation of
the statistical dependency discovery scheme.
Possible extensions
A kernel module to obtain per-application disk read / write
statistics
Application of data mining techniques to packet traces
©NEC Laboratories America
18
Thanks!
Questions?
©NEC Laboratories America
19
Backup slides
©NEC Laboratories America
20
Calculate Response Time from Traces
WS
a
AS
DS
WS
b
AS
DS
c
t1
t3
t5
t4
t2
WS
AS
DS
©NEC Laboratories America
21