Transcript Document
EtE: Passive End-to-End Internet Service
Performance Monitoring
Yun Fu, Lucy Cherkasova, Wenting Tang, and Amin Vahdat
HPLabs and Duke University
H
1
EtE Monitor
Service provider problems...
A lot of research is done to
optimize web server performance in
order to improve client experience
BUT
???
H
HP.com
Do we know what is the client
experience?
What are the critical latency
components in the end-to-end
response time?
Do we know whether the
improvements on the web server
side indeed improve end-user
experience?
Do we know who the clients are
and where they are located on the
Internet?
2
EtE Monitor
End-to-End Web Service Measurement:
Why Is It Important?
Two main factors impact the response time perceived by the clients:
network latency and server side processing time
Many web sites use complex multi-tiered architecture
A set of new technologies, such as servlets and Javaserver pages,
extend the web servers to generate information-rich dynamic web
pages and to leverage existing business systems
Combination of these technologies could lead to increased server-side
processing time especially in distributed environment
New ad-hoc business metric: web service is considered to be
“unavailable” if its response time exceeds 6 seconds
The service providers need a quantitative analysis of the major latency
components contributing to the response time to achieve given
business and QoS objectives:
H
Invest in more powerful site infrastructure or
Choose a CDN service?
3
EtE Monitor
Why Is It Difficult?
Web pages are complex objects with multiple embedded images
HTTP protocol is stateless: different images are requested by
client browser independently:
•
•
•
•
Some of them are issued concurrently
Some of them use persistent connections
Some of them are obtained from proxies
Some of them are obtained from user browser caches
The response time of a web page observed by the client is the
result of download of all page related images
H
4
EtE Monitor
What Are Currently Available Solutions?
Active periodic probing of a particular web page from a fixed number
of clients across the Internet
Keynote service
–
–
–
Keynote “clients” are not the real web site clients
Allows monitoring of a particular web page
Always pulls the entire page (with all embedded images) from the server
Page instrumentation technique based on downloadable JavaScript
or Java Applet to a client web browser
HP Open View “Web Transaction Observer”
–
–
The measurement starts after download of the main html page
(significant portion of the response time is missing)
Does not provide latency breakdown unless the web server is also
instrumented
eBusiness Assurance (eBA, from Candle Corp)
Quality of Service (QoS) Monitor (IBM, Tivoli)
Research paper by Rajamony and Elnozahy from IBM (Austin) uses
JavaScript to instrument the links to particular pages. Somewhat more limited:
cannot measure directly accessed pages, e.g “index.html”…
H
5
EtE Monitor
What Do We Propose?
EtE monitor
Passive monitoring tool for end-to-end response time measurement
Non-intrusive, does not require any changes or modifications to a site content, or server
side infrastructure, or client browsers
Can be used for sites with static or dynamically generated content
What does it provide?
End-to-end response measurement for all the pages and all the clients accessing the site
Analysis of response components:
• Server processing time portion
• Network transfer time portion
Reports the % of data delivered from the server vs the % of data cached on the client side
Reports the % of aborted page accesses and the related performance reasons
Analysis of the most frequently accessed documents and their response time
Client clustering by ASes (Autonomous Systems)
• Requests (bytes) clustering by ASes and the corresponding response time
And more …..
H
6
EtE Monitor
EtE Monitor Architecture
H
1.
The Network Packet Collector module: collects network
packets using tcpdump and records them in Network Trace
enabling offline analysis.
2.
In the Request-Response Reconstruction module, EtE
monitor reconstructs all TCP connections from the Network
Trace and extracts HTTP transactions (a request with
corresponding response) from the payload. EtE monitor
stores the HTTP header lines and other related information in
the Transaction Log
3.
The Web Page Reconstruction module is responsible for
grouping the request-response pairs into logical web page
accesses and stores them in the Web Page Session Log
4.
The Performance Analysis and Statistics module
summarizes a variety of performance characteristics
integrated across all client accesses
7
EtE Monitor
Request-Response Reconstruction
Module
The TCP connections are rebuilt from Network Trace using:
The client IP address
The client port number
The request (response) TCP sequence number
Within the payload of the rebuilt TCP connections, HTTP transactions
are delimited as defined by HTTP protocol
After reconstructing the HTTP transactions, the monitor records the
HTTP header lines and other information of interest in the Transaction
Log and discards the transaction body
H
8
EtE Monitor
Request-Response Reconstruction
Module (continuation)
Each entry in the Transaction Log includes:
The client IP address
A unique flow ID for TCP connection
The requested URL
The content type
The payload size
The referer field
The via field
Whether the request was aborted
The number of packets resent in the response
The corresponding timestamps
H
9
EtE Monitor
Page Reconstruction Module
To measure the client perceived end-to-end response time for
retrieving a web page, we need to group the objects in a web
page access
We use two-pass heuristic method and statistical filtering mechanism
to reconstruct different client page access
H
First pass: EtE monitor uses the HTTP requests with referer field to
build a Knowledge Base of web pages and their embedded objects
Second pass:
• EtE monitor reconstructs the page accesses without referer field
using the Knowledge Base of web pages and some additional
heuristics
• EtE monitor uses statistical analysis to identify valid access patterns
and filter the accesses grouped incorrectly
10
EtE Monitor
Example
Example of initial html.file request and the following embedded object request
with corresponding referer field:
H
11
EtE Monitor
First Pass: Client Access Table
EtE monitor stores web page access information into a hash table
using client IP addresses:
• If the content type is text/html, a new web page entry is created in the
Web Page Table
• For other types, the request URL is inserted according to its referer field
H
12
EtE Monitor
Building a Knowledge Base of Web Pages
From the Client Access Table, EtE monitor determines the content
template of any given web page as a combined set of all objects that
appear in all access patterns for this page
H
13
EtE Monitor
Second Pass: Reconstruction of Web
Page Accesses
With the help of Knowledge Base, EtE monitor processes the
entire Transaction Log again, and creates a new Client Access
Table
This time it processes the objects without referer field:
EtE monitor consults the Knowledge Base while checking all the page
entries in the Web Page Table to find the page an object might be
embedded in, and appends it at the end of that page
If none of the web page entries in the Web Page Table contains the
object based on the Knowledge Base then
• EtE monitor searches for the page accessed with the same flow ID
• Otherwise it appends the object to the latest accessed page (additionally it
uses configurable think time threshold to delimit web pages)
• If the think time threshold is exceeded, the object is dropped
H
14
EtE Monitor
Identifying Valid Accesses Using
Statistical Analysis of Access Patterns
Although the above two-pass process is very efficient, there
could still be some accesses grouped incorrectly
We use a statistical analysis to better approximate the actual
content of web pages and filter out the incorrectly constructed
accesses
H
15
EtE Monitor
Metrics to Measure Web Service
Performance
Response time metrics
End-to-end response time observed by the client for a web page download
Latency breakdown: server related and network related portions
Connection set-up time
Metrics evaluating web service caching efficiency
Server file hit ratio
Server byte hit ratio
Aborted pages and QoS
Why the accesses are aborted:
• Bad performance?
• Client browsing patterns?
H
16
EtE Monitor
Example: 1-object page retrieval
(basic timestamps)
H
17
EtE Monitor
Latency Breakdown for Multiple Concurrent
Connections: Server Processing vs Network
H
18
EtE Monitor
Metrics Evaluating Web Service Caching
Efficiency
Original web page url1 (page template):
• 10 objects,
• 100 Kbytes.
Access to url1: Acc1
• 5 objects,
• 70 Kbytes.
Access to url1: Acc2
• 7 objects,
• 80 Kbytes.
FileHitRatio(Acc1) = 5/10,
ByteHitRatio(Acc1)=70/100,
FileHitRatio(Acc1) = 7/10,
ByteHitRatio(Acc1)=80/100,
ServerFileHitRatio(url1) = (5/10 + 7/10) / 2,
ServerByteHitRatio(url1) = (70/100 + 80/100) / 2,
60%
75%
The smaller is the better!
H
19
EtE Monitor
50%
70%
70%
80%
Case Studies
HPL external site (HPL)
From July12, 2001 to August 11, 2001
The site has mostly static content
Open View Support site (Support)
H
From October 11, 2001 to October 25, 2001
The site uses JavaServer Pages technology for dynamic page
generation
20
EtE Monitor
Sites Statistics At-A-Glance
H
21
EtE Monitor
HPLabs Site Case Study
HPL site during a month (accesses to index.html page)
• Figure shows the EtE time to index.html on hourly scale during a month
• In spite of overall good performance, hourly averages reflect significant
variation in response time observed by the clients
• Periods of increased latency correspond to weekends!
What is the problem?
H
22
EtE Monitor
Understanding the Client Population
• Resent packets typically reflect network congestion or network–related bottlenecks
• Periods of increased resent packets correspond to weekends
• The explanation: the client population significantly “changes” during weekends
• Most of the clients access the web site from home via low-bandwidth connections
It is extremely important to understand the client population!
Active probing approach using artificial clients (with typically “good” connection to
the Internet) lacks this information
H
23
EtE Monitor
Performance Analysis of Accesses to
itanium.html
First Figure:
• Number of accesses to itanium.html page
• From being the most popular page in the beginning of the study, it gets to
the 7th place after 10 days
Second Figure
• Percentage of accesses above 6 sec to itanium.html page
• Question: why is the latency observed by the clients getting higher?
H
24
EtE Monitor
Caching Efficiency of the Page
When the page is getting less popular, “colder”, the number of objects and bytes retrieved
from the original server increases significantly: i.e. fewer network caches store the page
related objects
It translates into increased response time observed by the client
Active probing technique cannot reflect the caching efficiency of the site
The tools based on instrumentation technique cannot provide insight into this problem either
H
25
EtE Monitor
Clients Clustering by ASes
• Clients grouped by ASes show a heavy tail distribution
• These figures allow us to see large client clusters and their corresponding
end-to-end response time
• The ability of EtE monitor to measure performance metrics for a certain group of clients
is particularly attractive for Service Providers to validate required SLAs
H
26
EtE Monitor
Validation Experiments
We performed two groups of experiments
To validate the accuracy of EtE measurements
To evaluate the page access reconstruction power of EtE
• How dependent are the reconstruction results on the
existence of referer field information?
The results are encouraging:
EtE provides a very close approximation of the response time
EtE monitor does a good job of page reconstruction even when
the requests do not have any referer field!
However, two-pass heuristic method and statistical filtering mechanism
we use to reconstruct page accesses increase the number of
reconstructed pages by about 20-30%
H
27
EtE Monitor
Limitations
EtE monitor is not appropriate for sites that encrypt much of their data
(e.g., via SSL)
EtE monitor is not appropriate for sites that “outsource” most of their
content to CDNs
Similar limitation applies to pages with “mixed” content: if a portion of
the page is served from some other remote sites. In this case, EtE will
measure only response time for local site content
For clients coming behind the proxy, EtE monitor measures the
response time as observed from the proxy
Since the tool is based on heuristics and statistics to reconstruct the
page content, the best results are obtained when the sample size is
large enough
Dynamically generated content creates additional challenges for EtE
monitor (typical for other analysis tools too): a configuration file provided
by a site administrator is needed
H
28
EtE Monitor
Conclusion and Future Work
Understanding performance characteristics of Internet
services is critical to evolving and engineering the web
services to match:
Changing demand levels
Client populations
Global network characteristics
EtE monitor, based on a novel technique, offers a number of
benefits unavailable from other tools and by other means.
EtE monitor can be extended to work in “almost real-time” to
provide timely information about web services and their
performance.
Extended analysis on client clustering will provide an
opportunity to use the information from EtE monitor for
intelligent decision making on service placement and
service optimization
H
29
EtE Monitor
Acknowledgements
The tool and the study would not be possible without a generous
help of our HP colleagues:
HPLabs team:
• Mike Rodriquez, Annabelle Eseo, and Peter Haddad
HPO, Managed Web Services:
• Guy Mathews
OpenView team:
• Steve Yonkaitis, Bob Husted, Norm Follett, and Don Reab
US support team
• Claude Villermain, Vincent Rabiller, Pierre-Emmanuel Delforge
Their help is highly appreciated !
H
30
EtE Monitor