WebPerformanceCharacterization
Download
Report
Transcript WebPerformanceCharacterization
Web Performance
Web Performance
• Why do we care?
• What is performance?
– User Experience
– Web Server
– Network
• How can we tell how we are doing?
• What good is it?
Why do we care?
“Twenty-eight percent of shoppers who have suffered failed performance attempts
said they stopped shopping at the web site where they had problems, and six
percent said they stopped buying at that particular company’s off-line store.”
(Boston Consulting Group, quoted in Infoworld / Computerworld 3/00)
“It takes only 8 ½ seconds for half of the subjects to [give up]” (Peter Bickford, “Worth
the Wait?” in Netscape/View Source Magazine 10/97)
“Perhaps as much as $4.35 billion in e-commerce sales in the U.S. may be lost each
year due to unacceptable download speeds and resulting user bailout behaviors.”
(Zona Research 4/99)
“Fifty-eight percent of online customers surveyed indicated quick download time as a
key factor in determining whether they would return to a web site.” (Forrester
Research 1/99)
“One of the top three reasons cited by online shoppers for dissatisfaction with a web
site is slow site performance.” (Jupiter Communications / NFO Worldwide 1/99)
“At one site, the abandonment rate fell from 30% to 6-8% because of a one second
improvement in load time.” (Zona Research 4/99)
Effects of Poor Performance
• Lost prospective customer
– If the site didn’t work, or took too long, your prospect may not
return for a long time – if ever.
• Lost sale
– If your competitor’s site was up and responsive, you may have
lost a single sale.
• Lost customer
– If this happens repeatedly, you’ve lost a customer,
– AND the customer may stop going to associated web sites and
physical locations!
• Lost reputation
– People talk about poor performance; word spreads.
– People are looking for a few good sites that they can trust!
What is performance?
• User Experience
– How fast does the page load?
– How available is the site?
• Web Server
– How many requests/second can be served?
• throughput
– What is the effect of web proxies?
• Network
– What is the network performance?
• Latency, bandwidth
Network Performance
• At the network level, performance can be
measured in terms of:
– Latency
• How long it takes a message to travel from one end of
the network to the other
– Bandwidth
• The number of bits that can be transmitted over the
network in a certain period of time
latency
bandwidth
Network Performance Measures
• Overhead: latency of interface vs. Latency: network
Universal Performance Metrics
Sender
Sender
Overhead
Transmission time
(size ÷ bandwidth)
(processor
busy)
Time of
Flight
Transmission time
(size ÷ bandwidth)
Receiver
Overhead
Receiver
Transport Latency
(processor
busy)
Total Latency
Total Latency = Sender Overhead + Time of Flight +
Message Size ÷ BW + Receiver Overhead
Includes header/trailer in BW calculation?
Total Latency Example
• 1000 Mbit/sec., sending overhead of 80 µsec
& receiving overhead of 100 µsec.
• a 10000 byte message (including the header),
allows 10000 bytes in a single message
• 3 situations: distance 1000 km v. 0.5 km v.
0.01
• Speed of light ~ 300,000 km/sec (1/2 in
media)
• Latency0.01km =
• Latency0.01km =
• Latency1000km =
Total Latency Example
• 1000 Mbit/sec., sending overhead of 80 µsec & receiving
overhead of 100 µsec.
• a 10000 byte message (including the header), allows
10000 bytes in a single message
• 3 situations: distance 1000 km v. 0.5 km v. 0.01
• Speed of light ~ 300,000 km/sec
• Latency0.01km = 80 + 0.01km / (50% x 300,000)
+ 10000 x 8 / 1000 + 100 = 260 µsec
• Latency0.5km = 80 + 0.5km / (50% x 300,000)
+ 10000 x 8 / 1000 + 100 = 263 µsec
• Latency1000km = 80 + 1000 km / (50% x 300,000)
+ 10000 x 8 / 1000 + 100 = 6931 µsec
• Long time of flight => complex WAN protocol
So What?
• Long distance = long msg transmission time
– Servers should be as close as possible to clients
• Low bandwidth = long msg transmission time
– Servers should have high bandwidth links
• High Overhead = long msg transmission time
– Reduce the communication overhead as much as
possible
– Fast TCP implementation
– More memory
The Internet
DNS
Cache
Access
Routers Devices
Routers
Access
Provider
Internet Browser
The Internet
Web
Server
Peering
Point
Routers
Routers
PSInet
Digex
BBN
Verio
UUnet
Sprint
GTE Worldcom
Mindspring
The Internet & Performance
• Routers
– Read packet headers and
send along
– Each hop adds delay
Routers
Routers
Routers
Routers
• ISP Peering
– Congestion may occur at
peering points
– End-to-end route in one
direction my differ from
route in the other direction
Peering
Point
ISP “A”
Routers
ISP
“B”
Routers
The Internet & Performance
• Network Connection
– Performance of connection to ISP is
generally a limiting factor
• ISP Services
– Domain Name Service (DNS)
• Each time a request is made, the server
name must be translated into an IP
address
• Name Caching
– DNS server retains addresses until
“time to live” has passed
– Client machine may also cache names
for a short period of time
– Web Proxies
• Cache most frequently accessed pages
• Zipf’s law
DNS
Cache
Routers
Routers
Access
Devices
Access
Provider
Access
Provider
Internet
Browser
Web Server Performance
• Throughput: Requests per second
• How do you measure?
– Live
• May be too late….
– Offline
• Replay logs - does the past characterize the future?
• Synthetic Workload - does it characterize reality?
• “...factoring out I/O, the primary determinant to
server performance is the concurrency strategy.”
– -- JAWS: Understanding High Performance Web Systems
Applications of Workload Models
• Identify Performance Problems
– Problems may only occur under high load
• Benchmark Web Components
– Deployment decisions
– Evaluate new features
• Capacity Planning
– Determine network, memory, disk and
clustering needs
Web Workload Characterization
• Based on the results of
numerous studies
• Key properties
– HTTP Message
Characteristics
• Several request methods
and response codes
Category
Parameter
Protocol
Request Method
Response Code
Resource
Content type
Resource size
Response size
Popularity
Modification freqency
Temporal Locality
# embedded resources
Users
Session interarrival times
# clicks per session
Request interarrival times
– Resource Characteristics
• Diverse content-type,
size, popularity, and
modification frequency
– User Behavior
• User browsing habits
significantly affect
workload
Parameter Characterization
• Associate each parameter with
quantitative values
• Statistics
– Mean, median, mode
• OK for parameters that don’t vary much
– Probability Distributions
• Capture how a parameter varies over a wide
range of values
Probability Distribution
• Every random variable gives rise
to a probability distribution
• Probability Density Function
– Assigns a probability to every
interval of the real numbers
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
• Cumulative Distribution Function
– Describes the probability
distribution of a real-valued
random variable X
– F(x) = P(X <= x)
– The probability that a random
variable will be less than or
equal to x
• In the following slides, we will
show the CDF of commonly used
distributions
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Poisson Distribution
• F(x) = (e-k)/k!
• Used to model the
time between
independent events
that happen at a
constant average rate
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
• The number of times a web server is accessed per
minute is a Poisson distribution
– For instance, the number of edits per hour recorded on
Wikipedia's Recent Changes page follows an approximately
Poisson distribution.
Exponential Distribution
• F(x) = e-x
• Used to model the
time until the next
occurrence of an event
in a Poisson process
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
• Session interarrival times are exponential
– Time between the start of one user session and the start
of the next user session
Pareto Distribution
• F(x) = (x/a)-k
• k is shape, a is
minimum value for x
• Power law
• 80-20 rule
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
– 20% of the sample is
responsible for 80% of
the results
• Response sizes, Resource sizes, Number of
embedded images, Request interarrival times
• Often used to model self-similar patterns
Probability Distributions in
Web Workload Models
Distribution
Workload Parameter
Exponential
Session interarrival times
Pareto
Response Sizes
Resource Sizes
Number of embedded images
Request interarrival times
Lognormal
Response sizes
Resource sizes
Temporal Locality
Zipf-like
Resource Popularity
Probability Distribution
Conversion
• Most languages have random number
library functions
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
– Uniform distribution
• Must convert from uniform distribution
to the chosen distribution
• Given: the cumulative distribution function, CDF,
of the chosen distribution
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
– 1. Generate a random number; call this number p
– 2. Compute x such that CDF(x) = p
• Determine the inverse of the CDF
– 3. x is the random number you use
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Inverse of the CDF
For the exponential
distribution
User Experience
• 8 - second rule
– Probably 4 seconds
today
• Typical page
– Multiple requests
• Example
– Page has 20
elements
– Server must be
capable of 5
requests/second
KEYNOTE
User Experience
Performance Tips
•
•
•
•
•
•
•
•
•
Check for web standards compliance
Minimize the use of JavaScript and style sheets
Turn off reverse DNS lookups on the server
Get more memory
Index your database tables
Make fewer database queries
Decrease the number of page components
Decrease the size of each component
Minimize Perceived Delay
– Give the viewer something to look at while the page is
loading
Website Analysis
• Websites quickly become large and
difficult to test and optimize
• Use tools
– Workload generators
• Webstone
• JMeter
– Site analysis - log files
• Webalizer
JMeter
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.