Transcript Web Caching

An Overview of Proxy
Caching Algorithms
Haifeng Wang
Web Caching
• Internet traffic
• Load on web servers
• Access delay
web caching provides an efficient remedy to the
latency problem and network traffic by bringing
documents closer to clients.
Web Caching Location
• Client Caching
• Server Caching
• Proxy Caching ---- widely used form
There are many benefit of proxy caching. It
reduces network traffic, average latency of
fetching Web documents, and the load on
busy Web servers.
Web Caching Location
Web Caching Replacement Algorithm
• effective use of caching, an informative
decision has to be made to evict document
from the cache in case of cache saturation.
• key to the effectiveness of proxy caches
that can yield high hit ratio.
• differ to page replacement. Why?
Characteristic of Web Caching
• Web caching is variable-size caching
• The cost of retrieving missed Web
documents from their original servers
depends on many factors.
• Web documents are frequently updated
• Zipf-like popularity of web documents
Key Parameters
There are four key parameters that most
proxy replacement policies considering in
design
•
•
•
•
Frequency Information
Recency Information
Document size
Network cost
Classification of caching policies according
traffic information consideration
Frequency information
Recency information
LFU
Hyper-G
LRU
CERA
LRV
SLRU
GDS
Log2(SIZE)
LRU-MIN
LRU-threshold
Hybrid
SIZE
Replacement Algorithm(1)
1) LRU (Least-Recently-Used)
LRU evicts the least recently accessed document first
2) LRU-Threshold
It works the same way as LRU except that documents
that are larger than a given threshold are never cached.
3) LRU-MIN
LRU-MIN gives preference to small-size documents to
stay in the cache.
Replacement Algorithm(2)
4) LFU (Lease-Frequently-Used)
LFU evicts the least frequently accessed document first
5) Hyper-G
Hyper-G is an extension of the LFU policy, where ties
are broken according to the last access time.
6) LLF
LLF considers the document download time as its
primary and the document with the lowest download
time is evicted first
Replacement Algorithm(3)
7) Size
Size evicts the largest documents first
8) Log2-Size
Log2-Size consider document size as the primary key
according to [log2(size)], large documents are evicted
first, using the last access time as a secondary key.
Replacement Algorithm(4)
9) GDS (GreedyDual-Size)
The GDS algorithm associates a value H with
each cached page p. H is set to cost/size upon
an access to a document. When a replacement
needs to be made, the page with the lowest H
value, minH, is replaced, and then all pages
reduce their H values by minH. If a page is
accessed, its H value is restored to cost/size
upon an access to a document.
Replacement Algorithm(5)
10) CERA (Cost-Effective-Replacement-Algorithm)
CERA use a benefit value (BV) which is assigned to each
object to represent its importance in the cache. When the
cache is full, the object with the lowest BV is replaced.
BV = (Cost / Size) * Pr + Age
11) Hybrid
Hybrid is aimed at reducing the total latency. A function is
computed for each document which is designed to capture
the utility of retaining a given document in the cache. The
document with the smallest function value is then evicted.
Replacement Algorithm(6)
12) LRV (Lowest-Relative-Value)
LRV includes the cost and size in the calculation of a
value that estimates the utility of keeping a document in
the cache. It evicts the document with the lowest value.
The calculation of the value is based on extensive
empirical analysis of trace data.
13) SLRU (Size-Adjust LRU)
Document is ordered according to ratio calculated
according frequency, cost and size, it evicts the
document with the lowest ratio first.
Performance Issue
• No conclusion on which algorithm a proxy
should use.
• Document size is significance and need to
incorporate it in the design of replacement
policy.
• Good algorithm adjusts dynamically to
changes in the workload characteristics.