presentation

download report

Transcript presentation

Web Caching
By
Amisha Thakkar
Alpa Shah
Web Caching
1
Overview
•
•
•
•
•
•
What is a Web Cache ?
Caching Terminology
Why use a cache?
Disadvantages of Web Cache
Other Features
Caching Rules
Web Caching
2
Overview
•
•
•
•
•
Caching Architectures
Comparison of Architectures
Cache Deployment Scheme
Client Side Cache Cooperation
Active Caching
Web Caching
3
What is a Web Cache ?
• Cache is a place where temporary copies of
objects are stored
• Cached information is generally closer to
the requester than the permanent
information is
• Objects -HTML pages, images, files
Web Caching
4
What is a Web Cache?
Web Caching
5
Caching Terminology
• Client - An application program that
establishes connections for sending requests
• Server- An application program that accepts
connection to service requests by sending
back responses
• Origin Server-The server on which the
given resource resides or is to be created
Web Caching
6
Caching Terminology
• Proxy- An intermediary program which acts
both as a server and a client which requests
on behalf of the other clients
• Proxy is not necessarily a cache
* Proxy does not always cache the replies
passing through it
* It may be used on a firewall to monitor
accesses
Web Caching
7
Why use a cache ?
•
•
•
•
To reduce latency
To reduce network traffic
Load on origin servers will be reduced
Can isolate end users from network failures
Web Caching
8
Disadvantages of Web cache
• With cached data there is always a chance
of receiving stale information
• Content providers lose access counts when
cache hits are served
• Manual configuration is often required
• Operation of cache requires additional
resources
• In some situations the cache can be a single
9
point of failure Web Caching
Other Features
• Depending on the perspective the following
may be good or bad
* Cache requests on behalf of clients ; the
servers never see the clients IP addresses
* Cache provides an easy opportunity to
monitor and analyze browsing activities
* Cache can be used to block certain
requests
Web Caching
10
Types of Web Caches
• Proxy caches
* Serve a large number of users
* Large corporations and ISP’s often set
them up on the firewalls
* They are type of shared caches
• Browser caches
* Use a section of the computer’s hard disk
to store objects that you have seen
Web Caching
11
Caching Rules
• Rules on which caches work * Some of them set in protocols
* Some are set by cache administrator
• Most common rules :
* If the object is authenticated or secure it
won’t be cached
* Object’s headers indicate whether the
object is cacheable or not
Web Caching
12
Caching Rules
* Object is considered fresh when  It has an expiry time or other age
controlling directive set & is still
within the fresh period
 If the browser cache has already seen
the object & has been set to check
once a session
Web Caching
13
Caching Rules
 If a proxy cache has seen the object
recently & it was modified relatively
long ago
Fresh documents are served directly from the
cache without checking with the origin server
Web Caching
14
Caching Rules
* For a stale object , the origin server will
be asked to validate the object , or tell the
cache whether the copy is still good
* The most common validator is the time
that the object was last changed
Web Caching
15
Caching Architectures
Hierarchical /Simple Cache
• Browser-cache interaction is same as
browser -host interaction, i.e. a TCP
connection is made & item requested
• If not found send request to parent cache
• Hierarchy built up - each level serving
indirectly a wider community of users
Web Caching
16
Caching Architectures
Hierarchical /Simple Cache
National Network
National Network
Regional Network
Regional Network
Institutional Network
Institutional Network
Institutional Network
Web Caching
Institutional Network
17
Caching Architectures
Distributed /Co-operating Cache
• Decentralized(Cache Mesh)
• Multiple servers cooperate in such a way
that they share their individual caches to
create a large distributed one
• Simply put caching proxies communicating
with each other to serve different users
• On a cache miss, it checks with other proxy
caches before contacting the origin server
Web Caching
18
Caching Architectures
Distributed /Co-operating Cache
• Caches communicate amongst themselves
using a protocol like ICP (Internet Cache
Protocol)
• Caches can be selected on the basis of
* Distances from the end user
* Specialize in particular URLs(location
hint).
Web Caching
19
Caching Architectures
Distributed /Co-operating Cache
• Why Distributed - limitations of hierarchy
* Width of cache in hierarchy: caches at
same level are inaccessible to each other
* LRU policy implies sufficient disk space
* Cost in replication of disk storage
* Amount of disk space reqd. depends on
number of users served & breadth of
reading
Web Caching
20
Caching Architectures
Distributed /Co-operating Cache
More the users  more disk space higher
in the hierarchy
* Exponential growth of number of
documents on WWW
Web Caching
21
Caching Architectures
Distributed /Co-operating Cache
• Caching close to user - more effective,
higher the level lower the efficiency
• Can be created for load balancing
• Most effective when serving a community
of interests
Web Caching
22
Caching Architectures
Distributed /Co-operating Cache
• First an UDP packet sent for cache inquiry.
• Cache selection decision is determined by
RTT
• Potential problem -network congestion
because of UDP
• In favor* UDP exchange :2 IP packets, TCP :at least
8 packets
Web Caching
23
Caching Architectures
Distributed /Co-operating Cache
* UDP reply from cache can indicate
a. Presence
b. Speed
c. Availability of requested documents
Web Caching
24
Caching Architectures
Hybrid Cache
Note: ICP
Web Caching
25
Comparison of Architectures
• Hierarchical : caches placed at multiple
levels
• Distributed :caches only at bottom level; no
intermediate caches
Web Caching
26
Comparison of Architectures
• Performance parameters.
 Connection time (Tc)is defined as the
time since the document is requested & first
data byte is received
 Transmission time (Tt)is defined as the
time taken to transmit the document
 Total latency = Tc +Tt .
 Bandwidth usage
Web Caching
27
Comparison of Architectures
• Fig 3 -Connection time for different
document’s popularity
Web Caching
28
Comparison of Architectures
• For unpopular documents high connection
time
• No of requests increases  avg..
connection time decreases
• For extremely popular documents
distributed has smaller connection times
Web Caching
29
Comparison of Architectures
• Fig 4 Network traffic generated
Web Caching
30
Comparison of Architectures
• On lower levels, distributed caching
practically double the network bandwidth
usage
• Around the root node in national network,
the network traffic is reduced to half
• Distributed caching uses all possible
network shortcuts between institutional
caches, generating more traffic in the less
congested low network levels
Web Caching
31
Comparison of Architectures
• Fig 5 a, Not congested national network
Web Caching
32
Comparison of Architectures
• The only bottleneck on the path from the
client to the origin server is the international
path. Hence transmission times are similar
for both
Web Caching
33
Comparison of Architectures
• Fig 5 b Congested National Networks
Web Caching
34
Comparison of Architectures
• Both have higher transmission times
compared to the previous case
• Distributed caching gives shorter
transmission times than hierarchical
because many requests travel through
lower network levels
Web Caching
35
Comparison of Architectures
• Fig 6 Average total latency
Web Caching
36
Comparison of Architectures
• For large documents transmission time is
more relevant than connection times
• Hierarchical caching gives lower latencies
for documents smaller 200 KB due to lower
connection times
• Distributed caching gives lower latencies
for larger documents due to lower
transmission times
Web Caching
37
Comparison of Architectures
• The size- threshold depends on the degree
of congestion in national network
• Higher the congestion, lower is the sizethreshold
• Distributed caching has lower latencies
than hierarchical
Web Caching
38
Comparison of Architectures
With Hybrid Scheme
• Fig 7 connection time
Web Caching
39
Comparison of Architectures
With Hybrid Scheme
• Fig 8.
Web Caching
40
Comparison of Architectures
With Hybrid Scheme
• In the hybrid scheme if the number of
cooperating caches (kc) is very small , the
connection time is high
• When number of cooperating caches
increases, the connection times decreases up
to a minimum
• If the number increases over the threshold ,
the connection time increases very fast
Web Caching
41
Comparison of Architectures
With Hybrid Scheme
• Fig 9 Transmission time
Web Caching
42
Comparison of Architectures
With Hybrid Scheme
• For un-congested n/w the no.of coop caches
(kt) at every level hardly influences Tt
• If no. of coop caches is very small , high Tt &
vice -versa
• If the no increases above the threshold the Tt
increases
• Optimum no. of caches depends on the no of
caches reachable avoiding congested links
Web Caching
43
Comparison of Architectures
With Hybrid Scheme
• Fig 10
Web Caching
44
Comparison of Architectures
With Hybrid Scheme
• Fig 11 total latency
Web Caching
45
Comparison of Architectures
With Hybrid Scheme
• The no. of coop caches(kopt) at every level
depend on the document size to minimize the
total latency
• For small documents the optimum no. is
closer to kc
• For large documents the the optimum no. is
closer to kt
Web Caching
46
Comparison of Architectures
With Hybrid Scheme
• Fig 12
Web Caching
47
Comparison of Architectures
With Hybrid Scheme
• For any document the optimum kopt that
minimizes the total latency is such that
kc koptkt
Web Caching
48
Cache Deployment Schemes
• Proxy caching
Web Caching
49
Cache Deployment Schemes
• Advantages
 Clients point all web requests directly to
cache : no effect on non web traffic
Cost of upgrading h/w & s/w is limited
 Administration on caches limited to
basic configuration
Web Caching
50
Cache Deployment Schemes
• Disadvantages
Every browser must be configured to
point to the cache
Each client can hit only one cache
Single point of failure
 Unnecessary duplication of data
 Bottleneck in cases where content is
otherwise available
in LAN
Web Caching
51
Cache Deployment Schemes
• Transparent Proxy caching
Web Caching
52
Cache Deployment Schemes
• Advantages
No browser configuration
Cost of upgrading h/w & s/w is limited
No administration of intermediate
systems required
Web Caching
53
Cache Deployment Schemes
• Disadvantages
 Each client can hit only one cache
If cache goes down internet as well as
intranet access lost
 Negative impact on non web traffic
 Cache has to route non web traffic
 Routing ,packet examination & n/w addr.
translation steal CPU cycles from the main
cache serving function
Web Caching
54
Cache Deployment Schemes
• Transparent proxy caching with web cache
redirection.
Web Caching
55
Cache Deployment Schemes
• Advantages
Switch/ router examines the packets
Minimal impact on non-web traffic
Frees up CPU cycles for the web cache
 Allows client load to be dynamically
spread over multiple caches
 Eliminates single point of failure
especially if redundant redirectors are used
Web Caching
56
Cache Deployment Schemes
• Disadvantages
Additional intermediate systems must be
deployed
 Increases expense
Web Caching
57
Client Side Cache Cooperation.
Web Caching
58
Active Caching
• Current problem unable to cache dynamic
documents
• Caching Dynamic contents on the web
using active web
• Cache applet is server supplied code that is
attached with an URL , or collection of
URLs
• Applet is written in platform independent
language
Web Caching
59
Active Caching
• On a user request the applet is invoked by
the cache
• The applet decides what is to be sent to the
user
• Other functions of the applet* Logging user accesses
* Checking access permissions
* Rotating advertising banners
Web Caching
60
Active Caching
• The proxy has the freedom to not invoke the
applet but send the request to the server
• Proxy promises to not send back a cached
copy without invoking the applet
• If applet too huge ,send request to server
• Proxy not obligated to cache any applet , in
that case agrees to not service the request
for that document
Web Caching
61
Active Caching
• Proxy can devote resources to the applets
associated with the hottest URLs to its user
• Proxy that receives the request is typically
the proxy closest to the user , the scheme
automatically migrates the server
processing to the nodes that are close to
users
• Thus increasing the scalability of web based
services
Web Caching
62