Transcript amisha

Web Caching
By
Amisha Thakkar
Web Caching
1
Overview
•
•
•
•
•
•
What is a Web Cache ?
Caching Terminology
Why use a cache?
Disadvantages of Web Cache
Other Features
Caching Rules
Web Caching
2
Overview
•
•
•
•
•
Caching Architectures
Cache Deployment Scheme
Active Caching
Real World Solution
Research Areas
Web Caching
3
What is a Web Cache ?
• Cache is a place where temporary copies of
objects are stored
• Cached information is generally closer to
the requester than the permanent
information is
• Objects -HTML pages, images, files
Web Caching
4
What is a Web Cache?
Web Caching
5
Caching Terminology
• Client - An application program that
establishes connections for sending requests
• Server- An application program that accepts
connection to service requests by sending
back responses
• Origin Server-The server on which the
given resource resides or is to be created
Web Caching
6
Caching Terminology
• Proxy- An intermediary program which acts
both as a server and a client which requests
on behalf of the other clients
• Proxy is not necessarily a cache
* Proxy does not always cache the replies
passing through it
* It may be used on a firewall to monitor
accesses
Web Caching
7
Why use a cache ?
•
•
•
•
To reduce latency
To reduce network traffic
Load on origin servers will be reduced
Can isolate end users from network failures
Web Caching
8
Disadvantages of Web cache
• With cached data there is always a chance
of receiving stale information
• Content providers lose access counts when
cache hits are served
• Manual configuration is often required
• Operation of cache requires additional
resources
• In some situations the cache can be a single
9
point of failure Web Caching
Other Features
• Depending on the perspective the following
may be good or bad
* Cache requests on behalf of clients ; the
servers never see the clients IP addresses
* Cache provides an easy opportunity to
monitor and analyze browsing activities
* Cache can be used to block certain
requests
Web Caching
10
Types of Web Caches
• Proxy caches
* Serve a large number of users
* Large corporations and ISP’s often set
them up on the firewalls
* They are type of shared caches
• Browser caches
* Use a section of the computer’s hard disk
to store objects that you have seen
Web Caching
11
Caching Rules
• Rules on which caches work * Some of them set in protocols
* Some are set by cache administrator
• Most common rules :
* If the object is authenticated or secure it
won’t be cached
* Object’s headers indicate whether the
object is cacheable or not
Web Caching
12
Caching Rules
* Object is considered fresh when  It has an expiry time or other age
controlling directive set & is still
within the fresh period
 If the browser cache has already seen
the object & has been set to check
once a session
Web Caching
13
Caching Rules
 If a proxy cache has seen the object
recently & it was modified relatively
long ago
Fresh documents are served directly from the
cache without checking with the origin server
Web Caching
14
Caching Rules
* For a stale object , the origin server will
be asked to validate the object , or tell the
cache whether the copy is still good
* The most common validator is the time
that the object was last changed
Web Caching
15
Caching Architectures
Hierarchical /Simple Cache
• Browser-cache interaction is same as
browser -host interaction, i.e. a TCP
connection is made & item requested
• If not found send request to parent cache
• Hierarchy built up - each level serving
indirectly a wider community of users
Web Caching
16
Caching Architectures
Hierarchical /Simple Cache
National Network
National Network
Regional Network
Regional Network
Institutional Network
Institutional Network
Institutional Network
Web Caching
Institutional Network
17
Caching Architectures
Distributed /Co-operating Cache
• Decentralized(Cache Mesh)
• Multiple servers cooperate in such a way
that they share their individual caches to
create a large distributed one
• Simply put caching proxies communicating
with each other to serve different users
• On a cache miss, it checks with other proxy
caches before contacting the origin server
Web Caching
18
Caching Architectures
Distributed /Co-operating Cache
• Caches communicate amongst themselves
using a protocol like ICP (Internet Cache
Protocol)
• Caches can be selected on the basis of
* Distances from the end user
* Specialize in particular URLs(location
hint).
Web Caching
19
Caching Architectures
Distributed /Co-operating Cache
• Why Distributed - limitations of hierarchy
* Width of cache in hierarchy: caches at
same level are inaccessible to each other
* LRU policy implies sufficient disk space
* Cost in replication of disk storage
* Amount of disk space reqd. depends on
number of users served & breadth of
reading
Web Caching
20
Caching Architectures
Distributed /Co-operating Cache
More the users  more disk space higher
in the hierarchy
* Exponential growth of number of
documents on WWW
Web Caching
21
Caching Architectures
Distributed /Co-operating Cache
• Caching close to user - more effective,
higher the level lower the efficiency
• Can be created for load balancing
• Most effective when serving a community
of interests
Web Caching
22
Caching Architectures
Distributed /Co-operating Cache
• First an UDP packet sent for cache inquiry.
• Cache selection decision is determined by
RTT
• Potential problem -network congestion
because of UDP
• In favor* UDP exchange :2 IP packets, TCP :at least
8 packets
Web Caching
23
Caching Architectures
Distributed /Co-operating Cache
* UDP reply from cache can indicate
a. Presence
b. Speed
c. Availability of requested documents
Web Caching
24
Caching Architectures
Hybrid Cache
Note: ICP
Web Caching
25
Cache Deployment Schemes
• Proxy caching
Web Caching
26
Cache Deployment Schemes
• Advantages
 Clients point all web requests directly to
cache : no effect on non web traffic
Cost of upgrading h/w & s/w is limited
 Administration on caches limited to
basic configuration
Web Caching
27
Cache Deployment Schemes
• Disadvantages
Every browser must be configured to
point to the cache
Each client can hit only one cache
Single point of failure
 Unnecessary duplication of data
 Bottleneck in cases where content is
otherwise available
in LAN
Web Caching
28
Cache Deployment Schemes
• Transparent Proxy caching
Web Caching
29
Cache Deployment Schemes
• Advantages
No browser configuration
Cost of upgrading h/w & s/w is limited
No administration of intermediate
systems required
Web Caching
30
Cache Deployment Schemes
• Disadvantages
 Each client can hit only one cache
If cache goes down internet as well as
intranet access lost
 Negative impact on non web traffic
 Cache has to route non web traffic
 Routing ,packet examination & n/w addr.
translation steal CPU cycles from the main
cache serving function
Web Caching
31
Cache Deployment Schemes
• Transparent proxy caching with web cache
redirection.
Web Caching
32
Cache Deployment Schemes
• Advantages
Switch/ router examines the packets
Minimal impact on non-web traffic
Frees up CPU cycles for the web cache
 Allows client load to be dynamically
spread over multiple caches
 Eliminates single point of failure
especially if redundant redirectors are used
Web Caching
33
Cache Deployment Schemes
• Disadvantages
Additional intermediate systems must be
deployed
 Increases expense
Web Caching
34
Active Caching
• Current problem unable to cache dynamic
documents
• Cache applet is server supplied code that is
attached with an URL , or collection of
URLs
• Applet is written in platform independent
language
Web Caching
35
Active Caching
• On a user request the applet is invoked by
the cache
• The applet decides what is to be sent to the
user * Giving the proxy a new document to send
back to the user
* Allowing the proxy to use the cached copy
* Instructing the proxy to send the request to
the web server
Web Caching
36
Active Caching
• Functions of the applet* Logging user accesses
* Checking access permissions
* Client-Specific Information Distribution
Web Caching
37
Active Caching
• The proxy has the freedom to not invoke the
applet but send the request to the server
• Proxy promises to not send back a cached
copy without invoking the applet
• If applet too huge ,send request to server
• Proxy not obligated to cache any applet , in
that case agrees to not service the request
for that document
Web Caching
38
Active Caching
• Proxy can devote resources to the applets
associated with the hottest URLs to its user
• Proxy that receives the request is typically
the proxy closest to the user , the scheme
automatically migrates the server
processing to the nodes that are close to
users
• Thus increasing the scalability of web based
services
Web Caching
39
Real World Solution
• CacheFlow has successfully implemented
caching solutions for e-commerce
• Provide client-side & server-side solution
• On the client-side the cache is placed
between the network & the firewall i.e. in
front of the firewall & the web server
• Request for dynamic content or secure
transactions are passed to origin servers for
processing
Web Caching
40
Real World Solution
• This offers several advantages* Offloads load from servers & firewalls
* Scale the network to handle more customer
transactions & large traffic spikes
* Reduce capital & operating costs
* Reduces the security risks of users accessing
servers that are inside the firewalls
Web Caching
41
Real World Solution
• They have developed an operating
system:CacheOS
• Main features related to caching : Adaptive
Asynchronous Refresh , Object Pipelining
• Variables tracked for AAR :
* Frequency of request (model of use)
* Frequency of change (model of change)
* Time cost to retrieve object
Web Caching
42
Real World Solution
• CacheOS then automatically determines
refresh pattern
• 90% hit rate
• Some facts :
* As many as 90% or more web objects can
be static
* 8 sec threshold
Web Caching
43
Real World Solution
• Successful Implementations:
* Proflowers.com
* Kbkids.com
* delta-air.com
* Xerox
Web Caching
44
Research Areas
• How are the cache proxies organized,
hierarchically, distributed, or hybrid?-cache
architectures
• Where to place a cache proxy in order to
achieve optimal performance?proxy
placement
Web Caching
45
Research Areas
• How do proxies cooperate with each other?proxy co-operation
• What kind of data/information can be shared
among co-operating proxies?-data sharing
• How does a proxy decide what and when to
prefetch from Web server or other proxies to
reduce access latency in the future?-prefetching
Web Caching
46
Research Areas
• How does a proxy manage pages?-cache
placement and replacement
• How does a proxy maintain data
consistency?-cache coherency
• How is the control information distributed
among pages?-control information
Distribution
• How to deal with data which is not
cacheable?-dynamic data caching
Web Caching
47