Transcript Proxies

Proxies
Herng-Yow Chen
1
Outline





Explain HTTP proxies, contrasting them to web
gateways and illustrating how proxies are
deployed.
Show some of the ways proxies are helpful.
How proxies are deployed in real networks and
how traffic is directed to proxy servers.
How to configure your browser to use a proxy.
Demonstrate HTTP proxy requests, how they
differ from server requests, and how proxies can
subtly change the behavior of browsers.
2
Outline (cont.)

Explain how you can record the path of your
messages through chains of proxy servers, using
Via headers and the TRACE method.

Describe proxy-based HTTP access control.

Explain how proxies can interoperate between
clients and servers, each of which may support
different features and versions.
3
Web intermediaries

Web proxy servers are middlemen that
fulfill transactions on the client’s behalf.

Without a web proxy, HTTP clients (e.g., a
browser) talk directly to HTTP servers.

HTTP proxy servers are both web servers
and web clients.
4
A proxy must be both a server and a
client
Proxies act like CLIENT
to web servers.
Proxies act like SERVERS
to web clients.
Request
client
Request
Response
Response
Proxy
server
5
Private and Shared Proxies

Public proxies (Shared proxies)


A proxy server can be shared among numerous clients.
E.g., caching servers.
Private proxies


A proxy server can be dedicated to a single client.
E.g., some browser assistant products, as well as
some ISP services, run small proxies directly on the
user’s PC in order to extend browser features, improve
performance, or host advertising for free ISP services.
6
Proxies Versus Gateways


Proxies connect two or more applications
that speak the same protocol.
A gateway acts as a “protocol converter,”
allowing a client to complete a transaction
with a server, even when the client and
server speak different protocols.
7
Proxies Versus Gateways
(a)HTTP/HTTP Proxy
HTTP
HTTP
Web proxy
Browser
Web server
(b)HTTP/POP gateway
HTTP
Browser
POP
Web/email
gateway
Email server
8
Why Use Proxies?








Child filter
Document access controller
Security firewall
Web cache
Surrogate
Content router
Transcoder
Anonymizer
9
Child-safe Internet filter
ok
server
Internet
Child user
DENY
server
Site contains adult content
Child user
School’s filtering
proxy
10
Document access controller

Proxy servers can be used to implement a
uniform access-control strategy across a large
set of web servers and web resources and to
create an audit trail.

All the access controls can be configured on the
centralized proxy server, without requiring the
access controls to be updated frequently on
numerous web servers.

Maintain “blacklists” in order to identity and
restrict access to objectionable content.
11
Centralized document access control
General
news
General
news
Access
control
proxy
Client 1
Server A
To the Internet
Client 2
Client 3
Internet
Local area
network
Intended request
to server B
blocked
Secret
financial
data
Server B
What is the password
for the financial data?
12
Security firewall



Network security engineers often use proxy
servers to enhance security.
Proxy servers restrict which applicationlevel protocols flow in and out of an
organization, at a single secure point in the
network.
They also can provide hooks to examine
the traffic, as used by virus-eliminating web
and email proxies.
13
Security firewall
Internet
Server
Client
Filtering
router
Filtering
router
Server
Client
Virus
Firewall
proxy
Client
Firewall
Firewall
Server
14
Web cache

Proxy caches maintain local copies of
popular documents and serve them on
demand, reducing slow and costly Internet
communication.
15
Web cache
16
Surrogate




Proxies can masquerade as web servers.
These so-called surrogates or reverse proxies
receive real web server requests, but, unlike web
servers, they many initiate communicate with
other servers to locate the requested content on
demand.
Surrogate (server accelerator) may be used to
improve the performance of slow web servers for
common content.
Surrogates also can be used in conjunction with
content-routing functionality to create distributed
networks of on-demand replicated content.
17
Surrogate
Internet
client
Surrogate
(also know as a
reverse proxy or a
server accelerator)
server
18
Content router

Proxy servers can act as “content routers,”
directing requests to particular web servers
based on Internet traffic conditions and type of
content.

Content routers also can be used to implement
various service-level offerings.

For example, content routers forward requests to
nearby replica caches (if the user has paid for
higher performance), or route HTTP requests
through filtering proxies (if the user has signed up
for a filtering service).
19
Content routing
20
Transcoder


Proxy servers can modify the body format
of content before delivering it to clients.
This transparent translation between data
representation is called transcoding.
For example,




convert GIF images into JPEG images,
compress files,
summarize web content as a compact form,
Language translation
21
Content transcoder
Players de Verano
Obtendra mchas sonrisas
yguinios cuando use nuestras
players de verano.
Spanishspeaking
client
Summer Beach Shirts
 Blanco
You’ll get lots of smiles and
winks when you wear out
summer beach shirt.
 Negro
 Naranja amanecer
 White
 Black
Summer Beach Shirts
You’ll get lots of smiles and
winks when you wear
out summer beach shirt.
Web-enabled
mobile phone
1)
White
2)
Black
3)
Sunrise orange
Transcoding
proxy
Origin
server
 Sunrise orange
22
Anonymizer



Anonymizer proxies provide heightened privacy
and anonymity, by actively removing identifying
information from HTTP messages.
Removed information, e.g., client IP, From
header, Referer header, cookies, URI session
IDs.
However, because identifying information is
removed, the quality of the user’s browsing
experience may be diminished, and some web
sites may not function properly.
23
Anonymizer
GET /something/file.html HTTP/1.0
Date: Thu, 25 Sep 2003 12:55:23 GMT
Anonymized message doesn't
contain the common identifying
information headers
User-Agent: Mozilla/4.0 (Windows NT 5.0)
From: [email protected]
Referer: http://www.csie.ncnu.edu.tw/tax-audits.html
GET /something/file.html HTTP/1.0
Cookie: profile="fotbal,litte beer"
Date: Thu, 25 Sep 2003 12:55:23 GMT
Cookie: income-braket="30k-45k"
User-Agent: Mozilla/4.0
Anonymizing proxy
client
server
24
Proxy server deployment

Egress proxy



Access (ingress) proxy


placed at ISP access points, processing the aggregate requests from the
customers. E.g., ISPs use caching proxies to improve access
performance.
Surrogates



Located at the exist points of local networks to control the traffic flow
between LAN and the greater Internet.
E.g. Firewall protection, to reduce bandwidth charges and improve
performance of Internet traffic.
Located at the edge of the network, in front of web servers, where they
can field all of the requests directed at the web server and ask the web
server for resources only when necessary.
Add security features to web servers, improve slower web server’s
performance.
Network exchange proxy

Placed in the Internet peering exchanging points between networks, to
alleviate congestion at Internet junctions through caching and to monitor
traffic flows. (e.g. for national security concerns).
25
Private LAN egress proxy
(a)Private Lan egress proxy
Local
network
Internet
client
Proxy
server
client
26
ISP access proxy
(b)ISP access proxy
Internet
client
Proxy
server
client
27
Surrogate
(c)Surrogate
Local
network
Internet
client
Proxy
server
client
28
Network exchange proxy
(d)Network exchange proxy
Network 2
Network 1
Router
client
Router
Proxy
server
29
Proxy Hierarchies (e.g. 3-level)
Proxies can be cascaded in chains called proxy hierarchies.
This hierarchy is static.
Proxy 1
client
(Child of proxy 2)
Proxy 2
(Child of proxy
3 and parent
of proxy 1)
Proxy 3
(parent of proxy 2)
server
30
Dynamic hierarchy, changing for
each request
Dedicated cache server for
specially-subscribed objects
Caching proxy
client
Access proxy
Internet
Compressor proxy
Web servers
around the globe
31
Examples of dynamic parent
selection




Load balancing
Geographic proximity routing
Protocol/type routing
Subscription-based routing
32
How Proxies Get Traffic
(a)Client configured to use proxy
(b) Network intercepts and
redirects traffic to proxy
Router
client
client
server
server
proxy
proxy
(c) Surrogate stands in for web
server
(d) Server redirects HTTP requests
to proxy
proxy
client
(Assuming the
web server’s
name)
client
server
server
proxy
33
Client Proxy Settings

Manual configuration


Browser preconfiguration


The browser vendor manually preconfigures the proxy setting of
the browser before delivering it to customers.
Proxy auto-configuration (PAC)



Explicitly set a proxy to use.
Provide a URI to a JavaScript proxy auto-configuration (PAC)
files.
The browser fetches the JavaScript file and runs it to decide
which proxy to use.
WPAD proxy discovery

Some browser support the Web Proxy Autodiscovery Protocol
(WPAD), which automatically detects a “configuration server” from
which the browser can download an auto-configuration file. (e.g.
in I.E.)
34
PAC files





get http://proxy.ncnu.edu.tw/ncnu.pac
.pac suffix and the MIME type
“application/x-ns-proxy-autoconfig.”
Each PAC file must define a function called
FindProxyForURL (url, host) that computes
the proper proxy server to use for accessing
the URI.
DIRECT
// connections should be made directly
PROXY host:port // the specified proxy should be used
35
Web Proxy Autodiscovery Protocol
(WPAD)

A client that implements the WPAD will:




Use WPAD to find the PAC URI.
Fetch the PAC file given in the URI.
Execute the PAC file to determine the proxy
server.
Use the proxy server for requests.
36
WPAD (cont.)


WPAD uses a series of resource-discovery
techniques, one by one until it succeeds, to
determine the proper PAC file.
Multiple discovery techniques are used, because
not all organizations can use all techniques.





Dynamic Host Discovery Protocol (DHCP)
Service Location Protocol (SLP)
DNS well-known hostnames
DNS SRV records
DNS service URIs in TXT records.
37
Proxy URLs Differ from Server URLs
(a)Server request
GET /index.html HTTP/1.0
User-agent: SuperBrowser v1.3
client
Origin server
(b)Explicit proxy request
GET http://www.ncnu.edu.tw/index.html HTTP/1.0
User-agent: SuperBrowser v1.3
client
(Proxy explicitly configured)
Proxy Server
Origin server
38
Proxy URLs Differ from Server URLs
(c)Surrogate(reverse proxy) request
GET /index.html HTTP/1.0
User-agent: SuperBrowser v1.3
client
(Server hostname points to the surrogate proxy)
Surrogate
Origin server
(d) Intercepting proxy request
GET /index.html HTTP/1.0
User-agent: SuperBrowser v1.3
client
Origin server
Intercepting proxy
39
URL Resolution Without a Proxy
(2a)Browser looks up host “ncnu” via DNS
(2b)Failed , host unknown
(3b)Browser looks up host “www.ncnu.edu.tw” via DNS
(3c)Success!Get IP addresses back
(1)User types”ncnu”
into browser’s URI
location window
DNS server
(4a)Browser tries to connect to IP addresses,
one by one until connect successful
(4b)Success;connection established
(5a)Browser sends HTTP request
(3a)The browser does
auto-expansion,
converting ”ncnu” into
“www.ncnu.edu.tw”
(5b)Browser gets HTTP response
www.ncnu.edu.tw
40
URL Resolution with an Explicit
Proxy
GET http://ncnu/ HTTP/1.0
(2a)Proxy is explicitly configured,
so the browser looks up the address
of the proxy server using DNS
Proxy-connection: keep-Alive
User-Agent: Mozilla/4.0
Host: ncnu
Accept: */*
(2b)Success!Get proxy
server IP addresses
Accept-encoding: gzip
Accept-language: en
DNS server
(1)User types ”ncnu”
into browser’s URI
location window
Accept-charset: iso-8859-1,*,utf-8
(4b)Proxy gets a partial
hostname in the request,
because the client did not
auto-expand it.
(3a)Browser tries to connect to proxy
(3b)Success;connection established
(3a)The browser does autoexpansion, converting”ncnu” into
“www.ncnu.edu.tw”
proxy
(4a)Browser sends HTTP request
www.ncnu.edu.tw
41
URL Resolution with an Intercepting
Proxy
(2a)
(2b)
(3b)
proxy
(3c)
(5a)
DNS server
(4a)
(1)
(3a)
(4b)
(5a)
Client
Interceptor
www.ncnu.edu.tw
42
Tracing Messages
Today, it’s not uncommon for web requests to go through a chain of
two or more proxies on their way from the client to the server.
It’s important to trace the flow of messages across proxies and to
detect any problems.
Surrogate cache bank
ISP proxy
client
Internet
Web server
43
The Via Header



Is used to track the forwarding of messages,
diagnose message routing loops, and identify the
protocol capabilities of all senders along the
request/response chain.
Lists information about each intermediate node
(proxy or gateway) through which a message
passes.
Each time a message goes through another node,
the intermediate node must be added to the end
of the Via list.
44
The Via Header
Request message (as received by server)
GET /index.html HTTP/1.0
Accept: text/html
Host: www.csie.ncnu.edu.tw
Via: 1.1 proxy1.ncnu.edu.tw, 1.0 proxy2.ncnu.edu.tw
client
proxy1.ncnu.edu.tw
(HTTP/1.1)
proxy2.ncnu.edu.tw
(HTTP/1.0)
server
45
The response Via is usually the
reverse of the request Via
Request Via header
Via: 1.1 A, 1.1 B, 1.1 C
A
B
C
client
Response Via header
server
Via: 1.1 C, 1.1 B, 1.1 A
46
Via and gateways

Some proxies provide gateway functionality
to servers that speak non-HTTP protocols.

The Via header records these protocol
conversions, so HTTP applications can be
aware of protocol capabilities and
conversions along the proxy chain.
47
Via and gateways
HTTP request message sent to proxy
GET ftp://www.ncnu.edu.tw/pub/welcome.txt HTTP/1.0
FTP request
client
HTTP response message
proxy1.ncnu.edu.tw FTP response
(HTTP/1.1)
HTTP/1.0 200 OK
Date: Sun, 12 Dec 2003 21:01:59 GMT
Via: FTP/1.0 proxy.ncnu.edu.tw (Traffic-Server/5.0.1-17882[cMsf])
Last-modified: sun, 12 Dec 2003 21:05:24 GMT
Content-type: text/plain
www.ncnu.edu.tw
Hi there. This is an FTP server.
48
The Server and Via headers

The Server response header field describes the
software used by the origin server.





Server: Apache/1.3.14 (UNIX) PHP/4.0.4
Server: Netscape-Enterprise/4.1
Server: Microsoft-IIS/5.0
If a response message is being forwarded
through a proxy, make sure the proxy does not
modify the Server header.
The Server header is meant for the origin server.
Instead, the proxy should add a Via entry.
49
Privacy and security implications of
Via

There are some cases when we don’t want exact
hostnames in the Via string.

For example, when a proxy server is part of a network firewall it
should not forward the names and ports of hosts behind the
firewall, because knowledge of network architecture behind a
firewall might be of use to a malicious party.

Proxy can disable the Via node-name forwarding,
replacing the hostname with an appropriate pseudonym.

For strong privacy requirements, a proxy may combine an
ordered sequence of Via waypoint entries (with same
protocol version) into a single, joined entry.


Via: 1.0 foo, 1.1 devirus.com, 1.1 access-logger.com
Via: 1.0 foo, 1.1 concealed-stuff
50
The TRACE method





Proxy servers can change messages as the messages
are forwarded. Headers are added, modified, and
removed, and bodies can be converted to different
formats.
As proxies become more sophisticated, and more
vendors deploy proxy products, interoperability problems
increase.
We need a way to watch how messages are changed,
hop by hop.
HTTP/1.1’s TRACE method is for this purpose. It is very
useful for debugging proxy flows.
It can trace a request message through a chain of proxies,
observing what proxies the message passes through and
how each proxy modifies the request message.
51
The TRACE method



When the TRACE request reaches the
destination server, the entire request message is
reflected back to the sender, bundled up in the
body of an HTTP response.
When the TRACE response arrives, the client
can examine the exact message the server
received and the list of proxies through which it
passed (in the Via header).
The TRACE response has


Content-Type: message/http
And a 200 OK status
52
The TRACE Method
TRACE request
TRACE /index.html HTTP/1.1
Host: www.ncnu.edu.tw
Accept: text/html
Proxy1
client
(proxy.ncnu.edu.tw)
Proxy2
(proxy2.ncnu.edu.tw)
Proxy3
(proxy3.ncnu.edu.tw)
HTTP/1.1 200 OK
Content-Type: message/http
Content-Length: 269
Via: 1.1 proxy3.ncnu.edu.tw, 1.1 proxy2.ncnu.edu.tw, 1.1 proxy1.ncnu.edu.tw
TRACE /index.html HTTP/1.1
Host: www.ncnu.edu.tw
Accept: text/html
Via: 1.1proxy.ncnu.edu.tw, 1.1 proxy2.ncnu.edu.tw, 1.1 proxy3.ncnu.edu.tw
X-Magic-CDN-Thingy: 134-AF-003
Cookie: accept-isp=“hinet’s ISP, Puli”
Client-ip: 163.22.3.4
TRACE response
Server
www.ncnu.edu.tw
Received request
53
Max-Forwards

Normally, TRACE messages travel all the way to the
destination server, regardless of the number of
intervening proxies.

We can use the Max-Forwards header to limit the number
of proxy hops for TRACE and OPTIONS requests, which
is useful for



Testing a chain of proxies forwarding messages in an infinite loop.
Checking the effects of particular proxy in the middle of a chain.
If Max-Forwards value is zero, the receiver must reflect
the TRACE message back toward the client (The same
mechanism likes TTL in IP datagram). Otherwise MaxForwards value should be decremented by one.
54
Max-Forwards
TRACE request
TRACE /index.html HTTP/1.1
Host: www.ncnu.edu.tw
Max-Forward: 2
Accept: text/html
Proxy1
client
Max-Forward=1
(proxy.ncnu.edu.tw)
Proxy2
Max-Forward=0
(proxy2.ncnu.edu.tw)
HTTP/1.1 200 OK
Content-Type: message/http
Content-Length: 269
Via: 1.1 proxy2.ncnu.edu.tw, 1.1 proxy1.ncnu.edu.tw
TRACE /index.html HTTP/1.1
Host: www.ncnu.edu.tw
Accept: text/html
Via: 1.1proxy.ncnu.edu.tw, 1.1 proxy2.ncnu.edu.tw
X-Magic-CDN-Thingy: 134-AF-003
Cookie: accept-isp=“hinet’s ISP, Puli”
Client-ip: 163.22.3.4
Proxy3
(proxy3.ncnu.edu.tw)
Server
www.ncnu.edu.tw
Received request
TRACE response
55
Proxy Authentication

Proxies can serve as access-control
devices.

HTTP defines mechanism called proxy
authentication that blocks requests for
content until the user provides valid access
permission credentials to the proxy.

We will talk more about HTTP
authentication in later lectures (chap12).
56
Proxy Authentication
(a)
GET http://www.ncnu.edu.tw/secret.jpg HTTP/1.0
client
Access
control
proxy
server
Access
control
proxy
server
(b)
HTTP/1.0 407 Proxy Authorization Required
Proxy-Authenticate: Basic realm=“Secure Stuff”
client
(c)
GET http://server.com/secret.jpg HTTP/1.0
Proxy-Authorization: Basic YadNfddZws==
client
Access
control
proxy
server
57
Proxy Authentication
(d)
HTTP/1.0 200 ok
Content-type: image/jpeg
…<image data included>…
server
client
Access control
proxy
Super secret
image
58
Proxy Interoperation



Client, servers, and proxies are built by multiple
vendors, to different versions of HTTP specification.
Proxy servers need to intermediate between
client-side and server-side devices, which may
implement different protocols and have different
bugs (quirks).
Handling unsupported Headers and Methods


Must forward unrecognized header fields and must
maintain the relative order of header fields with the same
name.
OPTIONS method is use to discover optional
feature support
59
OPTIONS:Discovering Optional
Feature Support
OPTIONS * HTTP/1.1
Proxy
client
server
HTTP/1.1 200 OK
Allow: GET,PUT,POST,HEAD,TRACE,OPTIONS
60
OPTIONS

If the URI is a real resource, the OPTIONS
request inquires about the features
available to that particular resource.

OPTIONS http://www.joes-heardware.com/index.html HTTP/1.1
61
For More Information

http://www.w3.org/Protocols/rfc2616/rfc2616.txt


ftp://ftp.rfc-editor.org/in-notes/rfc3040.txt


Ari Luotonen,Prentice Hall Computer Books.
ftp://ftp.rfc-editor.org/in-notes/rfc3143.txt


Internet Web Replication and Caching Taxonomy
Web Proxy servers


“Hypertext Transfer Protocol” by R. Fielding,J. Gettys,J. Mogul,H.
Frystyk,L. Masinter,P. Leach,T. Berners-Lee
Known HTTP Proxy/Caching Problems
Web Caching

Duane Wessels ,O’Reilly & Associates,Inc
62