Transcript Web servers

Web servers
Guntis Bārzdiņš
Artūrs Lavrenovs
What web servers do?
What web servers do
●
Implement HTTP protocol
●
Listen for HTTP requests from browsers
●
●
Try to fulfill them with static content from file
system
Modern web servers also
–
Forward dynamic content requests to other systems
–
Do lots of useful tasks using modules
What are some of the web servers
C10K problem
●
●
Dan Kegel, 1999
Web servers should handle ten thousand clients
simultaneously
●
Operating system kernel limitations
●
Operating system provided functionality
●
Web server design flaws
C10K problem solution – OS kernel
●
●
●
Open source nature of unix kernels allowed to
quickly identify all C10K bottlenecks and fix
them
Networking related algorithms and data
structures in unix kernels originally implemented
with complexities O(n|n^2|...) which where fixed
to O(1|n)
As a result networking capabilities of unix
kernels are virtually limitless (limited by
hardware resources)
C10K - OS functionality
●
●
Implemented new scalable I/O event notification
mechanisms (epoll – Linux, kqueue - *BSD)
–
Better performance than traditional poll/select
–
Can receive all pending event using one system call
AIO - The POSIX asynchronous I/O (AIO) interface
allows applications to initiate one or more I/O
operations that are performed asynchronously (i.e., in
the background). The application can elect to be
notified of completion of the I/O operation in a variety
of ways: by delivery of a signal, by instantiation of a
thread, or no notification at all.
C10K – web server design
●
Non-blocking I/O for networking and disk
–
●
Many threads
–
●
Don't block waiting on action completion, serve
other requests and wait for notifications about I/O
completion
Use all available CPU cores to achieve maximum
concurrency, avoid locking data structures
Each thread serves many requests
–
Don't create thread per request, reuse threads,
while some non-blocking action completes process
other requests
C10M problem – Next decade
●
10 million concurrent connections per server
●
Current unix kernels can't handle that
●
–
Application thread locks in kernel
–
Hardware drivers (NIC)
–
Memory management
Solution: new generation of high load unix
kernels
–
1 main application per server
–
Minimize system call ammount
–
Minimize kernel work
Dynamic content
Dynamic content
●
●
●
Web servers can't create dynamic content
themselves
We need application created in some
programming language
We need some method how web server can
communicate with application
–
CGI
–
Apache modules
–
FastCGI, SCGI, ...
–
WSGI, PSGI, JSGI, ...
CGI - Common Gateway Interface
●
●
●
●
●
Oldest method of getting dynamic content from
web servers
For each browser request web server defines
set of environment variables derived from
request and server configuration
Web server starts application in prepared
environment
Send POST data as standard input (if any)
Waits for standard output from executed file and
returns it to browser
CGI application
●
●
●
Can be ANY script or binary file executable in
UNIX
No libraries required
Use request information from environment
variables
●
Or ignore it completely if not needed
●
Process standard input if needed
●
Output additional HTTP headers and then
generated document body in standard output
CGI enivronment variables
●
REQUEST_METHOD: name of HTTP method
●
PATH_INFO: path suffix, if appended to URL after program name and a slash
●
●
●
PATH_TRANSLATED: corresponding full path as supposed by server, if PATH_INFO
is present
SCRIPT_NAME: relative path to the program, like /cgi-bin/script.cgi
QUERY_STRING: the part of URL after ? character. The query string may be
composed of *name=value pairs separated with ampersands (such as
var1=val1&var2=val2...) when used to submit form data transferred via GET method
as defined by HTML application/x-www-form-urlencoded
●
REMOTE_HOST: host name of the client, unset if server did not perform such lookup
●
REMOTE_ADDR: IP address of the client (dot-decimal)
●
●
Variables passed by user agent (HTTP_ACCEPT, HTTP_ACCEPT_LANGUAGE,
HTTP_USER_AGENT, HTTP_COOKIE and possibly others) contain values of
corresponding HTTP headers
Only few more
CGI example
#!/bin/bash
echo "Content-type: text/plain"
echo ""
echo "Hello world!"
echo "Today is:" `date`
CGI issues
●
●
Each request forces to create new process, big
overhead for process creation and destruction
All script files must be interpreted on each
request, another big overhead
●
Not scalable
●
Not suitable for modern web servers
●
Still widely used in embedded systems (e.g. wifi
router web management console) which require
occasional requests
FastCGI
●
Multiple processes started
●
Web server communicate over sockets or TCP
●
Each process serves many requests
●
Good performance
●
●
Complete separation of web server and
dynamic content system
Great scalability – put FastCGI processes
across server farm
Other communication methods
●
●
●
●
Integrate dynamic content generation system
with web server process (Apache modules)
CGI derivatives (SCGI)
*SGI implement programming language specific
method of communication between web server
and selected programming language (WSGI –
Python, PSGI – Perl)
Proxy requests to applications that implement
communication via HTTP
LAMP
●
Linux Apache MySQL PHP
●
Most common web server stack
●
Simple to install and configure
●
Simple to develop web applications
●
Acceptable performance and security
Apache
●
One of the oldest web servers
●
Still actively developed
●
●
Most popular web server today and in recorded
web server history
Highly configurable and extensible using
modules
●
All in one solution
●
Runs on many OS, most often on unix servers
PHP
●
One of the most popular web application
programming language
●
Easy to learn (bad coding practices)
●
Interpreted language
●
Functions from unix libraries and tools
●
Huge ammount of ready applications, libraries
and modules
MySQL
●
Unix distributions moving towards MariaDB
(MySQL fork) after acquisition by Oracle
●
Fast relation DB implementation
●
Fairly easy to user
●
Different storage engines (faster without
transactions, slower with, memory based, etc.)
●
Query caching
●
User quotas
Historical installation
●
Acquire source files for all required software
(Apache MySQL PHP)
●
Acquire all dependencies and install them
●
Configure make files via ./configure
●
Compile everything
●
●
Configure each piece of software so it works
with other
Use it
Modern installation
●
Use OS package manager
–
●
root@server# apt-get install libapache2-mod-php5
apache2 php5 mysql-server
Use it
Simple web site example
●
●
Create database user, database, table structure
and maybe some data
Using MySQL command prompt accessed by
–
$ mysql -u root -p
–
> CREATE DATABASE `example` COLLATE
'utf8_general_ci';
–
> CREATE TABLE `posts` (...)
–
> CREATE USER 'example'@'localhost' IDENTIFIED
BY PASSWORD '…'
–
> GRANT ... ON `example`.* TO 'example'@'localhost';
–
> INSERT INTO `posts` (`title`, `info`) VALUES ('a', 'a');
Simple web site example II
●
Or be lazy and use some web interface like
phpMyAdmin or Adminer
–
Download single file adminer.php
–
Drop it into /var/www
–
Navigate your browser to
http://localhost/adminer.php
–
Do all the tasks in browser without really knowing
SQL
Simple web site example III
●
Create file example.php in /var/www
●
Write your HTML and PHP code inside
●
●
●
–
Connect to database
–
Select data
–
Show data
Your simple web site is ready
Navigate your browser to
http://localhost/example.php
Enjoy result
Simple web site example - Source
Simple web site example - Result
●
From http://localhost/example.php
nginx
●
Contestant for 2nd place in web server rating
●
Event-driven
●
High-performance (thousands req/s)
●
Small memory footprint per request
●
Efficient CPU usage
●
Advanced configuration and functionality via
modules
●
Often used as FrontEnd to big websites
●
CloudFlare built on top of it
High-load web systems
●
●
●
Big dynamic web site can't reside in only 1
server
Need some strategy how to split load across
multiple web servers
One possible strategy
–
One entry point “FrontEnd” which receives all
requests and can handle the load (e.g., Varnish,
nginx)
–
Backends process requests from FrontEnd (nginx,
Apache)
Kas ir Varnish?
●
Starpniekserveris (proxy server)
–
Reversais
–
Kešojošais
–
Programmējams
●
Slodzes dalītājs (load balancer)
●
Dinamiskā satura ģenerētājs
●
Rīki – žurnalēšanas, atkļūdošanas,
monitorēšānas
Kādēļ Varnish?
●
●
●
●
●
Fantastiska veiktspēja pat uz lētā gala
serveriem – no 1000 līdz 10000 pieprasījumu
uz serveri sekundē tā ir norma
●
C + LABI C programmētāji
●
Izmanto Unix arhitektūras labumus
Pēc tūninga desmitiem tūkstošu pieprasījumu
sekundē, testēšanā pārsniegti 100k/s
Brīva programmatūra (free open source)
Pieprasījuma orientēta domēnspecifiska
konfigurēšanas/programmēšanas valoda VCL
Gandrīz viss, kas nepieciešams augstas
noslodzes tīmeklim, vienā
Kešošana
●
Jebkura dinamiskas tīmekļa lapas ģenerēšana
ir ļoti lēna - atkarībā no vides simtiem vai
tūkstošiem reižu lēnāka nekā statiska satura
atgriešana
●
●
Jebkurš izstrādes ietvars padara dinamiskas
lapas ģenerēšanu vēl desmitiem vai simtiem
reižu lēnāks (it īpaši Java EE, Zend
Framework)
●
●
Lētā gala serveris var ģenerēt pāris simtus šādu
dinamisku lapu sekundē
Jau tikai daži desmiti pieprasījumi sekundē
Rupja matemātika 100x100=10 000 reižu
lēnāk kā statiska lapa
Kešošana II
●
●
●
●
Ideja – būtu ideāli atgriezt dinamisku saturu ar
veiktspēju līdzīgu statiskām lapām
Mēs varam saglabāt tās lapas, kas ir vienādas
lietotājam un būtiski nemainās noteiktā laika
posmā
Cietā diska izmantošana lēna, labā prakse
izmantot tikai RAM vai servera SSD visa
kešotā satura glabāšanai
Katram konkrētam gadījumam jāveido
kešošanas stratēģija, kas var būt ļoti
subjektīva
Varnish kešošana
●
●
●
●
●
Pēc pieprasījuma adreses (pilnas vai regulāras
izteiksmes) var noteikt, kurus pieprasījumus kešot,
cik ilgi konkrētu elementu kešot vai nekešot –
standarta kešošanas pieeja praktiski visur
Lietotāji – Facebook, Twitter, WikiLeaks,
ThePirateBay
Izstrādāts Norvēģijā
Reklamējas, ka var paātrināt lapas atgriešanu no 300
līdz tūkstošiem reižu, t.i., tikai aptuveni līdz 10 reizēm
lēnāk nekā statisks saturs
Ātra salīdzinoši ar citām kešošanas pieejām
DSL VCL
Vienkārša sintakse (līdzīga C), kas tiek notranslēta
uz C un tad nokompilēts uz mašīnkodu
●
●
=, ==, !=, ~, !~, !, &&, ||, +, “string”
●
if () {} else {}, set, unset, return
9 subrutīnas, kas ir dažādi katra pieprasījuma
apstrādes posmi, kurās var kaut ko ietekmēt
●
Tikai predefinēti objekti - client, server, req, bereq,
beresp, obj, resp
●
sub vcl_recv {
if (req.request == "GET" && req.url ~ “\.js$”) {
return (lookup); }
}
VCL apstrādes arhitektūra
Integrēšana
●
Fiksētais kešošanas laiks var nebūt optimāls
●
●
●
Saturs var mainīties biežāk par uzstādīto laiku - lietotāji dabū vecu
informāciju
Retāk – serveri veic nevajadzīgu darbu
Risinājums – jāpaziņo serverim, ka saturs ir jāatjaunina
acl purge { "192.168.0.0"/24; }
sub vcl_recv { if (req.request == "PURGE" ) {
if (!client.ip ~ purge) { error 405 "Not allowed."; }
(lookup);
}}
sub vcl_hit { if (req.request == "PURGE") {
purge;
error 200 "Purged.";
}}
return
Dinamiskā satura ģenerēšana ESI
●
Bieži vien tīmekļa lapas sastāv no blokiem, kuru
mainība ir dažāda
●
●
Vai arī ir neliels informācijas bloks, kas atbilst katram
lietotājam (piemēram, “Sveiks, Jānis Bērziņš | Tev ir [0]
jauns ziņas”)
Mēs to varam ielādēt pēc lapas ielādes, izmantojot
JSON vai arī ģenerēt saturu uz Varnish
<TABLE><TR><esi:include src=”sveiks.html”/></TR>
<TR><TD><esi:include src=”index.html”/></TD>
<TD><esi:include src=”article.html”/></TD></TR>
</TABLE>
●
Varnish parsē <esi> birkas un saliek elementus kopā, visi
elementi konfigurēti un kešoti kā neatkarīgi
Slodzes dalīšana
●
Vienu adresi var apstrādāt vairāki ar bakendi
●
Dažādus url var apstrādāt dažādi bakendi
●
Monitorēšana
●
●
●
●
Beigto serveru atslēgšana (restart, upgrade, repair)
●
Atdzīvojušos serveru pieslēgšana atpakaļ (arī jauni)
Faktiski nozīmē, ka var lietot kaudzi LĒTU desktop
grade dzelžu dinamiskā satura ģenerēšanai
Ja pievienojam vēl vienu frontend, tad iegūstam
augstu, bet lētu bojājumpiecietība (fault tolerance)
Ja izmantojam NoSQL vai kā savādāk iegūstam
replicētu datubāzi, tad nav nepieciešami dārgi serveri
vispār
Varnish lietojums Latvijā
$ curl -I www.tvnet.lv
●
$ curl -I www.delfi.lv
●
HTTP/1.1 200 OK
●
HTTP/1.1 200 OK
●
Server: Apache
●
X-Fe-Node: nuffy
●
Content-type: text/html; charset=utf-8
●
Server: lighttpd/1.4.31 (PLD Linux)
●
Content-Length: 159097
●
Date: Wed, 07 Nov 2012 20:20:58 GMT
●
X-Varnish: 734492112 734450241
●
Age: 58
●
Via: 1.1 varnish
●
Connection: keep-alive
●
●
●
●
●
●
●
Last-Modified: Wed, 07 Nov 2012 20:09:08
GMT
Expires: Wed, 07 Nov 2012 20:10:08 GMT
Cache-Control: max-age=60
Vary: Accept-Encoding
Content-Type: text/html; charset=UTF-8
Content-Length: 185924
Date: Wed, 07 Nov 2012 20:10:15 GMT
●
X-Varnish: 2025605055 2025545136
●
Age: 67
●
Via: 1.1 varnish
●
Connection: keep-alive
Nestandarta lietojumi - WAF
●
●
●
●
Programmējamība ļauj veidot nestandarta lietojumus,
piemēram, WAF
Definējam pēc iespējas precīzākas saņemto pieprasījumu
apstrādes adreses un metodes
–
req.url ~ “^/topic/([0-9])$” nevis “^/topic”
–
req.request == “GET”
Beigās izmantojam return(error);
Ierobežojam piekļuvi backend serveriem (vai atvienojam no
interneta)
●
Uzbrucēji tagad uzbrūk frontendam, aizsargājam to
●
Nepalīdz pret loģiskām (un daudzām citām) ievainojamībām
New trend
●
Web application is central thing
●
Develop application in some framework
●
●
No separate web server, it is now just a part of
application (it is library from used framework)
Extremely customizable
Situācija šobrīd
●
Standarta tīmekļa izstrādes risinājums ir HTTP
serveris un kāda klasiska dinamiskā satura
ģenerējošā sistēma (PHP, ASP, Python u.c.),
pastāv problēmas:
●
●
●
●
Ilglaicīgie pieprasījumi un pastāvīgie savienojumi
Vienlaicīgi apkalpojamo klientu skaits
Savietojamība ar citām tehnoloģijām
Nākotnes attīstības iespējas
Notikumvirzītie programmēšanas ietvari
Ideja un realizācija nav jauni (Python
Twisted, Perl Object Environment, Ruby
EventMachine, Node.js)
● Maza izplatība tīmekļa risinājumos
● Risina standarta tehnoloģiju problēmas
● Reaktora projektējums, C10K problēma
● Ļauj tīmekļa programmētājiem veidot tīkla
risinājumus
●
Node.js
●
●
●
●
Bibliotēku kopums, kas ļauj veidot tīkla
risinājumus JavaScript programmēšanas
valodā, darbojas V8 dzinī
JavaScript dziņu veiktspējas novērtējums
Jaunas saistītās tehnoloģijas – Socket.IO,
CoffeeScript
Problemātiski aspekti - pakotņu pārvaldība,
lietotņu mitināšana