Aucun titre de diapositive
Download
Report
Transcript Aucun titre de diapositive
p2pWeb
The p2pweb Project
Low cost Peer to Peer solutions for high
availability web hosting
19 Mai 2005
Séminaire
« Peer-To-Peer : Concept, Tools and Applications »
Ecole d’ingénieurs de Genève
Peer-To-Peer : Concept, Tools and Applications
Slide1
p2pWeb
Agenda
1.
The Project goals
2.
Web hosting solutions and architecture
3.
The p2pweb solution
4.
5.
Project constraints and key technologies
Related projects
6.
7.
The project components
–
Global server load balancing system
–
Distributed set of web server
–
Monitoring system
–
Node architecture and hardware
Conclusion
Peer-To-Peer : Concept, Tools and Applications
Slide2
p2pWeb
The Project goals
To explore and implement low cost solutions
for high availability web hosting
“Do More with Less”
Our targets are :
• small or medium structures (associations, NGO, etc …)
• with limited resources (money, IT people)
• with important web hosting needs (bandwidth available)
– rich and complex web site
– medium to high web traffic
– high availability and visibility needs
It may fit very well the needs of many project in Least Developed Countries :
TeleCentres Networks, Rural Organisations, Universities, Cultural Centres, Public
Libraries, Community Multimedia Centres, Health Networks, etc ...
Peer-To-Peer : Concept, Tools and Applications
Slide3
p2pWeb
Example of hosted web site
Afromix.org (personal web site)
A portal of African and Caribbean Cultures since 1993
A complex web site using multiple technologies
• in house Perl Content Management System (CMS)
• an extended discographic database (1600 artist, more
than 50 styles from all Africa and French West Indies)
• multilingual (French, English, Spanish) site running on
a JAVA application server (Tomcat)
• about 25 000 files, 400 000 pages/month, 2 million
hits/month, 60 000 unique visitors/month
Mediaport.net (community web site)
One of the first French web pioneer, first developed in INA
Peer-To-Peer : Concept, Tools and Applications
•
•
mostly static content (near 10 000 files)
multilingual (French, English) site running on a PHP
CMS (ezpublish)
•
it’s the main p2pweb test platform and it will
evolve to an open web hosting solution for
artistic and cultural web projects (an editorial
committee is forming)
Slide4
p2pWeb
The web hosting market
• Free web hosting
– Very limited
• static html or small PHP site (limited computing resources)
• can’t use your own domain name
• Professional web hosting
– A broad range of services
• private virtual server
• dedicated server
• Co/location
– But price is quite high
• 100-200€/month for one dedicated server
• and maintenance can be complex
Peer-To-Peer : Concept, Tools and Applications
Slide5
p2pWeb
Centralized architecture
Server in one location :
Server and Internet link are single point
of failure (SPOF)
Peer-To-Peer : Concept, Tools and Applications
Slide6
p2pWeb
Centralized architecture (cont.)
High availability
architecture
Multi-homing with BGP routing
Datacenter hosting
Load Balancers
- BGP routing
Reverse Proxy / Cache / SSL accelerators
- hardware load balancing
Load Balancers
- SAN storage
Web servers
Application Servers
In theory, no SPOF
•but very complex
architecture
•very high cost
Database cluster
SAN Storage
Peer-To-Peer : Concept, Tools and Applications
Slide7
p2pWeb
CDN Architecture
Content Delivery Network
Service delivered by
companies like Akamai,
Speedera, and others.
Edge servers provide
caching and data
replication for fast delivery
to clients worldwide.
A solution for very high traffic web
site.
Very expensive solution.
Peer-To-Peer : Concept, Tools and Applications
Slide8
p2pWeb
alternative web hosting
• Community based web hosting
– Initiatives from various associations
ouvaton.coop, globenet.net, autre.net, altern.net, ...
– Most of the time, people share their money and knowledge to
buy and administer one or two dedicated server.
• Home server
– We now have sufficient bandwidth (ADSL) computing power
(PCs), good software (apache, linux …)
– We lack reliability !
Peer-To-Peer : Concept, Tools and Applications
Slide9
p2pWeb
Peer-To-Peer : Concept, Tools and Applications
First idea : big home server
Slide10
p2pWeb
Second idea (better one)
Lots of people (family, friends, co-workers, …) already have :
• An ADSL Internet access or Permanent High Speed Connection
• One or more PCs (with a lot of unused disk space)
So, what about sharing those resources to
build a more powerful and resilient network
of web servers
Peer-To-Peer : Concept, Tools and Applications
Slide11
p2pWeb
Web Hosting : the p2pweb way
ADSL
ISP 2
ADSL
ISP 1
ADSL
ISP 3
Each member of the p2pweb network share a portion of his Internet bandwidth (most of the
time an ADSL line) and host a small server.
The result is a powerful network that is the sum of the bandwidth and computing resources of
all the members.
Peer-To-Peer : Concept, Tools and Applications
Slide12
p2pWeb
A peer to peer solution
• Somehow, it’s a return to the very fundamentals
principles of Internet:
– a cooperative solution (network of servers)
– a distributed solution (no central control)
– a fault tolerant solution (resilience)
• But with all the power of existing internet and
open source technologies
– consumer computers and internet access
– overlay network and services over the Internet
– It is a peer to peer solution !
Peer-To-Peer : Concept, Tools and Applications
Slide13
p2pWeb
The project constraints
• Unreliable component
– Node failure is not an exception, it’s the rule.
– Internet link failure, power outage, server crash …
• Automatic function
– Murphy’s law : servers will always crash when there
is nobody to fix the problem (at night, when you are
on vacation …)
• Pragmatic approach
– Build from existing component
– Simple and efficient solutions are priority choices
Peer-To-Peer : Concept, Tools and Applications
Slide14
p2pWeb
Key technologies
Mass market products are available at low cost now !
• ADSL lines
– 1 Mb/s Up - 15Mb/s Down for 30€ / month (free.fr)
• ADSL router / firewall / ethernet or wifi
– D-LINK, NetGear, LINKSYS from 75 to 150 €
• Small Servers
– PC barebones (Asus, Biostar, Shuttle …)
• from 300 to 500 €
– mini iMac (Apple)
• 499 €
• Open Source Software
– BSD, Linux, apache, tomcat, etc …
Peer-To-Peer : Concept, Tools and Applications
Slide15
p2pWeb
Related projects
YouServ (IBM)
•
•
•
http://www.almaden.ibm.com/cs/people/bayardo/userv/
YouServ is software that forms a webserving "grid" by allowing its users
to pool their desktop computing resources to create one large, virtual webspace.
An intranet project, more oriented on desktop file sharing.
Unfortunately not open source
Vergenet (Simon Horman)
•
•
http://www.vergenet.net/
Vergenet has servers located in Sydney, Amsterdam, London, Tokyo and
Indiana. These servers are all running Linux and a variant of Super
Sparrow to load balance traffic between them.
Super Sparrow enables users to load balance traffic between
geographically separated points of presence by finding the site networkwise closest to clients. This is done by accessing BGP routing information
(but it require direct access to a BGP router)
Peer-To-Peer : Concept, Tools and Applications
Slide16
p2pWeb
Related projects (cont.)
Coral (New York University)
•
•
•
•
http://www.coralcdn.org/
Coral is peer-to-peer content distribution network, comprised of a
world-wide network of web proxies and name servers
Publishing through Coral is as simple as appending a short string to the
hostname of objects' URLs; a peer-to-peer DNS layer transparently
redirects browsers to participating caching proxies
an URL like www.myserver.com/some/path.html becomes
www.myserver.com.nyud.net:8090/some/path.html
Coral is in fact running on top of the planet-lab network (a grid
computing research network : http://www.planet-lab.org/)
Globule (Vrije University Amsterdam)
•
•
http://www.globule.org/
Globule is a module for the Apache Web server that allows a given
server to replicate its documents to other Globule servers. Clients are
automatically redirected to one of the available replicas.
The project provide both content replication and HTTP or DNS based
redirection mechanisms
Peer-To-Peer : Concept, Tools and Applications
Slide17
p2pWeb
P2PWeb - Project Components
• A global server load balancing system
– Two main functions
• Load balance the traffic on the web servers
• Provide failover = only send traffic on alive web servers
• A distributed set of web server
– And a set of tools to :
• Publish content on the servers
• Keep all servers in sync (replication mechanism)
• Monitoring services
Peer-To-Peer : Concept, Tools and Applications
Slide18
p2pWeb
Global server load balancing
• Load balancing
– achieved using Round Robin DNS
• simple system, with well known limits
(http://www.tenereillo.com/GSLBPageOfShame.htm)
• Failover
– achieved by coupling a monitoring system (NAGIOS) with the
DNS
• DNS entries have short TTL (time to live)
• NAGIOS monitors each web servers
• When a server change state (for example DOWN) a special handler is
called that update the DNS entry and reload the DNS
• The failed server is no longer announced by the DNS
To have a fully redundant system, we use 3 independents
DNS (all primary), each running its own NAGIOS
instance
Peer-To-Peer : Concept, Tools and Applications
Slide19
p2pWeb
GSLB : Failover illustrated
Initial DNS entries : all server are up
www 300 IN A 82.66.103.28
www 300 IN A 195.101.152.113
www 300 IN A 82.232.203.167
www 300 IN A 66.35.250.210
Server 195.101.152.113 fails
In the syslog trace, we can see :
22:22:46 nagios: SERVICE ALERT: ns1;HTTP-P2PWEB;CRITICAL;SOFT;1;Connection refused by host
22:23:47 nagios: SERVICE ALERT: ns1;HTTP-P2PWEB;CRITICAL;SOFT;2;Connection refused by host
22:24:46 nagios: SERVICE ALERT: ns1;HTTP-P2PWEB;CRITICAL;HARD;3;Connection refused by host
After 3 unsuccessfull try, a notification is send by email to the admin
22:24:46 nagios: SERVICE NOTIFICATION: nagios;ns1;HTTP-P2PWEB;CRITICAL;notify-by-email;Connection refused by host
The specific handler is called
22:24:47 nagios: SERVICE EVENT HANDLER: ns1;HTTP-P2PWEB;CRITICAL;HARD;3;http_p2pweb_handler
And the DNS is reloaded
22:24:47 named[17379]: master/p2pweb.net.zone:1: no TTL specified; using SOA MINTTL instead
And now we can verify that the DNS entries are
www 300 IN A 82.66.103.28
;www 300 IN A 195.101.152.113
www 300 IN A 82.232.203.167
www 300 IN A 66.35.250.210
Failover time is : 2 or 3 minutes (NAGIOS) + DNS max TTL (here 5 minutes)
= less than 10 minutes
Peer-To-Peer : Concept, Tools and Applications
Slide20
p2pWeb
GSLB : next steps
Improvements :
– Better service provisioning (manual process for now)
– Better support for “long downtime”
• When a server crash for a long period of time and then recovers its content may
be outdated
• We must not announce it back until it has re-synchronize itself
– Proximity load balancing
• The goal is to load balance traffic between geographically distributed servers by
finding the site network-wise closest to clients.
• A technology used in the CDN (Content Delivery Network) world
We can use part of the globule project, as Globule support DNS redirection
based on 'AS-path length' policy (used in BGP routing) which tries to redirect
clients to a server close to them.
These BGP information's can be collected through routeviews.org (no direct access to a
BGP router needed)
Peer-To-Peer : Concept, Tools and Applications
Slide21
p2pWeb
Web server content management
ADSL
ISP 2
ADSL
ISP 1
ADSL
ISP 3
We have a set of web servers and we need tools to :
– Publish content on all servers
– Keep them in sync (content replication)
Two main replication strategies
•
•
primary backup : one master server to form replicas
active replication : if any changes, one replica propagates them back to all the other ones
Peer-To-Peer : Concept, Tools and Applications
Slide22
p2pWeb
static content replication
Replica
ADSL
ISP 2
Replica
ADSL
ISP 1
ADSL
ISP 3
Replica
Master
One server play the master’s role
– Content is published first on the master (for example via FTP)
– Then the content is either pushed or pulled on the replica
The easiest way is to use rsync (rsync.samba.org)
Content can be pulled via anonymous rsync from master
Content can be pushed via rsync over ssh (using private/public key pair for
security)
Peer-To-Peer : Concept, Tools and Applications
Slide23
p2pWeb
Content replication : rsync
rsync is a file transfer program for Unix systems. rsync provides a very fast method for
bringing remote files into sync. It does this by sending just the differences in the files across
the link, without requiring that both sets of files are present at one of the ends of the link
beforehand.
Anonymous rsync server (pull mode)
• Run as a standalone daemon or can be launched by inetd
• Advanced security options (read-only, chroot, IP access list)
• Use : run from crontab on each mirror
rsync -a master.mydomain.com::www/ /data/www/
Rsync over SSH (push mode)
• Need ssh access on each mirror
• And ssh cryptographic keys exchange for unattended operation
• Use : run on demand or from crontab on master
rsync -a /data/www/ [email protected]::/data/www/
Useful options
--compress
--bwlimit=KBPS
compress file data during the transfer
limit I/O bandwidth; KBytes per second
Peer-To-Peer : Concept, Tools and Applications
Slide24
p2pWeb
Content distribution : Satellite
For a lot of geographically distributed mirrors, an interesting
solution can be Datacasting over satellite
• Technology used by some CDN vendors
– Skycache, cidera, Skystream.com, panamsat.com
• Now available at lower cost from worldspace.fr (SatPost Solution)
Peer-To-Peer : Concept, Tools and Applications
Slide25
p2pWeb
Use of CMS
Nowadays most webmasters use CMS (Content
Management System) tools for publishing
– A lot of open source and commercial tools
•
•
•
•
Spip, mambo, typo3, phpnuke, … (php)
Bricolage, metadot, slashcode, … (perl)
Cofax, opencms, magnolia, jahia, … (java)
Plone, cps, zwook, … (python)
• But none of them has direct support for a distributed
architecture
• Most use a database as a backstore
• Database distributed transaction and replication is a hard problem
Peer-To-Peer : Concept, Tools and Applications
Slide26
p2pWeb
CMS : a pragmatic solution
Replica
webmaster
ADSL
ISP 2
ADSL
ISP 1
Replica
ADSL
ISP 3
CMS
Back office
Replica
html export
Master : static html files
The webmaster publish using the CMS as usual
– The content is exported as static html files
– Then distributed on the replicas using rsync
Constraint : the CMS must support export with “static like URLs”
Either directly or thru URL rewriting
/article/sport/2005/4/13/football.html (good)
/article.php?id_category=3&id_article=25 (bad for mirroring)
Peer-To-Peer : Concept, Tools and Applications
Slide27
p2pWeb
CMS : distributed architecture (1)
ADSL
ISP 2
Senegal
Mali
ADSL
ISP 1
ADSL
ISP 3
XML content exchange
Ivory coast
Burkina faso
Example : a non-governmental organization has activity over 4 countries and want to provide a global
web presence. The same global web design and tools are used on all servers.
Local publishing
Each local webmaster publish news about his country using the CMS on the local server
Content exchange using web services
Each local web server “collect” (pull) new articles from the other servers using some RSS (Really Simple Syndication) web services
Global web presence
Global content is (re)constructed on each server (from all data from the others) and served on Internet
Such solution may be constructed by hacking/customizing existing CMS
Peer-To-Peer : Concept, Tools and Applications
Slide28
p2pWeb
CMS : distributed architecture (2)
CMS + Message-oriented middleware (MOM)
A MOM is a client/server infrastructure that increases the interoperability, portability and
flexibility of an application by allowing the application to be distributed over multiple
heterogeneous platforms.
Thru the use of queue system, a MOM can provide asynchronous reliable data exchange.
MOM is typically asynchronous and peer-to-peer and supports
– Point to point communication
– Publish and subscribe communication
There is a standardized interface in Java : JMS (java Message Service) API
Various open source implementation in the java world
ActiveMQ (activemq.codehaus.org)
OpenJMS (openjms.sourceforge.net)
Joram (joram.objectweb.org)
MantaRay (mantamq.org)
No CMS use it now (as far as i know), but it may be a very good solution
Peer-To-Peer : Concept, Tools and Applications
Slide29
p2pWeb
Performance monitoring
We collaborate with the webperf.org project
– WebPerf is a system for measuring response time of specified URLs
from multiple locations on the internet.
– The project is founded on the premise that there are lot of other
companies who also require such a monitoring service. If the other
companies are willing to monitor our URLs, we will montior theirs (a
free co-peering arrangement).
Some perl script installed on local node collect data from other web site, then
data are pushed to a central repository for further analysis.
A web interface allow members to display various statistics.
A view of one’s web site as seen from all other the world.
Peer-To-Peer : Concept, Tools and Applications
Slide30
p2pWeb
Peer-To-Peer : Concept, Tools and Applications
Webperf.org : sample graph (1)
Slide31
p2pWeb
Peer-To-Peer : Concept, Tools and Applications
Webperf.org : sample graph (2)
Slide32
p2pWeb
Peer-To-Peer : Concept, Tools and Applications
Webperf.org : sample graph (3)
Slide33
p2pWeb
Node architecture and security
Security
Internet
Mandatory
•Hardware router/firewall with NAT capabilities
•Internal private network using RFC 1918 IP
address (192.168.x.y)
ADSL or Cable modem
No incoming traffic from the outside other
than required
Controlled via redirect on the firewall
Ethernet link
•http (port 80)
•ssh (port 22, optional)
Ethernet router/firewall
Optional Wifi access point
P2pweb traffic
Private Ethernet LAN
Web server
Peer-To-Peer : Concept, Tools and Applications
Slide34
p2pWeb
Node hardware (example)
Run on the corner of a desk
•An ethernet and wifi switch
Connect other computers (not shown here)
•A web and application server
Mac mini (apple) running apache2 and tomcat
•A firewall
Embedded PC (www.pcengines.ch) running pf
(packet filter) on OpenBSD from a compact flash
No noise, and low electric power
consumption (near 50W)
Peer-To-Peer : Concept, Tools and Applications
Slide35
p2pWeb
Conclusion
• It can be done (at low cost)
• It runs, with good results
(service uptime measured by siteuptime.com)
www.p2pweb.net
hosted by the p2pweb network
monitored Since: 9/23/2004
Outages: 40
Total Uptime: 99.560%
Downtime/year: 38,5 hours
www.afromix.org
hosted on a single node
monitored Since: 9/23/2004
Outages: 37
Total Uptime: 97.634%
Downtime/year: 207,3 hours
• Still a lot of improvements
Not already an easy to use solution : node admin still require good Unix knowledge
• Most important : a new way to design web
applications
Peer-To-Peer : Concept, Tools and Applications
Slide36
p2pWeb
The Future
What we can provide right now
P2pweb.net : a global load balancing solution for any distributed web project
•
Just provide the servers IP addresses and a health check URL
Mediaport.net : a Community web hosting solution
•
We can host various web projects
We are looking for Partnerships in the following domains :
Packaging an easy and ready to use solution for deploying web mirrors
(industrializing the solution)
•
dedicated LINUX or BSD Distro with preinstalled packages
•
“all in one” solution : Java CMS + MOM in one webapp application
Helping in deploying such solution in Least Developed Countries
The P2PWeb Solution fits perfectly for Least Developed Countries with weak
bandwidth and low connectivity,
Peer-To-Peer : Concept, Tools and Applications
Slide37
p2pWeb
Contacts
P2pweb is a SourceForge project (bsd license)
www.p2pweb.net or mediaport.sourceforge.net
Contacts :
about the project :
[email protected]
you want to be hosted on mediaport.net :
[email protected]
[email protected]
Peer-To-Peer : Concept, Tools and Applications
Slide38
p2pWeb
Questions
Thank you
• Questions ?
Peer-To-Peer : Concept, Tools and Applications
Slide39