NUST_BSIT8_DC_Lecture_6

Transcript NUST_BSIT8_DC_Lecture_6

1
Tuesday, January 27, 2009
“In the confrontation between the
stream and the rock,
the stream always wins, not
through strength but by
perseverance.”
2
H. Jackson Brown
Distributed Computing
Class:BSIT-8
Instructor: Dr. Raihan Ur Rasool
Lecture Objectives

To understand the practical concepts of



P2P
SOA
Distributed Algorithms
Loose Coupling and the degree of loose coupling
Outline

Peer to Peer Systems
Evolution of P2P systems, P2P middleware, Routing
overlay, case studies: Chord, Pastry, TeaPastry

Service Oriented Architecture


Vision of web & Evolution of web
Web Services
Web Services, Web Services Architecture, SOAP, WSDL, UDDI,
Service Description and IDL, Directory Service for use with
Web Services, XML Security, Coordination of web Services.
Intro to P2P Systems


[reliable resource sharing layer over unreliable]
Demand for services --eliminating separately-managed servers
The scope of expanding popular services by adding to
number of the computers hosting them is limited when
all the host must be owned & managed by the service
provider

Administration and fault recovery costs

Bandwidth that can be provided to a single server site over
available physical link

Major service provider all face this problem with varying
severity
Intro to P2P Systems

Purpose:

Describe some general techniques



Problem:



placement of objects, manage workloads
ensure scalability without adding overheads
P2P applications exploit resources
available at the edges of the internet


Construction of P2P applications
Scalability, reliability and security
*Storage, content, cycles, human presence
Intro to P2P Systems

P2P application that exploit resources available at the edges of the internet
 *Storage, content, cycles, human presence

Traditional client-server provide access to these but
only on single machine or tightly coupled servers
 This centralized design required few decisions about
placement & management of resources
Intro to P2P Systems



P2P application that exploit resources available at the edges of the
internet
 Storage, content, cycles, human presence
Traditional client-server provide access to these but only on single
machine or tightly coupled servers
 This centralized design required few decisions about placement
& management of resources
In P2P -- algorithm for the placement and
subsequent retrieval of information objects are
a key aspect of the system design. It’s a system
which is


Fully decentralized & self organizing
Can dynamically balance the storage and processing
loads between all the participating computers as they
join and leave
P2P Design Characteristics





Their design ensures that each user contributes
resources to the systems
Although they may differ in the resources that
contribute, all the nodes in a peer to peer system have
the same functionality capabilities and responsibilities
Their correct operation dose not depend on the
existence of any centrally administered systems
They can be designed to offer a limited degree of
anonymity to the providers and users of resources
Key issues for the their efficient operation is the choice
of algorithm for placing and retrieving data on many
hosts


Balance of load
Availability without much overhead
 Participants availability to system is unpredictable
Evolution of P2P

Volatile resources --Strength ?



No guaranteed access to individual resources
Probability of failure can be minimized
Can be grouped in three generations



First generation – Napster music exchange service [OpenNap 2001]
Second generation – file sharing applications with greater
 Scalability, anonymity & fault tolerance
 Guentella, Kaza, Freenet
Developed with help of middleware layers
 Application independent management of distributed resources on a
global scale
 E.g. Pastry, Tapestry, CAN, CHORD, JAXTA
 Provide guarantees of delivery for requests in a bounded number of
network hops
 Place replicas of resources, by keeping in mind volatile availability &
trustworthiness, locality
P2P Middleware - GUID







Resources are identified by Global Unique Identifier GUID
Derived from secure hash from resource’s state
HASH makes a resource self certifying
Client receiving the resource can check the hash
This requires that states of resources are immutable
P2P systems are inherently best suited for the storage of
immutable objects – music file, images
Mutable objects sharing can be managed by set of trusted
servers to manage the sequence of versions e.g Oceanstore,
Ivy – more in section 10.6
Overlay routing vs IP routing
Scale
Load balancing
Network dynam ics
(addition/deletion of
objects/nodes)
Fault tolerance
Target identification
Security andanonym ity
(shared characteristics)
IP
Application-level routing overlay
IP v4 is lim it ed to 232 addressable nodes. T he
IP v6 name space is much more
generous
(2128), but addresses in bot h versions are
hierarchically struct ured and much of the space
is pre-allocat ed according to administ rative
requirement s.
Loads on rout ers are det ermin
ed by net work
topologyand associated trafficpat terns.
P eer-to-peer systems can addressmore objects.
T he GUID name space is very large
and flat
(>2128), allowing it t o be much more
fully
occupied.
Object locat ions can be ra
ndomized and hence
traffic pat terns are divorced from t he network
topology.
IP routingt ables are updated asynchronously on Rout ing tables can be updat ed synchronously or
a best-efforts basis wit h time const ant s on
the asynchronously wit h fract ionsof a second
order of 1 hour.
delays.
Redundancy is designed int o the IP net work by Rout es and object refer
ences can be replicated
it s managers, ensuring toleran
ce of a single
n-fold, ensuring tolerance of n failures ofnodes
router or network co
nnectivityfailure. n-fold or connections.
replication is costly.
Each IP address maps to exactly one t arget Messages can be routed to the nearest replica of
node.
a target object.
Addressing is only secure when all nodes are Securit y can be achieved even in environments
trust ed. Anonym ity for the owners addresses
of
wit h limited trust . A limited degree of
is not achievable.
anonymit ycan be provided.
Distributed Computation



Only a small portion of the CPU cycles of most
computers is utilized. Most computers are idle for the
greatest portion of the day, and many of the ones in
use spend the majority of their time waiting for input
or a response.
Loosely coupled –data/computation
A number of projects have attempted to use these idle
CPU cycles. The best known is the SETI@home
project, but other projects including code breaking
have used idle CPU cycles on distributed machines.
15
How many of you did not shutdown the
computer and are now here in this room?


Assume we are 15 people running a screensaver without
performing real work. The talk lasts one hour.
Opportunity loss for one hour:
Speed:
15 * 0.8 GFlops = 12 GFlops
Comp: 12GFlops * 1h = 43‘200 billion of floating point operations

Costs for one hour:
Power consumption: 15 * 300 W =
4500 W during one hour = 4.5 kWh
Money: 4.5kWh à 0.20 CHF = 0.9 CHF
Oil needed:
0.36 liter
(Gasoline: 12.3 kWh/kg)
CO2 emissions:
0.81 kg CO2
(Gasoline: 2.27 kg CO2 / liter)
During one year (15 people)…

Opportunity loss for one year:
Speed:
15 * 0.8 GFlops = 12 GFlops
Comp: 12GFlops * 1y = 378 432 000 billion of floating point
ops

Costs for one year:
Power consumption: 15 * 300 W =
4500 W during one year = 39.42 MWh
Money: 39.42 MWh => 7 884 CHF (525.6 CHF per head)
Oil needed:
3153.6 liter
(Gasoline: 12.3 kWh/kg)
CO2 emissions:
7 t CO2 (Gasoline: 2.27 kg CO2 / liter)
Distributed computation

Usage & Exploitation best example







SETi@home (Search for Extra-Terrestrial Intelligence)
Portions a steam of digitized radio telescope data into 107
second work unit, each about 350KB, distribute them on
clients computer
Work unit is redundantly distributed to 3-4 users, to guard
against errors & bad nodes
Coordination work is handled by a single server
3.91 million PCs participated in this by 2002
In one year they processed 221 million work units, data worth
27.36 teraflops on average
Need for Grid –bluebrain
Discussion Question: Computer or Infomachine?



The first computers were used primarily for
computations. One early use was calculating ballistic
tables for the U.S. Navy during World War II.
Today, computers are used more for sharing information
than computations— perhaps infomachine may be a
more accurate name than computer?
Distributed computation may be better suited to Grid
and peer-to-peer systems while information tends to be
hierarchical and may be better suited to client/server.
19
Current Peer-Peer Concerns

Topics listed in the IEEE 9th annual conference:
20
Dangers and Attacks on P2P









Poisoning (files with contents different to description)
Polluting (inserting bad packets into the files)
Defection (users use the service without sharing)
Insertion of viruses (attached to other files)
Malware (originally attached to the files)
Denial of Service (slow down or stop the network traffic)
Filtering (some networks don’t allow P2P traffic)
Identity attacks (tracking down users and disturbing them)
Spam (sending unsolicited information)
21
Where are we ?







Introduction
Napster and its legacy –self study
Peer-to-Peer middleware –self study
Routing overlays
Overlay case studies: Pastry, Tapestry
Application case studies: Squirrel, OceanStore, Ivy
Summary
Discussion date: 6th January
22
First four and last two pages only
Reading Assignment
Discussion date: 10th February
(first 8 pages, conclusion and future work)
Reading
• Napster and its legacy
• Peer-to-Peer middleware
23