Transcript ppt - LIFL

Toward third Generation
Desktop Grids
(Private Virtual Cluster)
Ala Rezmerita, Franck Cappello
INRIA
Grand-Large.lri.fr
1
Agenda
•
•
•
•
•
•
Basic Concepts of DGrids
First and second generation Desktop Grids
Third generation concept
PVC
Early evaluation
Conclusion
2
Basic Concepts of
Desktop Grids
• Bag of tasks, master-worker, divide and
conquer applications
• Batch schedulers (clusters)
• Virtual machines (Java, .net)
• Standard OS (Windows, Linux, MaxOS X)
• Cycle stealing (Desktop PCs)
• Condor (single administration domain)
3
First Generation DG
• Single application / Single user
– SETI@HOME (1998)
• Research for Extra Terrestrial I
• 33.79 Teraflop/s (12.3 Teraflop/s
for the ASCI White!), 2003
– DECRYPTHON
• Protein Sequence comparison
– RSA-155 (1996?)
• Breaking encryption keys
– COSM
4
First Gen DG Architecture
Centralized architecture
Client application
Params. /results.
+
Monolythique architecture
User + Admin interface
Coordinator/
Resource Disc.
Parameters
Results
PC
Application
Scheduler
Task + Data + Net
OS + Sandbox
Protocols
PC
Firewall/NAT
5
Second Generation of DG
• Multi-applications / “Multi-users” platforms:
– BOINC (2002?)
• SETI@home, Genome@home, XtremLab…
– XTREMWEB (2001)
• XtremWeb-CH, GTRS, XW-HEP, etc..
– Platform (ActiveCluster), United devices,
Entropia, etc.
– Alchemi (.NET based)
6
Second Gen DG Architecture
Centralized architecture
(split tasks/data mgnt.,
Inter node com.)
Client
application
Params. /
results.
Results
Monolythique architecture
User + Admin interface
Application
Scheduler
Coordinator/
Scheduler (Tasks)
Parameters
+
PC
Task + Data + Net
OS + Sandbox
Protocols
Data Manager
Scheduler (Tasks)
Firewall/NAT
7
What we have learned
from the history
• Rigid architecture
–
–
–
–
•
•
•
•
Dedicated
Dedicated
Dedicated
Dedicated
(Open source: Yes, Modularity: No)
job scheduler
data management/file system
connection protocols
transport protocols
Centralized architecture
No direct communication
Almost no security
Restricted application domain (essentially
High Throughput Computing)
8
•
Third Generation
Modular architecture
Concept
– Pluggable / selectable job scheduler
– Pluggable / selectable data
management/file system
– Pluggable / selectable connection
protocols
– Pluggable / selectable transport
protocols
•
•
•
•
Decentralized architecture
Direct communications
Strong security
Unlimited application domain
(restrictions imposed only by the
platform performance)
PC
PC
PC
User + Admin interface
Applications
Schedulers
Task + Data + Net
OS + Sandbox
Protocols
9
3rd Gen Dgrid: design
PC
User + Admin interface
PC
Virtualization
at IP level!
Applications
PC
Apps: Binary codes
Scheduler/runtime
Condor, MPI
Data management
LUSTRE / Bittorrent
OS
XEN / Virtual PC
Connectivity / Security
IP Virtualization
Network
10
PVC (Private Virtual Cluster)
A generic framework turning dynamically a set of resources
belonging to different administration domains into a cluster
•Connectivity/Security
–Dynamically connects firewall protected nodes without
breaking the security rules of the local domains
•Compatibility
–Creates an execution environment for existing cluster
applications and tools (schedulers, file systems, etc.)
11
PVC Architecture
Peer's modules:
 Coordination
 Communication interposition
 Network virtualization
 Security
 Connectivity
Broker:
 Collects connection requests
 Forwards data between peers
 Realizes the virtualization
 Helps in the security negotiation
12
Virtualization
• Virtual network on top of the real one
• Uses a virtual network interface
• Provides virtual to real IP address translation
• Features an IP range and DNS
The proposed solution respects the IP standards
– Virtual IP: class E (240.0.0.1 – 255.255.255.254) of IP
addresses
– Virtual Names: use of a virtual domain name (.pvc)
13
Interposition
• Catch the applications’ communications and
transform them for transparent tunneling
• Interposition techniques:
–
–
–
–
LibC overload (IP masquerading, modified kernel)
Tun / Tap (encapsulation: IP over IP)
Netfilter (IP masquerading, kernel level)
Any other…
14
Interposition techniques
Application packet
ld_preload? (1)
Yes
U. Space
Security check +
Connection +
IP Masquerading
LibC’
No
Socket Interface
LibC
Kernel
Route Selection
240.X.X.X
Security check +
Connection +
Encapsulation
Tun/Tap (2)
Interposition modules
Netfilter
Security challenge +
Connection +
IP Masquerading (3)
Group check
Std network
interface
15
Connectivity
Goal: Direct connections between the peers
Firewall/NAT traversal techniques:
• UPnP - firewall configuration technique
• UDP/TCP hole punching - online gaming and voice over IP
• Traversing TCP – novel technique
• Any other…
16
Security
•
Fulfil the security policy of every local domain
•
Enforce a cross-domain security policy
B
PkM[PKC]
PKM
Master peer in every virtual cluster
•
Implements global security policy
•
Registers new hosts
PVC peer must:
C
M
Master
Put PKC
Get PKM
S
PK: Public Key
Pk: Private Key
•
Check the target peer’s membership to the same virtual cluster
•
After the connection establishment, authenticate target peer identity
17
Security
Security protocol is based on double asymmetric keys mechanism
(1)/(2) Membership to the same
virtual cluster
(3)/(4)/(5) Mutual authentication
The PVC security protocol ensures that :

Only hosts of a same virtual cluster are connected

Only trusted connections become visible for the application
18
Performance Evaluation
•
PVC objectives: Minimal overhead for communications
– Network performance with/without PVC
– Connection establishment
•
Execution of real applications without modification in MPI
– NAS benchmarks
– MPIPOV program
– Scientific application DOT
•
Bag of Tasks typical setting (BLAST on DNA database)
– Condor flocking strategies
– Broadcast protocol
– Spot Checking / Replication + voting
•
Evaluation platforms
– Grid’5000 (Grid eXplorer Cluster)
– DSL-Lab (PC @ home, connected on Public ADSL network)
19
Communication perf.
Connection overhead
Direct Communication overhead
20
Connection overhead
Performed on DSL-Lab platform using a specific test suite
Reasonable overhead in the context of the P2P applications
21
Bandwidth overhead
Technique: LibC overload
Performed on a local PC cluster with three different Ethernet (Netperf)
networks: 1Gbps, 100Mbps and 10Mbps
22
Communication overhead
Technique:
Tun / Tap
Netfilter
QuickTime™ et un
décompresseur TIFF (LZW)
sont requis pour visionner cette image.
QuickTime™ et un
décompresseur TIFF (LZW)
sont requis pour visionner cette image.
23
MPI applications
Applications :
• NAS benchmarks class A
(EP, FT, CG and BT)
• DOT
• MPIPOV
Results for NAS EP:
 Measured acceleration is almost linear
 Overhead lower than 5%
Other experiments: Losses of performances due to ADSL network
24
Typical configuration for Bag
of tasks (Seti@home like)
PC
Application
PC
Result certification
PC
Scheduler/runtime
Condor
Data management
Bittorrent
OS
Using PVC!
BLAST (DNA)
Connectivity / Virtualization
OS
PVC
Network
25
Broadcast protocol?
Question: Bittorent instead of Condor transport protocol?

BLAST application

DNA Database (3.2 GB)

64 nodes
Condor exec. time grows
Protportionnly with #jobs
Condor + Bittorent exec.
Time stays almost constant
26
Distribution of job management?
Question: how many job managers for a
set of nodes?

Condor Flocking

64 sequences of 10 BLAST jobs
(between 40 and 160 seconds, with an
average of 70 seconds)
QuickTime™ et un
décompresseur TIFF (non compressé)
sont requis pour visionner cette image.
QuickTime™ et un
décompresseur TIFF (LZW)
sont requis pour visionner cette image.
27
Result certification?
Question: how to detect bad results?

Spot Checking (black listing)

Replication + voting

Not implemented in Condor

20 lines of script both

Test: 70 machines

10% Saboteurs (randomly
choosen)

QuickTime™ et un
décompresseur TIFF (LZW)
sont requis pour visionner cette image.
How many jobs are required
to detect the 7 saboteurs?
28



Computational/Data Desktop Grids

BOINC/Xtremweb like applications

User selected scheduler (Condor, Torque, OAR, etc.)

Communication between workers (MPI, Distributed file systems, distributed
archive, etc.)
“Instant Grid”

Connecting resources “sans effort”: family Grid, Inter School Grids, etc.

Sharing resources and content. Example: Apple TV synchronized with a
remote Itune
“Passe muraille” runtime


Applications
OpenMPI
Extended Clusters

Run applications and manage resources beyond limits of admin domains
29
Conclusion
Third Generation Desktop Grids (2007)…
• Break the rigidity, again!
• Let users choose and run their favourite
QuickTime™ et un
décompresseur TIFF (non compressé)
sont requis pour visionner cette image.
environments (Engineers may help)
• PVC: connectivity + security + compatibility
– Dynamically establishes virtual clusters
– Modular, extensible architecture
– Features properties required for 3rd Gen. Desktop Grids
• Security model + Use of applications without any modification + With minimal
communication overhead
• On going work (on Grid’5000)
– Test the scalability and fault tolerance of cluster tools in the Dgrid Context
– Test more applications
– Test & improve the scalability of the security system
30
Questions?
31
Condor flocking with PVC
Question: May we use several Schedulers?

Use of a synthetic job that consumes resources (1 sec.)

Sequence of 1000 submissions

Submits the synthetic jobs from the same host to Condor pool
Min
Max
Average
Stdev

1 Master
2 Master
4 Master
00:00:27
00:33:49
00:17:09
00:09:38
00:00:06
00:33:29
00:16:48
00:09:39
00:00:13
00:33:34
00:16:54
00:09:38
16 Master 32 Master
00:00:05
00:31:13
00:16:42
00:09:30
00:00:05
00:33:27
00:16:46
00:09:39
64 Master
00:00:05
00:32:41
00:16:00
00:09:37
Future work: make Condor Flocking fault tolerant.
32