01_Grids_intro - Indico

Download Report

Transcript 01_Grids_intro - Indico

Introduction to Grid
computing and the EGEE project
Robert Lovas
MTA SZTAKI
[email protected]
www.lpds.sztaki.hu
What is Grid?
●
●
●
●
A Grid is a collection of
computers, storages, special
devices, services that can
dynamically join and
leave the Grid
They are heterogeneous in
every aspect
They are geographically
distributed and connected
by a wide-area network
They can be accessed ondemand by a set of users
Grid
Internet
Why use a Grid?
• A user has a complex
problem that requires
many services/resources in
order to
•
•
•
•
reduce computation time
access large databases
access special equipments
collaborate with other users
Internet
Typical Grid application
areas
• High-performance computing (HPC)
• to achieve higher performance than individual
supercomputers/clusters can provide
• Reguirement: parallel computing
• High-throughput computing (HTC)
• To exploit the spare cycles of various computers
connected by wide area networks
• Collaborative work
• Several users can jointly and remotely solve complex
problems
Two players of the Grid
• Resource donors = D
• Resource users = U
• Relationship between the two
characterizes the Grid:
• if U ~ D
• if U >> D
• if U << D
=> generic Grid model
=> utility Grid model
=> desktop Grid model
Generic Grid modell
Donating free resources
Inst1
Inst4
Internet
Inst2
Inst3
Requiring resources
Characteristics of the generic
Grid model
• A volunteer Grid: Anybody can donate resources
• Heterogeneous resources, that dynamically join and leave
• Anybody (belonging to the donating institutes) can use the
donated resources for solving her/his own applications
• Symmetric relationship between donors and users:
U~D
• Examples:
• GT-2 grids
• 1st version of UK NGS
• Problems:
• Installing and maintaining client and server grid software are too
complicated
• Volunteer Grids are not robust and reliable
Desktop Grid model
Dynamic resource donation
Company/
univ.
server
Donor:
Company/
Univ. or
private PC
Application
Internet
Donor:
Company/
univ. or
private PC
Donor:
Company/
univ. or
private PC
Work package distribution
Desktop Grid model –
Master/slave parallelism
DG Server
Master
Workunit-1
Workunit-2
Workunit-3
Workunit-N
Internet
Characteristics of the desktop
Grid model
• A volunteer Grid: Anybody can donate resources
• Heterogeneous resources, that dynamically join and
leave
• One or a small number of projects can use the
resources
• Asymmetric relationship between donors and users:
U << D
• Advantage:
• Donating a PC is extremely easy
• Setting up and maintaining a DG server is much easier
than installing the server sw of utility grids
Types of Desktop Grids
• Global Desktop Grid
• Aim is to collect resources for grand-challenge scientific
problems
• Example:
• BOINC (SETI@home)
• SZTAKI Desktop Grid (SZDG)
• Local Desktop Grid
• Aim is to enable the quick and easy creation of grid for
any community (company, univ. city, etc.) to solve their
own applications
• Example:
• Local SZDG
SETI: a global desktop grid
●
SETI@home
●
3.8M users in 226 countries
●
1200 CPU years/day
●
●
38 TF sustained (Japanese
Earth Simulator is 32 TF
sustained)
Highly heterogeneous: >77
different processor types
SZTAKI Desktop Grid
global version
SZTAKI Desktop Grid
global version
TOP 500 entry performance:
1645 GFlops
URLs: http://www.desktopgrid.hu/ and http://szdg.lpds.sztaki.hu/szdg/
SZTAKI Desktop Grid
local version
•
Main objective:
•
•
•
Enable the creation of local DG for any community
Demonstrate how to create such a system
Building production Grids requires huge effort and
represents a privilege for those organizations where
high Grid expertise is available
Using the local SZDG package
•
•
•
•
Any organization can build a local DG in a day with
minimal effort and with minimal cost (a strong PC is
enough as a server machine)
The applications of the local community will be executed
by the spare PC cycles of the local community
There is no limitation for the applied PCs, all the PCs of
the organization can be exploited (heterogeneous Grid)
You can download the local SZDG package from:
http://www.desktopgrid.hu/
DSP application on a local SZDG
in the Univ. of Westminster
• Digital Signal Processing
Appl.: Designing optimal
periodic nonuniform
sampling sequences
• Currently more than 100
PCs connected from
Westminster and planned to
extend over 1000 PCs
The speedup
DSP size
Sequential
Production
SZDG
20
~3h 33min
~35min
~1h 44min
22
~41h 53min
~7h 23min
~5h 4min
~141h
~46h 46min
24
~724h
Usage of local SZDG in industry
• AMRI Hungary Ltd.
• Drug discovery application
• Creating enterprise Grid for prediction of ADME/Tox
parameters
• Millions of molecules to test according to potential drug
criteria
• New FP6 EU Grid project: CancerGrid
• Hungarian Telecom
• Creating enterprise Grid for supporting large data
mining applications where single computer performance
is not enough
• OMSZ (Hungarian Meteorology Service)
• Creating enterprise Grid for climate modeling
Utility Grid model
Inst1
Donating free resources
static 7/24 mode
Donor and
user
Inst2
Donor and
user
Internet
User 1
User N
Dynamic resource
requirements
Characteristics of
the utility Grid model
• Semi-volunteer Grids: Donors must be
“professional” resource providers who provide
production service (7/24 mode)
• Typically homogeneous resources
• Anybody can use the donated resources for
solving her/his own applications
• Asymmetric relationship between donors and
users:
U >> D
• Examples:
• EGEE -> SEE-Grid, BalticGrid, etc.
• UK NGS current version, NorduGrid
• OSG, TeraGrid
The largest production Grid: EGEE
Country
participating
in EGEE
Scale
> 180 sites in 39 countries
~ 20 000 CPUs
> 5 PB storage
> 10 000 concurrent jobs per day
> 60 Virtual Organisations
NorduGrid
Dynamic Grid
~ 33 sites, ~1400 CPUS
Production Grid
Real users, real
applications
It is in 24/7 operation,
unattended by
administrators for most
of the time
TeraGrid
Caltech: Data collection analysis
0.4 TF IA-64
IA32 Datawulf
80 TB Storage
Sun
IA64
ANL: Visualization
LEGEND
Cluster
Visualization
Cluster
Storage Server
Shared Memory
IA32
IA64
IA32
Disk Storage
Backplane Router
1.25 TF IA-64
96 Viz nodes
20 TB Storage
IA32
Extensible Backplane Network
LA
Hub
30 Gb/s
Chicago
Hub
40 Gb/s
30 Gb/s
30 Gb/s
30 Gb/s
30 Gb/s
4 TF IA-64
DB2, Oracle Servers
500 TB Disk Storage
6 PB Tape Storage
1.1 TF Power4
IA64
Sun
IA64
10 TF IA-64
128 large memory nodes
230 TB Disk Storage
3 PB Tape Storage
GPFS and data mining
Pwr4
SDSC: Data Intensive
NCSA: Compute Intensive
EV7
EV68
6 TF EV68
71 TB Storage
0.3 TF EV7 shared-memory
150 TB Storage Server
Sun
PSC: Compute Intensive
PSC integrated Q3 03
Exploiting parallelism
●
Single parallel application
●
●
●
Single-site parallel execution
Multi-site parallel execution
Workflow branch parallelism
●
●
Sequential components
Parallel components
●
●
●
Two-level single-site parallelism
Two-level multi-site parallelism
Parameter sweep (study) applications:
●
●
The same application is executed with many (1000s) different
parameter sets
The application itself can be
●
●
●
Sequential
Single parallel
workflow
How to use a Grid for single-site
parallelism?
Internet
How to use a Grid for multi-site
parallelism?
Internet
How to use a Grid for two level
single-site parallelism?
Internet
How to use a Grid for two level
multi-site parallelism?
Internet
Master/slave parallelism and
parametric studies in utility Grids
Master
Workunit-1
Workunit-2
Workunit-3
Workunit-N
Internet
How to use a Grid for HPC
parameter study?
Internet
Typical Grid Applications
●
●
●
●
Computation intensive
●
Interactive simulation (climate modeling)
●
Very large-scale simulation and analysis (galaxy formation, gravity
waves, battlefield simulation)
●
Engineering (parameter studies, linked component models)
Data intensive
●
Experimental data analysis (high-energy physics)
●
Image and sensor analysis (astronomy, climate study, ecology)
Distributed collaboration
●
Online instrumentation (microscopes, x-ray devices, etc.)
●
Remote visualization (climate studies, biology)
●
Engineering (large-scale structural testing, chemical engineering)
In all cases, the problems were big enough that they required
people in several organization to collaborate and share computing
resources, data, instruments.
EGEE Applications
●
>20 applications from 7 domains
●
●
●
●
●
●
●
●
High Energy Physics
Biomedicine
Earth Sciences
Computational Chemistry
Astronomy
Geo-Physics
Financial Simulation
Further applications in evaluation
Applications now moving from testing
to routine and daily usage
An Example Problem tackled by
EGEE
●
●
●
●
●
The Large Hadron Collider
(LHC) located at CERN,
Geneva Switzerland
Scheduled to go into
production in 2007
Will generate 10 Petabytes
(107 Gigabytes)
of information per year
This information
must be processed
and stored somewhere
It is beyond the scope of a
single institution to
manage this problem ->
VO is needed
Virtual Organizations
•
•
•
•
Distributed resources and people
Linked by networks, crossing admin domains
Sharing resources, common goals
Dynamic
R
R
R
R
R
R
R
R
R
R
R
R
VO-A
R
VO-B
Other EU Grid projects
Training and Education: ICEAGE
International Collaboration to Extend and Advance Grid Education
www.iceage-eu.org