Introduction to High Performance and Grid Computing
Download
Report
Transcript Introduction to High Performance and Grid Computing
Enablingand
Grids
for Computing
E-sciencE
Introduction to High Performance
Grid
Faculty of Sciences, University of Novi Sad
High Performance Cluster
and Grid Computing
io n a l Gr id I
n
a d e mi c a n d
Ac
S
e o f er b ia
EGEE-III INFSO-RI-222667
it
t iv
Feb. 06, 2009
www.euegee.org
u
t
ca
ia
Ed
Antun Balaz, [email protected]
Scientific Computing Laboratory
Institute of Physics Belgrade
Serbia
A E G I S
Introduction to High Performance and Grid Computing
Overview
Enabling Grids for E-sciencE
•
•
•
•
•
Introduction to clusters
High performance computing
Grid computing paradigm
Ingredients for Grid development
Introduction to Grid middleware
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
Parallel computing
Enabling Grids for E-sciencE
• Splitting problem in smaller tasks that are executed
concurrently
• Why?
– Absolute physical limits of hardware components (speed of
light, electron speed, …)
– Economical reasons –more complex = more expensive
– Performance limits –double frequency <> double
performance
– Large applications –demand too much memory & time
• Advantages: Increasing speed & optimizing resources
utilization
• Disadvantages: Complex programming models –
difficult development
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
Parallelism levels
Enabling Grids for E-sciencE
• CPU
– Multiple CPUs
– Multiple CPU cores
– Threads –time sharing
• Memory
– Shared
– Distributed
– Hybrid (virtual shared memory)
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
Parallel architectures (1)
Enabling Grids for E-sciencE
• Vector machines
–
–
–
–
–
CPU processes multiple data sets
shared memory
advantages: performance, programming difficulties
issues: scalability, price
examples: Cray SV, NEC SX, Athlon3/d, PentiumIV/SSE/SSE2
• Massively parallel processors (MPP)
–
–
–
–
–
large number of CPUs
distributed memory
advantages: scalability, price
issues: performance, programming difficulties
examples: ConnectionSystemsCM1 i CM2, GAAP
(GeometricArrayParallel Processor)
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
Parallel architectures (2)
Enabling Grids for E-sciencE
• Symmetric Multiple Processing (SMP)
–
–
–
–
–
two or more processors
shared memory
advantages: price, performance, programming difficulties
issues: scalability
examples: UltraSparcII, Alpha ES, Generic Itanium,
Opteron, Xeon, …
• Non Uniform Memory Access (NUMA)
–
–
–
–
–
Solving SMP’sscalability issue
hybrid memory model
advantages: scalability
issues: price, performance, programming difficulties
examples: SGI Origin/Altix, Alpha GS, HP Superdome
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
Clusters
Enabling Grids for E-sciencE
• Poor’s man supercomputer “…Collection of
interconnected stand-alone computers working
together as a single, integrated computing resource”–
R. Buyya
• Cluster consists of:
–
–
–
–
Nodes
Network
OS
Cluster middleware
• Standard components
– Avoiding expensive proprietary components
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
Cluster classification
Enabling Grids for E-sciencE
• High performance clusters (HPC)
– Parallel, tightly coupled applications
• High throughput clusters (HTC)
– Large number of independent tasks
• High availability clusters (HA)
– Mission critical applications
• Load balancing clusters
– Web servers, mail servers, …
• Hybrid clusters
– Example: HPC+HA
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
Beowulf clusters
Enabling Grids for E-sciencE
• 1994
– T. Sterling & M. Baker
– NASA Ames Centre
• Frontend
– Access machine
– JMS & Monitoring server
– Shared storage –NFS (directory /home)
• Nodes
– Multiple private networks
– Local storage (/scratch)
• Private networks
– High speed / low latency
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
From clusters to Grids
Enabling Grids for E-sciencE
• Many distributed computing resources (clusters) exist,
even in Serbia
• Problem 1: they cannot be used by end users
transparently
• Problem 2: even when access is granted to users to
several clusters, they tend to neglect smaller clusters
• Problem 3: distribution of input/output data, sharing of
data between clusters
• To overcome such problems, Grid paradigm was
introduced
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
Unifying concept: Grid
Enabling Grids for E-sciencE
Resource sharing and coordinated problem
solving in dynamic, multi-institutional virtual
organizations.
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
Effective policy governing access
within a collaboration
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
What problems Grid addresses
Enabling Grids for E-sciencE
• Too hard to keep track of authentication data
(ID/password) across institutions
• Too hard to monitor system and application status
across institutions
• Too many ways to submit jobs
• Too many ways to store & access files/data
• Too many ways to keep track of data
• Too easy to leave “dangling” resources lying around
(robustness)
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
Requirements
Enabling Grids for E-sciencE
•
•
•
•
•
•
•
Security
Monitoring/Discovery
Computing/Processing Power
Moving and Managing Data
Managing Systems
System Packaging/Distribution
Secure, reliable, on-demand access to data, software,
people, and other resources (ideally all via a Web
Browser!)
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
Why Grid security is hard (1)
Enabling Grids for E-sciencE
• Resources being used may be valuable & the
problems being solved sensitive
– Both users and resources need to be careful
• Dynamic formation and management of user
groups
– Large, dynamic, unpredictable…
• Resources and users are often located in distinct
administrative domains
- Cannot assume cross-organizational trust
agreements
– Different mechanisms & credentials
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
Why Grid security is hard (2)
Enabling Grids for E-sciencE
• Interactions are not just client/server,
but service-to-service on behalf of user
– Requires delegation of rights user service
– Services may be dynamically instantiated
• Standardization of interfaces to allow for discovery,
negotiation and use
• Implementation must be broadly available & applicable
– Standard, well-tested, well-understood protocols; integrated with
wide variety of tools
• Policy from sites, user communities and users need to be
combined
– Varying formats
• Want to hide as much as possible from applications!
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
Grids and VOs (1)
Enabling Grids for E-sciencE
• Virtual organizations (VOs) are groups of Grid users
(authenticated through digital certificates)
• VO Management Service (VOMS) serves as a central
repository for user authorization information, providing
support for sorting users into a general group
hierarchy, keeping track of their roles,etc.
• VO Manager, according to VO policies and rules,
authorizes authenticated users to become VO
members
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
Grids and VOs (2)
Enabling Grids for E-sciencE
• Resource centers (RCs) may support one or more VOs,
and this is how users are authorized to use computing,
storage and other Grid resources
• VOMS allows flexible approach to A&A on the Grid
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
User view of the Grid
Enabling Grids for E-sciencE
•User
Interface
•User
Interface
•Grid services
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
Ingredients for GRID development
Enabling Grids for E-sciencE
• Right balance of push and pull factors is needed
• Supply side
Technology – inexpensive HPC resources (linux clusters)
Technology – network infrastructure
Financing – domestic, regional, EU, donations from industry
• Demand side
Need for novel eScience applications
Hunger for number crunching power and storage capacity
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
Supply side - clusters
Enabling Grids for E-sciencE
•
•
•
•
•
•
The cheapest supercomputers – massively parallel PC clusters
This is possible due to:
Increase in PC processor speed (> Gflop/s)
Increase in networking performance (1 Gbs)
Availability of stable OS (e.g. Linux)
Availability of standard parallel libraries (e.g. MPI)
Widespread choice of components/vendors, low price (by factor ~5-10)
Long warranty periods, easy servicing
Simple upgrade path
Good knowledge of parallel programming is required
Hardware needs to be adjusted to the specific application
(network topology)
More complex administration
Advantages:
Disadvantages:
Tradeoff: brain power purchasing power
The next step is GRID:
Distributed computing, computing on demand
Should “do for computing the same as the Internet did for information”
(UK PM, 2002)
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
Supply side - network
Enabling Grids for E-sciencE
• Needed at all scales:
World-wide
Pan-European (GEANT2)
Regional (SEEREN2, …)
National (NREN)
Campus-wide (WAN)
Building-wide (LAN)
• Remember – it is end user to end user connection that
matters
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
GÉANT2 Pan-European IP R&E network
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
GÉANT2 Global Connectivity
Enabling Grids for E-sciencE
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
Future development: regional
network
Enabling Grids for E-sciencE
Budapest
Oradea
Cluj-Napoca
Szeged
Targo-Mures
Arad
Subotica
Timisoara
Brasov
Novi-Sad
Resita
Belgrade
Brcko
Derventa
Turnu Severin
Bjeljina
Doboj
Banja Luka
Pitesti
Bucharest
Sabac
Zvornik
Vlasenica
Sarajevo
Ploiesti
Slatina
Craiova
Ruse
Kragujevac
Nis
Pirot
Sevlievo
Sofia
Vranje
Plovdiv
Skopje
Titov Veles
Tirana
Prilep
Xanthi
Bitola
Ohrid
Elbasan
Gjirokastra
Drama
Serres
Edessa
Tepelene
Korce
Florina
Beroia
Ioannina
Kardzali
Komotini
Thessaloniki
Larissa
Lamia
Mytilini
Preveza
Agrinio
Patra
Veliko
Tarnovo
Livadia
Chios
Athens
Samos
Syros
Rhodos
Chania
EGEE-III INFSO-RI-222667
Iraklio
Introduction to High Performance and Grid Computing
Supply side - financing
Enabling Grids for E-sciencE
•
National funding (Ministries responsible for research)
Lobby gvnmt. to commit to Lisbon targets
Level of financing should be following an increasing trend (as a % of
GDP)
Seek financing for clusters and network costs
Networking (HIPERB)
Action Plan for R&D in SEE
FP6 – IST priority, eInfrastructures & GRIDs
FP7
CARDS
•
•
•
•
•
Bilateral projects and donations
Regional initiatives
EU funding
Other international sources (NATO, …)
Donations from industry (HP, SUN, …)
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
Demand side - eScience
Enabling Grids for E-sciencE
• Usage of computers in science:
Trivial:
text editing, elementary visualization, elementary quadrature,
special functions, ...
Nontrivial:
differential eq., large linear systems, searching combinatorial
spaces, symbolic algebraic manipulations, statistical data
analysis, visualization, ...
Advanced:
stochastic simulations, risk assessment in complex systems,
dynamics of the systems with many degrees of freedom, PDE
solving, calculation of partition functions/functional integrals, ...
• Why is the use of computation in science growing?
Computational resources are more and more powerful and
available (Moore’s law)
Standard approaches are having problems
Experiments are more costly, theory more difficult
Emergence of new fields/consumers – finance, economy,
biology, sociology
• Emergence of new problems with unprecedented
storage and/or processor requirements
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
Demand side - consumers
Enabling Grids for E-sciencE
• Those who study:
Complex discrete time phenomena
Nontrivial combinatorial spaces
Classical many-body systems
Stress/strain analysis, crack propagation
Schrodinger eq; diffusion eq.
Navier-Stokes eq. and its derivates
functional integrals
Decision making processes w. incomplete information
…
Adequate training in mathematics/informatics
Stamina needed for complex problems solving
• Who can deliver? Those with:
• Answer: rocket scientists (natural sciences and
engineering)
EGEE-III INFSO-RI-222667
Introduction to High Performance and Grid Computing
Scenario
Enabling Grids for E-sciencE
“User
stderr.txt
interface”
stdout.txt
Status / log query
Job Submit Event
Logging and
bookkeeping
EGEE-III INFSO-RI-222667
Submit
Input “sandbox”
Get output
Output “sandbox”
stderr.txt
stdout.txt
publish
state
Job status update
• STD input stream
is read from file
• STD out and err.
streams are
redirected into
files
A worker node is
allocated by the
local jobmanager
stderr.txt
stdout.txt
/bin/hostname
Computing
Element
Introduction to High Performance and Grid Computing