Transparency - Agenda Catania

Download Report

Transcript Transparency - Agenda Catania

Enabling Grids for E-sciencE
Introduction to Grid Comptuing
and EGEE
Fabio Scibilia
INFN Catania
Catania, 08.02.2006
www.eu-egee.org
INFSO-RI-508833
Enabling Grids for E-sciencE
Fundamentals of Grid Computing
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
Grid Idea By A Simple Analogy
Enabling Grids for E-sciencE
One consumer
wants to access to
that power
He/she comes to an
agreement with the
electrical society
•
Some power
stations
dispersed
everywhere
produce the
electrical power
Now the user is
able to access to
the power grid
The electrical
society provides
for a new socket
in which the user
can plug
The produced
power is
distributed over a
power network
The user:
– Does not need to know anything about what stays beyond the socket.
– Can absorb all the power he wants according to the agreement
•
The power society
– Can modify production technologies at any moment
– Manages the power network as it wants
– Defines terms and conditions of the agreement
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
In the same way . . .
Enabling Grids for E-sciencE
Computing power is
made available over
the Internet
One user wants to
access to intensive
computational power
He/she comes to an
agreement with
some society that
offers grid services
•
The user:
Now the user
accesses to grid
facilities as a grid
user
Some computing
farms produce
the computing
power
Internet
The society will provide for grid
facilities allowing the user to
access to its grid resources and
providing for proper tools
– Does not need to know what stays beyond its user interface
– Can access to a massive amount of computational power through a simple terminal
•
The society:
– Can extend grid facilities at any moment
– Manages the architecture of the grid
– Defines policies and rules for accessing to grid resources
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
What about Grid Computing
Enabling Grids for E-sciencE
Grid Computing paradigm is an emerging way of
thinking distributed environments in a global scale
infrastructure to:
•
•
•
•
Share data
Distribute computation
Coordinate works
Access to remote
instrumentation
Storage
systems
User
Grid
Infrastructure
Computational
power
INFSO-RI-508833
Instruments
EGEE Tutorial, Roma, 02.11.2005
Why Computing Grids now?
Enabling Grids for E-sciencE
•
Because the amount of
computational power needed by
many applications is getting very
huge
•
Because the amount of data
requires massive and complex
distributed storage systems
•
To make easier the cooperation
of people and resources
belonging to different
organizations
People of several organizations
working together to achieve a
common goal
•
To access to particular
instrumentation that is not easily
reachable in a different way
Because it cannot be moved or
replicated or its cost is too much
expensive.
•
Because it is the next of step in
the evolution of distribution of
computation
INFSO-RI-508833
Thousands of CPUs
working at the same time
on the same task
From hundreds of Gigabytes
to Petabytes (1015) produced
by the same application.
To create a marketplace of
computational power and
storage over the Internet
EGEE Tutorial, Roma, 02.11.2005
Who is interested in Grids?
Enabling Grids for E-sciencE
Research community, to carry
out important results from
experiments that involve many
and many people and massive
amounts of resources
Enterprises that can have huge
computation without the need for
extending their current
informatics infrastructure
Businesses, which can
provide for computational
power and data storage
against a contract or for
rental
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
Properties of Grids
Enabling Grids for E-sciencE
• Transparency
– The complexity of the Grid architecture is hidden to the final user
– The user must be able to use a Grid as it was a unique virtual
supercomputer
– Resources must be accessible setting their location apart
• Openness
– Each subcomponent of the Grid is accessible independently of the
other components
• Heterogeneity
– Grids are composed by several and different resources
• Scalability
– Resources can be added and removed from the Grid dynamically
• Fault Tolerance
– Grids must be able to work even if a component fails or a system
crashes
• Concurrency
– Different processes on different nodes must be able to work at the
same time
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
Challenged Issues in Grids (i)
Enabling Grids for E-sciencE
• Security
– Authentication and authorization of users
– Confidentiality and not repudiation
• Information Services
– To discover and monitor Grid resource
– To check for health-status of resources
– As basis for decision making processes
• File Management
– Creation, modification and deletion of files
– Replication of files to improve access performances
– Ability to access to files without the need to move tham locally to
the code
• Administration
– Systems to administer Grid resource respecting local
administration policies
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
Challenged Issues in Grids (ii)
Enabling Grids for E-sciencE
• Resource Brokering
– To schedule tasks across different resources
– To make optimal or suboptimal decisions
– To reserve (in the future) resources and network bandwidth
• Naming services
– To name resources in un unambiguous way in the Grid scope
• Friendly User Interfaces
– Because most of Grid users have nothing to do with computing
science (physicians, chemistries . . .)
– Graphical User Interfaces (GUIs)
– Grid Portals (very similar to classical Web Portals)
– Command Line Interfaces (CLIs) for experts
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
Virtual Organizations (VOs)
Enabling Grids for E-sciencE
• A Virtual Organization is a collection of people and resources that
work in a coordinated way to achieve a common goal
• To use Grid facilities, any user MUST subscribe to a Virtual
Organization as member
• Each people or resource can be member of more VOs at the same
time
• Each VO can contain people or resources belonging to different
administration domains
University of Catania
VO
Garr-B
Italian institute
of Particle
Physics
INFSO-RI-508833
?
VO
VO
?
Italian CNR
EGEE Tutorial, Roma, 02.11.2005
Virtual Laboratory
Enabling Grids for E-sciencE
• A new way of
cooperating in
experiments
• A platform that allow
scientists to work
together on in the
same “Virtual”
Laboratory
• Strictly correlated to
Grids and Virtual
Organizations
INFSO-RI-508833
Devices
?
People
?
Instruments
Grid Infrastructre
Data
Computing
resources
EGEE Tutorial, Roma, 02.11.2005
Globus Alliance
Enabling Grids for E-sciencE
• The Globus Alliance
– Is a community of people and organizations involved in projection and
development of Grid technologies
– University of Illinois, Argonne National Laboratory, University of
Edinburgh, EPCC, etc…
• The Globus Toolkit (GT)
–
–
–
–
It is a standard de facto
It is a bag of services
At its fourth release (GT4)
Now adopts Web Services interfaces
• The Global Grid Forum
–
–
–
–
It is a forum of grid researchers
Works to define standards and protocols on grid technologies
It is divided in Working Groups (WGs)
http://www.ggf.org
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
Globus Services
Enabling Grids for E-sciencE
Community
Scheduler
Framework
Delegation
Data Replication
Grid Telecontrol
Protocol
WebMDS
Python WS Core
Pre-WS
Authentication
Authorization
OGSA-DAI
Workspace
Management
Index
C WS Core
Authentication
Authorization
Reliable File
Transfer
Grid Resource
Allocation &
Management
Trigger
Java WS Core
Pre-WS
Authentication
Authorization
GridFTP
Pre-WS Grid
Resource Allocation
& Management
Monitoring &
Discovery
(MDS2)
C Common
Libraries
Credential
Management
Replica Location
Security
Data
Management
WS Components
eXtensible IO
(XIO)
Execution
Management
Information
Services
Non-WS
Components
Common
Runtime
Core GT Component: public interfaces frozen between incremental releases; best effort support
Contribution/Tech Preview: public interfaces may change between incremental releases
Deprecated Component: not supported; will be dropped in a future release
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
Hourglass Reference Model
Enabling Grids for E-sciencE
•
Fabric layer:
– Manages resources locally
•
Connectivity
– Network communications (IP, DNS etc.)
– Security: authentication, authorization,
certification
– Single Sign On
•
Collective
–
–
–
–
•
Collective
Resource
– Allocation, reservation and monitoring of
resources
– Data access and transport
– Gathering of information on resources
•
Application
View of services as collections
Discovery and allocation
Replica and catalogue of data
Management of workflow
Resource
Connectivity
Fabric
Application
– User applications
– Tools and interfaces
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
An Example: SETI@Home
Enabling Grids for E-sciencE
• The SETI@Home project
– Searches for Extra Terrestrial Intelligence
(SETI)
 Collecting samples of microwaves coming
from the Universe through a telescope
 Scheduling tasks spread over Grid nodes
to analyse these samples
– Uses desktop computers as Grid nodes
– Working nodes are dynamically added and
removed to the grid
– The owner of the desktop machine
decides how contribute to the project
offering its computational power
• To contribute to the project
– http://setiathome.berkeley.edu/
– Download and install the client
– Your machine will work as a Grid node
when is idle (in place of your screensaver)
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
Application Areas (i)
Enabling Grids for E-sciencE
• Physicical Science Applications
–
–
–
–
GryPhiN, http://www.gryphin.org/
Particle Physics DataGrid (PPDG), http://grid.fnal.gov/ppdg/
GridPP, http://www.gridpp.ac.uk/
AstroGrid, http://www.astrogrid.org/
• Life Science Applications
– Protein Data Bank (PDB), http://www.rcsb.org/pdb/Welcome.do
– Biomedical Informatics Research Network (BIRN),
http://www.nbirn.net/
– Telemicroscopy, http://ncmir.ucsd.edu/
– myGrid, http://www.mygrid.org.uk/
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
Application Areas (ii)
Enabling Grids for E-sciencE
• Engineering Oriented Applications
– NASA Information Power Grid (IPG), http://www.ipg.nasa.gov/
– Grid Enabled Optimization and Design Search for Engineering
(GEODISE), http://www.geodise.org/
• Commercial Applications
– Butterfly Grid, http://www.butterfly.net/
– Everquest, http://www.everquest.com/
• E-Utility
– ClimatePrediction experiment, http://www.climateprediction.net/
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
Enabling Grids for E-sciencE
EGEE Project
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
EGEE Partners
Enabling Grids for E-sciencE
•
•
•
•
•
•
•
•
•
•
•
•
CERN
Central Europe including Austria, Czech Republic, Hungary,
Poland, Slovakia and Slovenia
France
Germany and Switzerland
Ireland and the United Kingdom
Italy
Northern Europe including Belgium, Denmark, Finland, The
Netherlands, Norway and Sweden
Russia
South-East Europe including Bulgaria, Cyprus, Greece, Israel and
Romania
South-West Europe including Portugal and Spain
NRENS (National Research and Education Networks)
United States
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
The largest e-Infrastructure: EGEE
Enabling Grids for E-sciencE
• Objectives
– consistent, robust and secure
service grid infrastructure
– improving and maintaining the
middleware
– attracting new resources and users
from industry as well as science
• Structure
– 71 leading institutions in 27
countries, federated in regional
Grids
– leveraging national and regional grid
activities worldwide
– funded by the EU with ~32 M Euros
for first 2 years starting 1st April
2004
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
EGEE Activities
Enabling Grids for E-sciencE
• 48 % service activities (Grid
Operations, Support and Management,
Network Resource Provision)
• 24 % middleware re-engineering
(Quality Assurance, Security, Network
Services Development)
• 28 % networking (Management,
Dissemination and Outreach, User
Training and Education, Application
Identification and Support, Policy and
International Cooperation)
INFSO-RI-508833
Emphasis in EGEE is on
operating a production
grid and supporting the
end-users
EGEE Tutorial, Roma, 02.11.2005
EGEE
Enabling Grids for E-sciencE
• Enabling Grids for E-SciencE (EGEE) in Europe
– Funded by the European Union (EU)
– Involves 26 countries and more than 70 institutions
• EGEE infrastructure
– Over GEANT European Communication Network
– LHC Computing Grid (LCG) Middleware
– Moving towards the complete adoption of the new gLite
middleware
LCG-1
LCG-2
gLite-1
gLite-2
Globus 2 based Web services based
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
Large Hadron Collider
Enabling Grids for E-sciencE
• It is a particle accelerator
built in Geneve
• The biggest instrument
ever built
Mont Blanc
(4810 m)
Downtown Geneva
• Data is collected in a few
places of the LHC and
distributed across many
computing sites
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
The LHC Experiments
Enabling Grids for E-sciencE
• Large Hadron Collider
(LHC):
– four experiments:




ALICE
ATLAS
CMS
LHCb
– 27 km tunnel
– Start-up in 2007
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
The LHC Experiments
Enabling Grids for E-sciencE
ATLAS
CMS
~10-15 PetaBytes /year
~108 events/year
~103 batch and interactive users
LHCb
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
Grid monitoring
Enabling Grids for E-sciencE
• Operation of Production Service: real-time display of grid
operations
• Accounting Information
• Selection of Monitoring tools:
– GIIS Monitor + Monitor
Graphs
– Sites Functional Tests
– GOC Data Base
– Scheduled Downtimes
– Live Job Monitor
– GridIce – VO + Fabric View
– Certificate Lifetime Monitor
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
BioMed Overview
Enabling Grids for E-sciencE
• Infrastructure
– ~3.000 CPUs
– ~12 TB of disk
– in 9 countries
PADOVA
BARI
• >50 users in 7 countries
working with 12
applications
• 18 research labs
15 resource
centres
17 CEs
16 SEs
BIOMED
Number of jobs
Number of jobs
25.000
20.000
15.000
10.000
5.000
0
2004-09
2004-10
2004-11
2004-12
2005-01
2005-02
2005-03
Month
Month
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
Biomed Virtual Organisation
Enabling Grids for E-sciencE
• ~ 70 users, 9 countries
• > 12 Applications (medical image processing, bioinformatics)
• ~3000 CPUs, ~12 TB disk space
• ~100 CPU years, ~ 500K jobs last 6 months
120000
60
100000
50
80000
40
60000
30
40000
20
duration estimate (years)
nb of jobs
BIOMED jobs distribution
registered jobs
successful jobs
20000
10
cancelled jobs
aborted jobs
0
0
2005-01
INFSO-RI-508833
2005-02
2005-03
2005-04
2005-05
2005-06
2005-07
run duration estimate
2005-08
EGEE Tutorial, Roma, 02.11.2005
Bioinformatics
Enabling Grids for E-sciencE
• GPS@: Grid Protein Sequence Analysis
– Gridified version of NPSA web portal
 Offering proteins databases and sequence analysis algorithms to the
bioinformaticians (3000 hits per day)
 Need for large databases and big number of short jobs
– Objective: increased computing power
– Status: 9 bioinformatic softwares gridified
– Grid added value: open to a wider community with larger
bioinformatic computations
• xmipp_MLrefine
– 3D structure analysis of macromolecules
 From (very noisy) electron microscopy images
 Maximum likelihood approach to find the optimal model
– Objective: study molecule interaction and chem. properties
– Status: algorithm being optimised and ported to 3D
– Grid added value: parallel computation on different resources of
independent jobs
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005
Contacts
Enabling Grids for E-sciencE
• EGEE Website
http://www.eu-egee.org
• How to join
http://public.eu-egee.org/join/
• How to test
https://gilda.ct.infn.it
• EGEE Project Office
[email protected]
INFSO-RI-508833
EGEE Tutorial, Roma, 02.11.2005