Enabling Grids for E-sciencE

Download Report

Transcript Enabling Grids for E-sciencE

Enabling Grids for E-sciencE
European Grid Initiative
A tool for international
collaboration
Guy Wormser
Director of CNRS Institut des Grilles (CNRS, France)
www.eu-egee.org
EGEE-II INFSO-RI-031688
EGEE and gLite are registered trademarks
Electricity Grid
Enabling Grids for E-sciencE
Analogy with the Electricity Power Grid
Power Stations
Distribution Infrastructure
'Standard Interface'
EGEE-II INFSO-RI-031688
2
EGEE : Enabling Grids for E-sciencE
Enabling Grids for E-sciencE
Goal
create a general European Grid
production quality infrastructure on top of
present and future EU RN infrastructure
Build on
EU and EU member states major
investment in Grid Technology
Several pioneering prototype results
Largest Grid development team in the
world
Goal can be achieved for about €100m/4 years on top
of the national and regional initiatives
Approach
Leverage current and planned national
and regional Grid programmes (e.g.
LCG)
Work closely with relevant industrial Grid
developers, NRNs and US
EGEE-II INFSO-RI-031688
applications
EGEE
Geant network
3
Grid: Resource Sharing
Enabling Grids for E-sciencE
•
•
Share more than information
Data, computing power, applications
• Middleware handles everything
Your
Program
The Grid
Single computer
PROGRAMS
Word/Excel
Games
MIDDLEWARE
User Interface
Machine
Your
Program
Email/Web
Resource
Broker
OPERATING SYSTEM
Disks, CPU etc
EGEE-II INFSO-RI-031688
Disk
Server
CPU
Cluster
CPU
Cluster
4
The Large Hadron Collider Project
Enabling Grids for E-sciencE
4 detectors
CMS
ATLAS
LHCb
EGEE-II INFSO-RI-031688
5
Bat 40
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
6
New solutions are necessary!
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
7
LHC Computing Model
Enabling Grids for E-sciencE
Lab m
Uni x
Uni a
USA
Brookhaven
Lab a
UK
USA
FermiLab
Physics
Department
France
The LHC Computing
Tier
1
Centre
Tier2
CERN
Uni n
……….
Italy
Desktop

NL
Germany
Lab b
Lab c
Uni y

EGEE-II INFSO-RI-031688
[email protected]

Uni b
8
How e-Infrastructrures help e-Science
Enabling Grids for E-sciencE
•
e-Infrastructures provide easier access for
– Small research groups
– Scientists from many different fields
– Remote and still developing countries
•
To new technologies
– Produce and store massive amounts
of data
– Transparent access to millions of files
across different administrative domains
– Low cost access to resources
 Mobilise large amounts of CPU & storage
on short notice (PC clusters)
– High-end facilities (supercomputers)
•
And help to find new ways to collaborate
– Develops applications using distributed
complex workflows
– Eases distributed collaborations
– Provides new ways of community building
– Gives easier access to higher education
EGEE-II INFSO-RI-031688
9
240 sites
45 countries
41,000 CPUs
5 PetaBytes
>5000 users
>100 VOs
>100,000 jobs/day
Enabling Grids for E-sciencE
No. CPU
No. Sites
Aug…
Mar…
Oct-…
May…
Dec…
Jul-05
Feb-…
Sep-…
0
Apr-…
Aug…
Mar…
Oct-…
Ma…
Dec…
Jul-05
Feb…
50000
Sep…
400
200
0
Apr-…
Archeology
Astronomy
Astrophysics
Civil Protection
Comp. Chemistry
Earth Sciences
Finance
Fusion
Geophysics
High Energy Physics
Life Sciences
Multimedia
Material Sciences
…INFSO-RI-031688
EGEE-II
32%
10
EGEE – What do we deliver?
Enabling Grids for E-sciencE
• Infrastructure operation
– Currently includes ~250 sites across 45 countries
 Continuous monitoring of grid services & automated site
configuration/management
 Support many Virtual Organisations from diverse
research disciplines
• Middleware
– Production quality middleware distributed under
business friendly open source licence
 Implements a service-oriented architecture that virtualises
resources
 Adheres to recommendations on web service interoperability and evolving towards emerging standards
• User Support - Managed process from first contact
through to production usage
–
–
–
–
Training
Expertise in grid-enabling applications
Online helpdesk
Networking events (User Forum, Conferences etc.)
EGEE-II INFSO-RI-031688
11
EGEE
Enabling Grids for E-sciencE
Flagship grid infrastructure project co-funded by the European Commission
Now in 2nd phase with 91 partners in 32 countries
Main Objectives
• Operate a large-scale,
production quality grid
infrastructure for e-Science
• Attract new resources and
users from industry as well
as sciences
EGEE-II INFSO-RI-031688
12
Types of applications
Enabling Grids for E-sciencE
•
•
•
•
•
•
Simulation
– LHC Monte Carlo simulations; Fusion; WISDOM
– Jobs needing significant processing power; Large number of
independent jobs; limited input data; significant output data
Bulk Processing
– HEP ; Processing of satellite data
– Distributed input data; Large amount of input and output
data; Job management (WMS); Metadata services; complex
data structures
Parallel Jobs
– Climate models, computational chemistry
– Large number of independent but communicating jobs; Need
for simultaneous access to large number of CPUs; MPI
libraries
Short-response delays
– Prototyping new applications; grid Monitoring grid;
Interactivity
– Limited input & output data; processing needs but fast
response and quality of service
Workflow
– Medical imaging; flood analysis
– Complex analysis algorithms; complex dependencies
between jobs
Commercial Applications
– Non-open source software; Geocluster (seismic platform);
FlexX (molecular docking); Matlab, Mathematics; Idl, …
– License server associated to an application deployment
model
EGEE-II INFSO-RI-031688
13
SEISMOLOGY[1]
Enabling Grids for E-sciencE
Fast Determination of mechanisms of important earthquakes (IPGP:
E. Clévédé, G. Patau)
Challenge
Provide results 24h -48h after its
occurrence
5 Seisms already ported: Peru, Guadeloupe,
Indonesia (Dec.), Japon, Indonesia (Feb.)
Application to run on alert
Collect data of 30 seismic stations from
GEOSCOPE worldwide network
Select stations and data
Definition of a spatial 3D grid +time
Peru earthquake, 23/6/2001, Mw=8.3
Data used: 15 Geoscope Stations
Run for example 50-100jobs
EGEE-II INFSO-RI-031688
14
Management of water resources in
Mediterranean area (SWIMED)
Enabling Grids for E-sciencE
G. Lecca (CRS4 Italy), P. Renard (Unine, CH),
J. Kerrou (INAT, Tunisia), R. Ababou (IMFT, Fr)
Korba coastal aquifer
Tunisia
45 km
Cape Bon
Peninsula
70km
south-east
of Tunis
EGEE-II INFSO-RI-031688
15
GEOSCIENCES
Enabling Grids for E-sciencE
• Generic seismic platform software, based on
Geocluster commercial software developed by CGG
• Includes 400 geophysical modules, implemented on
EGEE
• Used by both academics and private companies.
• Free of charge for Academics, with charge for R&D
EGEE-II INFSO-RI-031688
16
GATE
Enabling Grids for E-sciencE
GEANT4 Application to Tomography Emission
• Scientific objectives
Radiotherapy planning for improving the treatment of cancer by ionizing
radiations of the tumours.
Therapy planning is computed from pre-treatment MR scans by
accurately locating tumours in 3D and computing radiation doses applied
to the patients.
• Method
GEANT4 base software to model
physics of nuclear medicine.
Use Monte Carlo simulation to
improve accuracy of computations (as
compared to the deterministic classical
approach)
EGEE-II INFSO-RI-031688
17
Drug Discovery
Enabling Grids for E-sciencE
• WISDOM focuses on in silico drug discovery for
neglected and emerging diseases.
• Malaria — Summer 2005
– 46 million ligands docked
– 1 million selected
– 1TB data produced; 80 CPU-years used in 6 weeks
• Avian Flu — Spring 2006
– H5N1 neuraminidase
– Impact of selected point mutations on eff. of existing drugs
– Identification of new potential drugs acting on mutated N1
• Fall 2006
– Extension to other neglected diseases
EGEE-II INFSO-RI-031688
18
High Throughput Virtual Docking
Enabling Grids for E-sciencE
Millions of chemical
compounds available
in laboratories
Chemical compounds : ZINC
Molecular docking : FlexX, Autodock
Targets structures : PDB
Grid infrastructure : EGEE
Chemical compounds :
Chembridge – 500,000
Drug like – 500,000
High Throughput Screening
1-10$/compound, nearly impossible
Molecular docking (FlexX, Autodock)
~80 CPU years, 1 TB data
Computational data challenge
~6 weeks on ~1000/1600 computers
Targets :
Plasmepsin II (1lee, 1lf2, 1lf3)
Plasmepsin IV (1ls5)
EGEE-II INFSO-RI-031688
Hits screening
using assays
performed on
living cells
Leads
Clinical testing
Drug
19
Grid workflow
Enabling Grids for E-sciencE
Results
Compounds list
Software
Storage
Element
Site1
Computing
Element
Statistics
Parameter settings
Target structures
Compounds sublists
User interface
Compounds
database
Storage
Element
Results
Computing
Element
Site2
Software
• FlexX license server :
– 3000 floating licenses given by BioSolveIT to SCAI
– Maximum number of used licenses was 1008
EGEE-II INFSO-RI-031688
20
Grids key competitive advantages
Enabling Grids for E-sciencE
• Transparent access to distributed data
– Exemples Earth sciences, Life sciences
• Handling of huge datasets
– Physique des particle Physics, astrophysics, human sciences
• Large flexibility in computing ressources
– Disasters management
– Avian flu, malaria challenges
• Synergy between the grid network and the human
network
EGEE-II INFSO-RI-031688
21
European Grid Initiative
Enabling Grids for E-sciencE
• Need to prepare permanent, common Grid infrastructure
• Ensure the long-term sustainability of the European e-Infrastructure
independent of short project funding cycles
• Coordinate the integration and interaction between National Grid
Infrastructures (NGIs)
• Operate the production Grid infrastructure on a European level for a
wide range of scientific disciplines
Must be no gap
in the support of
the production
grid
EGEE-II INFSO-RI-031688
22
Enabling Grids for E-sciencE
37 NGIs en Europe
+ Asie, US, Amérique latine
+ PRACE
+ OGF-Europe
+…
EGEE-II INFSO-RI-031688
23
EGI Draft Blueprint
Enabling Grids for E-sciencE
• EGI_DS just released the EGI draft Blueprint document
http://www.eu-egi.eu/blueprint.pdf
• Main concept : EGI is based on National Grid Initatives (NGI) in a
way very similar to the NRENs and DANTE/GEANT
• EGI scope: production grids (not limited to EGEE nor to EGEE
middleware). Partnership with DEISA/PRACE supercomputer
initiatives
• EGI is formed by NGIs and a (small) central organisation EGI.org
• EGI.org in charge of the grid operation and other central functions
(user support and training management, middleware certification
and distribution,..)
– EGI.org not responsible for middleware developpment
• Total manpower required: 50 FTEs (many can be outsourced in
the NGIs).
EGEE-II INFSO-RI-031688
24
Central effort within EGI
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
25
Operational model within EGI
Enabling Grids for E-sciencE
EGEE-II INFSO-RI-031688
26
The EGI.org structure
Enabling Grids for E-sciencE
EGI council
is formed by
participating
NGIs
EGEE-II INFSO-RI-031688
27
EGI financing model
Enabling Grids for E-sciencE
• NGIs membership fees cover EGI core EGI.org personnel
(10-20 FTEs)
• Each NGI finances national ressources and the operation
of its national grid with some help from EU.
• The European contribution (co-)finances
– EGI.org technical personnel (partly)
– Some fraction of NGI operations
– R&D and innovation
EGEE-II INFSO-RI-031688
28
EGI : Next steps
Enabling Grids for E-sciencE
• EGI Blueprint accepted as a basis for further discussions by the
Policy Board (regrouping more than 30 NGIs)
– Final EGI Blueprint delivered end of 2008
– POlicy Board meeting scheduled on January 20 for EGI Blueprint
endorsement
• Bid for siting has been launched on October 1st (deadline
January 7th)
– Final EGI site selection on March 2009
• Reminder : EGI must be in place before end 2009 for a smooth
transition before the end of EGEE-III
EGEE-II INFSO-RI-031688
29
Registered Collaborating Projects
Enabling Grids for E-sciencE
25 projects have registered as of September 2007: web page
Infrastructures
geographical or thematic coverage
EGEE-II INFSO-RI-031688
Applications
Support Actions
improved services for academia,
industry and the public
key complementary functions
30
Collaborating infrastructures
Enabling Grids for E-sciencE
Nothing
there
yet!
EGEE-II INFSO-RI-031688
31
The Montpellier workshop
Enabling Grids for E-sciencE
• Held in France in December 10-12 2007
• Grid workshop to develop France-Africa collaboration
• Sponsored by CNRS and fondation « Share the
knowledge »
• Focus on African development via science and
excellence of African scientists
• Promote Internet connectivity and Grid nodes in Africa
• First actions selected: implant two grid nodes in Africa
• South Africa and Senegal selected as the best places
to start
• Prepare the launch of a « EuroAfrica » FP7 program
EGEE-II INFSO-RI-031688
32
Enabling Grids for E-sciencE
The first EGEE grid node in
subsaharian Africa
Grid node in Dakar
established in July 2008
with the help of HP/Unesco
and CNRS!
EGEE-II INFSO-RI-031688
33
Conclusion
Enabling Grids for E-sciencE
• The European production grid is in full swing!
• Thousands of users from many scientific fields and
disciplines
• Strong on-going effort to establish a sustainable
European Grid Iniative based on NGIs
• Many international collaborations.
• New initiative towards Africa!
– EGEE Grid node in Senegal
– Grid users workshop and training in CHPC08 Dec 11-13
sponsored by France and Italy
– Connection of the South African Grid to EGEE early 2009
– 2009 action plan being consolidated (another users workshop in
West Africa, joint Africa Union-European Union programs, etc..)
EGEE-II INFSO-RI-031688
34