Grid - Indico LAL

Download Report

Transcript Grid - Indico LAL

Le projet EGEE
Applications
Guy Wormser
LAL Orsay
15 Mars 2007
Web: information sharing
•
Invented at CERN by Tim Berners-Lee
No. of Internet
hosts (millions)
• Quickly crossed over into public use
Tim
BernersLee
Year
• Agreed protocols: HTTP, HTML, URLs
• Anyone can access information and
post their own
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
2
Grid: Resource Sharing
•
•
Share more than information
Data, computing power, applications
• Middleware handles everything
Your
Program
The Grid
Single computer
PROGRAMS
Word/Excel
Games
MIDDLEWARE
User Interface
Machine
Your
Program
Email/Web
Resource
Broker
OPERATING SYSTEM
Disks, CPU etc
Disk
Server
CPU
CPU
Cluster
Cluster
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
3
Electricity Grid
Analogy with the Electricity Power Grid
Power Stations
Distribution Infrastructure
'Standard Interface'
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
4
Définitions Grilles
• Grilles: Ensemble distribué de ressources informatiques reliées
par des réseaux rapides et accessible de façon transparente par
l’utilisateur
• EGEE: projet européen financé dans le cadre du 6ème PCRD à
vovation pluridisciplinaire s’appuyant sur la physique des hautes
énergies mais ouverts à bcp d’autres sciences:
Biologie/médecine, sciences de la Terre, Astrophysique, chimie
• LCG: LHC Computing GRID, grille internationale mise en place
autour du CERN seul outil choisi pour satisfaire les besoins de
calcul du LHC.
• EGEE et LCG partagent la même infrastructure, la même équipe
logicielle,etc…
• Plus de 200 nœuds et 15000 processeurs sont opérationnels
24h/24 7jours/7. Les Data Challenges des expériences LHC sont
complètement basées sur cette infrastructure
• Tier1/2 Centres de ressources de la grille LCG/EGEE.
– Tier1 : très fortes obligations de service, gros volume de stockage,
dédié aux productions centralisées
– Tier2 Forte réactivité pendant heures ouvrables, dédié analyses et
simulation
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
5
EGEE : Enabling Grids for E-sciencE
Goal
create a general European Grid
production quality infrastructure on top of
present and future EU RN infrastructure
Build on
EU and EU member states major
investment in Grid Technology
Several pioneering prototype results
Largest Grid development team in the
world
Goal can be achieved for about €100m/4 years on top
of the national and regional initiatives
Approach
Leverage current and planned national
and regional Grid programmes (e.g.
LCG)
Work closely with relevant industrial Grid
developers, NRNs and US
applications
EGEE
Geant network
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
6
The Large Hadron Collider Project
4 detectors
CMS
ATLAS
LHCb
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
7
Bat 40
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
8
New solutions are necessary!
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
9
LHC Computing Model
Lab m
Uni x
Uni a
USA
Brookhaven
Lab a
UK
USA
FermiLab
Physics
Department
France
The LHC Computing
Tier
1
Centre
Tier2
CERN
Uni n
……….
Italy
Desktop

NL
Germany
Lab b
Lab c
Uni y

[email protected]

Uni b
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
10
HEP commitment to Grids
• 2000-2003 : Exploratory phase. Several R&D projects in
the world to develop the middleware, build mid size
test beds, port some applications
– US : PPDG, Griphyn
– Europe : DATAGRID
– Several initiatives in Asia
• 2002: Decision to build a grid infrastructure as THE
TOOL for LHC computing. Creation of the LCG project
(LHC Grid Computing). Point of no return!
• 2007 Large scale deployment of the world largest
distributed computing infrastructure(s):
– LCG, EGEE, OSG (US), NAREGI(Japan), etc….
– >200 sites, 35 000 processors, 12 PB of storage
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
11
Grids are a reality
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
12
Deployment of applications
• Pilot applications
– High Energy Physics (LHC + D0, BaBar, CDF)
– Biomed applications (12)
• Generic applications –
Deployment under way
–
–
–
–
Computational Chemistry
Earth science research
EGEODE: first industrial application
Astrophysics
• With interest from
–
–
–
–
–
–
Hydrology
Seismology
Grid search engines
Stock market simulators
Digital video etc.
Industry (provider, user, supplier)
Pilot
New
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
13
Status
250
No. Sites
200
150
100
50
Ap
r-
04
Ju
n0
Au 4
g04
O
ct
-0
4
D
ec
-0
4
Fe
b05
Ap
r05
Ju
n05
Au
g05
O
ct
-0
5
D
ec
-0
5
Fe
b06
Ap
r06
Ju
n0
Au 6
g06
O
ct
-0
6
D
ec
-0
6
0
~17.5 million jobs run (6450 cpu-years) in 2006;
Workloads of the “not HEP VOs” start to be significant –
approaching 8-10K jobs per day; and 1000 cpu-months/month
• one year ago this was the overall scale of work for all VOs
40000
35000
25000
20000
15000
10000
5000
04
Ju
n0
Au 4
g04
O
ct
-0
4
D
ec
-0
4
Fe
b05
Ap
r05
Ju
n0
Au 5
g05
O
ct
-0
5
D
ec
-0
5
Fe
b06
Ap
r06
Ju
n0
Au 6
g06
O
ct
-0
6
D
ec
-0
6
0
Ap
r-
No. CPU
30000
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
14
Grid Virtual Organizations
• Routine and large-scale use of EGEE infrastructure.
• Virtual Organizations:
–
–
200+ visible on the grid
100+ registered with EGEE
http://www3.egee.cesga.es/gridsite/accounting/CESGA/tree_vo.php
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
15
CPU Usage
Sep. ’06
Jan. ’06
Virtual Organizations
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
16
Production service
Sites
200
180
160
140
120
100
80
sites
60
40
20
ec
-0
5
D
ct
-0
5
O
Au
g05
Ju
n05
Fe
b05
Ap
r05
ec
-0
4
D
ct
-0
4
O
Au
g04
Ju
n04
Ap
r-
04
0
Size of the infrastructure today:
• 192 sites in 40 countries
• ~25 000 CPU
• ~ 3 PB disk, + tape MSS
30000
No. CPU
25000
20000
CPU
15000
10000
5000
A
pr
-0
4
Ju
n04
A
ug
-0
4
O
ct
-0
4
D
ec
-0
4
Fe
b05
A
pr
-0
5
Ju
n05
A
ug
-0
5
O
ct
-0
5
D
ec
-0
5
Fe
b06
0
Date
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
17
EGEE Resources
#countries
#sites
#cpu
#cpu
DoW
disk (TB)
CERN
0
1
4400
1800
770*
UK/I
2
23
4306
2010
310
Italy
1
27
2800
2280
373
France
1
10
2316
1252
300*
De/CH
2
13
2895
1852
280*
Northern Europe
6
16
2379
1860
64
SW Europe
2
13
956
898
16*
SE Europe
8
26
1101
1189
30
Central Europe
7
21
1584
1163
70
Russia
1
15
515
445
38
Asia-Pacific
8
19
840
751
72
North America
2
8
4069
-
229
Totals
40
192
28161
20265
2552
Region
* Estimates taken from reporting as IS publishes total MSS space
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
18
Usage of the infrastructure
other VOs
EGEE workload
planck
1800000
Jobs/month
ops
>50k jobs/day
1600000
magic
1400000
lhcb
1200000
geant4
1000000
fusion
esr
800000
egrid
600000
egeode
400000
dteam
200000
compchem
Au
g06
Ju
l-0
6
Ju
n06
06
M
ay
-
06
Ap
r-
06
M
ar
-
Fe
b06
Ja
n06
ec
-0
5
D
ov
-0
5
N
ct
-0
5
O
Se
p05
Au
g05
Ju
l-0
5
Ju
n05
05
M
ay
-
05
Ap
r-
M
ar
-
05
cms
Fe
b05
Ja
n05
0
biomed
atlas
alice
Normalized CPU time
other VOs
6000000
planck
magic
lhcb
4000000
geant4
fusion
3000000
esr
egrid
2000000
egeode
dteam
1000000
compchem
cms
0
Au
g06
Ju
l-0
6
Ju
n06
06
M
ay
-
06
Ap
r-
06
M
ar
-
Fe
b06
Ja
n06
ec
-0
5
D
ov
-0
5
N
ct
-0
5
O
Se
p05
Au
g05
Ju
l-0
5
Ju
n05
05
M
ay
-
05
Ap
r-
M
ar
-
05
biomed
Fe
b05
Ja
n05
k.SI2k. hours
ops
~7000 CPU-months/month
5000000
atlas
alice
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
19
Non-LHC VOs
EGEE workload
250,000
planck
200,000
ops
Jobs/month
magic
geant4
150,000
fusion
esr
100,000
egrid
egeode
50,000
compchem
biomed
Au
g06
Ju
l-0
6
Ju
n06
06
06
06
Fe
b06
Ja
n06
ec
-0
5
ov
-0
5
ct
-0
5
Se
p05
Au
g05
Ju
l-0
5
Ju
n05
05
05
05
Fe
b05
other VOs
M
ay
-
Ap
r-
M
ar
-
D
N
O
M
ay
-
Ap
r-
Workloads of the “other VOs” start to be significant – approaching 810K jobs per day; and 1000 cpu-months/month
• one year ago this was the overall scale of work for all VOs
M
ar
-
Ja
n05
0
Normalized CPU time
800,000
planck
700,000
ops
magic
geant4
500,000
f usion
esr
400,000
egrid
300,000
egeode
dteam
200,000
compchem
100,000
biomed
other VOs
-0
6
A
ug
Ju
l-0
6
6
Ju
n0
06
M
ay
-
06
A
pr
-
06
M
ar
-
6
eb
-0
6
F
Ja
n0
ec
-0
5
D
ov
-0
5
N
ct
-0
5
O
-0
5
S
ep
-0
5
A
ug
Ju
l-0
5
5
Ju
n0
05
M
ay
-
05
A
pr
-
05
M
ar
-
eb
-0
5
F
5
0
Ja
n0
k.SI2k. hours
600,000
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
20
Grids: a great oportunity for
interdisciplinary contacts
• Grids have become a great vehicle for promoting
interdisciplinary contacts between HEP, many other
application fields and computing scientists
–
–
–
–
–
–
–
–
–
–
Bioinformatics
Medecine
Chemistry
Fusion science
Earth sciences
Astrophysics/astronomy
Neuroinformatics
Climate
Finance
…..
• HEP can be proud to have been a key player in these
endeavours
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
21
D0 MC efficiency on LCG2 since Xmas
(but small statistics)
CE
Success
Failed
bohr0001.tier2.hep.man.ac.uk
237
3
cclcgceli01.in2p3.fr
–
grid-ce.physik.uni-wuppertal.de
-
14
-
-
gridkap01.fzk.de
2564
19
golias25.farm.particle.cz
198
15
heplnx131.pp.rl.ac.uk
246
4
lcgce02.gridpp.rl.ac.uk
293
10
mu6.matrix.sara.nl
397
7
tbn18.nikhef.nl
154
2
Total
4089
74
•Efficiency 98 %
•System running monitored very closely by
run-manager in close contact with sites
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
22
APPLICATIONS ported on EGEE
Earth Observation by Satellite
Hydrology
Solid Earth Physics
Meteorology
Climate
Geosciences
Chemistry of the Mars Upper Atmosphere
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
23
SEISMOLOGY[1]
Fast Determination of mechanisms of important earthquakes (IPGP:
E. Clévédé, G. Patau)
Challenge
Provide results 24h -48h after its
occurrence
5 Seisms already ported: Peru, Guadeloupe,
Indonesia (Dec.), Japon, Indonesia (Feb.)
Application to run on alert
Collect data of 30 seismic stations from
GEOSCOPE worldwide network
Select stations and data
Peru earthquake, 23/6/2001, Mw=8.3
Definition of a spatial 3D grid +time
Data used: 15 Geoscope Stations
Run for example 50-100jobs
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
24
Management of water resources in
Mediterranean area (SWIMED)
G. Lecca (CRS4 Italy), P. Renard (Unine, CH),
J. Kerrou (INAT, Tunisia), R. Ababou (IMFT, Fr)
Korba coastal aquifer
Tunisia
45 km
Cape Bon
Peninsula
70km
south-east
of Tunis
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
25
GEOSCIENCES
• Generic seismic platform software, based on
Geocluster commercial software developed by CGG
• Includes 400 geophysical modules, implemented on
EGEE
• Used by both academics and private companies.
• Free of charge for Academics, with charge for R&D
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
26
Status of Biomedical VO
RLS, VO LDAP Server:
CC-IN2P3
PADOVA
BARI
4 RBs:
CNAF, IFAE,
LAPP, UPV
15 resource centres ( )
17 CEs (>750 CPUs)
16 SEs
4 RBs
1 RLS
1 LDAP Server
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
27
GATE
GEANT4 Application to Tomography Emission
• Scientific objectives
Radiotherapy planning for improving the treatment of cancer by ionizing
radiations of the tumours.
Therapy planning is computed from pre-treatment MR scans by
accurately locating tumours in 3D and computing radiation doses applied
to the patients.
• Method
GEANT4 base software to model
physics of nuclear medicine.
Use Monte Carlo simulation to
improve accuracy of computations (as
compared to the deterministic classical
approach)
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
28
Drug Discovery
• WISDOM focuses on in silico drug discovery for
neglected and emerging diseases.
• Malaria — Summer 2005
– 46 million ligands docked
– 1 million selected
– 1TB data produced; 80 CPU-years used in 6 weeks
• Avian Flu — Spring 2006
– H5N1 neuraminidase
– Impact of selected point mutations on eff. of existing drugs
– Identification of new potential drugs acting on mutated N1
• Fall 2006
– Extension to other neglected diseases
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
29
WISDOM : Wide In Silico Docking On Malaria
• Goals of the first biomedical “data challenge” (July - August 2005)
– Biological goal : Proposition of new inhibitors for a family of proteins produced
by Plasmodium falciparum
– Biomed. informatics goal : Deployment of in silico virtual docking on the grid
– Grid goal : Deployment of a CPU consuming application generating large data
flows to test the grid infrastructure and services.
• Partners
– Fraunhofer SCAI (Project PI: Martin Hofmann)
– LPC Clermont-Ferrand (CNRS/IN2P3)
– CMBA (Center for Bio-Active Molecules screening)
• Representing different projects:
– EGEE (EU FP6)
– Simdat (EU FP6)
– AuverGrid and Campus Grid
(French and German Regional Grids)
– Accamba project (french ACI project)
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
30
High Throughput Virtual Docking
Millions of chemical
compounds available
in laboratories
Chemical compounds : ZINC
Molecular docking : FlexX, Autodock
Targets structures : PDB
Grid infrastructure : EGEE
Chemical compounds :
Chembridge – 500,000
Drug like – 500,000
High Throughput Screening
1-10$/compound, nearly impossible
Molecular docking (FlexX, Autodock)
~80 CPU years, 1 TB data
Computational data challenge
~6 weeks on ~1000/1600 computers
Targets :
Plasmepsin II (1lee, 1lf2, 1lf3)
Plasmepsin IV (1ls5)
Hits screening
using assays
performed on
living cells
Leads
Clinical testing
Drug
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
31
Modeling for the molecular docking
•
Target scenarios
–
•
Software scenarios
–
–
•
Docking methods (Autodock)
Water molecules place and max overlapping
volume (Flexx)
Target preparation
–
–
–
–
•
number of water molecules in the active site
X-ray crystal structures of 5 plasmepsins
(PDB)
Superimposed all the proteins on to 1lee
(PDB Kabsch and PDB transform)
Native ligand conversion in mol2 and
hydrogens added (Babel and Corina)
Active site created from native crystal ligand
Compounds preparation
–
–
Yet drug like
Conversion for Autodock in pdbqs
Active
site
Ligand
Loops variation
between structures
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
32
Grid workflow
Results
Compounds list
Software
Storage
Element
Site1
Computing
Element
Statistics
Parameter settings
Target structures
Compounds sublists
User interface
Compounds
database
Storage
Element
Results
Computing
Element
Site2
Software
• FlexX license server :
– 3000 floating licenses given by BioSolveIT to SCAI
– Maximum number of used licenses was 1008
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
33
Score results in different scenarios
with VS Explorer (SCAI)
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
34
gPTM3D
3D Medical Image Analysis Software
• Scientific objectives
Interactive volume reconstruction on large radiological data.
PTM3D is an interactive tool for performing computer-assisted 3D
segmentation and volume reconstruction and measurement (RSNA 2004)
Reconstruction of complex organs (e.g. lung) or entire body from modern
CT-scans is involved in augmented reality use case e.g. therapy planning.
• Method
Starting from an hand-made rough
Initialization,a snake-based algorithm
segments each slice of a medical volume.
3D reconstruction is achieved in parallel
by triangulating contours from consecutive
slices.
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
35
GPS@
• Grid added value
The NPS@ portal records 3000 hits a day and is limited in the size of the
databanks and the kind of computations performed by local resources.
The grid version, GPS@, can:
- for biological data: provide Biologist with a convenient way to distribute
and to access to international databanks, and to store more and larger
of these databanks
- for bioinformatic algorithms: allow each portal user to compute larger
datasets with the available algorithms through larger bioinformatic
computations
- Open to a wider user community.
• Results and perspectives
9 world-used bioinformatic softwares have currently been gridified: such as
BLAST, CLUSTALW, PattInProt, …
GPS@ is stressing the grid infrastructure with a large number of rather short jobs
(few minutes each).
Optimizations are worked on to:
- Speed-up access to databases.
- Lower short jobs latencies.
- Processing data or software dependent jobs (workflow)
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
36
Conclusion
• Les grilles : Une très grande réussite en quelques
années du concept à une réalité opérationnelle!
• « Huge infrastructure in place, other sciences
embarked, very high political visibility, HEP role clearly
recognised »
• Rôle très important de la France et du CNRS dans
EGEE, notamment au cœur du secteur crucial des
applications: notre mission d’ouvrir le spectre des
applications d’EGEE a été réussie au-delà de toutes les
espérances
• Il faut bâtir sur le succès d’EGEE pour construire la
suite pérenne de l’après EGEE-II: une European Grid
Initiative basée sur des « National Grid Initiatives »
Initiative nationale de Grille française en cours de
construction avec le ministère
Guy Wormser, Entrevue Arnold Migus , 23 Février 2006
37