Slides - Indico

Download Report

Transcript Slides - Indico

Open Science Grid
Project DASH:
Securing Direct MySQL
Database Access for the Grid
D. Malon, E. May, D. Ratnikov, A. Vaniachine
Argonne National Laboratory
M. Vranicar, J. Weicher
PIOCON Technologies
XV International Conference on Computing in High Energy and Nuclear Physics
T.I.F.R., Mumbai, India
February 13-17, 2006
Open Science Grid
Databases and Grids
• In addition to petabytes of file-based event data, high energy
physics applications require access to non-event data (detector
conditions, calibrations, etc.) stored in relational databases
• Databases also
play a critical role
in grid middleware:
file catalogues,
monitoring, etc.
• Crosscutting the
computational grid
infrastructure, a
database hyperinfrastructure
emerges
CHEP06 Mumbai India
File Transport
RFT Database
RLS Database
OSG
PanDA DB
Non-LHC Sites
Large Scale Distributed
Computations
Management
Production DB
System
Workload Orchestration
Meta-data DB
WLCG
Sites
ATLAS Sites
NorduGrid
Production DB
CMS Sites
Sites
Cluster
RLS Database
RLS Database
Monitoring DB
Worker Node
Head Node
Edge Services
Worker Node
Worker Node
Alexandre Vaniachine (ANL)
World-Wide Federation of
Computational
Grids
Conditions DB
2
Open Science Grid
Project DASH
• As grid computing technologies mature, development must focus on
database and grid integration
• New technologies are required to bridge the gap between data
accessibility and the increasing power of grid computing used for
distributed event production and processing
• The Database Access for Secure Hyperinfrastructure
(DASH) project is funded by the DOE Small Business
Innovative Research Program to build and test secure
high-performance database access technology for
distributed computing
www.piocon.com/DASH.php
A project of PIOCON Technologies, Inc and Argonne National Laboratory
CHEP06 Mumbai India
Alexandre Vaniachine (ANL)
3
Open Science Grid
Database Access on the Grid
Two different architectures:
• A separate middleware server does the grid authorization:
•
•
•
•
OGSA-DAI: SOAP/XML + XML binary extensions
Spitfire (EDG WP2): SOAP/XML text-only data transport
Perl DBI database proxy (ALICE): SQL data transport
Oracle 10g (separate authorization layer)
• Grid middleware is integrated in database server process:
• Instead of surrounding database with external secure middleware layers
the safety features are embedded inside of the code
• By pushing secure authorization into the database engine the inefficient
data transfer bottlenecks are eliminated
CHEP06 Mumbai India
Alexandre Vaniachine (ANL)
4
Open Science Grid
Embedded Security Approach
• The embedded security approach is listed
among the top ten innovations in security by the
panel of experts convened by Battelle:
– “The Global Cyber Net: Communications and
information are the lifeblood of security. Today we
enjoy a worldwide web, which is open but unsecured.
In the future, we will have a global cyber net that is
faster and better protected than today… Software will
contain embedded safety features inside of the code
rather than just surrounding it.”
http://www.battelle.org/forecasts/defense.stm
CHEP06 Mumbai India
Alexandre Vaniachine (ANL)
5
Open Science Grid
End-to-End Secure Transport
• DASH technology bridges the gap between data accessibility and
the increasing power of grid computing
• To overcome database access inefficiencies inherent in a traditional
middleware approach the DASH project implements secure
authorization on the transport level
• Pushing the grid authorization into the database engine eliminates
the middleware message-level security layer and delivers transportlevel efficiency of SSL/TLS protocols for grid applications
• The DASH proof-of-concept prototype provides Globus grid proxy
certificate authorization technologies for MySQL database access
control
• DASH technology brings database access efficiencies similar to the
https advantages introduced in the Globus Toolkit 4.0
• The database architecture with embedded grid authorization
provides a foundation for secure end-to-end data processing
solutions for the grids
CHEP06 Mumbai India
Alexandre Vaniachine (ANL)
6
Open Science Grid
Aspect-Oriented Programming
• To avoid a brittle, monolithic system DASH uses
an aspect-oriented programming approach
• By localizing Globus security concerns in a
software aspect, DASH achieves a clean
separation of Globus Grid Security Infrastructure
dependencies from the MySQL server code
• During the database server build, the AspectC++
tool automatically generates the transport-level
code to support a grid security infrastructure
• www.aspectc.org
CHEP06 Mumbai India
Alexandre Vaniachine (ANL)
7
Open Science Grid
Automatic Code Generation
DASH grid
security grid.ah
aspects code
Globus
GSI
code
cbk.c
OpenSSL
Transport
Level
Security
code
CHEP06 Mumbai India
tls.c
Auto-generated
grid-enabled
MySQL
database
server
code
vio.c
Alexandre Vaniachine (ANL)
MySQL
database
server
code
8
Open Science Grid
AOP is the Next ‘Big Thing’
A 2001 paper on Aspect Oriented Programming is on
Top 10 Downloads from ACM’s Digital Library
• Paper by our
collaborators
from Illinois
Institute of
Technology
ATLAS experience with AOP was first reported at the previous CHEP04
CHEP06 Mumbai India
Alexandre Vaniachine (ANL)
9
Open Science Grid
Testing New Functionalities
• Prototype servers built with DASH technology
are being tested in ANL, BNL, CERN and U
Geneva
• We thank to
–
–
–
Jason Smith (BNL)
Yuri Smirnov (BNL)
Frederik Orellana (U Geneva)
Among the new functionalities are
• Check for the proxy expiration time
• Host name checking (to reject impersonation)
CHEP06 Mumbai India
Alexandre Vaniachine (ANL)
10
Open Science Grid
Packaging Challenge
• Initial response from our beta-testers suggested that
because of the globus gsi libraries dependencies the
preferred distribution would be the static build
• However test showed that static builds works best on the
platforms (Linux distributions) very close to those that of
the build machine
• We experienced unexpected sensitivities to the minor
variations in the glibc library version
• We are now addressing that issue by developing the
dynamic build that will have the static globus gsi and
openssl libraries built in
CHEP06 Mumbai India
Alexandre Vaniachine (ANL)
11
Open Science Grid
Scalability Challenge
• Large-scale world-wide distributed simulations performed by the
ATLAS Collaboration show steady progress in grid computing
• The chaotic
Rome Production (mix of jobs)
LCG/CondorG
nature of
LCG/Original
opportunistic grid
NorduGrid
Grid3
computations
results in
Data Challenge 2
variations in daily
(short jobs period)
production rates
Data Challenge 2
• Database
(long jobs period)
services
capacities should
be adequate for
peak demand
14000
12000
Jobs/day
10000
8000
6000
4000
2000
0
Jul
CHEP06 Mumbai India
Aug
Sep
Oct
Nov
Alexandre Vaniachine (ANL)
Dec
Jan
Feb
Mar
Apr
May
12
Open Science Grid
Why Dynamic Deployment?
• The high level of sharing of computational resources
achieved on grids result in increased fluctuations in
demand for database services, because of the chaotic
nature of shared resource availability
• Static services deployment require over-capacity
• Opportunistic production on non-LCG sites requires
database services deployment on-demand
• To provide on-demand database services capability for
Open Science Grid, the Edge Services Framework
activity builds the DASH mysql-gsi database server into
the virtual machine image, which is dynamically
deployed via Globus Virtual Workspaces
CHEP06 Mumbai India
Alexandre Vaniachine (ANL)
13
Open Science Grid
Edge Services
• Services executing on the edge of the public and
private network
CMS
CE
ATLAS
CDF
Guest
VO
SE
Site
Compute nodes
and Storage nodes
• See CHEP06 contribution id # 214
http://indico.cern.ch/contributionDisplay.py?contribId=214&sessionId=7&confId=048
CHEP06 Mumbai India
Alexandre Vaniachine (ANL)
14
Open Science Grid
Synergistic Collaboration
CMS & ATLAS collaborate in OSG ESF Activity
http://www.opensciencegrid.org/esf
To achieve the ESF proof-of-concept milestone:
• The first ESF VM was deployed by CMS
• The first ESF service on that VM was by ATLAS:
– Grid-enabled MySQL database built by the DASH project
• To access the server the grid job used proxy certificate
(instead of the clear-text passwords hardwired in the scripts that are distributed world-wide)
CHEP06 Mumbai India
Alexandre Vaniachine (ANL)
15
Open Science Grid
Collaboration Benefits
Celebrating ESF proof-of-concept milestone at Supercomputing 2005
CHEP06 Mumbai India
Alexandre Vaniachine (ANL)
16
Open Science Grid
Globus Folder at SC05
http://www.globus.org/alliance/events/sc05
CHEP06 Mumbai India
Alexandre Vaniachine (ANL)
17
Open Science Grid
Complementary Project
• A new collaborative project with the
Globus team has just started at Argonne
– to grid-enable the PostgreSQL database
• Both DASH and the new project target
technology integration with OSGA-DAI
• Please contact us if you are interested to
contribute to these projects
CHEP06 Mumbai India
Alexandre Vaniachine (ANL)
18
Open Science Grid
OGSA-DAI Complementarity
Why you might NOT want to use OGSA-DAI
You want very fast data access
– OGSA-DAI is slower than direct connection methods e.g., JDBC
– But remember OGSA-DAI provides functionality “over and
above” these methods
• e.g. data delivery and transformation
You need scalability
•
Neil P Chue
Hong, OGSADAI Status
Summary
Third OGSADAI Users
Group Meeting,
6/1/2005
– Depends on your intended usage of e.g., delivery mechanisms,
number of clients etc.
You don’t care about interoperability with other
Grid software or are only using one type of data
• Through
resource our continued interactions with
OGSA-DAI team we
have
established
– OGSA-DAI
may be overkill working relationships to achieve technological
compatibility
CHEP06 Mumbai India
Alexandre Vaniachine (ANL)
19
Open Science Grid
Additional Benefits
• Direct access to database servers unleashes a broad
range of vendor-specific server capabilities for data
processing applications: distributed XA transactions,
binary data transport, etc.
• Grid proxy certificate technology opens technical
opportunities to enable fine-grained delegation of rights
for access control (attribute certificates)
• Grid-enabled relational database server technology has
the potential for application beyond the domain of high
energy physics, and is of interest to bioinformatics and
other data-intensive sciences
CHEP06 Mumbai India
Alexandre Vaniachine (ANL)
20
Open Science Grid
DASH Outreach
DASH Presentations at the Conferences and Workshops
Supercomputing 2005, November 12-18, 2005
Washington State Convention and Trade Center, Seattle, Washington, USA
http://osg-docdb.opensciencegrid.org/cgi-bin/ShowDocument?docid=307
First DIALOGUE Workshop: Applications-Driven Issues in Data Grids
August 1-2, 2005, The Ohio State University, Columbus, Ohio, USA
http://www.datagrids.org/ws/docs/High-performanceDatabaseAccess.ppt
CHEP06 Mumbai India
Alexandre Vaniachine (ANL)
21