What is a Grid? - Community Grids Lab

Download Report

Transcript What is a Grid? - Community Grids Lab

Grids for Chemical
Informatics
Randall Bramley, Geoffrey Fox, Dennis Gannon, Beth Plale
Computer Science, Informatics, Physics
Pervasive Technology Laboratories
Indiana University Bloomington IN 47401
What is a Grid?

Name borrowed from the power grid.
• The concept:
 A ubiquitous information & computation resource

A definition
• a network of compute and data resources that has been
supplemented with a layer of services that provide uniform
and secure access to a set of applications of interest to a
distributed community of users.

Grids may be wide-area or enterprise
Scientific Challenges
Genetics and Disease Susceptibility
Science Communities and Outreach

The current and future
generations
of scientific problems
• Communities
are: • CERN’s Large Hadron Collider
experiments
• Data
Oriented
• Physicists working in HEP and
similarly data intensive
scientific
 Increasingly
stream
based.
disciplines
 Often
need
petabyte
• National
collaborators
and those
across the digital divide in
archives
disadvantaged countries
• In
need of on-demand
• Scope
computing
resources
• Interoperation
between LHC
Data Grid Hierarchy and ETF
• Conducted
by geographically
• Create and Deploy Scientific
distributed
teamsGrid
of specialists
Data and Services
Portals
• Bringdon’t
the Power
of ETF
bear
 Who
want
totobecome
on LHC Physics Analysis: Help
experts
in grid computing.
discover the Higgs Boson!
Phenotype 1
Phenotype 2
Phenotype 3 Phenotype 4
Ethnicity
Environment
Age
Gender
• Partners
Identify Genes
• Caltech
Pharmacokinetic s
• UniversityMetabolism
ofFlorida Endocrine
Biomarker
Physiology
Proteome
• Open Science
Grid
and Grid3
Signatures
Transcriptome
Immune
Morphometrics
• Fermilab
• DOE PPDG
Predictive Disease Susceptibility
Terry Magnuson, UNC
•Source:
CERN
Storms Forming
• NSF GriPhyn and iVDGL
Forecast Model
Streaming• EU LCG and EGEE
Data Mining
Observations
• Brazil (UERJ, …)
On-Demand
• Pakistan (NUST, …)
Storm predictions
• Korea (KAIST,…)
LHC Data Distribution Model
Information/Knowledge Grids


Distributed (10’s to 1000’s) of data sources (instruments,
file systems, curated databases …)
Data Deluge: 1 (now) to 100’s petabytes/year (2012)
• Moore’s law for Sensors

Possible filters assigned dynamically (on-demand)
• Run image processing algorithm on telescope image
• Run Gene sequencing algorithm on compiled data



Needs decision support front end with “what-if”
simulations
Metadata (provenance)
critical to annotate data
Integrate across experiments
as in multi-wavelength
astronomy
Data Deluge comes from pixels/year available
Internet Scale Distributed Services




Grids use Internet technology to manage sets of network
connected resources
• Classic Web: independent one-to-one access to individual
resources
• Grids integrate together and manage multiple Internetconnected resources: People, Sensors, computers, data
systems
Grids are built on top of commodity web service technology
with broad industry support
Organization can be explicit as in
• TeraGrid which federates many supercomputers;
• CrisisGrid which federates first responders, commanders,
sensors, GIS, (Tsunami) simulations, science/public data
Organization can be implicit such as curated databases and
simulation resources that “harmonize a community”
The Architecture of Gateway Grids
The Users Desktop.
Grid
Portal Server
Gateway Services
Proxy Certificate
Server / vault
Application
Workflow
Application
Deployment
Application Events
Resource Broker
App. Resource
catalogs
User Metadata
Catalog
Replica Mgmt
Core Grid Services
Security
Services
Information
Services
Self
Management
Resource
Management
OGSA-like Layer
Physical Resource Layer
Execution
Management
Data
Services
Let’s look at a few real
examples
(about a dozen … many more exist!)
BIRN – Biomedical Information
Mesoscale Meteorology
NSF LEAD project - making the tools that
are needed to make accurate predictions of
tornados and hurricanes.
- Data exploration and Grid workflow
Workflow in the LEAD Grid
Katrina
output
Renci Bio Gateway
Providing access to biotechnology tools running on a back-end Grid.
- leverage state-wide
investment in
bioinformatics
- undergraduate &
graduate education,
faculty research
- another portal
soon:
national evolutionary
synthesis center
X-Ray Crystallography
SERVOGrid
SERVOGrid Requirements


Seamless Access to Data repositories and large scale
computers
Integration of multiple data sources including sensors,
databases, file systems with analysis system
• Including filtered OGSA-DAI (Grid database access)





Rich meta-data generation and access with
SERVOGrid specific Schema extending openGIS
(Geography as a Web service) standards and using
Semantic Grid
Portals with component model for user interfaces and
web control of all capabilities
Collaboration to support world-wide work
Basic Grid tools: workflow and notification
NOT metacomputing
Repositories
Federated Databases
Database
Sensors
Streaming
Data
Field Trip Data
Database
Sensor Grid
Database Grid
Research
SERVOGrid
Education
Compute Grid
Data
Filter
Services Research
Simulations
?
GIS
Discovery Grid
Services
Customization
Services
From
Research
to Education
Analysis and
Visualization
Portal
Grid of Grids: Research Grid and Education Grid
Education
Grid
Computer
Farm
Google maps
can be
integrated with
Web Feature
Service
Archives to
filter and
browse seismic
records.
Integrating
Archived Web
Feature Services
and Google Maps
MyGrid - Bioinformatics
The Williams
Workflows
A
A: Identification of
overlapping sequence
B: Characterisation of
nucleotide sequence
C: Characterisation of
protein sequence
B
C
BioInformatics Grid
Chemical Informatics Grid
…
Services
HTS Tools
Quantum
Calculations
CIS
…
Sequencing Tools
Biocomplexity
Simulations
BIS
Domain Specific
Grids/Services
Information/Knowledge
Collaboration
Portals
Compute/Supercomputer
MIS
Instrument/Sensor
Application Services
Policy
Data Access/Storage
Discovery
Security
Core Low Level Grid Services
Messaging
Workflow
Metadata
Management
Physical Network
M(B,C)IS is Molecular (Bio, Chem) Information System supporting
specific metadata (CML, CellML, SBML) and physical representations
Comments on Grid Components












Support GT4 and WS-I+(+); Support Java and .NET
Portals – all services will have a portlet interface
Compute Grid -- This is some sort of Condor Grid (as used by Cambridge)
Supercomputer Grid -- (extended) TeraGrid
Workflow, Metadata, Information Management – learn from Taverna, link
with BPEL style workflow, link with other Semantic Grid/metadata services
Instruments – learn from CIMA/Reciprocal Net, compare with Sensors in
LEAD/SERVOGrid
MIS/CIS – See if idea sensible – in any case need CML, LSID, Molecular
visualization
Application Services – Need a wizard. Support “filters” (Wild) and loosely
coupled simulations (Baik)
Data – Link to PubChem and Bioinformatics – link to Baik database
Discovery – Extended UDDI
Security – review any special requirements and status of PubChem, caBIG,
myGrid etc,
Collaboration, Management, Messaging, Policy -- nothing special needed