Grid - Faculty of Computer Science & Engineering

Download Report

Transcript Grid - Faculty of Computer Science & Engineering

Grid computing and e-Science
Grid computing
Lecturer:
PhD. Phạm Trần Vũ
Presenter:
Phan Quang Thiện
Trần Phước Hiệp
Nguyễn Minh Nhật
e-Science
1
Outline






Grid computing
What’s e-science
New modes of scientific inquiry
Fault diagnosis and prognostic system
Grid service for diagnostic problem
Distributed Aircraft Maintenance Environment(DAME)
project
Conclusion
e-Science
2





“e-Science is about global collaboration in key areas of science,
and the next generation of infrastructure that will enable it.”
John Taylor
Director General of Research Councils
Office of Science and Technology
Purpose of the UK e-Science initiative is to allow scientists to do
‘faster, better or different’ research
3

“At the heart of the cyberinfrastructure vision is the development of a
cultural community that supports peer-to-peer collaboration and new
modes of education based upon broad and open access to leadership
computing; data and information resources; online instruments and
observatories; and visualization and collaboration services”.
Dr. Arden L. Bement, Jr. , Director of National Science Foundation

Includes not only computers but also data storage resources and
specialized facilities

Long term goal is to develop the middleware services that allow
scientists to routinely build the infrastructure for their ‘Virtual
Organisations’
4



Data-intensive science
Simulation-Based Science
Remote Access to Experimental Apparatus
5


Worldwide, scientists and engineers are producing,
accessing, analyzing, integrating and storing
terabytes of digital data daily through
experimentation, observation and simulation
These vast amount of data needs to be preprocessed
and distributed for further analysis.
6
Balloon
(30 Km)
Annual data storage:
CD stack with
1 year LHC data!
(~ 20 Km)
12-14 PetaBytes/year
50 CD-ROM
= 35 GB
6 cm
Concorde
(15 Km)
Each of the four LHC experiments
will generate several petabytes of
experimental data per year
Mt. Blanc
(4.8 Km)
7



The Japanese Earth Simulator was in 2003 running numerical
simulations of Earth’s climate at a sustained rate of 40
teraflop/sec.
The U.S. Encyclopedia of Life (EOL) project.
 http://www.eol.org/
The UK Comb-e-Chem project
 The goal of this project is to “synthesize” large numbers of
new compounds by high-throughput combinatorial methods
and then map their structure and properties.
Structure + Properties
Knowledge + Prediction
8


The advance of technology is also producing revolutionary new
experimental apparatus.
Allow remote participants to design, execute, and monitor
experiments.
9
Sharing engineering research equipment, data
resources, and leading edge computing resources.
Remote access to perform teleobservation and
teleoperation of experiments.
10



The convergence of information, grid, and networking technologies with
contemporary communications now enables science and engineering
communities to pursue their research and learning goals in real-time and
without regard to geography.
The size and/or complexity of the problem requires that people in
several organizations collaborate and share computing resources,
data, instruments
Virtual organization:
A set of individuals and/or institutions defined by such sharing rules
In other words, VOs are dynamic federations of heterogeneous
organizational entities sharing data, metadata, processing and
security infrastructure
11
•
•
•
•
If you need huge Computing Power and/or Data
Storage
If do not have a supercomputer in your institution
If you have access to a “reasonable” network
connection
 Grid (Distributed Computing) could be a good
solution
12
HPC
Storage
Experime
nt
Analysis
HPC
Scientist
Experiment
Computing
Storage
HPC
Analysis
13
Scientist
M
I
D
L
E
W
A
R
E
Experime
nt
Storag
e
Analysi
s
Computing
Computin
g
Experime
nt
Analysi
s
Storag
e
Computin
g
Storag
e
14
Scientists
Infrastructure
use Web 2.0 here
Grid
16
The social process
of science
Digital
Libraries
Virtual Learning
Environment
Undergraduate
Students
scientists
Graduate
Students
Reprints
PeerReviewed
Journal &
Conference
Papers
Technica
Preprints l Reports
&
Metadata
Repositories
experimentation
Local
Web
Certified
Experimental
Results & Analyses
Data,
Metadata
Provenance
Workflows
17
An e-Science Grid Framework
Scientific
Informatics
Data-Intensive
Iterative
Solver
Short-AreaLayer 1:
Existing Server/
Network
based
Infrastructu
Super Computer
PC Cluster
re
Data
Storage
•••
Visualization &
Collaboration
Data
Collaborative
Management
•••
Special
Instrument
Security
services
Security
Mathematical
& Theoretical
Simulations
Authentication &
authorization
Resource
Monitoring
Uniform
Resource
Access
Resource
Management
Brokering
Globus
Layer 2:
Core Grid
Engine
ComputeIntensive
Grid Information
Service
Layer 3:
Application
Toolkit
Simulations
Of
Materials
Uniform Data
Access
Parallel Molecular
Modeling
Co-scheduling
Layer 4:
Portal &
Application
Fast-Ethernet
based
PC Cluster
18

Capture individual data transformation and analysis steps

Large monolithic applications broken down to smaller jobs

Smaller jobs can be independent or connected by some
control flow/ data flow dependencies

Usually expressed as a Directed Acyclic Graph of tasks

Allows the scientists to modularize their application

Scaled up execution over several computational resources
19



Workflows orchestrate processes on the Grid
Workflows are a processing model that incorporate
tasks, data, and rules.
Workflow management systems execute tasks on
the Grid using data once the task’s dependencies
are satisfied based on rules.
Task
1
Task
2
Task
3
Task
5
Task
4
20




A decision system that develops strategies for
reliable and efficient execution in a variety of
environments.
Reliable and scalable execution of
dependent tasks
Reliable, scalable execution of independent
tasks (locally, across the network),
priorities, scheduling
Cyberinfrastructure: Local machine, cluster, PBS (Condor) pool, Grid
21
Execute Environment





Globus and Condor Services for
job scheduling
Globus Services for data
transfer and Cataloging
Information Services:
- information about data
location
- information about the
execution sites
22
The Grid Problem
1.
2.
3.
6.
7.
8.
Everyday researchers doing everyday research
BUT heroic Grid infrastructure not being adopted
A data-centric perspective, like researchers
BUT Grid gives APIs to computation not data
Collaborative and participatory
BUT Grid has deeply rooted service provider mindset
Better not Perfect
BUT Grid aims to provide well-engineered perfect solution
Giving autonomy to researchers
BUT Grid imposes institutional control (at this time)
About pervasive computing
BUT Grid is about portals, not the next generation of users
23
Summary





e-Science is about doing new science
Grid is just one part of the solution
Users are not just consumers of infrastructure.
Empower them.
Think Web 2.0 on top of Grid and other services
Workflows make e-Science easier, and Web 2
makes workflows easier.
Diagnosis and prognostic system


Grid computing
Computer-based fault diagnosis and prognostic (DP)
Arise in many domains : medicine, engineering,
transport, and aero-space
e-Science
25
Operational Scenario
Engine flight data
London Airport
Airline
office
New York Airport
Grid
Diagnostics Centre
Maintenance Centre
American data center
European data center
Grid computing
e-Science
26
Diagnosis and prognostic (DP) System





Grid computing
Data-centric
Require complex interactions among agents
Distributed
Need to provide supporting and qualifying evidence
for the DP offered
Safety and business critical and high dependability
requirements
e-Science
27
Data Centricity





Grid computing
Integrating data from several different system for
root cause determination
Require vast data repositories
The types of data can also be highly diverse
Not only sensor data but also non-declarative
knowledge
The interpretation of the knowledge can vary among
the entities
e-Science
28
Data Centricity
Grid computing:
 Knowledge and semantics (chapter 23)
 Solutions for the management and archiving of large
data repositories
 Remote collection and distribution of data
 Coherent integration of information from diverse
databases (chapter 22)
Grid computing
e-Science
29
Multiple stakeholders
Involve a number of stakeholders
 The system owner
 Experts
 The commercial service provider
 ….
Grid computing :
 Interaction of diverse parts is inherent within the Grid
computing model
Grid computing
e-Science
30
Distribution



Grid computing
Data storage, data mining, and fault diagnosis may
take place at different location
Across diverse IT systems
The system can also be highly dynamic : involving a
number of disparate entities (virtual, change often)
e-Science
31
Distribution
Grid computing:
 The standardization of communication and
application protocols in the Grid paradigm

Grid computing
Grid portal : support effective interactions with users
e-Science
32
Data Provenance


Transparency and trust results
Steps to arrive at a decision
Grid computing :
 Develop open data communication protocols
 Meta-labeling schemes
Grid computing
e-Science
33
Dependability



Grid computing
Guaranteed service availability
Data security
System security
e-Science
34
Dependability
Grid computing:
 Offer a security model to secure distributed
computing (chapter 21)
 Address data access and data confidentiality
 The concept of guaranteed service and quality-ofservice (chapter 18)
Grid computing
e-Science
35
The aero-engine DP problem




Modern aero-engine must operate with extremely
high reliability
Combine advanced mechanical engineering systems
with electronic control systems
Using engine sensor
Prognostic applications
Grid computing
e-Science
36
DAME project
Engine flight data
London Airport
Airline
office
New York Airport
Grid
Diagnostics Centre
Maintenance Centre
American data center
European data center
Grid computing
e-Science
37
DAME project
Principal challenges :
 Vast data repositories
 Advanced pattern-matching and data-mining
methods with suitable response times
 Collaboration among a number of diverse actors
Grid computing
e-Science
38
DAME service
DAME Diagnostics
Portal
Grid Services Management
Modelling/
Simulation
Case Based
Reasoning
...
Decision
Support
QUOTE
Novel
Data
AURA-G
Grid computing
Parts
Data
Operational
Data
Service
Data
Data-Mining
Raw
Engine
Data
The Grid
Vibration
Shaft Speed
Fuel Flow
e-Science
39
Core services and tools





Grid computing
Engine data service
Data storage and mining service
Engine modeling service
Case-based reasoning support
Maintenance interface service
e-Science
40
Engine data service



Grid computing
Control the interaction between QUOTE system and
its communication to ground station
Establish the link to the Grid data repositories.
Many replication of this service : highly transient
e-Science
41
Data storage and mining service

Consists of the AURA patter-matching engine system


Grid computing
Use specialized methods to rapidly search both raw and
archived engine data
Resemble data-mining service
e-Science
42
Engine modeling service

Infer the current state of the engine

Perform model-based data fusion
Grid computing
e-Science
43
Case-based reasoning support




Grid computing
Use case-based reasoning to improve the knowledge
base
Capture fault DP methods in a procedural way
Manage workflows associated with DP operations
Build and maintain the DAME knowledge base
e-Science
44
Maintenance interface service


Grid computing
Organize all interaction with stake-holders involved
in taking remedial actions
Capture information that helps validate or refine the
output from the preceding DP processes
e-Science
45
Grid computing
e-Science
46
Grid computing
e-Science
47
Conclusion




Grid computing
Ambitious vision for the future of science and
engineering
The realization of this vision will require long-term
investments of financial resources
Should not underestimate the difficulty of the
technical challenges before realize the vision
The realization of these goals is extremely important
for the future of science and engineering
e-Science
48
Q&A
Thank you!
Grid computing
e-Science
49
Reference




I. Foster and C. Kesselman, The Grid 2: Blueprint for a
New Computing Infrastructure. Morgab Kaufmann
Publishers, 1999.
Cyberinfrastructure Vision for 21st Century Discovery
(NSF)
National e-Science centre :
http://www.nesc.ac.uk/action/esi/
Dame homepage : http://www.cs.york.ac.uk/dame/
Grid computing
e-Science
50