lecture10_eScience - Homepages | The University of Aberdeen
Download
Report
Transcript lecture10_eScience - Homepages | The University of Aberdeen
CS5038 The Electronic Society
Lecture 10: e-Science
Lecture Outline
•
•
•
•
•
Background: “Big Science”
Grid Computing
Standards for Grid Computing
e-Science – what is it
e-Science Examples:
Social Simulations – modelling land-use change
Particle Physics (LHC),
Astronomy (VirtualObservatory)
Environmental Sciences – Climate Change
Engineering - Aircraft Maintenance
Economics – Predicting Markets
Bio-informatics – Simulated Biology
Healthcare - Cancer Diagnosis
1(#total)
e-Science - Background
“Big Science”
During early part of 20th Century, Science became crucial in warfare
World War II : Scientists developed new weapons and tools
proximity fuse, radar, atomic bomb, cryptography
Lead to a new form of research facility: Government-sponsored laboratory
thousands of technicians and scientists, managed by universities
Enabled hitherto impossible scientific projects
heavy investment by government and industrial interests:
blurred line between public and private research
Criticisms:
Undermines basic principles of scientific method: Results difficult to verify.
Access to facilities limited to those who are accomplished -> elitism.
Increased government funding often implies military agenda
Subverts the Enlightenment-era ideal of science as quest for knowledge.
Increased administrative overhead – e.g. filling out grant requests
Connections between academic, governmental, and industrial interests
Concern about Scientists’ objectivity (e.g. pharmaceutical industry)
Internet was born from "Big Science"
August 1991 CERN (Switzerland) : new World Wide Web project
2(#total)
Grid Computing
Grid computing evolved from the computational needs of “Big Science”
“Grid computing uses the resources of many separate computers connected by
a network (usually the internet) to solve large-scale computation problems.”
A conceptual framework rather than a physical resource:
flexible computational provisioning beyond the local administrative domain.
Involves sharing computing power:
heterogeneous resources (based on different platforms, hardware/software
architectures, and computer languages),
located in different places
belonging to different administrative domains
using open standards.
Requires security : to allow remote users to control computing resources.
Special Purpose Grid – Example: SETI@home project
General Purpose Grid - Example: Parabon Computation (Commercial)
In terms of function: Three types of grid:
Computational Grids : computationally-intensive operations.
Data grids: sharing and management of large amounts of distributed data.
Equipment Grids: control equipment remotely and analyse data produced.
e.g. controlling a telescope
3(#total)
Grid Standards - Globus
Globus Alliance is an association – mainly Universities
(e.g. Chicago, Edinburgh, Southern California)
Developing fundamental technologies needed to build grid computing
infrastructures
Most grids in Europe and North America use the Globus Toolkit as their
core middleware.
Globus software provides (e.g.):
Resource management: Grid Resource Allocation & Management Protocol
(GRAM)
Information Services: Monitoring and Discovery Service (MDS)
Security Services: Grid Security Infrastructure (GSI)
Data Movement and Management: Global Access to Secondary Storage
(GASS) and GridFTP
XML-based web services allow access to services/applications
grid computing and web services converge: Grid Service
Open Grid Services Architecture (OGSA): vision is to describe and build a
well-defined suite of standard interfaces and behaviours that serve as a
common framework for all Grid-enabled systems and applications.
4(#total)
e-Science
What is e-Science? - science enabled by electronic infrastructure
Computationally intensive
Uses highly distributed network environments
Requires access to immense data sets
May require Grid Computing
High performance visualisation back to the individual user scientists
Examples:
Social Simulations – modelling land-use change
Particle Physics (LHC), Astronomy (VirtualObservatory)
Environmental Sciences – Climate Change
Engineering - Aircraft Maintenance
Economics – Predicting Markets
Bio-informatics – Simulated Biology
Healthcare - Cancer Diagnosis
Middleware: Data communication, data integration
Organisations:
Requires large and complex infrastructure
Research Labs, Large Universities, Governments (e.g. UK)
5(#total)
e-Science Examples: Particle Physics
Large Hadron Collider (LHC) at CERN
Currently the most developed e-Science infrastructure
LHC due to start generating data in 2007/8/9??
Massive amount of data generated
Estimated at 10 petabytes each year (peta=1015)
Thousands of researchers across the world will be
involved in the LHC experiments and in analysing
results.
GridPP
UK’s contribution to analysing this data deluge.
Six-year, £33m project
Collaboration of around 100 researchers in 19 UK
University particle physics groups, CCLRC and CERN.
More than 100,000 PCs, spread at one hundred
institutions across the world.
Three main areas of work:
• Applications to allow physicists to submit data to
Grid for analysis
• Middleware to manage the distribution of
computing jobs around the grid and deal with
security
• Deploying computing infrastructure at sites
across the UK, to build a prototype Grid.
6(#total)
7(#total)
e-Science Examples: Astronomy
Astrogrid
£10M project to build a data-grid for UK astronomy
Forms the UK’s contribution to a global
VirtualObservatory
Three main strands to VirtualObservatory
1. International standards for astronomical data,
metadata, and software Interoperability
2. New software infrastructure using emerging
technology: web services and the Grid.
3. Science user tools to exploit the new infrastructure
will bring the VO to the astronomer’s desktop.
Goals of Astrogrid (mainly thread 2):
Datagrid for key UK databases
Datamining facilities for interrogating those databases e.g. search for ‘cloaked’ objects
A uniform archive query and data-mining interface
A facility for users to upload code to run their own
algorithms on the datamining machines
An exploration of techniques for open-ended resource
8(#total)
discovery
e-Science Examples: Climate Change
Climateprediction.net
To address the enormous variation in current climate
predictions
Existing climate models have to include the effects of smallscale physical processes (such as clouds) through
simplifications (parameterisations)
Results can be out by an order of magnitude
Experimental Objective: Ensemble Forecasting
Run thousands of climate models with slightly different
physics in order to represent the whole range of
uncertainties in all the parameterisations.
(parameters are varied within their current range of
uncertainty)
The project has already recruited 37,000 users
Project Goal:
to make the first fully probability-based
fifty-year forecast of human-induced
climate change using a full-scale 3-D
atmosphere-ocean climate simulation
model.
9(#total)
e-Science Example: Aircraft Maintenance
DAME project
£3.2 Million, 3 years, commenced Jan 2002.
4 Universities:
York, Sheffield, Oxford, Leeds
Industrial Partners:
Rolls-Royce, Data Systems, Cybula Ltd
Aim: aerospace diagnostics
Remote, secure access to flight data and other operational data and
resources
Rapid data mining and analysis of fault data
Distributed search on massive data collections using scalable, neural
network type methods for comparing data with archived fleet engine data.
Each flight could produce up to 1GB of vibration data
The DAME workbench (portal)
Analysis tools for the engine diagnosis process
Central control point for automated workflows
Manages distributed diagnosis team and virtual organisations
Manages issues of security and user roles.
10(#total)
e-Science Example: Aircraft Maintenance
Engine flight data
London Airport
Airline
office
New York Airport
Grid
Diagnostics Centre
Maintenance Centre
American data center
European data center
11(#total)
e-Science Example: Predicting Markets
The INWA Grid project (Innovation Node: Western Australia) :
Investigating suitability of existing Grid technologies for secure, commercial
data mining.
The three-continent Grid:
Edinburgh Parallel Computing Center (EPCC)
Curtin University in Western Australia (WA)
Chinese Academy of Sciences in Beijing.
Data mining to predict customer trends, develop new products and better meet
customer needs.
Samples drawn from a region + publicly available
-> build a clearer picture of regional behaviour within the economy
But: need a distributed-aggregated approach to preserve anonymity
Resources
UK mortgage data + UK property data
Australian telco data +Australian property data
Compute power at EPCC + Curtin
Scenario
A bank wants to predict if home owners are likely to move house within 5 years of
taking out a mortgage to buy the house
Bank wants to use its own data and publicly available data to help improve the
12(#total)
prediction
e-Science Example: Simulated Biology
BioSimGrid project
Aim: to make the results of large-scale
computer simulations of biomolecules more
accessible to the biological community.
Simulations of the motions of proteins are a
key component in understanding how the
structure of a protein is related to its dynamic
function.
Data distributed between University of California, San Diego
and Oxford.
Simulations were run using different programs and protocols
Data in very different formats.
•
•
•
•
•
Software tools for interrogation and data-mining
Generic analysis tools (python), visualisation VMD
Annotation of simulation data
Readily modifiable simple example scripts
Underlying data storage structure hidden
13(#total)
e-Science Examples: Cancer Diagnosis
Telemedicine on the Grid
Multi-site videoconferencing
Real-time delivery of microscope imagery
Communication and archiving of radiological
images
Supports multi-disciplinary meetings for the
review of cancer diagnoses and treatment.
Remote access to computational medical
simulations of tumours and other cancer-related
problems
Data-mining of patient record databases
Improved clinical decision making.
Currently clinicians travel large distances
Grid technology can provide access to
appropriate clinical information and images
across the network.
14(#total)