lecture10_eScience - Homepages | The University of Aberdeen

Download Report

Transcript lecture10_eScience - Homepages | The University of Aberdeen

CS5038 The Electronic Society
Lecture 10: e-Science
Lecture Outline
•
•
•
•
•
Background: “Big Science”
Grid Computing
Standards for Grid Computing
e-Science – what is it
e-Science Examples:








Social Simulations – modelling land-use change
Particle Physics (LHC),
Astronomy (VirtualObservatory)
Environmental Sciences – Climate Change
Engineering - Aircraft Maintenance
Economics – Predicting Markets
Bio-informatics – Simulated Biology
Healthcare - Cancer Diagnosis
1(#total)
e-Science - Background
“Big Science”
 During early part of 20th Century, Science became crucial in warfare
 World War II : Scientists developed new weapons and tools
 proximity fuse, radar, atomic bomb, cryptography
 Lead to a new form of research facility: Government-sponsored laboratory
 thousands of technicians and scientists, managed by universities
 Enabled hitherto impossible scientific projects
 heavy investment by government and industrial interests:
blurred line between public and private research
Criticisms:
 Undermines basic principles of scientific method: Results difficult to verify.
 Access to facilities limited to those who are accomplished -> elitism.
 Increased government funding often implies military agenda
 Subverts the Enlightenment-era ideal of science as quest for knowledge.
 Increased administrative overhead – e.g. filling out grant requests
 Connections between academic, governmental, and industrial interests
 Concern about Scientists’ objectivity (e.g. pharmaceutical industry)
Internet was born from "Big Science"
 August 1991 CERN (Switzerland) : new World Wide Web project
2(#total)
Grid Computing
Grid computing evolved from the computational needs of “Big Science”
“Grid computing uses the resources of many separate computers connected by
a network (usually the internet) to solve large-scale computation problems.”
A conceptual framework rather than a physical resource:
 flexible computational provisioning beyond the local administrative domain.
 Involves sharing computing power:
 heterogeneous resources (based on different platforms, hardware/software
architectures, and computer languages),
 located in different places
 belonging to different administrative domains
 using open standards.
 Requires security : to allow remote users to control computing resources.
Special Purpose Grid – Example: SETI@home project
General Purpose Grid - Example: Parabon Computation (Commercial)
In terms of function: Three types of grid:
 Computational Grids : computationally-intensive operations.
 Data grids: sharing and management of large amounts of distributed data.
 Equipment Grids: control equipment remotely and analyse data produced.
e.g. controlling a telescope
3(#total)
Grid Standards - Globus
Globus Alliance is an association – mainly Universities
(e.g. Chicago, Edinburgh, Southern California)
 Developing fundamental technologies needed to build grid computing
infrastructures
 Most grids in Europe and North America use the Globus Toolkit as their
core middleware.
 Globus software provides (e.g.):
 Resource management: Grid Resource Allocation & Management Protocol
(GRAM)
 Information Services: Monitoring and Discovery Service (MDS)
 Security Services: Grid Security Infrastructure (GSI)
 Data Movement and Management: Global Access to Secondary Storage
(GASS) and GridFTP
 XML-based web services allow access to services/applications
 grid computing and web services converge: Grid Service
 Open Grid Services Architecture (OGSA): vision is to describe and build a
well-defined suite of standard interfaces and behaviours that serve as a
common framework for all Grid-enabled systems and applications.
4(#total)
e-Science
What is e-Science? - science enabled by electronic infrastructure
 Computationally intensive
 Uses highly distributed network environments
 Requires access to immense data sets
 May require Grid Computing
 High performance visualisation back to the individual user scientists
Examples:
 Social Simulations – modelling land-use change
 Particle Physics (LHC), Astronomy (VirtualObservatory)
 Environmental Sciences – Climate Change
 Engineering - Aircraft Maintenance
 Economics – Predicting Markets
 Bio-informatics – Simulated Biology
 Healthcare - Cancer Diagnosis
 Middleware: Data communication, data integration
Organisations:
 Requires large and complex infrastructure
 Research Labs, Large Universities, Governments (e.g. UK)
5(#total)
e-Science Examples: Particle Physics
 Large Hadron Collider (LHC) at CERN
 Currently the most developed e-Science infrastructure
 LHC due to start generating data in 2007/8/9??
 Massive amount of data generated
Estimated at 10 petabytes each year (peta=1015)
 Thousands of researchers across the world will be
involved in the LHC experiments and in analysing
results.
 GridPP
 UK’s contribution to analysing this data deluge.
 Six-year, £33m project
 Collaboration of around 100 researchers in 19 UK
University particle physics groups, CCLRC and CERN.
 More than 100,000 PCs, spread at one hundred
institutions across the world.
 Three main areas of work:
• Applications to allow physicists to submit data to
Grid for analysis
• Middleware to manage the distribution of
computing jobs around the grid and deal with
security
• Deploying computing infrastructure at sites
across the UK, to build a prototype Grid.
6(#total)
7(#total)
e-Science Examples: Astronomy
Astrogrid
 £10M project to build a data-grid for UK astronomy
 Forms the UK’s contribution to a global
VirtualObservatory
 Three main strands to VirtualObservatory
1. International standards for astronomical data,
metadata, and software Interoperability
2. New software infrastructure using emerging
technology: web services and the Grid.
3. Science user tools to exploit the new infrastructure
will bring the VO to the astronomer’s desktop.
Goals of Astrogrid (mainly thread 2):
 Datagrid for key UK databases
 Datamining facilities for interrogating those databases e.g. search for ‘cloaked’ objects
 A uniform archive query and data-mining interface
 A facility for users to upload code to run their own
algorithms on the datamining machines
 An exploration of techniques for open-ended resource
8(#total)
discovery
e-Science Examples: Climate Change
Climateprediction.net
 To address the enormous variation in current climate
predictions
 Existing climate models have to include the effects of smallscale physical processes (such as clouds) through
simplifications (parameterisations)
 Results can be out by an order of magnitude
 Experimental Objective: Ensemble Forecasting
 Run thousands of climate models with slightly different
physics in order to represent the whole range of
uncertainties in all the parameterisations.
(parameters are varied within their current range of
uncertainty)
 The project has already recruited 37,000 users
Project Goal:
 to make the first fully probability-based
fifty-year forecast of human-induced
climate change using a full-scale 3-D
atmosphere-ocean climate simulation
model.
9(#total)
e-Science Example: Aircraft Maintenance
DAME project
 £3.2 Million, 3 years, commenced Jan 2002.
4 Universities:
 York, Sheffield, Oxford, Leeds
Industrial Partners:
 Rolls-Royce, Data Systems, Cybula Ltd
Aim: aerospace diagnostics
 Remote, secure access to flight data and other operational data and
resources
 Rapid data mining and analysis of fault data
 Distributed search on massive data collections using scalable, neural
network type methods for comparing data with archived fleet engine data.
 Each flight could produce up to 1GB of vibration data
The DAME workbench (portal)
 Analysis tools for the engine diagnosis process
 Central control point for automated workflows
 Manages distributed diagnosis team and virtual organisations
 Manages issues of security and user roles.
10(#total)
e-Science Example: Aircraft Maintenance
Engine flight data
London Airport
Airline
office
New York Airport
Grid
Diagnostics Centre
Maintenance Centre
American data center
European data center
11(#total)
e-Science Example: Predicting Markets
 The INWA Grid project (Innovation Node: Western Australia) :
 Investigating suitability of existing Grid technologies for secure, commercial
data mining.
 The three-continent Grid:
 Edinburgh Parallel Computing Center (EPCC)
 Curtin University in Western Australia (WA)
 Chinese Academy of Sciences in Beijing.
 Data mining to predict customer trends, develop new products and better meet
customer needs.
 Samples drawn from a region + publicly available
-> build a clearer picture of regional behaviour within the economy
 But: need a distributed-aggregated approach to preserve anonymity
Resources
 UK mortgage data + UK property data
 Australian telco data +Australian property data
 Compute power at EPCC + Curtin
Scenario
 A bank wants to predict if home owners are likely to move house within 5 years of
taking out a mortgage to buy the house
 Bank wants to use its own data and publicly available data to help improve the
12(#total)
prediction
e-Science Example: Simulated Biology
BioSimGrid project
 Aim: to make the results of large-scale
computer simulations of biomolecules more
accessible to the biological community.
 Simulations of the motions of proteins are a
key component in understanding how the
structure of a protein is related to its dynamic
function.
 Data distributed between University of California, San Diego
and Oxford.
 Simulations were run using different programs and protocols
 Data in very different formats.
•
•
•
•
•
Software tools for interrogation and data-mining
Generic analysis tools (python), visualisation VMD
Annotation of simulation data
Readily modifiable simple example scripts
Underlying data storage structure hidden
13(#total)
e-Science Examples: Cancer Diagnosis
Telemedicine on the Grid
 Multi-site videoconferencing
 Real-time delivery of microscope imagery
 Communication and archiving of radiological
images
 Supports multi-disciplinary meetings for the
review of cancer diagnoses and treatment.
 Remote access to computational medical
simulations of tumours and other cancer-related
problems
 Data-mining of patient record databases
 Improved clinical decision making.
 Currently clinicians travel large distances
 Grid technology can provide access to
appropriate clinical information and images
across the network.
14(#total)