Water: Data collection, management, and analysis in the cloud

Download Report

Transcript Water: Data collection, management, and analysis in the cloud

Potential use of Cloud computing
for streamlining the processing of
MT data
Prof J Craig Mudge FTSE
Collaborative Cloud Computing Lab
(C3L)
New eScience Lab
enabled by
cloud computing
Seed funding from
-- minerals and geothermal research at www.pir.sa.gov.au
-- Microsoft Research USA Jim Gray Seed Grant
Prof Graham Heinson
Prof J Craig Mudge FTSE
Stephan Thiel
Pinaki Chan
Jared Peacock
Wei Wang
Andrew Wendelborn
Acknowledgements:
David Giles, Richard Lane, Tim Baker, Tristan Wurst
2
JCM 30 Sept 2010
Magnetotelluric (MT) imaging
1.
2.
3.
Using the magnetic and electric
fields of the earth, MT imaging
determines the resistivity
structure of a sub-surface area of
interest.
It goes deeper (hundred or so Km)
than seismic (<2 Km) but does not
have the same resolution
Applications
1.
2.
3.
4.
5.
6.
mineral exploration,
water management in mining,
geothermal exploration,
carbon storage,
aquifer research and management
earthquake and volcano studies.
(Heinson and Mudge, 2010)
CO2 in depleted gas field
3
[email protected]
27 sep 2010
Overview of cloud computing
4
Ahead for research in minerals and
energy
1. Data deluge
Gene sequencers
25 Terabytes per day
Large Hadron Collider
700MB of data per second,
60TB/day, 20PB/year
Square Kilometre Array
Petabytes per day
2. Computation, e.g., rapid inversion
3. Data and experiments: curation, provenance, sharing,
reuse
5
Approx 100,000 PCs in Google data centres
Google Goose-Creek
Google Dalles Oregon
From www.cloudinnovation.com.au
6
Essence of cloud
1. software as a service – applications are delivered
over the Internet with a common-or-garden browser
2. significant cost savings, factors of 5x – 7x
3. presented as a utility
with a matching business model, namely pay-per-use
4. a new data-parallel programming framework
7
Cost savings in warehouse-sized data centres
1. resources in massive warehouse-sized data
centres are pooled at scale,
2. built from low-cost commodity chips and
disks
(run time environment of MapReduce, Dryad takes
care of fault tolerance, scheduling, and load
balancing)
3. share the overhead of cooling, refrigeration,
physical security, and backup power,
8
Execution of MapReduce
The Map step is shown as
M
in the following slides
(Dean and
Ghemawat,
2004)
9
Decomposition
• Task decomposition
– How can a problem be decomposed into tasks
that can execute concurrently?
• Data decomposition
– How can a problem's data be decomposed into
units that can be operated on relatively
independently?
then dependencies among the tasks
– Group tasks, Order tasks, and Data Sharing
10
Station 1
M
M
M
M
Station n
Sort by key
Parallel execution of MT data- one per
station
R
R
R
M
11
Parallel execution of gridded exploration data
by using sub grids when the original is too big to do as one grid
M
Form sub
grids
M
R
M
R
M
M
R
Re-combine
Concrete example: Map step is an existing
MatLab program running on Amazon EC2
12
Water: Data collection, management, and analysis in the cloud
Data collection, aggregation
- high volumes of complex heterogeneous data
Data integration/
data fusion
Data use
Metadata and databases of interest
gateway
Existing data bases
Wireless ad-hoc networks
- mesh networked motes
with sensors
40 mm
Sensors -- 10 years
On 2 AA batteries
Potentially energy scavenging,
too
satellite
River data:
from
sensors (both
mobile , moored)
Weather
Aquifer
River
Irrigation
Organisations
(water, government,
regulators, market
operators, and
researchers) will
mine this data.
Remote sense
(satellite)
Data clean
Data analysis
Data repurpose
Historical photos
Visualisaton
etc
www.pacific-challenge.com
Academy Working Group
Cloud computing at peta-scale
1. Alex Zelinsky, CSIRO Group
Executive, 17 May 2010 “The
Academy project has been a real catalyst for
getting the cloud computing agenda moving
forward in Australia.”
2. Summer internships – cloud
computing
$1,000 prize won by Jinhui Yao for his
security project in an internship
hosted by CSIRO
3. Report to be launched October 14
in Canberra
14
www.cloudinnovation.com.au
NBN: fiber/wireless net
connecting mobile and
fixed clients to a cloud
computing infrastructure
for applications & content
Mobile Clients
N
B
N
Cloud
Computing:
Services &
Content
Fixed
Clients &
Client Nets
Television Content
Computer person’s view of NBN:
“Continuous Services i.e. apps &
Client Connected Devices”
Mobile Clients
Connected Devices
N
B
N
Cloud
Computing:
Services &
Content
Fixed
Clients &
Client Nets
Television Content
Our cloud service providers
Google
Document
sharing
Calendar
Search
18
Amazon
Microsoft
Drop box
Azure
Computation Computation Document
sharing
Storage
Code reuse
Search
Our first application domain,
magnetotellurics
19
Magnetotelluric (MT) imaging
1.
2.
3.
Using the magnetic and electric
fields of the earth, MT imaging
determines the resistivity
structure of a sub-surface area of
interest.
It goes deeper (hundred or so Km)
than seismic (<2 Km) but does not
have the same resolution
Applications
1.
2.
3.
4.
5.
6.
mineral exploration,
water management in mining,
geothermal exploration,
carbon storage,
aquifer research and management
earthquake and volcano studies.
(Heinson and Mudge, 2010)
CO2 in depleted gas field
20
[email protected]
27 sep 2010
MT Station data
from logging
in the field
Station 1
Convert
to EDI
Station 2
Convert
to EDI
Clean
Broadband processing
E field conversion to standard units
BIRRP
Outputs from
BIRRP are
(a)impedance
Z, where E=ZB
(b)coherence
data
(c) Apparent
resistivity
and phase
Convert
to EDI
21
Forward
Modelling
and
Inversion
Station n
Time series
inspect
with
GMT
plots
Apparent resistivity
Forward model and inversion
Start
<
Compute MT response
of new model
Compare Model response
and MT observed data
Update model
N
Required
misfit?
Y
<
Exceeded
max # of
iterations?
Y
N
MT Processing
1. Time series data from
stations
Currently
Remove outliers
-
To frequency domain
Apply BIRRP (Chave,
Thomson 1989
~ 24 hours
(robust METHOD)
Produces resistivity – by frequency
and phase
2. Inversion to produce
subsurface image
~3 to 4 weeks
for 3D
(Siri 2005)
Chave and Thompson Bounded influence magnetotelluric response function
estimation. Geophys. Jnl. Int. 1989
Siripunvaraporn, Egbert, Lenury, and Uyeshima. Three dimensional magnetotelluric
inversion: data-space method. Physics of the Earth and Planetary Interiors 150. 2005
Reflections – September 2010
Value of cloud for PIRSA, our MT processing, and CRC DET
1. Access to cheap flexible computing
1.
2.
Amazon runs Fortran, Matlab, Python, etc. E.g., T Dhu’s gridded execution
On-demand purchase of a couple of hours of a more powerful computer (generally in
memory – 8 Gbytes, for example); pricing is growing in sophistication – spot pricing, microinstances, etc.
2. Parallel execution
1.
2.
Easy to get concurrent execution of steps, e.g., 45 stations
Parallel within a step (Google’s MapReduce and Dryad/LINQ) is hard work, but have made a
little progress
3. Our future work on integration in multi-layered data bases has
been strongly endorsed
Disappointments
Honours student gave up on Visualisation of sub-surface layers using Bing/Google Earth
eScience workflow was a major contribution (unexpected)
1.
2.
Less human interaction, repeatable, provenance, sharing of workflows internationally
Increasingly important, as volume of data grows
No machines Lab: “built first cloud based server, which is the SVN
server for C3 Lab in the Amazon EC2 cloud. “
Craig Mudge 29.9.2010
Scientific Workflow Systems
• Value proposition: More time on science, less time on code, admin
• How: By providing language emphasizing sharing, reuse, reproducibility, rapid
prototyping, efficiency
–
–
–
–
–
Provenance
Visual programming
Integration with domain-specific tools
Scheduling
Data curation
2010: Honours project
in Geophysics – Tristan Wurst –
Steps in MT processing
Bill Howe, UW
3/12/09
25
Porosity Joint Inversion
Start
Invert for a single
parameter,
to which both techniques
are sensitive
porosity
model
Porosity-Density
relationship
density
model
Archie’s Law
conductivity
model
update
model
gravity
MT
model response model response
gravity
observed data
MT
observed data
compare
No
Is: a) required misfit
obtained or b) max.
number of iteration
exceeded ?
Yes
(Rachel Maier, 2010)
Stop
(Rachel Maier, 2010)
Renmark Trough

MTInversion
   


SW

1 
0

 





2
1
1.5
1
3
2
4
3
5
4
1
0.5
RMSMT=2.3
RMSMT
5 =2.3
0.5
0
10
15
depthdepth
(km) (km)
Joint Inversion
  
 
  
  
  
  
 
 
   
   
    
1  
0
2
1
4
3
5
4
5
RMSJI5=5.3
5
RMSMT=5.3
10
10
15 RMSMT=5.3
20
distance (km)
15
20
distance (km)
RMSGV =4.5
25
25
RMS
30 GV =4.535
30
35
5
2
10
10
2
D=2800
Distance (km)
35
10
1
10
10
0
0
10
90
90
60
60
90
30
30
60
0
-2
10
1
0
10
0
10
0
-2
30 10
2
10
1
1
10
0.5
0
0
10
2
10
phase
Basement
35
30
2
10
phase
Depth (km)
D=2360
30
2
1.5
90
Devonian
15
20
25
15
distance20
(km) 25
distance (km)
10
Seismic constrained Gravity
-40
5
1
0.5
RMSJI=5.3
-30
-30
-40
1.5
1
3
2
-20
-20
0
(ohm.m)
resistivity
(ohm.m)
resistivity
20
25
30
35
distance (km)
5  
 
10
 
  
15
 
  
20
 
 
 25
 
  30

    35

0  
distance (km)
5
-10
-10
2
1.5
zg (mGals)
  
g (mGals)

App Res ( .m)
App Res ( .m)
depthdepth
(km) (km)

2
App Res ( .m)
phase App Res
( .m)

0
10
period (s)
0
10
period (s)
2
10
phase
 
z

NE
(ohm.m)
resistivity
(ohm.m)
resistivity
0 
TE data
TE data
TEmodel
modelresponse
response
TE
TMdata
data
TM
TMmodel
modelresponse
response
TM
60
0
-2
30
10
TE data 0
-2
TE 2model response
10
10
TM data
TM model response
0
10
period (s)
0
10
period (s)
2
10
Data logging with near real-time feedback
Data
and geologist’s
data integrations
Sub-surface
28
Compute
Future areas
1.
2.
3.
4.
Seismic
Inversion and forward modelling in general
Rapid inversion, too
Data integration or data fusion
across multiple layers
5. Data mining
29
Vision:
A geologist steering a drill in real time, using real-time sensing
of the sub-surface and updating geological models,
while referring to her cloud-based data sets
and collaborating with her team
Data
Compute
and geologist’s
back home
Geologist
data integrations
in field
Seismic,
Satellite,
MT,
Petrophysical
Cores,
Density
drilling machine
control system
etc
steering
Collaboration
Sensing – a dozen or more sensors
Seismic
XRF
Resistivity
etc
30
Sub-surface
www.cloudinnovation.com.au
[email protected]
0417 679 266
Searching the Deep Earth: sustaining
your wealth for the next century
High Flyers
Think Tank
Canberra
19–20 Aug 2010
32
from draft report
... nationally coordinated program
to deploy new geophysical tools
(magneto telluric, passive seismic)
and methods (geochemical)
integrated with
a comprehensive drilling program.
...
next, using petascale computing,
Storage, and network resources
these data will be integrated into
multi-dimensional databases ...
Searching the Deep Earth: sustaining
your wealth for the next century
High Flyers
Think Tank
Canberra
19–20 Aug 2010
33
from draft report
... nationally coordinated program
to deploy new geophysical tools
(magneto telluric, passive seismic)
and methods (geochemical)
integrated with
a comprehensive drilling program.
...
next, using petascale computing,
Storage, and network resources
these data will be integrated into
multi-dimensional databases ...
The Power Wall
www.pacific-challenge.com
34