Databases and Global Environmental Change - DPI
Download
Report
Transcript Databases and Global Environmental Change - DPI
Brazilian Academy of Sciences,
Annual Meeting, May 2012
Databases and Global
Environmental Change:
Information Technology for
Sustainable Development
Gilberto Câmara
INPE, Instituto Nacional de Pesquisas
Espaciais
The fundamental question of our time
source: IGBP
How is the Earth’s
environment changing, and
what are the consequences for
human civilization?
Global Change
Where are changes taking place?
How much change is happening?
Who is being impacted by the change?
Uncertainty on basic equations
Limits for Models
Social and Economic
Systems
Quantum
Gravity
Particle
Physics
Living
Systems
Chemical Hydrological
Models
Reactions
Solar System Dynamics
Global
Change
Meteorology
Complexity of the phenomenon
source: John Barrow
(after David Ruelle)
Limits for Models
Uncertainty on basic equations
e-science
Social and Economic
Systems
Quantum
Gravity
Particle
Physics
Living
Systems
Chemical Hydrological
Models
Reactions
Solar System Dynamics
Global
Change
Meteorology
Complexity of the phenomenon
source: John Barrow
(after David Ruelle)
Collaborative e-science
Connect expertise from different fields
Make the different conceptions explicit
Territory
(Geography)
Money
(Economy)
Modelling
(IT)
Culture
(Antropology)
Deforestation in Amazonia
Até 10%
10 - 20%
20 – 30%
30 – 40%
40 – 50%
50 – 60%
60 – 70%
70 – 80%
80 – 90%
90 – 100%
Amazonia (4.000.000 km2 = size of Europe)
Data (we need a lot of it)
Deforestation in Brazilian Amazonia (1988-2011)
dropped from 27,000 km2 to 6,200 km2
Real-time Deforestation Monitoring
Daily warnings of newly deforested large areas
How much it takes to survey Amazonia?
116-112
30 Tb of data
500.000 lines of code
150 man/years of software dev
200 man/years of interpreters
116-113
166-112
TerraAmazon – open source software
for large-scale land change monitoring
116-112
116-113
Spatial database (PostgreSQL with
vectors and images)
166-112
2004-2008: 5 million polygons, 500 GB images
Permanent
Welcome to the Age of Data-intensive Science!
Vantage Points
Capabilities
FarSpace
L1/HEO/GEO
TDRSS &
Commercial
Satellites
LEO/MEO
Commercial
Satellites
and Manned
Spacecraft
NearSpace
Aircraft/Balloon
Event Tracking
and Campaigns
Deployable
Airborne
Terrestrial
Forecasts & Predictions
User
Community
Weather and climate
source: WMO
11,000 land stations (3000 automated)
900 radiosondes, 3000 aircraft
6000 ships, 1300 buoys
5 polar, 6 geostationary satellites
ARGOS Data Collection System (16000 plats)
650,000 messages
processed daily
Argo bouy network
Data chain in Earth System Science
fonte: NASA
Data-intensive Science = principles and
applications of information technology for
handling very large data sets
Conjectures
IT concepts are essential to global change
researchers (but most of them don’t know it)
Global change challenges will motivate new
research in IT (but most of us are not looking
there)
Challenges for data-intensive science
Which data is out there?
How to organize big data?
How to get the data I need?
How to model big data?
How to access and use big data?
Stage 1 – A scientist’s personal database
User interface
Database creation
Database access
Local
database
Analysis
Stage 1 – A scientist’s personal database
User interface
The good: data is close to you (or so you think)
The
bad: no long-term
data preservation
Database creation
Database access
Analysis
no data sharing
Local
database
Stage 2 – A scientific lab database
User interface
Database access
Database creation
Corporate
database
Analysis
Stage 2 – A scientific lab database
User interface
The good: long-term data preservation
data sharing
inside the Analysis
lab
Database access
reusable corporate software
The
bad:
substantial
costs
on
data
admin
Corporate
Database creation
database
little outside data sharing
Metview
Metview – MOPTC June 2004 - 24
ECMWF
Field plotting
Metview – MOPTC June 2004 - 25
ECMWF
Stage 3 – A scientific lab database in the cloud
User interface
Database access
Database creation
Corporate
database
Analysis
Stage 3 – A scientific lab database in the cloud
User interface
The good: long-term data preservation
shared
costs on data admin
Database access
Analysis
The bad: rewrite software for cloud processing
outside
data
sharing
still
not
solved
Corporate
Database creation
database
Risk Analysis
Analysis
On-line data feed
DCP
Rain total
Fixed time and
irregular – alert
Point data
One file per DCP
Satellite/Radar
Grid 4km
Total rain 1h
Total rain 24h
Current (mm/h)
Binary file
Models
ETA 40, 20, 5 Km
Ensemble 40 Km
Total rain 72h
72 files
ASCII grid file
TerraMA2 - Natural Disasters Monitoring and
Alert System
Stage 4 – Multidatabase access
Modelling
Data discovery
Data
source
Data access
Data
source
Remote
Analysis
Remote
Analysis
Analysis
Data
source
Remote
Analysis
Stage 4 – Multidatabase access
Modelling
Data discovery
access
Analysis
The good:
long-termData
data
preservation
shared costs on data admin
access to large external database
The bad:
rewrite
software
for
cloud
processing
Data
Data
Data
source finding data
source
source
is
a
major
problem
Remote
Remote
Remote
Analysis
Analysis
Analysis
Data Access Hitting a Wall
Current science practice based on data download
How do you download a petabyte?
Data Access Hitting a Wall
Current science practice based on data download
How do you download a petabyte?
You don’t! Move the software to the archive
Scientific Data Management in the Coming
Decade (Jim Gray, 2005)
Next-generation science instruments and
simulations will produce peta-scale
datasets. Such peta-scale datasets will be
housed by science centers that provide
substantial storage and processing for
scientists who access the data via smart
notebooks. The procedural stream-ofbytes-file-centric approach to data analysis
is both too cumbersome and too serial for
such large datasets. Database systems will
be judged by their support of common
metadata standards and by their ability to
manage and access peta-scale datasets.
Virtual Observatory
If data is online, internet is
the world’s best telescope
Scientific Data
Management in the Coming
Decade (Jim Gray)
36
Where is scientific database going?
From tables to arrays
nome CPF
cargo
relation
(table)
Scientific
data
selection,
projection,
join,
relational
algebra
SELECT *
FROM images
WHERE date=“today”
SQL
language
Spatial queries,
Math operations
SELECT Mean (A.B)
FROM Array A
Array
Algebra
AQL
language
Communicating concepts is hard
vulnerability? climate change? poverty?
Image source: WMO
Communicating concepts is hard
We’re bad at representing meaning
deforestation? degradation? disturbance?
degradation
Communicating change is very hard
When did the Aral Sea reach the tipping point?
Describing events and processes is very hard
When did the flood occur?
Conclusions
Earth System Science data management poses a
major challenge for the database community
We need new techniques, architectures and data
handling techniques to deal with scientific data