Klausen | Big data: What`s GAW got to do with it

Download Report

Transcript Klausen | Big data: What`s GAW got to do with it

Big data
What’s GAW got to do with it …
Jörg Klausen
Chair ET-WDC
CAS EPAC SSC Meeting, 15-17 March 2016
WMO, Geneva
Pasted from <http://www.mckinsey.com/mgi/our-research
Outline
• «Big data»
• «GAW data»
• «Science as a service»
• Conclusion
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
2
BIG DATA
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
3
Evolution of term «big data»
Fig. 1. Frequency distribution of documents containing the term “big data” in ProQuest Research Library.
Source: Amir Gandomi, Murtaza Haider, Beyond the hype: Big data concepts, methods, and analytics, International Journal of Information Management,
Volume 35, Issue 2, 2015, 137–144, http://dx.doi.org/10.1016/j.ijinfomgt.2014.10.007
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
4
adapted from <https://www.linkedin.com/pulse/big-data-brokers-lost-privacy-saranya-anandh>
Facts about «big data»
•
Google manages >1 million PB and processes >24 PB of data
every day (a lot more than all printed material in the U.S.
Library of Congress.)
•
>1 billion Google searches are conducted every day
>250 billion email communication happens every day.
•
YouTube has >1 billion unique visitors per month
>6 billion hrs of video watched per month on YouTube
(~1 hour for every person on Earth and 50% more than in 2014)
•
90% of the data in the world today has been created in the past
2 years.
•
Data are forecast to double every 2 years until 2020.
•
In 2020, the amount of digital data produced will exceed 40
zettabytes (5,200 GB for every homo sapiens on Earth)
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
5
What is «big data»?
 Big data is high volume, high velocity, high variability, low veracity
(reliability), high value data (e.g., Dunbill (2012), https://www.oreilly.com/ideas/what-is-big-data )
 Big data is largely «unstructured» data
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
6
Handling «big data»
Fig. 3. Processes for extracting insights from big data.
Source: Amir Gandomi, Murtaza Haider, Beyond the hype: Big data concepts, methods, and analytics, International Journal of Information Management,
Volume 35, Issue 2, 2015, 137–144, http://dx.doi.org/10.1016/j.ijinfomgt.2014.10.007
• Facial recognition technologies (for customer profiling)
• «Clickstream» analysis (for web sites)
• Data mining from mobile devices
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
7
Techniques for analyzing «big data»
• Text analytics (text mining)
• Information extraction, text summarization
• Question answering («Siri», «Watson»)
• Sentiment analysis (opinion mining)
• Audio, Video and Social media analytics
• Facial recognition technologies (for customer profiling)
• «Clickstream» analysis (for web sites)
• Image and pattern recognition
• Predictive analytics
• Uncover patterns and capture relationships in data
Source: Amir Gandomi, Murtaza Haider, Beyond the hype: Big data concepts, methods, and analytics, International Journal of Information Management,
Volume 35, Issue 2, 2015, 137–144, http://dx.doi.org/10.1016/j.ijinfomgt.2014.10.007
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
8
GAW DATA
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
9
Traditional sources
• Land-based in-situ and remote-sensing observations
• Balloon-borne in-situ and aircraft observations
• Satellite observations
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
10
New sources
• Potentially (a lot) more land-based in-situ and remote-sensing
observations
• «GAW Local» stations
• «Citizen scientists» operating private stations
• Mobile devices
•
•
•
•
•
Sensors mounted on vehicles
«Personal health»-related sensors
UAV networks
Managed telecom balloon networks («Google Loon»)?
Facebook statements, Tweets, WhatsApp messages,
Instagram photos, … ?
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
11
New Sources: Examples
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
12
Governance and data curation
• Past and present
• Under the auspices of NMHSs, other governmental organizations or
academic institutions
• Curation of data by GAW World Data Centers, and other international
and national or program-specific data centres
• Future
• Under the auspices of EPAs, local governments, private companies,
academic institutions
• Curation of data by national or program-specific data centres, private
companies, non-profit organizations, “Google”
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
13
Are «GAW data» «Big data»?
Aspect
Traditional sources
New sources
Volume
Relatively small (except
satellite data)
Growing, (potentially)
huge
Velocity
Most data have high
latency
Most data in n.r.t.
Variability
Well-structured data
Well-structured data
Veracity (reliability) (Normally) high
(Often) unknown
• Some aspects of «big data» (volume, velocity, veracity)
• Still expect data to be well-structured (low variability)
 Existing approaches for data management and analysis need to
be propped up, but concepts remain viable
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
14
«SCIENCE AS A SERVICE»
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
15
NextGAW
Standardize discovery, access and retrieval
Standardize exchange formats
Understand observations
Combine observations & models
Develop products
Standardize metadata formats
Standardize data formats
Standardize observing techniques
Provide data quality objectives
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
16
Earth observation initiatives
• UK Natural Environment Research Council-funded Environmental
Virtual Observatory pilot (EVOp) project
• Earth Cube initiative of the US National Science Foundation
• Global Earth Observation System of Systems
• «NextGEOSS» proposal
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
17
“GAW” Data: Current Situation (still…
WOUDC
WRDC
Ozone/UV
Radiation
WDCPC
WDCGG
Precip Chem
Gases
6 WDCs
 6 different
(meta)data formats
 1 data policy
WDCA
Aerosols
WDC-RSAT
Satellites
AERONET
AGAGE
BSRN
CapMon
CDIAC
EANET
EBAS (NILU)
GALION (Earlinet, …)
Partial integration
through GAWSIS
>15 other archives
NADP
NOAA/ESRL/GMD
RAMCES
SHADOZ
SKYNET
TCCON (CalTech)
[One for each satellite]
…
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
 many different
(meta)data formats
 several different
data policies
>20 ways to submit
data
18
Federated GAW data architecture (I)
Providers of
“GAW” data
Submission
Data + Metadata
Data + Metadata
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
Dissemination
Virtual “GAW Data” Centre
Data + Products
+ Metadata
Users of
“GAW” data
Data + Products
+ Metadata
19
Federated GAW data architecture (II)
Inst
n.r.t. data
submission
1
2
operator
5
WDC
WDC
CDC
WDC
CDC
CDC
serve
metadata
web
service
request
3
delayed
mode data
submission
?
6
retrieve data
+ metadata
query
for data
Modeling data?
GAWSIS
4
user
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
20
Tentative road map for metadata
1.
Migrate GAWSIS from Empa to MeteoSwiss, integration with
OSCAR/Surface, by mid-April 2016
2.
Extend GAWSIS API to be compliant with WIGOS Metadata
Standard (WMDS)
•
•
3.
Test API in context of OSCAR/Surface
•
4.
Pilot projects with DWD, MeteoSwiss, BoM?, UKMO?
Connect GAW WDCs
•
•
5.
Draft specification for OGC-compliant XML schema, by mid-April ‘16
Review by WMO expert teams, by mid-May ‘16
First, use existing sources, by mid-July ‘16
Later, use GAWSIS API (requires changes at WDCs), by mid ‘17
Connect GAW Contributing Data Centers (GAW CDCs), by end ‘17
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
21
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
22
Tentative road map for data
1.
ET-WDC + GAW CDC managers to agree on data exchange
specification, by mid 2017
2.
Implement web services at WDCs, CDCs to make data available in
this format (amongst others); alternatively, implement a central
harvester and pre-processor web service
•
•
Test beds at WDCA, WDC-RSAT, WOUDC, by mid 2018
Adoption by WDCGG, WRDC, CDCs asap
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
23
Realization?
• Co-funding through NextGEOSS?
 Additional requirements from GEOSS
 Decision expected by July
• Co-funding through WMO resource mobilization department?
• Co-funding through OSCAR/Surface extension?
• Co-funding through WDCs, CDCs?
• Co-funding through GAW?
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
24
Conclusions
• «Big data» may be coming to GAW, but …
• «GAW data» are not «big data» according to 4V definition
• «Big data» is coming, but …
• GAW won’t be the owner of these «big data»
• GAW won’t benefit nor be harmed necessarily
• GAW can serve as a reference network
 Vision of «federated GAW data infrastructure» agreed at Zurich
workshop in 2015
 Potential of a federated GAW data infrastructure probably larger
than potential of «big data» in the foreseeable future
 SSC needs to endorse, defend strategy, help mobilize resources.
J. Klausen | Big data: What’s GAW got to do with it …
16 March 2016 | CAS EPAC SSC Meeting | WMO, Geneva
25