Big Data and Official Statistics

Download Report

Transcript Big Data and Official Statistics

El valor de la información: el reto del Big Data
Instituto de Estadística y Cartografia de Andalucia
5 Feb 2016
Big data in official statistics in the European
Statistical System: the Big Data Action Plan &
Roadmap
EUROSTAT – Fernando Reis – 'Task Force Big Data'
Datafication
Digital footprint
Sensors
2
Big Data and Official Statistics
What will be the impact of ubiquitous data
collection and networking
•
•
•
•
•
•
•
Mobile Communication
Internet of [every]Things,
Social media,
Wearables,
Autonomous traffic,
Smart systems,
…
on official statistics?
Expected benefits of using big data ?
Outward-looking

More adequate and flexible response to user needs

Wider range of statistical products and services
(without increasing burden)

Better understand quality aspects of new sources
Inward-looking

Acquisition of new competences for NSIs

Increase efficiency in producing statistics

We remain key players for statistical information
4
Big data at Eurostat – key points
ESS (European Statistical System)

Scheveningen Memorandum Sept 2013
 Examine the potential of big data sources for official statistics
 Official Statistics big data strategy as part of wider
government strategy
 Address privacy and data protection
 Collaboration at European and global level
 Address need for skills
 Partnerships between different stakeholders (government,
academics, private sector)
 Developments in methodology, quality assessment and IT
 Adopt action plan and roadmap for the ESS
5
Big data at Eurostat – key points
ESS (European Statistical System)

Scheveningen Memorandum Sep 2013
 Task Force Big Data
 Big Data Roadmap and Action Plan 1.0 June 2014
 ESS Pilots 2016 - 2020
 Implementation of ESS Vision 2020:
Big Data project = integral part of the portfolio
European Commission Communication


"Towards a thriving data driven economy"
Private Public Partnership on big data
International cooperation (UNSD, UNECE, etc.)
• UN/ECE project “Big data in official statistics”
(Sandbox)
• UNSD Global WG on Big Data
6
Big Data Action Plan and Roadmap@ a glance
Governance
Policy
Quality
Experience
sharing
Legislation
Methods
Ethics /
Communication
Pilots
Skills
IT
Infrastructures
Big data sources
Governance
Policy
Quality
Experience
sharing
Legislation
Methods
Ethics /
Communication
Skills
IT
Infrastructures
Pilots
Big data
sources
Challenges
▫ cooperation, sharing of know-how
▫ development of a sound methodology
("from design-based to model-based
approach")
▫ exploration & tentative implementation
▫ Looking for partners
Action (example)
▫ Pilot projects, carried out by the Member States (ESSnet)
 2015 – 2019 (FPA / SGA construction)
 Exploring different big data sources (but also IT architecture,
partnerships), developing generic guidelines and frameworks
 Establish Parternships with data providers and research and
international organisations
 Cooperation with UN (lead) on Metodological Framework
8
Governance
Policy
Quality
Experience
sharing
Legislation
Methods
Ethics /
Communication
Pilots
Skills
IT
Infrastructures
Big data
sources
Action (example) – continued
▫ List of pilot projects (Frame Partnership Agreement signed)

Web scraping [job vacancies ; enterprise characteristics]

Smart meters [electricity consumption ; temporary vacant
dwellings]

AIS data [vessel identification systems]
 Mobile phone data
▫ “The big data for official statistics competition" (2016)
9
Governance
Policy
Quality
Experience
sharing
Legislation
Methods
Ethics /
Communication
Skills
IT
Infrastructures
Pilots
Big data
sources
Challenges
▫ new skills for NSI staff:
statisticians vs. data scientists ?
▫ computing capacity, hardware ?
▫ analytical tools, software?
▫ storage ?
Action (example)
▫ Training program for European statisticians (ESTP)

In the next years: dedicated courses on big data

Focus on big data sources and on big data tools

Acquiring the skills needed to assess sources and their
quality, the skills to use tools and to explore big data
sources
10
ESTP courses supporting big data (2016)
12 – 15 Sep
29 Feb – 2 Mar
Introduction to
big data and its
tools
21 – 24 Jun
Hands-on
immersion on big
data tools
5 – 7 Apr
The use of R in
official statistics:
model based
estimates
Big data sources Web, Social media
and text analytics
7 – 10 Nov
Nowcasting
Advanced big data
sources - Mobile
phone and other
sensors
8 – 10 Jun
Can a statistician
become a data
scientist?
Big data courses
Methodology courses
24 – 26 Feb
Time-series
econometrics
Activity
11
Governance
Policy
Quality
Experience
sharing
Legislation
Methods
Ethics /
Communication
Skills
IT
Infrastructures
Pilots
Big data
sources
Challenges
▫ integrating official statistics in
big data strategies
▫ getting access to data &
continuity of access
▫ data security & privacy concerns
▫ compensate for the burden ?
Action (example)
▫ Project on the analysis of legislation and strategy (but also
ethics and communication)
 2015-2017 (22 months)
 Analysis for EU and for Member States at national level
▫ See also the Feasibility study on the use of mobile positioning
data for tourism statistics (report on feasibility of access)
13
Governance
Policy
Quality
Experience
sharing
Legislation
Methods
Ethics /
Communication
Skills
IT
Infrastructures
Pilots
Big data
sources
Challenges
▫ transversal challenges to all big data
activities: quality and ethics &
communication
▫ big data vs. statistics : "goodness of
fit" (concepts, representativeness,…)
▫ impact on the public opinion of
privacy and security concerns ?
Action (example)
▫ Cooperation with UN (lead) on a quality framework for big data
▫ Project on the analysis of ethics and communication (but also
legislation and strategy)
 2015-2017 (22 months)
 Analysis for EU and for Member States at national level
14
Communication
WWW
Sensors
Web
Searches
Traffic loops
Process
generated data
Crowd
sourcing
Flight
Booking
Mobile
phone data
VGI websites
Currently a focal data source for big data
(OpenStreetMap)
Social Media
 Exists in all countries
Supermarket
Businesses'
(≠ accessibleSmart
in all countries)
Websites
meters
transactions
Cashier Data
 Many promising studies/experiments
available
e-commerce
Financial
Vessel
websites
Identification
Job
Satellite
Images
Community
pictures
collection
transactions
 Potential relevance to many areas of official
statistics (synergies!)
 Most available studies linking big data to
tourism statistics, are based on mobile phone
Realdata
estate
websites
advertisements
15
Mobile phone data
Eurostat:

Feasibility study on the use of mobile
positioning data for tourism statistics
(2012-2014)

Included in the forthcoming ESS Pilots
on Big Data (2016-2019)

GWG Big Data Pilot
NSIs (and tourism researchers)

Many small or larger scale projects
ongoing!

GWG Big Data Task Team Mobile
Phone Data
16
… slow data vs. quick
data…
Article released one day after 2015
Easter weekend about tourism in
Belgian coast:
150 000 same-day visitors on Sunday,
400 000 during the entire long weekend
•
•
•
Data based on a monitoring of the regional
tourism board, in cooperation with the main
mobile network operator Proximus and the
road infrastructure administration;
In comparison: Eurostat will receive data on
same-day visitors for the 2nd quarter of 2015
(not a particular weekend) on 30 June 2016
(not the day after) for the entire country (not a
coastal strip within a NUTS2 region);
Methodology not clear, but it's a nice example
of how flash estimates based on big data
decreases the relevance of official statistics.
17
Bi g data =
Multiple sources & Multiple outputs
Mobile
phone data
Tourism
Statistics
Commuting
Statistics
Traffic
Statistics
Mobile
Phone
Data
Population
Statistics
Migration
Statistics
Satellite
Images
Population
Statistics
Smart
Meters
VGI
websites
18
Lifecycle for the coming years ?
Mobile
phone
data
HOUSEHOLD
& BUSINESS
Payment
cards
data
Domain
SURVEYS
Other
big data
STATISTICS
SHORT TERM
→
→
'Traditional' surveys as main
input for tourism statistics
Big data sources slowly becoming
auxiliary information
Lifecycle for the coming years ? (2)
Mobile
phone
data
HOUSEHOLD
& BUSINESS
SURVEYS
Domain
STATISTICS
MID TERM
Payment
cards
data
Other big
data
→
→
Weight of surveys
decreases in favour
of big data ?
Surveys no longer
'main filter' but 'one
of the sources' ?
Lifecycle for the coming years ? (3)
Mobile
phone
data
Payment
cards
data
HOUSEHOLD
& BUSINESS
SURVEYS
Domain
STATISTICS
Other big
data
NEW
Web
LONGER TERM
(prices)
→
Bookings
(nowcast
/forecast)
→
'Replacement of surveys
continues (smaller samples, less
frequent collection) ?
Enhanced tourism statistics via
embedding of newer sources ?
The statistical office of the future
 Data flows in addition to surveys and censuses
 Embedded in data flow – smart statistics
 Product designers in addition to data collection designers

Statistical modelling will be a major activity

From descriptive indicators to nowcasting (and forecasting)
 Trust and quality will be key
 New role in teaching digital literacy
 Accreditation and certification instead of pure production
 Address issues linked to quality & transparency, privacy &
confidentiality, access to third party data sources & data sharing,
scientific standards & methodology, professional ethics, skills, … 22
Thank you for your attention
Fernando Reis
Eurostat Task Force on Big Data
[email protected]
https://github.com/reisfe/
https://twitter.com/reisfe/
https://linkedin.com/in/reisfe/