big data - CIRCABC

Download Report

Transcript big data - CIRCABC

Working Group on Tourism Statistics
Luxembourg, 21 and 22 September 2015
Item 10
Special session dedicated to big data sources
with potential for tourism statistics
Overview of big data sources with
relevance for a system of tourism statistics
DG EUROSTAT – Christophe Demunter, tourism statistics & TF Big Data
Eurostat
Big variety in big data
Aim of this presentation:
 to give an overview of the main sources of big data
 … and their potential relevance for tourism statistics
Why tourism?

Big data sources are often about where people are, where
they intend to be soon, what their activities are, what they
buy or what they are looking for to buy.

All of these 'topics of interest' also apply to the behaviour
and whereabouts of tourists
Eurostat
Big variety in big data
This photo, “Cartoon: Big Data” is
copyright (c) 2014 Thierry Gregorius
and made available under an
Attribution 2.0 Generic license.
Eurostat
Big variety in big data
WWW
Sensors
Mobile
phone data
Web
Searches
Traffic loops
Social
Media
Businesses'
Websites
Smart
meters
Communication
ecommerce
websites
Job
advertisements
Vessel
Identification
Satellite
Images
Real estate
websites
Eurostat
Process
generated data
Flight
Booking
Crowd
sourcing
VGI
websites
transactions
(OpenStreetMap)
Supermarket
Community
pictures
collection
Cashier
Data
Financial
transactions
Communication
WWW
Sensors
Web
Searches
Traffic loops
Process
generated data
Crowd
sourcing
Flight
Booking
Mobile
phone data
websites
Currently a focal data source for bigVGI
data
(OpenStreetMap)
Social Media
 Exists in all countries
Supermarket
Businesses'
Smartin all countries)
(≠ accessible
Websites
meters
transactions
Cashier Data
Community
pictures
collection
 Many promising studies/experiments
available Vessel
e-commerce
Financial
websites
Identification
Job
Satellite
Images
transactions
 Potential relevance to many areas of
official statistics (synergies!)
 Most available studies linking big data to
tourism statistics, are based on mobile
Realphone
estate
data
websites
advertisements
Eurostat
Mobile phone data
Eurostat:

Feasibility study on the use of mobile
positioning data for tourism statistics
(2012-2014)

Prime position in the forthcoming ESS
Pilots on Big Data (2016-2019)
NSIs (and tourism researchers)

Many small or larger scale projects
ongoing!
 See examples for the Netherlands,
Estonia and Belgium in this session
Eurostat
… slow data vs. quick data…
Article released one day after 2015
Easter weekend about tourism in
Belgian coast:
150 000 same-day visitors on Sunday,
400 000 during the entire long weekend
•
•
•
Data based on a monitoring of the regional
tourism board, in cooperation with the main
mobile network operator Proximus and the
road infrastructure administration;
In comparison: Eurostat will receive data on
same-day visitors for the 2nd quarter of 2015
(not a particular weekend) on 30 June 2016
(not the day after) for the entire country (not a
coastal strip within a NUTS2 region);
Methodology not clear, but it's a nice example
of how flash estimates based on big data
decreases the relevance of official statistics;
Eurostat
Promising, but some quality issues,
e.g. coverage (representativeness?)
Mobile phone data
Source: DGE, SDP3E, bureau des études sur le tourisme et les catégories d’entreprise (France, Sept 2015)
Eurostat
Promising, but some quality issues,
e.g. coverage (representativeness?)
Mobile phone data
Source: DGE, SDP3E, bureau des études sur le tourisme et les catégories d’entreprise (France, Sept 2015)
Eurostat
Communication
WWW
Sensors
Process
generated data
Crowd
sourcing
Flight
Mobile
phone data
Web
VGI their
websites
Intented
or Traffic
unintended,
people
leave
loops
Booking
(OpenStreetMap)
Searches
transactions
digital footprint when using
social media
Social Media

social media
posts as Supermarket
informationCommunity
source
Businesses'
Smart
pictures
Cashier Data
Websites
collection
on people'smeters
movement
and behaviour
 many methodological challenges:
Financial
Vessel
Identification
transactions
 representativeness (higher inclusion
probability when higher 'posting'
Satellite
Job frequency)
advertisements
Images
 socio-demographic info (but profiling
exercises ongoing – e.g. CBS Netherlands)
Real estate
 continuity (players come & go)
websites
 …
e-commerce
websites
Eurostat
Social media
The back office
The front office
Eurostat
Does the
Internet know
where you
were in
October 2014 ?
Eurostat
WWW
Sensors
Mobile
phone data
Web
Searches
Traffic loops
Social Media
Businesses'
Websites
Smart
meters
Communication
Process
generated data
Flight
Booking
transactions
Supermarket
Cashier Data
Search engines as a source of topics of interest
e-commerce (but correlated?)
Vessel
 interest ≠ purchases
Identification
websites
Financial
transactions
 interest ≠ tourism ("Bangkok" vs. "Bangkok train")
Satellite
Job
 Relevant for breakdowns
("accommodation",
"bike tour")
advertisements
Images
 Eurostat project "Internet as a data source"
Real estate
websites
 Work done by NSIs
(e.g. ONS UK using Google Trends)
Eurostat
Crowd
sourcing
VGI websites
(OpenStreetMap)
Community
pictures
collection
WWW
Sensors
Mobile
phone data
Web
Searches
Traffic loops
Social Media
Businesses'
Websites
Smart
meters
e-commerce
websites
Identification
Communication
Vessel
Process
generated data
Flight
Booking
transactions
Supermarket
Cashier Data
Crowd
sourcing
VGI websites
(OpenStreetMap)
Community
pictures
collection
Financial
transactions
Can give information on tourism offer of businesses
 ≈ capacity of accommodation (establishments, rooms, beds)
Job
Satellite
Images
 webscraping of prices, availability & occupancy
(see also e-commerce)
advertisements
Real estate
websites
Eurostat
WWW
Sensors
Mobile
phone data
Web
Searches
Traffic loops
Social Media
Businesses'
Websites
Communication
e-commerce
websites
Job
advertisements
Process
generated data
Flight
Booking
transactions
Crowd
sourcing
VGI websites
(OpenStreetMap)
High
to provideCommunity
data
Supermarket
Smartpotential
pictures
Cashier Data
meters
collection
on products & services and
on
prices
Vessel
Financial
transactions
 Trends (bias?)
Identification
 Absolute figures
Satellite
(representativeness?)
Images
 internet bots, webscraping!!
Real estate
websites
 See examples from the Netherlands
and Spain in this session
Eurostat
WWW
Sensors
Mobile
phone data
Web
Searches
Traffic loops
Social Media
Businesses'
Websites
Smart
meters
Communication
Process
generated data
Flight
Booking
transactions
Supermarket
Cashier Data
Traffic counting has been intensively used
e-commerce
Financial
Vessel
for tourism statistics
Identification
websites
transactions
 In the past: quick & dirty border
Satellite
Job
surveys
advertisements
Images
(#cars * average of passengers = #tourists)
 Automation opens
new perspectives
Real estate
websites
Eurostat
Crowd
sourcing
VGI websites
(OpenStreetMap)
Community
pictures
collection
Traffic loops
CBS Netherlands – summer 2015

"Statistics Netherlands has recently
launched its first statistics purely based
on Big Data […]. A major advantage is that
results are more quickly available, more
up to date and more detailed"

"The Dutch traffic intensities statistics are
based on the total of counts performed each
minute of vehicles crossing the more than 20,000 traffic loops
on Dutch motorways over the period 2011–2014"

"It is a huge amount: over 115 billion measurements, with a
total size of 80 terabytes, more than 7 times the amount of
data generally processed by Statistics Netherlands in a year"
Eurostat
WWW
Sensors
Mobile
phone data
Web
Searches
Traffic loops
Social Media
Businesses'
Websites
Smart
meters
e-commerce
websites
Identification
Communication
Vessel
Process
generated data
Flight
Booking
transactions
Supermarket
Cashier Data
Crowd
sourcing
VGI websites
(OpenStreetMap)
Community
pictures
collection
Financial
transactions
Electronic devices recording energy consumption (installed
in private homes / enterprises)
Job
Satellite
Images
 Possible source for population statistics
advertisements
 Tourists as "temporary population": smart meters installed
Real estate
in holiday homeswebsites
to record presence/absence of tourists (micro
or aggregate level) or to monitor seasonal fluctuations ?
Eurostat
WWW
Sensors
Mobile
phone data
Web
Searches
Traffic loops
Social Media
Businesses'
Websites
Smart
meters
e-commerce
websites
Identification
Communication
Process
generated data
Flight
Booking
transactions
Supermarket
Cashier Data
Air travel leaves a trace via the booking and
reservation systems
Vessel
Crowd
sourcing
VGI websites
(OpenStreetMap)
Community
pictures
collection
Financial
transactions
 Incomplete source of tourism statistics (concerns only
trips by plane – 15% of all trips, 53% of outbound trips)
Job
Satellite
Images
 More relevant for island countries or island regions
advertisements
 Auxiliary information for poorly covered destinations (due
Real estate
to sample sizes)
that are typically reached by plane
websites
 See example of Amadeus data in this session
Eurostat
Communication
WWW
Sensors
Web a digital trace
Mobile
Many
tourists leave
Traffic loops
Searches
phone data
of their stay via purchases they
make in local retail stores
Social Media
Process
generated data
Flight
Booking
transactions
Businesses'
Websites
Smart
meters
Cashier Data
websites
Identification
Financial
transactions
 Seasonal fluctuations in turnover as
a proxy for seasonality in tourism
e-commerce
Vessel
activity ?
 Using electronic payments as a
source for estimating
tourism
Satellite
Job
advertisements
ratios for TSA (cards used allImages
year
round vs. cards used for a short
period only) orReal
toestate
estimate
websites
countries of origin of tourists ?
Eurostat
Supermarket
Crowd
sourcing
VGI websites
(OpenStreetMap)
Community
pictures
collection
Communication
WWW
Sensors
Web
TheMobile
idea to use Searches
payment
card
Traffic loops
phone data
data is much older than 'big data'
 Source with very
high potential
for
Businesses'
Smart
Social Media
Websites
meters
tourism and BOP statistics !
 Important missing link in the big
e-commerce
Vessel
data sources (that
often focus
on
Identification
websites
physical flows of people)
Satellite
 Quality of the data
Job is constantly
advertisements
Images
improving (e.g. distinction between
e-commerce and POS transactions;
merchant codeReal
≈ estate
NACE)
websites
 See Statistics Austria's work in this session
Eurostat
Process
generated data
Flight
Booking
transactions
Supermarket
Cashier Data
Financial
transactions
Crowd
sourcing
VGI websites
(OpenStreetMap)
Community
pictures
collection
Communication
Mobile
phone data
WWW
Sensors
Web
Searches
Traffic loops
Businesses'
Websites
Smart
meters
Process
generated data
Flight
Booking
transactions
Crowd
sourcing
VGI websites
(OpenStreetMap)
Community
pictures
source
- Insigths
for
Cashier
Data
collection
Supermarket
Social Media
Example:
Wikipedia as a big data
World Heritage Sites from Wikipedia page views
(Wikistats Big Data
Sandbox project
team – cooperation
with UNECE)
e-commerce
Financial
Vessel
websites
Identification
transactions
 Exploratory work for cultural statistics ongoing
 Also relevant forJob
tourism (destinations,
attractions ;
Satellite
advertisements
Images
methodology could be applied to 'cities', 'countries', …)
 Work-in-progress:
CSO Ireland tries to move from
Real estate
websites visits of tourists to the locations
pageviews to actual
Eurostat
Crowd sourcing -- Wikipedia
Total number of page views during 2012-2013 for the top
WHS with most visits to its articles
Eurostat
Final food for thought
Risk 1: Big data sources come and go

MySpace  Facebook  what's next?

SMS  WhatsApp & FacebookMessenger  what's next?
 Huge impact on continuity of data
… vs. stability and continuity as one of the unique selling
propositions of official statistics)
Risk 2: Volume, velocity, volatility

"can't see the wood for the trees"

Main challenge: keeping the overview and synthesize into
a manageable, coherent & sustainable production system
Eurostat
How to turn a promising source into a healthy source?
Thank you for your attention !
Eurostat