big data - CIRCABC

Download Report

Transcript big data - CIRCABC

Mobile positioning and other ‘big’ data
for tourism statistics
Experience Statistics Netherlands
Nico Heerschap, Luxembourg, 2015
Gartner’s hype curve
Expectations
Big data
Mobile positioning data
Social media
Internet job vacancies
Internet housing market
Smartphone measurements
Internet prices
Marktplaats/Ebay data
Phase 1
Phase 2
2
Phase 3
Time
Content of presentation
– Big data projects SN  Roadmap
– Tourism big data projects SN -> examples and conclusions
– Plans for the coming year
3
Projects in the big data roadmap SN
–
–
–
–
–
–
–
–
–
–
Internet data (job vacancies, housing, prices and tourism)
All Dutch websites (e-commerce with Google, museums)
Dutch Ebay data (stopped)
Mobile positioning data (daytime population and tourism)
Electronic loops in roads/camera’s (together with traffic statistics and
databases, tourism?)
Installation of mobile apps (ICT, mobility and tourism)
Social media, e.g. Twitter (e.g. social cohesion statistics)
Polis (= job data of all employees in the Netherlands)
Credit card and bank data
Text mining
4
Installation of mobile apps
– Why: monitor the use of mobile phones (ICT) and mobility (part is tourism)
– How: installation of (standard) app; 300 respondents; registration of location (GPS)
and time every five minutes; 4 week period;
– Results: comparable with mobile phone data, but (better):
‐ Background variables available, can be coupled with other data  rich dataset
‐ Accurate location
‐ Less privacy problems
‐ Popup questions can be triggered by e.g. time, location and activity
‐ Difficult to get people to join; technical problems
– Use of GPS-trackers in Amsterdam
5
Example of mobile apps
6
Source
Call Detail Records/ Event Data Detail Records
Call Detail records can contain many variables like:
– the starting time of the call (date and time)
– the call duration
– the identification of the telephone exchange or equipment writing the
record (for > 24 hours)
– call type (voice, SMS, etc.)
– Location (less accurate than GPS)
7
Internetdata
– Why: no population of tourism accommodations in Dutch Caribbean
– How: internet robots
- Results:
- population available (also Airb&b segment). Only small number of
units.
- Often no address information available or posted on more than one
website (de-duplication of records)
- It’s important to scrape the right websites
– Museum statistics: use of a database with all Dutch websites; all Dutch
URL’s coupled to business register information.
8
Mobile positioning data
– Why: daytime population and tourism statistics
– How: use of passive mobile positioning data from one provider (out of three
in the Netherlands). Intermediary Mezuro
- Results: see next slides
9
Possible applications
‐
‐
‐
‐
‐
‐
‐
‐
‐
Daytime population
Mobility, of which tourism
Crowd management / safety
Demographics
Border traffic
Crime statistics
Disaster management or safety planning
Use of public services
Sociology (calling patterns, social cohesion,.
communities)
10
Daytime population
11
Tourism statistics: possible questions
Tourism:
• What is the customer base of a recreational area?
• How many (foreign) tourists does a city or a region attract and where do
they come from?
• Differences in the number of (foreign) tourists per day or per the week or in
a time period.
• Which cities will be visited by tourists consecutively (tourist-related areas,
joint marketing)?
Events:
• How many people does an event attract and where are the (foreign) visitors
come from?
• How successful is an event compared to other events (benchmarking)?
12
Example 1
(Telecom)
Telecom provider
Source: Vodafone/Mezuro, compiled by SN
Example 2
German tourists (= devices) coming to the Dutch coastal area
14
Source: Vodafone/Mezuro, compiled by SN
Example 3
Asian tourists
Belgian tourists
15
Source: Vodafone/Mezuro, compiled by SN
Example 4
Portugese roaming data during 2013 UEFA Cup
League final, Benfica (Portugal) - Chelsea (England)
Source: Vodafone/Mezuro, compiled by SN
16
Example 4
Source: Vodafone/Mezuro, compiled by SN
17
Conclusions for tourism statistics
– Mobile positing data have potential for tourism statistics, but until now less then
expected (24 hour limit to follow a mobile phone):
‐ Complementary to existing statistics
‐ Estimations for smaller areas and smaller timeframes
‐ Events (benchmarking) / crowd management (e.g. Sail Amsterdam). A role for NSI?
‐ Less survey burden
– Rather trends than volumes
– Despite an overload of research, no real statistics until now (much research papers
into ‘smaller’ issues: get the right location, bias in data, changes of mast plans, ODmatrix, too few interactions with provider masts)
– Conclusion of pilots with mobile positioning data: use together with other technics:
wifi spots, tourism cards, camera’s/loops and physical counts -> especially crowd
management.
18
Plans for the coming year(s)
Continue with mobile positioning data
‐ Drop the 24 hour limit: daytrips and number overnight stays; flows of tourists
(transit); origin – destination matrix; tourism related areas
‐ Daytime population as a structural statistic
‐ Negotiation with the two other telecom providers / privacy
‐ Talking to potential customers; private – public competition
‐ Resources / sponsors
‐ Quality of the data (e.g. masts plans)
Internet data
– Population data of Dutch Caribbean (update)
– Room prices / yearly financial accounts
– Museums
– Text mining
19
Other (big data) sources
Research into other (big data) sources (wish list)
‐ Tourism tax data
‐ Credit card and bank data
‐ Booking.com (booking systems)
‐ Airb&b data
‐ Justice department: centralised data on people who stay in hotels (safety
measures)
– Prerequisites to progress to phase 3 not so much technological, but other:
‐ Access to data (telecom providers) / Privacy
‐ More (international) exchange of knowledge
‐ Resources (e.g. collaboration with private sector, universities)
‐ Culture (e.g. management; separate place on Internet to publish betaindicators)
‐ Methodology (e.g. bias, representiveness, OD matrix)
‐ More content and customer driven. Not only ICT and methodology
20