PowerPoint-presentatie

Download Report

Transcript PowerPoint-presentatie

New methodological challenges
for new societal phenomena
Barteld Braaksma
Stream-of-Consciousness
–
–
–
–
–
–
–
–
–
–
–
–
Cross-border statistics
Urban data (poverty, local economy, safety, education)
From quantitative to qualitative analysis
Remote linking and blockchain
Interactive visualisation
Trade in data
Sharing economy
Internet economy
Personalised data (sensors etc)
M3 and NDSW
Process mining
Uncertainty
2
Trade in data seems very important, but
there are no good, er, data on it
McKinsey Global Institute study (relying on rough measures)
– Data zipped across borders at a rate of 211 terabits per second in 2014
– Equivalent to 1.3 Libraries of Congress per second, and 45 times more than in 2005
– McKinsey reckons this contributed more to global growth in 2014 than trade in goods
Contributions of data to the economy
1. Spurs conventional trade in goods and services
2. A growing share of products being traded is digital
3. Data are increasingly important lubricants for global supply chains
Statisticians face three big problems
1. Current trade data lacks information on digital services
2. No clear correlation between volume and value of data; and maybe lots of double-counting
3. Identifying where data adds value is nightmarish. Bytes across borders are mostly unpriced
Anecdotal evidence of value
– Companies claiming to use data to make savings and generate value
– Willingness of companies to invest in new cables (a transatlantic cable costs $200m-300m)
– Governments are trying to value and regulate data flows and stores
3
Economics in the age of big data
• Increasing dependence on
quantitative data
(often not open)
• Look ‘inside’ firms
(micro level)
• New indicators
(Billion prices project,
Economic Policy Uncertainty
Index)
• Empirical economics
• More modelling!
(bigger data requires leaner
models)
4
Measuring the internet economy
in The Netherlands
Barteld Braaksma/Lotte Oostrom
History- take off and landing
– October 2013: initial discussion with Google (at their initiative): is CBS willing to
repeat a study that Google did with NIESR and GrowthIntelligence in UK?
– February 2014: “Go” from CBS management
– May 2014: one-pager describing the general lay-out of the study (guaranteeing
research independence for CBS, among other things)
– October 2014: Google has secured access to Google Translate API
– November 2014: formal CBS offer for contract research supported by Google
– December 2014: first discussions with Chamber of Commerce about use of
commercial register by GrowthIntelligence
– June 2015: Agreement with Chamber of Commerce
– September 2015: GrowthIntelligence drops out, talks with Dataprovider start
– January 2016: project starts
– April 2016: first consultation of external review board
– September 2016: project report finalised
– October 2016: press conference at Nieuwspoort
10
Press conference, 7 October 2016
-
Report presented to Henk Kamp,
Dutch Minister of Economic Affairs
-
Well received by press
(extensive media coverage)
-
Fears of possible criticism by
journalists did not substantiate
-
All’s well that ends well 
11
Aim of the study
Main research question:
“What is the importance of the internet economy to the
Dutch economy?”
The aim of the research project was fourfold:
1.
2.
3.
4.
Determine a pragmatic definition of “the internet economy”
Show the importance and size of the internet economy in NL
Show the possibilities of new measurement methods
Explain differences from regular statistics/concepts
12
Definition of the internet economy
All Dutch businesses
Category
Method
Examples
Hairdresser without website
A. No Income generated:
businesses without a website.
Businesses without a website
B. Income generated indirectly
through the internet (internet
presence)
Businesses with a website but that
do not belong to category C, D or E
C. Income generated directly
through the internet: online
stores.
Businesses with a website and a
high ecommerce certainty
D. Income generated directly
through the internet: other online
services.
Businesses with a website and that
according to the most important
keywords belong to this category
Relatieplanet
E. Income generated with the
internet: Internet related ICT.
Businesses with a website and that
according to the most important
keywords belong to this category
Webdesigners
Bakery without website
Freelancer without website
Hairdresser with website
Shell
DSM
Bol.com
Wehkamp
Bijenkorf
Funda
Spil games
Hosting
Internet marketing
13
Merging to the GBR
Example 2
Example 1
Dataset
Dataprovider
GBR
Website 1
Kvknr 1
KAU 1
EG 1
Website 2
Kvknr 2
KAU 2
EG 2
Kvknr 3
KAU 3
Website 3
Website 4
Website 5
Kvknr 4
Population
14
Calculating indicators
Population
KAU 1
KAU 2
CBS data
GBR:
Size
Sector
Age
Regional distribution
Results
Companies
(number)
Turnover
(mln euro)
Total
Category A
KAU 3
KAU 4
STS/tax data:
Omzet
Category C
Wages and salaries data:
Employees
KAU 5
KAU 6
…….
KAU ….
Category B
Category D
Category E
Other:
……..
Plus models,
assumptions and
decision rules (!)
15
Employees
(number)
Merging to the GBR: results
+/- 2,5 million websites dataset Dataprovider
Around 900 thousand websites of
companies (according to
Dataprovider)
+/- 840 thousand
websites merged to
GBR
+/- 550 thousand
unique business
units in GBR with
website
16
Participation in the internet economy
17
Businesses without a website
– Majority consist of only one employee (more than 80%)
– Almost all large businesses (>250 employees) have a website
– More than 80% of the medium sized businesses (50-249
employees) have website
– ‘Information and communication’ , ‘Public administration’,
‘Education’, ‘Wholesale and retail trade’ and ‘Manufacturing’
most often have a website
– Almost half (45 %) of businesses without a website is founded
less than five years ago
– Especially in south-west part of the Netherlands and in the
province of Noord-Holland there are relatively more
businesses without a website
18
Heatmap with/without website
19
Category C: Online stores
Total 28,500 online stores in 2015 (2% of all businesses)
Nearly 70,000 web-shops belong to 28,500 thousand businesses
Almost 75 % has only 1 employee
Relatively ‘young’ companies compared tot the other categories
Half of web-shops belong to retail, almost 15% to wholesale
Over 30 per cent of online stores found in less obvious sectors like
“Information and communication” or “Manufacturing”
– Important discrepancies from webshops registered in GBR (!)
–
–
–
–
–
–
20
Category D: Online services
– Approximately 5,700 companies with over 8,300 websites
– Relatively high productivity
– Companies are relatively small. On average they have 10 employees
and only a handful has more than 150 employees.
– Nearly 40% of all online services businesses are under five years old
– Relatively many based around Amsterdam and Groningen
– These businesses had a turnover of €10 billion (approx 1% of the
total for the Dutch business economy). They account for a total of
26,000 jobs
– Category not covered in NACE classification (!)
21
Regional distribution of online services
22
NACE codes for the online services
23
Category E: Internet related ICT
– Around 16 thousand companies in category E
– More than 40 % belong to NACE code 62 (Service activities in the
field of information technology). Other dominant NACE codes are: 70
(Holdings), 73 (Advertising and market research), 74 (Design) and 63
(Service activities in the field of information)
– The companies in this category are slightly older and larger than the
web-shop and online services businesses. 36% are founded more
than 10 years ago
– This category has a relatively high number of mid-sized companies
(10-50 employees)
– These more than 16,000 businesses in Category E with a total
turnover of €71 billion provide 273,000 jobs in 2015
24
Regional distribution of the Internet related ICT
25
High-level interest…
Prime minister Mark Rutte
and Prince Constantijn
(twitter and facebook)
26
Future plans
– Improve method by using machine learning/AI
– Obtain time series for several years
– Repeat the study in other countries
(Dataprovider already has data on more than 40 countries!)
– Revisit/refine the definition of internet economy
– Turn into regular statistics?
– Similar approach for cyber security?
– Or family-owned businesses?
27
Developing cross-border
statistics
Johan van der Valk
EU-perspective: many CBC-regions
29
Opportunities: jobs available in region
30
X-border statistics not available?
– Need for comparable data on the border
regions: low regional detail and harmonised
– Sources: admin data + (big data) but not surveys
– EU data: harmonised (surveys) but not enough
regional detail
– National data: regional data (admin) but not
harmonised
31
Income data (inter)national
CBS,
2011
Eurostat
Regional
yearbook
2014,
2011
32
X-border information is challenging
Cross-border statistics are not easy
‐
‐
‐
‐
Call for regional detail
Internationally comparable methods
Internationally comparable sources
Measurement of flows (persons & goods)
33
Many potential users!
– Local authorities (province, border cities)
– Euregions
– National authorities
– International institutions (Benelux, EC)
– Academic community
34
Cooperation of CBS and IT.NRW
Publication on labour
market in border
region NL-NRW
(August 2015)
35
Cross-border workers
DE-> NL
39,000 Crossborder workers
NL-> NRW
9,000 Crossborder workers
36
Cross-border information needed!
DG_REGIO is in need of data
‐ Monitor INTERREG projects
‐ Information comparable across regions
‐ Identify obstacles for cross-border cooperation
DG_REGIO is prepared to finance pilot
projects of limited scope
‐ Call for proposal issued September 2016
37
Conclusions
X-border stats: Need for collaboration
• To use admin data
• To explore big data
Financing is an issue
Exchange on international level is useful
Methods must be available (SAE, …)
38
Happiness meter
http://www.cbs-geluksmeter.nl
Jacqueline van Beuningen
Linda Moonen
Idea
– Interactive infographics on wellbeing
– Useful for indices and composite indicators
– Reusable
– Flexible
– Proof of Concept, meant to develop further
– Part of CBS innovation program
http://www.cbs-geluksmeter.nl
40
Goals
– Insight into complex statistics
– Reach public at large
– Test reusability aspects
‐ Outsource development (w/startup Wayform)
‐ Support and maintenance internally
Long term goals
– Develop standard approach
– Reduce costs
41
Start screen
Happiness meter
42
Background characteristics
how about your province?
43
Personal questions
How happy are you about your financial situation?
44
Happiness score
Overall and elements
45
Evaluation
– Publicity campaign launched on Blue Monday
– Users: 3,939 in first two weeks
– Attention from printed press and national/local TV
–
–
–
–
Devices used: desktop, mobile phone, tablet
Users come from CBS-site, direct link typing, facebook
Drop-out analysis
Recommendations for follow-up
46
Process mining
Johan Lammers
What is process mining?
Process
questions
Process data
Event
log
Process mining
48
Why process mining?
•
•
•
•
•
•
•
Fast insight in process
Insight based upon facts
Exact determination of issues
More depth and width
More views
Extend/expand to chain
Stimulates culture improvements
49
Types of waste
50
Examples at CBS
• Logistic processes in data collection
• Batch and interactive statistical processes
(profiling, editing, analysis)
• General business register production process
• Supports Lean Six Sigma program
• Use of Disco tool by Fluxicon
(spinoff of Eindhoven University)
• Useful for statistics compilation as well?
51
Twitter in Horst aan de Maas
Marco Parigini (UM)
Hans Schmeets (CBS / UM)
Horst aan de Maas: rural municipality
8 villages,
40 thousand inhabitants,
2 thousand Polish migrants
53
From quantitative to qualitative
– Four billion tweets in NL
– 800 thousand can be attributed to Horst aan de Maas
– Specialising to topics: even less volume
– Quantitative analysis may have to be complemented
with qualitative analysis
– Similar issues in other text-based media?
54
17 areas of Horst aan de Maas
55
Twitter in 17 areas, 2009-09/2015
40
% Horst aan de Maas Population
35
30
25
20
15
10
5
0
% Population
% Twitter Users
56
Friendships in Horst aan de Maas
57
"I am my words"
58
Top 10 most popular
Topics
Hashtags
1. Horst
2. Sevenum
3. Joa
4. Venray
5. Veur
6. Merge
7. Nit
8. Oet
9. Toverland
10. Reindonk nieuws
1. #horst
2. #gtst
3. #ajax
4. #psv
5. #pvv
6. #twexit
7. #3fm
8. #weer
9. #koerier
10. #dtv
59
Polish (labour)migrants: an issue?
‐ 533 tweets with Polen/Poolse (out of 800,000)
‐ Only 40 negative
‐ Most tweets about soccer (EC 2012)
‐ Not many other tweets about migrants
60
Polen + Poolse
61
Economic Crisis
Totaal
Economie
1224
179
133
Crisis
471
52
52
Inflatie
19
0
1
Duurder
177
7
153
Financiën
26
19
12
Belasting
262
26
34
Werkloosheid
132
4
15
Werkgelegenheid
88
16
6
Banen
378
46
17
Recessie
44
6
622
CBS in Horst aan de Maas?
Total 127
12
9
63
Urban Population
50% world (70% expected)
75% in Europe
90% in the Netherlands
Progressing towards a
data-driven society
- CBS Urban Data Center (UDC) is a joint project between CBS and city
government(s).
- Aimed at broadening, deepening and enhancing municipal statistics by linking
them to CBS data and contributing CBS expertise
- Municipality interests: added value of data for local community, businesses and
visitors
- Early July: initiative to create UDC Eindhoven
- 22 September: Launch at Smalle Haven (watch video)
- In 2017: evaluation
46 core and 56 supporting indicators
Measures performance of local city services and quality of life
Amsterdam and Rotterdam certified already
National working party ISO37120 established by Geonovum
CBS has mapped currently available information
ISO 37120 standard- sustainable
development at local level
(collaboration with WCCD)
4
??????
3
1
2
Launch on november 28th
Three more topics (almost there…)
70
Remote linking and processing
of (big) data sets
• M3 proposal (Man, Molecule, Society) of KNAW, Dutch Academy
of Science: combine statistical databases, DNA data and large
medical repositories (like Lifelines, data on 165 thousand people
for 30 years) in a safe way for research purposes
• National Data infrastructure for the Social Sciences (launched 27
October 2016)
• How to combine large and very privacy-sensitive
datasets at a distance?
• Remote encryption
• Secure multiparty computing
• Blockchain and distributed ledger techniques
71
Crowd sourcing and citizen science
• Astronomers use laymen to analyse large sets of data
(like stellar photographs and IR-scans)
• Sensors to measure noise and air quality in your
backyard emerge
• Time to invest in methods to put the crowd to work?
• We already use amateurs to count birds and so on
and post-process data using bayesian techniques!
• We could advice people (and municipalities)
where to put their sensors…
72
Uncertainty measures
• Many statistics are not based on sample surveys
• Traditional measures for uncertainty don’t apply
• (And even with traditional sources we fool ourselves)
• There is a need for new measures of uncertainty
• For our own statistics but maybe also to judge others’
Can we do
something similar
to what
meteorologists do?
73
The Times They Are A-Changin'
We badly need:
– Data and methods to support
digital economy
– Data and methods for local
and X-border information
– Mix of quantitative
and qualitative methods
– New uncertainty measures
– Interactive visualisations
– Safe remote techniques
– More information on Dylan
74