WP7 MULTI DOMAINS

Download Report

Transcript WP7 MULTI DOMAINS

WP7 MULTI DOMAINS
WP7 Multi domains
WP7 Multi domains
1. Population
2. Tourism/border crossing
3. Agriculture
WP7 TEAM
Janusz Dygaszewicz
Project Manager
of Polish work
Anna Nowicka
Leader
cooperation
Jacek Maślankowski
Coordinator of
methodology
PARTNERS
Piet Daas
Nigel Swier
Coordinator of domain area (SGA-1)
Cooperation on domain area
John Sheridan
Cooperation on domain area
Country leaders of each domain
• Regional statistical
office in Poznań
• Regional statistical
office in Rzeszów
• Department of
Agriculture
• Regional statistical office in
Bydgoszcz
• Department of Social
Research
• Regional statistical office
in Olsztyn
Population
Tourism/
border
crossing
AGRICULTURE
Aim of WP7 is to find out how
a combination of:
Big Data sources
administrative data
statistical data
may enrich statistical output in domains:
WP7 - Future perspectives
Suggest pilots and domains with successful
implementation potential for further elaboration
in the second wave of pilots in 2018
WP 7 – General tasks
Data access (SGA-1)
Data feasibility (SGA-1)
Data combination (SGA-2)
Summary plus future perspectives (SGA-2)
Milestones and deliverables(SGA-1)
Milestone 1.
Progress and
technical report
of internal WPmeeting;
by M4
Milestone 2.
List of
availability
Big Data
sources in the
domain(s);
by M8
Milestone 3.
Recommendation
for using two or
three Big Data
sources in the
domain(s); by
M12
DELIVERABLE
THE PARTIAL REPORT FOR EACH
DOMAIN CONTAINING BASIC
INFORMATION ON:
We are here now
The data access (with legal and
privacy aspects)
The data quality issues
The methodology (focus also on
combining data)
The technical aspects
by M13
TASK 1 & TASK 2
BRAINSTORMING RESULTS
QUESTIONNAIRE RESULTS
MILESTONE 7.4
PROGRESS AND TECHNICAL REPORT OF INTERNAL WP-MEETING
INTERNAL MEETING
MILESTONE 7.5 „LIST OF AVAILABLE BIG DATA SOURCES IN THE
DOMAIN(S)”
Why did we do the brainstorm?
to create the widest possible range
of Big Data sources (a cafeteria);
possible sources of data that public
statistics could use for new
developments or supplement
existing ones, so that in the later
stages these sources can be verified
from different points of view and
gradually part of them will be
eliminated as the least useful.
to analyze as many as possible use
cases of using Big Data sources
to take into account the most popular
source
Big Data is a new phenomenon we should take into account that the potential of
each source may still change.
to the QUESTIONNAIRE
From BRAINSTORMING
Why did WP7 carry out the questionnaire?
to find out more about the
possibilities of technical,
methodological quality,
access in different countries
recommending the
source to the pilots after
2018 to know the plans
for Big Data of different
countries
questionnaire was sent
to countries outside the
FPA (but EU country),
because we recommend
beyond the period of its
duration
recognize the
obstacles of using Big
Data sources
The questionnaire results
Questionnaire - results
Results
Q7: What kind of obstacles have you come across while using Big Data sources?
0%
20%
40%
Access
46%
52%
Metholodogy aspects
48%
48%
52%
34%
66%
28%
72%
Yes
No
100%
30%
54%
Organization
IT
80%
70%
Legal aspects
Quality
60%
Results
Respondents were asked i.e. to indicate domain assuming, that the data source is
accessible. For each of three domains (Population, Agriculture and Tourism/border
crossing) respondents indicated the most promising BD sources:
• Mobile sensors (tracking)
– Mobile phone location;
• Social Networks;
• Data produced by Public
Agencies;
• Internet searches;
• Websites;
• Mobile sensors (tracking)
– Satellite images;
• Data produced by Public
Agencies;
• Mobile sensors (tracking)
– Mobile phone location;
• Data produced by
business – Credit cards;
• Websites;
• Traffic sensors.
Population
Agriculture
Tourism
Common WP6 & WP7 face to face meeting
took place on 28-30 of June in Warsaw
1.
2.
3.
• Exchange of information/experience in using BD sources and arrangements for future work WP7
• Build the list of potential sources for each domain
• Preparation and establish a framework for cooperation to SGA-2
Results
Access
Legal
Quality
Organization
Tourism/
Border crossing
Agriculture
IT
Population
Methodology
Results
The results were used to elaborate the next milestone (Milestone 2):
„List of availability Big Data sources in the domain(s)”; by M8
Use cases for SGA-2
List of available Big Data sources in the domain(s)
Population
Domain
Agriculture
Name of the use Everyday citizen satisfaction
Tourism/Border Crossing
Estimation of Agricultural statistics – Border movement
pilot case study on crop types based
case
on satellite data
Big Data source
Responsibility
Social media/blogs/Internet portals
Satellite images
UK – coordinator (SGA-1)
Department
RSO Poznań/Bydgoszcz
Webscraping
Brief overview of
the methodology Data/Text/Web mining
learning
of
Traffic sensors
Agriculture,
RSO RSO Rzeszów, Department of
Olsztyn + IE
Social Survey + NL
combining data – data fusion on radar Intertemporal disaggregation and
Machine and optical remote sensing data;
data
comparison
surveys
with
e.g.
traditional Latent variable models,
FSS;
combining data – administrative data
sources with satellite data.
interpolation,
Cross entropy econometrics.
Use case for POPULATION
„Everyday citizen satisfaction „
Responsibility: UK – coordinator, supported by PL, PT
Data sources: Social media/Blogs/Internet portals
Methodology: Webscraping, Data/Text/Web mining, Machine learning
The goal of the case study:
to examine the level of daily satisfaction by analyzing the content of messages for the
presence of defined expressions describing emotional states, e.g., happiness, joy, sadness,
fear, anger;
to present the moods of people associated with various public events;
to observe morbidity areas, e.g., flu.
Plan of Combining Datasets: Combine in one repository the selected data from all Big Data
sources, Comparison with the results of social studies to add more detailed information,
Supplement of information gained in social studies.
Main benefits and value added for official statistics: Support traditional European Social
Survey, supplement of the research methodology of some phenomena that are difficult to
measure through traditional polls.
Everyday citizen satisfaction
Use case for POPULATION
„Everyday citizen satisfaction„
Responsibility: UK – coordinator, supported by PL, PT
• Data sources: Social media/Blogs/Internet portals
• Methodology: Webscraping, Data/Text/Web mining, Machine learning
• The goal of the case study: to examine the level of daily satisfaction by analyzing the content of messages for
the presence of defined expressions describing emotional states, e.g., happiness, joy, sadness, fear, anger;
• to present the moods of people associated with various public events;
• to observe morbidity areas, e.g., flu.
• Plan of Combining Datasets: Combine in one repository the selected data from all Big Data sources,
Comparison with the results of social studies to add more detailed information, Supplement of information
gained in social studies.
Main benefits and value added for official statistics: Support traditional European Social
Survey, supplement of the research methodology of some phenomena that are difficult to
measure through traditional polls.
Use case for
TOURISM/ BORDER CROSSING
Border movement
Use cases for TOURISM/
BORDER CROSSING
„Border movement”
Responsibility: PL – coordinator, supported by NL and PT.
• Data sources: Traffic sensors.
• Methodology:
• intertemporal disaggregation and interpolation;
• latent variable models;
• cross entropy econometrics.
• The goal of the case study: to estimate border traffic through internal border of EU (Polish-German, PolishSlovakian, Polish-Czech and Polish-Lithuanian border) also regarding to some mirror statistics. Partial estimation of
domestic traffic may be an extra result.
• Plan of Combining Datasets:
• Intertemporal disaggregation of data if it is the case (data frequency issue);
• Latent variable model for data imputation for roads without traffic sensors;
• Data smoothing if needed;
• Preparing comparable data sets (common set of variables);
• Combining traffic data from different sources with cross-entropy econometrics method.
Main benefits and value added for official statistics: Decreased burden of interviewers, more
detailed results than from the survey solely, data consistent with mirror statistics.
Use case for AGRICULTURE
Estimation of Agricultural statistics
– pilot case study on crop types based on satellite data
Use case for AGRICULTURE
Estimation of Agricultural statistics
– pilot case study on crop types based on satellite data
Responsibility: PL – coordinator, supported by IE.
•
•
•
•
•
•
Data sources: Satellite images, administrative data, in situ surveys.
Methodology:
combining data – data fusion on radar and optical remote sensing data;
data comparison with traditional surveys e.g. FSS;
combining data – administrative data source s with satellite data.
The goal of the case study: Crop type: look at the types of crops being grown and see if we can tell
this accurately from the imagery; analysis of possibilities of using satellite images.
• Plan of Combining Datasets: Data fusion – combining data sources by spatial reference.
Main benefits and value added for official statistics: Increase the quality of the
agricultural surveys; Decrease of respondents burden; More detailed data published
by official statistics; Potential decrease of the cost of conducting surveys.
[email protected]