DwB Standard Presentation

Download Report

Transcript DwB Standard Presentation

Improving transnational access to microdata
Proof of Concept for a European Network
of Secure Remote Access Systems
Presentation and demonstration
Roxane Silberman (CNRS), Jara Kampmann (Gesis), Maurice
Brandt (Destatis), Eric Debonnel (Genes), Katharina KinderKurlanda (Gesis), Mathias Zenke (Destatis), Philippe Donnay
(Genes), Kamel Gadouche (Genes)
Luxembourg,
Data without Boundaries, European Data Access Forum 2015,
march 25, 2015
Proof of concept : Main goals
• Build a trans-border network of secure remote access
systems will :
Allow access in the same environment to several
data-sources from different data-providers
Improve collaboration between researchers in
Europe
Improve collaboration between data-providers in
Europe
Give a secure infrastructure for hosting confidential
microdata at a European level
• The main goal of the POC is to show how it could be
possible to define and deploy such a secure network
Proof of concept : Main security features
• High security level :
 Closed environment
 Input and ouput are controlled
 Connections are permitted only from accredited sites (IP address)
 Confidential data must remain inside the secure environment to
prevent datafile extraction
 Strong authentication is mandatory
• The same security level everywhere accross the network :
 Every node (site) with the same security level
 Every access point with the same security level
 Every user (researchers and staff) authenticates with the same
procedure
 All the communications have to be encrypted
• By design, the network should be compliant with ISO 27001 (IT security
certification)
Proof of concept : Main features
• Flexibility of the network is a key point :
 The network should be able to manage different legal
frameworks (and different interpretations…)
 The network should be able to accept new members (easy to
join)
 The network has to be independent from existing RDC
infrastructure (completely separated and isolated) : it doesn’t
aim at replacing the local RDC
• For flexibility, 3 different levels of trust between partners of the
network have been defined :
 Full trust inside the secured environment (free exchange of
microdata)
 Medium trust (exchange of lightly controlled output data)
 Minimum trust (exchange of fully controlled output data)
Proof of concept : Main features
• Usability :
The virtual research environment (VRE), a windows
desktop, should provide every standard tool for
processing microdata (SAS, STATA, R, SPSS, etc.)
• Collaboration :
Users should be able to share data and results
according to certain rules (legal or organisational)
• Management tools (not implemented for POC):
RDC Staff should be able to manage the network :
project creation, output checking etc.
Proof of concept : Context
• 3 institutions involved :
 DESTATIS – GERMANY – WIESBADEN
 GESIS – GERMANY – COLOGNE
 GENES – FRANCE – PARIS
• A previous Dwb study has shown that existing RDC IT
infrastructures are very different and therefore they cannot
be securely connected.
• The POC was also designed and implemented without using
any existing RDC’s IT infrastructure (build from scratch)
• IT Infrastructure implemented : Gesis Node located in
Cologne, DESTATIS node located in Wiesbaden, CASD node
and the central node located in Paris
Proof of concept : Organisation
• September 2014 : Decision to implement this POC
was taken in
• October 2014 – January 2015 : Installation of the 4
infrastructures at CASD
• February 2015 : The servers (nodes) were sent to
GESIS and DESTATIS
• March 2015 :
 Physical installation of the servers
 Connections to the central node
 First tests and improvements
DwB Proof of concept diagram
DEMONSTRATION SCENARIO
Scenario overview
• There is a project of a study on female employment and opinion on
gender role with a comparison of France and Germany.
• The project involves 2 researchers in this fictional scenario :
 Jara from Gesis as a researcher from the Humboldt University in
Berlin.
 Kamel from CASD as a researcher from Toulouse School of
Economics.
• It requires access to confidential microdata from INSEE provided at
CASD in Paris and from DESTATIS in Germany as well as microdata
provided at GESIS in Cologne.
• Jara does not speak French and does not know so well the French
context for keeping young children.
• Jara works with SPSS and Kamel uses SAS
Scenario context
• Rules for the demonstration scenario :
 Rule 1 : Official German microdata must not be transferred to
another country (legal constraint)
 Rule 2 : French microdata could be transferred to Germany in a
secure environment
 Rule 3 : Gesis research microdata could be transferred to France
or Destatis in a secure environment
• Access points rules :
 Jara and Kamel use the same thin client (DwB SDBox)
 DESTATIS allows access for resident as well as non resident,
however only from accredited point of access in Germany.
Humboldt University is accredited.
 France allows access from researchers own office in France as
well as in other European countries
Scenario context
• Different phases for access :
Accreditation level (each institution/country has
its own rules): Jara and Kamel have both got the
accreditation in Germany and in France
Service providing level (after getting accreditation)
-> researchers working on a network of RDCs is
the topic of the demonstration
Microdata needed
• For the demonstration, only public use files will be used (for
legal reason) :
 The microcensus data for employment information
(DESTATIS) available at DESTATIS
 The French LFS for employment information (INSEE)
available at CASD
 The EVS for attitudes information available at GESIS
• That’s why for the demonstration, data will be merged at
region level.
• In DESTATIS and INSEE there are only factual microdata that
will be merged with GESIS EVS microdata in this project.
DEMONSTRATION LIVE (20 MINUTES)
DwB Proof of concept
Workflows animation
Demonstration scenario
1
6
4
5
3
1 Connection and EVS transfer
2 Microcensus copy to common folder
3 EVS and Microcensus preparation
4 LFS copy to local folder
5 Preparation, translation and conversion
6 Blind transfer of LFS to common folder
7 EVS, LFS, Microcensus file merge > output
2
Demonstration : Story board
• 1- Jara connects to VRE-D1 from Berlin and transfers EVS from Gesis
to the common folder
• 2- Jara copies Microcensus data to the common folder
• 3- Jara prepares data from EVS and Microsensus with SPSS and
waits for the French microdata
• 4- Kamel connects to VRE-C2 in France from Toulouse, and copies
the French LFS (metadata is only in French) to the local folder
• 5 – Kamel, using SAS, prepares, translates, converts microdata data
from the French LFS,
• 6 – Kamel transfers prepared data to « write only » common folder
(Kamel will show that he can’t see datafiles inside the common
folder)
• 7- Jara merges the 3 files and does the analysis and issues the
results
The final output
• The final output has to be checked by DESTATIS with the
support of GESIS and CASD
• Jara gets the output outside the secure environment (by
encrypted email for example)
Conclusion of the POC
• The POC was designed in a collaborative way with several NSI
and data archives involved.
• Currently the POC does not include management tools for
RDC’s staff and data-producers but it is possible to implement
them (question of time).
• The consortium contract for managing the network with
different stakeholders was not defined.
• There were some issues to solve :
 Installing server inside different institutions.
 Network and firewall configuration adjustments.
• There was strong involvement of all partners to make the POC
successful.
Conclusion – the POC and afterwards
• The system has to be flexible enough to manage very different legal and
organisational contexts.
• The system has to provide a real collaborative environment for researchers
across Europe (because data are not harmonised and not always
documented in English).
• The system has to be highly secure to get the trust of many dataproducers. Such a system may encourage some countries to get more
involved in cross-border collaboration.
• Such a system, really implemented, would allow :
 A better collaboration between researchers and data-providers.
 New possibilities to merge data from different institutions/countries
and different domains.
 A better way to secure microdata which could lead to more data
available.
• Still lots of thing to do to implement a real production network.
Thanks for Listening
Questions & comments
Contact:
[email protected]
Website:
http://www.dwbproject.org/