single dataset traditional tools value added services various

Download Report

Transcript single dataset traditional tools value added services various

Shirley Crompton
Source: Rob Allan
harmonisation tools
Sub-setting tools
modelling tools
simulation tools
dataset linking tools
annotation tools
methods capture tools
traditional tools
safe setting tools
statistical disclosure tools
Grid
Repository
Institutional
Repository
Subject
Repository
Data Producer
Repository
•single dataset
•traditional tools
•value added services
•various metadata
•various data storage
•search function
•ontologies
•metadata registry
•algorithm registry
•geo cross walk
•question bank
•classification mappings
•variable mapping
•share resources
•solve bigger problems
•integrate communities
•secure setting
•integrate data
1. Recommend that the British Household Panel
Survey, the Census 1991 Samples of
Anonymised Records and EDINA UKBorders
Census boundary data for SARs 1991 should be
the first priority for Grid-enabling.
2. Recommend that a GRID Shibboleth and/or
Athens authentication system is in place and
a GIS system that can utilise the boundary
data.
3. Recommend that initially the datasets reside in an
oracle database but that long term the data should be
pulled to the Grid from the existing dataset provider.
Note: At present both the BHPS and SARs reside in
Nesstar servers where authentication and sub-setting
systems already exist. The outputs from these
servers are downloadable Zip files containing the
data in SPSS, SAS, Stata, NSDStat or delimited
format.
4. Recommend that the Grid projects that have
previously used these datasets, for example
GEMEDA, MoSeS, GEMS, are used as exemplars.
Where possible the data outputs, techniques and
methodology should also be made available. Note: A
further enhancement could be the actual modification
of the models and simulations so that researchers
can experiment with the systems.
5. Recommend priority for Grid-enabling should
be given to health related datasets that are
available in Nesstar, such as the Health
Survey for England and the National Child
Development Survey. Note: The BHPS also
contains health related questions.
6. Recommend that the Grid projects that have
previously used this type of data, for example
HYDRA and MoSeS are used as exemplars
as above.
7. Recommend that datasets from other disciplines are
available on the Grid to social science researchers.
For example more sensitive medical data from the
Medical Research Council or environmental data from
the Natural Environment Research Council on air
pollution or global warming.
8. Recommend that the Grid projects that have
previously used this type of data, for example
GeoVUE and ESG II are used as exemplars
as above.
9. Recommend that the experience and difficulties
encountered in the pilot projects should be pooled to
ensure that the metadata describing these datasets is
sufficient to allow ease of use and the data
accompanied by additional systems to ease
interoperability. Also that these pooled experiences
should be the base for and converted into procedure
and best practice guides for Grid-enabling datasets.
10. Recommend that long running series of data,
suitable for harmonisation and available via Nesstar,
are considered for Grid-enabling. These include the
Quarterly Labour Force Survey, the General
Household Survey, the British Social Attitudes Survey,
the Workplace Employee Relations Survey, the ONS
Omnibus Survey and the Millennium Cohort Study.
11. Recommend that Grid tools are in place that
facilitates the harmonisation of long running data
series. Tools for Sub-setting, modelling, simulation
and linking of datasets should also be available on
the Grid. The methods employed should be captured
for future use and where applicable added as
metadata to the appropriate dataset.
12. Recommend additional Grid tools for
geographic mappings, metadata registries,
controlled vocabularies, ontologies, question
banks, classification schema and variable
mappings are also considered.
13. Recommended that consideration should be
given to making traditional social science
tools such as SPSS, SAS and Stata available
on the Grid.
14. Recommend that the following aggregate datasets
should be Grid-enabled; namely the International
Monetary Fund (IMF), World Bank and Organisation
for Economic Cooperation and Development (OECD)
macro databank series from the ESDS International
service. Note: These datasets are available via
Beyond20/20
15. Recommend that the Grid projects that have
previously used this type of data, for example
SAMD are used as exemplars as above.
16. Recommend other datasets for
consideration be the British Crime Survey and
the ONS Neighbourhood Statistics.
17. Recommend that the Grid projects that have
previously used this type of data, for example
the Offenders Personal and Area-based
Social Exclusion project are used as
exemplars as above.
18. Recommend that European datasets, such
as the European Social Survey and the
Eurobarometer series which are also
available via Nesstar, be considered for Gridenabling.
19. Recommend investigation into Grid-enabling
data that are not available via existing data
centres, such as administrative, retail,
consumer, video, CCTV and web usage data.
20. Recommend negotiations with ONS and
ESDS to establish a Grid virtual organisation
which could act as a safe setting for
statistically sensitive data such as the Census
Controlled Access Microdata Samples.
21. Recommend development of Grid software
to determine whether the combining or subsetting of datasets would lead to statistical
disclosure of individuals.
This report concludes that the Grid-enabling of datasets in itself is not
sufficient to stimulate the uptake by researchers of Grid technologies and
the new methodologies for research that are offered by exposing data to
the computational power of the Grid. To encourage uptake the report
suggests that the metadata associated with Grid-enabled datasets has to
be sufficient to support both the combination of data and the new forms of
research, and that systems have to be in place that facilitate and ease the
processes involved; such as metadata registries, geo-cross walks,
question banks, ontologies, classification schema and variable mappings.
The report also concludes that exposing statistical sensitive datasets in a
controlled safe setting or systems for the analysis of the outputs from
modifiable simulation models would offer a unique opportunity for social
science research and increase the uptake of the Grid.
•
•
•
•
•
•
•
•
•
•
•
•
British Household Panel Survey, 1991Quarterly Labour Force Survey, 1992General Household Survey, 1971Family Expenditure Survey, 1961-2001
Health Survey for England, 1991British Social Attitudes Survey, 1983British Cohort Study (BCS70) 1970-2005
British Crime Survey, 1982 British Election Studies, 1969Family Resources Survey, 1993National Child Development Study, 1958Workplace Employee Relations Survey, 1980-
4,522
17,386 (735)
5,343 (935)