SDMX - unece

Download Report

Transcript SDMX - unece

Combining Metadata Standards:
Approaches and Benefits
Arofan Gregory
Open Data Foundation
Overview
• Recent events of interest
• The Standards: Comparison and
Explanation
• Emerging Implementation Approaches
– DDI and SDMX
– SDMX and the Semantic Web Technologies
– Classifications & Multiple Standards
• Ideas about Future Work
Recent Events of Interest
Note: Some of these
events/implementations have been or will
be described in detail in other papers –
they are only mentioned here.
• Schloss Dagstuhl, Germany, November
2009 (DDI 3 Workshop)
– SDMX 2.0 – DDI 3 field-level mapping work
started
– Topic: DDI and the Semantic Web???
Recent Events of Interest (2)
• Semantic Web and SDMX
– ONS hosted 2-day meeting in the UK, February 2009
(produced draft “SDMX-RDF”)
– Banca d’Italia has a prototype project
– New project launched at University of Tillburg in the
Netherlands (RDF expression of OECD SDMX data)
• Australian Bureau of Statistics (ABS) starts
looking at SDMX and DDI to support data
production lifecycle
– Prototype implementations
– Some other NSIs also very interested
Recent Events of Interest (3)
• Classifications and ISO/IEC 11179
– Australia: Government agencies looking to
exchange classifications with ABS from
existing ISO/IEC 11179 system, using SDMX,
DDI
– Statistics Canada: Evaluation of IMDB
(ISO/IEC 11179-based metadata repository)
for use in coordination with Canadian RDC
Network (based on DDI 3)
What Does This Mean?
• Not a complete list of
events/implementations, but…
• Indicates the interest we are seeing in the
combined use of standards!
– These are not just experiments!
– Organizations are looking at implementation
in a serious way now
Characterizing the Standards
• SDMX:
–
–
–
–
Data structures and formats
Reference metadata structures and formats
Web-services architecture based on registry services
Content-oriented gudelines
• ISO/IEC 11179:
– Model for managing concepts and data elements
– Metadata registries and lifecycle
• ISO 19115:
– Standard metadata model for geographies
– Used by DDI as geographical model
Characterizing the Standards (2)
• Dublin Core:
– Citation metadata
– Widely used in the Semantic Web
– Used natively by DDI for citations
• Semantic Web/ “Linked Data” / RDF
– See “Open Issues on the Semantic Web”
• DDI 3:
– Will give more detail, as it is not as familiar to
the METIS community…
Characterizing the Standards (3)
• DDI 1.*/2.* was a standard used by archives and data
libraries
– Based on a “codebook” model
– Used by some NSIs, especially in the developing world because
of the IHSN Metadata Management Toolkit
– Used by the European network of data archives, CESSDA
– Used by many data archives in North America
• Documentation of a single “Study” (survey)
– Designed to help researchers find and use microdata
• DDI 3 is more ambitious – capture and use of metadata
throughout the entire data lifecycle
DDI 3 Lifecycle Model
Notice: This is very like a high-level view of the METIS model!
Characterizing the Standards (4)
• DDI 3 provides machine-actionable metadata to
support “metadata-driven” systems throughout
the lifecycle
– Focus is on upstream metadata capture and reuse
• Describes tabulation/aggregation of microdata
• Provides support for comparison across surveys,
detailed geography, data processing, register
data
• Aggregate “NCube” model aligned with SDMX
• No architecture/web services support (yet)
An Observation…
• It is easy to say that two standards are
“aligned”
– Many of these standards were intentionally
aligned as they were developed
• It is much more difficult to understand how
to use them in combination effectively…
Approaches and Benefits
• SDMX and DDI
– DDI microdata production/SDMX aggregate
dissemination
– Using SDMX data in DDI-based systems (combining
aggregates and microdata)
– Combined SDMX/DDI supporting the entire data
lifecycle
– DDI register data reported to SDMX collection system
• SDMX and the Semantic Web
• Classifications and the Standards
DDI 3 Metadata
Surveys
Registers
Input
data
Dissemination
data
Cleaning, editing,
estimation,
aggregation,
etc.
Website/Web Service
SDMX-ML
Data, Metadata,
Structure
DDI – SDMX: Benefits
• The benefits of this approach are those
found by using the standards generally
– Supports “metadata-driven” system for data
production throughout the lifecycle (DDI)
– Metadata-rich dissemination format, preferred
by data collectors (SDMX)
– Shared tools; SDMX registry services, Web
Services for discovery and use of aggregates
SDMX – DDI: Integrating
Aggregates and Microdata
• Scenario is common in some research
– Economic data is often only available as
aggregates
– Challenge is to combine aggregates and other
microdata
SDMX Web Service
SDMX-to-DDI 3
Transform
Surveys
Data archive/
repository
(DDI 3)
Processing to produce
Integrated data and
Metadata (DDI 3)
Registers
(DDI 3)
SDMX – DDI: Benefits
• Allows for easy use of official statistics by
researchers
– Solves problems of combining aggregates
and microdata
• Note: This does not involve disaggregation of published data
– Structural transformation only, to allow DDI 3
systems to process aggregates easily
DDI + SDMX: The Data Lifecycle
• Uses a metadata model capable of
expression as either SDMX or DDI,
depending
• Provides support for process management
– Uses many features of SDMX (process
model, structure sets, reporting taxonomies,
etc.)
• Uses SDMX architecture/services model
– Designed to allow incorporation of other
standards
Process-management system
(BPML)
(SDMX)
Input data
store
Surveys
(DDI 3)
All registry
interactions
use SDMX
Dissemination
data store
SDMX
Registry
Web site/
Print/
Web
Services
Registers
(DDI 3)
Interactions between
systems are DDI or
SDMX Web Services,
as appropriate
Data and metadata repositories/
application databases
(SDMX, DDI,
etc.)
SDMX + DDI: Benefits
• Leverages Web-Services technologies
(registry, event triggers, etc.) for efficient
automation, migration, flexibility
• Choice of tools is broad
– Use the “best” format for any given task
• All the benefits of DDI-SDMX case
• Good support for process management as
well as data management
SDMX and the Semantic Web
Technologies
• Potentially applies to other standards as well
(DDI, ISO/IEC 11179, etc.)
• Note that Semantic Web technologies only apply
to dissemination
– Not designed to support data production
• Terms:
– “Raw data” in an SW context does not mean “raw
data”
– “Data” in an SW context means “anything that can be
described using RDF” – not numeric data
Assumptions
• Creation of a harmonized statistical model
based on proven models/standards, but
expressed as RDF (“ontology” or
“vocabulary” in SW terms)
• Implementation of an “SDMX-RDF” in
standard SDMX dissemination packages
Internal (production environment)
“SDMX-RDF”
Transform
External (dissemination to Web)
Triplestore
(SDMXRDF)
(SPARQL
Queries)
(RDF)
(SDMX-driven
production
system)
Dissemination
data store (SDMX)
SDMX
Web Service
(SDMX-ML)
SDMX and the Semantic Web:
Benefits
• Leverages the “Linked Data” phenomenon
without requiring a deep understanding of
RDF, etc.
• Uses existing standards/models and best
practices to do “heavy lifting” (data
production)
• Puts a lot of reliable, quality data into the
“Linked Data Web”
– Helps address issues of provenance
Warning
• RDF is verbose!
• 4.5 Megs of GESMES/TS = 45 Megs of
“compact” SDMX-ML XML = 420 Megs of
RDF triples
• This may encourage the on-demand
production of RDF data from web services,
rather than static files
Standards and Classifications
• Some maintainers of standard
classifications are looking at expressing
them in useful formats (SDMX, DDI)
– This is an easy thing to do
– It is very useful: promotes re-use,
comparability, etc.
– Could apply to Semantic Web RDF
expressions as well as XML-based standards
Ideas for Future Work
• Endorse SDMX – DDI mappings now being
produced
• Develop an “SDMX-RDF” (?) or…
• Develop a harmonized statistical model for
expression in RDF (based on DDI, SDMX,
ISO/IEC 11179) (?)
– Encourage tools developers to implement it in
standard dissemination packages
• Publish standard classifications in standard
formats
Summary
• Combined use of standards is becoming a
reality
• Proactive engagement with the Semantic
Web world could provide benefits to all
concerned parties, as well as users