Presentation

Download Report

Transcript Presentation






Name and organization
Have you worked with DDI before? (2 or 3)
If not, are you familiar with XML?
What kind of CAI systems do you use?
Goals for today
Introduction
Creating DDI3 and
Documentation
Discussion
• DDI 3 Background
• XML Background
• How DDI 3 documents survey instruments
• Manual Markup
• Using functionality from CAI systems
• Custom development
• Colectica
• Questions and discussion
• Additional documentation activities
Data Documentation Initiative
DDI3 Background
Background
• Concept of DDI and definition of needs grew out
of the data archival community
• Established in 1995 as a grant funded project
initiated and organized by ICPSR
• Members:
– Social Science Data Archives (US, Canada, Europe)
– Statistical data producers (including US Bureau of the
Census, the US Bureau of Labor Statistics, Statistics
Canada and Health Canada)
• February 2003 – Formation of DDI Alliance
– Membership based alliance
Copyright © 2008
GESIS
– Formalized development
procedures
Origins of the DDI Alliance
• Versions 1.* and 2.* were developed by an
informal network of individuals from the social
science community and official statistics
– Funding was through grants
• It was decided that a more formal organization
would help to drive the development of the
standard forward
– Many new features were requested
– The DDI Alliance was born to facilitate the
Copyright © 2008 GESIS
development in a consistent
and on-going fashion
Requirements for 3.0
• Improve and expand the machine-actionable aspects of
the DDI to support programming and software systems
• Support CAI instruments through expanded description
of the questionnaire (content and question flow)
• Support the description of data series (longitudinal
surveys, panel studies, recurring waves, etc.)
• Support comparison, in particular comparison by design
but also comparison-after-the fact (harmonization)
• Improve support for describing complex data files (record
and file linkages)
• Provide improved support for geographic content to
facilitate linking to geographic files (shape files,
boundary files, etc.) Copyright © 2008 GESIS
DDI 3.0 and the Data Life Cycle
•
•
•
•
•
A survey is not a static process: It dynamically evolved across time and involves many
agencies/individuals
DDI 2.x is about archiving, DDI 3.0 across the entire “life cycle”
3.0 focus on metadata reuse (minimizes redundancies/discrepancies, support comparison)
Also supports multilingual, grouping, geography, and others
3.0 is extensible
Copyright © 2008 GESIS
Development of DDI 3.0
• 2004 – Acceptance of a
new DDI paradigm
– Lifecycle model
– Shift from the codebook
centric / variable centric
model to capturing the
lifecycle of data
– Agreement on expanded
areas of coverage
• 2005
– Presentation of schema
structure
– Focus on points of metadata
creation and reuse
• 2006
– Presentation of first
complete 3.0 model
– Internal and public review
• 2007
– Vote to move to Candidate
Version
– Establishment of a set of use
cases to test application and
implementation
• 2008
– April: DDI 3.0 published
Copyright © 2008 GESIS


XML: Extensible Markup Language
Designed to transport and store data
XML Schemas, DDI Modules,
and DDI Schemes
Data Collection
Instance
Study Unit
Physical Instance
DDI Profile
Logical Product
Physical Data Structure
Archive
Conceptual Component
Comparative
Reusable
Ncube
Inline ncube
Tabular ncube
Proprietary
Dataset
Copyright © 2008 GESIS
XML Schemas, DDI Modules,
and DDI Schemes
Data Collection
Instance
Study Unit
Physical Instance
DDI Profile
Logical Product
Physical Data Structure
Archive
Conceptual Component
Comparative
Reusable
Ncube
Inline ncube
Tabular ncube
Proprietary
Dataset
Copyright © 2008 GESIS
XML Schemas, DDI Modules,
and DDI Schemes
Instance
Study Unit
Physical Instance
DDI Profile
Comparative
Data Collection
Question Scheme
Control Construct Scheme
Interviewer Instruction Scheme
Logical Product
Category Scheme
Code Scheme
Variable Scheme
NCube Scheme
Physical Data Structure
Physical Structure Scheme
Record Layout Scheme
Archive
Organization Scheme
Conceptual Component
Concept Scheme
Universe Scheme
Copyright © 2008
GESIS Scheme
Geographic
Structure
Geographic Location Scheme
Reusable
Ncube
Inline ncube
Tabular ncube
Proprietary
Dataset
Maintainable Schemes
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Category Scheme
Code Scheme
Concept Scheme
Control Construct Scheme
Geographic Structure Scheme
Geographic Location Scheme
Interviewer Instruction Scheme
Question Scheme
NCube Scheme
Organization Scheme
Physical Structure Scheme
Record Layout Scheme
Universe Scheme
Variable Scheme
Copyright © 2008 GESIS
Packages of reusable
metadata maintained
by a single agency
Designed to Support Registries
• A “Registry” is a catalog of metadata resources
• Resource package
– Structure to publish non-study-specific materials for reuse
• Extracting specified types of information in to schemes
– Universe, Concept, Category, Code, Question, Instrument,
Variable, etc.
• Allowing for either internal or external references
– Can include other schemes by reference and select only
desired items
• Providing Comparison Mapping
– Target can be external harmonized structure
Copyright © 2008 GESIS
Data Collection
• Methodology
• Question Scheme
• Question and Response
Domain designed to support
question banks
– Question
– Response domain
– Question Scheme is a
maintainable object
• Instrument
– using Control Construct
Scheme
• Coding Instructions
– question to raw data
– raw data to public file
• Interviewer Instructions
• Organization and flow of
questions into Instrument
– Used to drive systems like
CASES and Blaise
• Coding Instructions
– Reuse by Questions,
Variables, and comparison
Copyright © 2008 GESIS
QuestionItem in DDI
QuestionItem
Opening tag &
identification
QuestionText
NumericDomain
NumericDomain
In a QuestionScheme
ControlConstructScheme with QuestionConstructs
An Instrument
Those all go in a DataCollection element
The DataCollection element goes in a StudyUnit,
which goes in a DDIInstance or ResourcePackage

Create QuestionScheme and QuestionItems


Create ControlConstructScheme
Add QuestionReferences


Add control flow items to
ControlConstructScheme
Include a main Sequence element


Create the Instrument Element
Add the main ControlConstructReference




Create the DDIInstance element
Create the StudyUnit element
Create the DataCollection element
Add the QuestionScheme,
ControlConstructScheme, and Instrument to
the DataCollection element

Check the XML document against the DDI
schemas to see if we got it right.

We have DDI, now we need documentation
Custom Development
MQDS
Colectica
Michigan Questionnaire
Documentation System (MQDS)
Sue Ellen Hansen
Nicole Kirgis
What Does MQDS Do?
• Facilitates automated documentation and
harmonization of Blaise survey instruments
and datasets
– Extracts survey question metadata
– Standardized format
Survey Question Metadata
•
•
•
•
•
•
•
Question universe
Variable name and label
Question text
Question variable text (fills)
Data type
Code values and code text
Skip instructions
• etc.
MQDS Version 1
• Extracted metadata from Blaise data model as
XML tagged data
• Provided user interface for selection of
– Blaise files
– Instrument questions and sections
– Types of metadata to extract
– Languages to display
– Style sheet for generation of instrument
documentation or codebook
Using MQDS V1 XML: Codebook in Five Languages
National Latino and Asian American Study
www.icpsr.umich.edu/CPES
MQDS Version 1
• Limitations
– XML not DDI-compliant
• DDI Version 2 did not have XML tags for all metadata
provided by Blaise
• Did not provide easy means of adding XML tags without
becoming noncompliant
– XML files for complex surveys can be very large (text files)
• Entire files had to be processed in computer memory
• Limited ability to fully automate documentation
DDI Version 3
• Released April 2008
• Focus on complete data lifecycle –going
beyond the codebook
DDI Version 3
• Included extensions proposed by DDI
working group on instrument design
Persistent Content of Question
Use of Question in Instrument
Question text
• Static
• Dynamic or variable
Order and routing
• Sequence / skip patterns
• Loops
Multiple-part question
Universe
Response domain
• Open
• Set categories
• Special types (date, time, etc.)
Analysis unit
Definitional text
Instructions
MQDS Version 3
• Joint SRC and ICPSR venture
• Goals:
– Address version 2 limitations
• Process Blaise instrument of any size
– Exploit new elements and validate to the recently
released DDI version 3 standard
– Move from processing XML metadata in memory
to streaming metadata to a relational database
MQDS Version 3
Relational Database: Import, Export, Transform
SQL Server /
SQL Server Express
XML (DDI 3)
Relational
Db
Blaise
Datamodel
(BMI)
User specifies
input files
(location, file type,
etc.)
Blaise
Database
(BDB)
2.
Export
1.
Import
User specifies
output files
(location,
Language/locale,
XML output
options, etc.)
3.
Transform
Questionnaire
Other File
Types
(e.g. SAS,
SPSS, etc)
Database
connection
settings
DDI 3
elements
not in
*.bmi
Codebook
User specifies stylesheet selection
criteria, type of output desired
(html, rtf, pdf), etc.
MQDS Version 3
• Relational database
– DDI compliant standardized tables
– Flexibility for SRC and ICPSR to add extensions that meet
their specific organizational needs
– Allows
• Automated documentation of any Blaise survey
instrument
• Importing and documenting data produced by other
software
• Lower cost development of other tools that facilitate
editing and disseminating data
MQDS V3 Prototype: Exporting Language XML
MQDS Development
• Expect to release Summer 2009
• Working out a distribution plan for Blaise
users