Proudfoot_Hambric
Download
Report
Transcript Proudfoot_Hambric
Experiences With Developing and
Using Metadata-Driven Processing
Systems for the Economic Census
June 19, 2007
Mark Wallace
[email protected]
1
What is Metadata?
Data
•12
•Yes
•02152005
Metadata
•Number of month in
operation
•Does your company
conduct research and
development?
•Date completed
2
Forms Design - 1997
Before Metadata:
• Each trade worked independently
• Inconsistent layout practices
• No standards for content or design
• Design of each paper questionnaire
one page at a time
• A handful of custom coded
computerized questionnaires
3
Dissemination - 1997
Some Metadata in place:
• Major advancements to reduce paper and
focus on electronic data products
• Metadata was used to create publications
and dissemination products
– Metadata was handled as independent files
– No centralized Metadata system existed
• Inconsistencies existed across
questionnaires and data products
4
Examples of Inconsistencies from
1997 – Question Numbering
SECTOR
EMPLOYMENT
PAYROLL
FRINGE BENEFITS
Construction
Item 5
Item 6
Item 8
Mining Long
Item 2
Item 3A
Item 3C
Mining Short
Item 2
Item 3A
N/A
Annual Survey of Manufactures
Item 2
Item 3A
Item 3C
Manufacturing Long
Item 2
Item 3A
N/A
Manufacturing Short
Item 2
Item 3
N/A
Retail Long
Item 6
Item 5a
N/A
Retail Short
N/A
N/A
N/A
Service Long
Item 7
Item 6a
N/A
Service Short
N/A
N/A
N/A
Wholesale Long
Item 6a
Item 5a
N/A
Wholesale Short
Item 5a
Item 4a
N/A
Transportation/Utilities Long
Item 6
Item 5a
N/A
Transportation Short
N/A
N/A
N/A
Finance Long
Item 6
Item 5a
N/A
Finance Short
N/A
N/A
N/A
Auxiliaries
Item 7
Item 6
N/A
5
Examples of Inconsistencies from
1997 – Forms Design
Picture 1
Picture 2
6
Examples of Inconsistencies from
1997 – Dissemination
Picture 1
Picture 2
7
Questionnaire Design - 2002
After Metadata:
• Each trade worked only trade-specific
questions
• Established standards for questions and
layouts
• Design of each question once, allowing for
re-use across all questionnaires
• Introduced a generalized system to offer
computerized questionnaires to all
respondents
8
Categorization of Forms Content
Question Number
(generated)
Question Title
(calculated)
Item
Wording
Question
Instructions
Headers
Item
Numbers
Item Instruction
Data Elements
9
Reusable Content - Questions
10
Dissemination - 2002
With Integrated Metadata System:
• Metadata was entered once and used
multiple times
• Data products were more consistent
across subject area and format
• The system allowed hundreds of users
the ability to analyze and output
publications simultaneously
11
Categorization of
Dissemination Content
Headnote - Will be included in metadata for
2007. Processed outside system in 2002.
Table Layout/Header Contents Metadata (Both wording and layout
from Metadata)
File Name/Table Title - Metadata
Stub - Metadata (Both wording
and layout from Metadata)
Footnotes - Metadata
Data - Physical data is not metadata.
(Layout of data comes from
Metadata extract rules)
12
Forms Design Process
Improvements for 2007
• The Redesign of the database has
streamlined processes and improved
performance.
• The integration of tools has fostered
contiguous development of paper and
electronic forms.
• Electronic forms have been completed
early and made available for advance
customer outreach program.
13
Dissemination Plans for 2007
• Continue using same system with some
improvements
– System upgrade from UNIX to LINUX Blade
– Implementation of Software Quality Assurance
• Incorporate additional new products into the
current system
• Upgrade publication tools to utilize all
metadata
14
Questions
15