Presentation
Download
Report
Transcript Presentation
Terminology and
Standards
Dan Gillman
US Bureau of Labor Statistics
Terminology
Principle –
To communicate, we need to agree on terms
Concept –
– unit of thought
Term –
– linguistic expression (similar to a word) linked to
a concept
Special Language –
– set of terms describing a subject field
2
Terminology
Examples of special languages
Probability and statistics
Database theory
Statistical metadata
Statistical activity within each SI
– E.g., US Current Population Survey
• Labor force
• Unemployed
Union of special languages within SI
3
Projects
UNECE Metadata Glossary
Glossary (a.k.a. Vocabulary) –
– Alphabetical listing of terms and their definitions
BLS Taxonomy and Lexicon
Taxonomy (artefact, not the science) –
– Scheme for organizing terms within some
subject field, typically a hierarchy
Lexicon –
– Vocabulary, or dictionary, of terms
4
UNECE Metadata Glossary
Create glossary of terms
In order of importance
– UNECE statistical metadata standards
• GSIM, GSBPM, GAMSO, CSPA, etc.
– Other statistical metadata standards
• DDI, SDMX, etc.
– Other standards and specifications
• Maybe ISO/IEC 11179, Dublin Core, etc.
Disseminate in user-friendly format
5
UNECE Metadata Glossary
Build special language for
Statistical institutes
– Designing metadata systems
– Building interfaces to metadata systems
– Message frameworks for sharing metadata
Establish authoritative source
Terms
Definitions
For international use
6
BLS Taxonomy and Lexicon
Project to
Record terms describing BLS data
– For all disseminated time series
– Separate terms into facets
• Measures (estimates on populations)
• Characteristics (classifications used to subset
measures)
Produce
– Taxonomy – hierarchy of terms
– Lexicon – list of terms
7
BLS Taxonomy and Lexicon
Goals
For each term, find related documents and
data
– organize data – use taxonomy
– tag documents – use lexicon
Use taxonomy to drive and guide
– Web site reorganization
Provide plain English equivalent words
– Help unsophisticated users find resources
– Alleviate common confusions
8
BLS Taxonomy and Lexicon
Plain English examples
Inflation – CPI
Field of work – industry or occupation
Wages, earnings, income, compensation
Plain English names for categories
Authoritative source for BLS language
9
Usage of Terms
Metadata models
Names of classes, attributes, relationships
E.g., Universe, Category, Specialization
Metadata content
Content stored in attributes in a model
E.g., establishment, retail grocery store, etc.
Terminology systems
Authoritative sources for terms / meaning
10
Standards
Why standards?
Consistency
– Eliminate inconsequential (gratuitous) differences
• Spelling and phrasing differences
Semantic interoperability
– Shared meaning w/o need for negotiation
Data harmonization
– Ability to combine data from different sources
11
Standards
Many levels
Program, Agency, National, Regional,
International
Weaker condition
Authoritative sources
– Term and meaning for some subject field(s)
• E.g., unemployed in US CPS
• Plain English -> not employed
• US CPS -> not employed but still in Labor Force
– Not necessarily standard
12
Standards
Consistency and Interoperability
Handled by authoritative sources
Use URI’s to terminological entries
Spelling and phrasing differences eliminated
Access to meaning ensured
But,
Differences across subject fields remain
13
Standards
Data Harmonization
Authoritative sources not sufficient
– Subject fields may differ
– Gratuitous differences may exist too
Need new standards and agreements
– Bilateral agreements not scalable
Multiple standards on same subject a
problem
– E.g., Geographical standards (US MSA vs. CSA)
– BLS has 6 definitions of Boston
14
Contact Information
Dan Gillman
[email protected]