Taxonomy 1-2-3

Download Report

Transcript Taxonomy 1-2-3

Taxonomy Strategies LLC
Co-Evolution of the Dublin
Core and the Semantic Web
JPL Summer Series on Information
Architecture
September 30, 2008
Sept. 30, 2008
Copyright 2008 Taxonomy Strategies LLC. All rights reserved.
Agenda
 About the Speaker
 Introduction to the Dublin Core
 Co-Evolution of the Dublin Core and the Semantic Web
 Timeline
 Communities
 Use of the Dublin Core
 In the NASA Taxonomy
 In a Client Engagement
 Current DCMI Activities and Directions
Taxonomy Strategies LLC The business of organized information
2
About the Speaker: Ron Daniel, Jr.
http://www.taxonomystrategies.com/html/rondaniel.htm
 Over 15 years in the business of metadata & automatic
classification
 Principal, Taxonomy Strategies
 Standards Architect, Interwoven
 Senior Information Scientist, Metacode Technologies (acquired by
Interwoven, November 2000)
 Technical Staff Member, Los Alamos National Laboratory
 Metadata and taxonomies community leadership.
 Chair, PRISM (Publishers Requirements for Industry Standard Metadata)
working group
 Acting chair, XML Linking working group
 Member, RDF working groups
 Co-editor, PRISM, XPointer, 3 IETF RFCs, and Dublin Core 1 & 2
reports.
Taxonomy Strategies LLC The business of organized information
3
Agenda
 About the Speaker
 Introduction to the Dublin Core
 Co-Evolution of the Dublin Core and the Semantic Web
 Timeline
 Communities
 Use of the Dublin Core
 In the NASA Taxonomy
 In a Client Engagement
 Current DCMI Activities and Directions
Taxonomy Strategies LLC The business of organized information
4
Dublin Core: A little more complicated over time
Elements
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
Identifier
Title
Creator
Contributor
Publisher
Subject
Description
Coverage
Format
Type
Date
Relation
Source
Rights
Language
Refinements
Abstract
Access rights
Alternative
Audience
Available
Bibliographic citation
Conforms to
Created
Date accepted
Date copyrighted
Date submitted
Education level
Extent
Has format
Has part
Has version
Is format of
Is part of
Is referenced by
Is replaced by
Is required by
Issued
Is version of
License
Mediator
Medium
Modified
Provenance
References
Replaces
Requires
Rights holder
Spatial
Table of contents
Temporal
Valid
Taxonomy Strategies LLC The business of organized information
Encodings
Box
DCMIType
DDC
IMT
ISO3166
ISO639-2
LCC
LCSH
MESH
Period
Point
RFC1766
RFC3066
TGN
UDC
URI
W3CTDF
Types
Collection
Dataset
Event
Image
Interactive
Resource
Moving Image
Physical Object
Service
Software
Sound
Still Image
Text
5
Dublin Core: Even more complicated over time
Elements
Abstract
Access rights
Accrual Method
Accrual Periodicity
Accrual Policy
Alternative
Audience
Available
Bibliographic citation
Conforms to
Contributor
Created
Creator
Coverage
Date
Date accepted
Date copyrighted
Date submitted
Description
Education level
Extent
Format
Has format
Has part
Has version
Identifier
Instructional Method
Is part of
Is referenced by
Is replaced by
Is required by
Issued
Is version of
Language
License
Mediator
Medium
Modified
Provenance
Oublisher
References
Relation
Replaces
Requires
Rights
Rights holder
Source
Spatial
Subject
Table of contents
Temporal
Title
Type
Valid
Is format of
Taxonomy Strategies LLC The business of organized information
Encodings
DCMIType
DDC
IMT
LCC
LCSH
MESH
NLM
TGN
UDC
Box
ISO3166
ISO639-2
ISO639-3
Period
Point
RFC1766
RFC3066
URI
W3CTDF
Types
Collection
Dataset
Event
Image
Interactive
Resource
Moving Image
Physical Object
Service
Software
Sound
Still Image
Text
Classes
Agent,
AgentClass
BibliographicResource
FileFormat
Frequency
Jurisdiction
LicenseDocument
LinguisticSystem
Location
LocationPeriodOrJurisdiction
MediaType
MediaTypeOrExtent
MethodOfAccrual
MethodOfInstruction
PeriodOfTime
PhysicalMedium
PhysicalResource
Policy
ProvenanceStatement
RightsStatement
SizeOrDuration
Standard
6
Current Efforts in the DCMI
 New Elements and
Vocabularies?
 Very few. DCMI is pushing mixed
vocabulary approaches.
 Application Profiles
 Collection Description, Education,
Government, Libraries, …
 Singapore Framework
 Defines a set of descriptive
components that are necessary or
useful for documenting an
Application Profile.
 Describes how these documentary
standards relate to standard domain
models and Semantic Web
foundation standards.
 Abstract Model
Taxonomy Strategies LLC The business of organized information
7
DCMI Abstract Model
 An information model which is independent of any particular
encoding syntax.
 Facilitates the development of better mappings and cross-syntax translations.
 Composed of three main parts
 Resource Model
 Description Set Model
 Vocabulary Model
 Strong basis on RDF.
Resource Model
Taxonomy Strategies LLC The business of organized information
Vocabulary Model
8
Agenda
 About the Speaker
 Introduction to the Dublin Core
 Co-Evolution of the Dublin Core and the Semantic Web
 Timeline
 Communities
 Use of the Dublin Core
 In the NASA Taxonomy
 In a Client Engagement
 Current DCMI Activities and Directions
Taxonomy Strategies LLC The business of organized information
9
Co-Evolution Timeline
Dublin Core
Second WWW Conference (Chicago)
OCLC/NCSA Metadata Workshop; First DC Report
Warwick Framework; Second DC Report
Dublin Core Metadata Element Set 1.0
Encoding Dublin Core in HTML
Dublin Core Qualifiers
DCMI Terms, DCMI Type Vocabulary, Expressing DC in
HTML/XHTML.
Shift to Application Profiles and away from more
elements.
DCMI Abstract Model, Guidelines for Encoding
Bibliographic Citation Information in DC Metadata
Expressing DC Metadata using RDF
Semantic Web
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
Taxonomy Strategies LLC The business of organized information
First RDF working draft released
RDF Model and Syntax Specification as W3C Rec.
RDF Schema Specification as Candidate Rec.
RDF Core WG, WebONT WG formed
RDF Schema, RDF Concepts and Abstract Syntax, OWL
Specifications as RECs
GRDDL REC
SPARQL RECs
10
Communities within the DCMI
 The participants at the OCLC/NCSA Metadata Workshop were
“geeks, freaks, and the people in sensible shoes”
 Web and Internet Engineering Task Force particpants
 SGML practitioners
 Librarians and Library Standards participants
 Multiple formats have always been an issue
 IAFA templates, HTML, “dot-kludge”, XML, RDF, …
 DCMI has NEVER been a standards body for leading–edge
technology
 Mix of participants with strong representation from libraries and
technologists
 Provides a place to try out technologies on information problems
 Has conservative and liberal wings
– Conservatives tend to the basic 15 elements
– Liberals tend to the Abstract Model, Singapore Framework, etc.
 Semantic Web technology is not a good fit with the librarian culture
 Explained blank nodes to someone off the street lately?
 “You don’t have to be an automotive engineer to drive a car” – Tom Baker
Taxonomy Strategies LLC The business of organized information
11
Agenda
 About the Speaker
 Introduction to the Dublin Core
 Co-Evolution of the Dublin Core and the Semantic Web
 Timeline
 Communities
 Use of the Dublin Core
 In the NASA Taxonomy
 In a Client Engagement
 Current DCMI Activities and Directions
Taxonomy Strategies LLC The business of organized information
12
NASA Taxonomy: Metadata Specification
Bold fieldnames are from Dublin Core or DC Terms
Title
Element Name and
Namespace
dc:title
Field Name
Audience
Element Name and
Namespace
dcterms:audience
Creator
dc:creator
Access Controls
dcterms:accessControls
Creator Affiliation
dc:creator.affiliation
Language
dc:language
Subject
dc:subject
Rights
dc:rights
Field Name
Description
dc:description
Missions and Projects nasa:missionsProjects
Publisher
dc:publisher
Date
dc:date
Type
dc:type
Format
Workforce
Competencies
Instruments
nasa:workforceCompetencies
dc:format
Business Purpose
nasa:businessPurpose
Identifier
dc:identifier
nasa:workBreakdownStructure
Coverage
dc:coverage
Work Breakdown
Structure
Keywords
Taxonomy Strategies LLC The business of organized information
nasa:instruments
nasa:keywords
13
NASA Taxonomy: Instruments Vocabulary Sample
<?xml version="1.0" encoding="ISO-8859-1"?>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#‘
xmlns:rdfs='http://www.w3.org/2000/01/rdf-schema#‘
xmlns:skos='http://www.w3.org/2004/02/skos/core#‘
xmlns:nt2='http://nasataxonomy.jpl.nasa.gov/cvFields#‘
xmlns:dcterms='http://purl.org/dc/terms/‘
xmlns:dc='http://purl.org/dc/elements/1.1/'>
<skos:Concept rdf:about='inst:8'>
<skos:prefLabel>Cameras</skos:prefLabel>
<skos:broader rdf:resource='inst:1'/>
<skos:narrower rdf:resource='inst:9'/>
<skos:narrower rdf:resource='inst:10'/>
<skos:narrower rdf:resource='inst:11'/>
<skos:narrower rdf:resource='inst:12'/>
<nt2:status>Approved</nt2:status>
<nt2:type>Descriptor</nt2:type>
<nt2:code>8</nt2:code>
<nt2:inputdate>2004-05-01</nt2:inputdate>
<dcterms:dateAccepted>2004-06-11</dcterms:dateAccepted>
<dcterms:modified>2004-06-11</dcterms:modified>
</skos:Concept>
…
Taxonomy Strategies LLC The business of organized information
14
NASA taxonomy demo
Technology: Siderean
http://demo.siderean.com/NASADemoV4/NASA-demoquery1.jsp
Search collection
Shows top categories in
ascending order
Shows distribution of entire
collection across taxonomy
facets.
Multiple resources from
heterogeneous sources
are searched as single
collection
Taxonomy Strategies LLC The business of organized information
Click on arrows to re-sort
by frequency, or switch to
descending order
15
NASA taxonomy demo: Search on “Rover”
Refine search results
Re-sort search results
More filters based
on this result
Click to see
source document
Click to refine
search by subject
Click to refine search
by collection
More
filters
Taxonomy Strategies LLC The business of organized information
16
Agenda
 About the Speaker
 Introduction to the Dublin Core
 Co-Evolution of the Dublin Core and the Semantic Web
 Timeline
 Communities
 Use of the Dublin Core
 In the NASA Taxonomy
 In a Client Engagement
 Current DCMI Activities and Directions
Taxonomy Strategies LLC The business of organized information
17
Use of Dublin Core and DC Terms in a Client
Metadata Specification
Field
Near Equivalent
id
identifier
url
identifier
Field
Near Equivalent
audience
coverage.ward
coverage
bytecount
accessControl
access rights
coverage.neighborhood coverage
title
publisher
briefTitle
publisherType
headline
contributor
subhead
keywords
description
date
thumbnail
date.reviewed
date accepted
language
date.nextReview
date
type
date.lastModified
date submitted
date.embargoed
date
format
topic
subject
Taxonomy Strategies LLC The business of organized information
18
Agenda
 About the Speaker
 Introduction to the Dublin Core
 Co-Evolution of the Dublin Core and the Semantic Web
 Timeline
 Communities
 Use of the Dublin Core
 In the NASA Taxonomy
 In a Client Engagement
 Current DCMI Activities and Directions
Taxonomy Strategies LLC The business of organized information
19
DCMI Recent Activity
 The future direction of the DCMI will emphasize:
 Mixed-vocabulary use of the Dublin Core elements (initial 15 plus
additions).
 The maintenance of the standards.
 The support of the community around those standards.
 DCMI Berlin conference has just concluded
 Levels of Interoperation
– Level 1: Shared natural language definitions
– Level 2: Common semantic model (RDF)
– Level 3: Shared notion of description sets
– Level 4: Shared use of constraints and functional requirements
 RDA (Resource Description and Access) – Possible successor to Anglo-
American Cataloging Rules (AACR).
– Builds on FRBR (Functional Requirements for Bibliographic Records)
– Some are pushing for this to be expressed in RDF.
– Lots of testing needed
 What will follow the MARC format and the AACR?
Taxonomy Strategies LLC The business of organized information
20
Predicted Directions
 DCMI will continue to investigate Semantic Technologies,
make them more accessible to the library community, and
have some requirements and testing input to the
Semantic Web.
 DCMI will continue to display multiple personalities.
 Dealing with multiple formats will remain important.
 Mappings from fields in various systems will continue to limit
sophistication of solutions.
 Some DCMI participants will drive towards more
sophisticated information applications, e.g.
 Library of Congress Subject Headings published in SKOS
 Research into areas such as RDA (described earlier).
Taxonomy Strategies LLC The business of organized information
21
Taxonomy Strategies LLC
More Information:
www.taxonomystrategies.com
[email protected]
[email protected]
Sept. 30, 2008
Copyright 2008 Taxonomy Strategies LLC. All rights reserved.