Metadata and Controlled Vocabularies

Download Report

Transcript Metadata and Controlled Vocabularies

Taxonomy Strategies LLC
Metadata and Controlled
Vocabularies
Global Corporate Circle Working Session
Joseph Busch
28 August 2007
Copyright 2007 Taxonomy Strategies LLC. All rights reserved.
Focus of this session
 Best practices for specifying and using controlled
vocabularies in DC-compliant information management
applications.
 Tradeoffs and best practices around organization-dependent vs.
sharable common controlled vocabularies.
 Tagging content for internal vs. external audiences using the
same metadata and controlled vocabularies.
 When and how to map different taxonomies to each other.
Taxonomy Strategies LLC The business of organized information
2
For us, taxonomy work includes:
 Metadata specification defines
the properties needed to
describe content so that it can
be found & used.
 Vocabularies are collections of
terms that are used to specify
some of the metadata
properties.
 Some vocabularies are big
and hierarchical, some are
small and flat.
 An application profile specifies
what metadata & vocabularies
are required, and then
represents them formally.
Taxonomy Strategies LLC The business of organized information
3
Best practices (1)
 Intranet and public taxonomies should be based on a




common metadata specification and shared value
vocabularies.
Some metadata attributes are directly mapable to DC,
some will be local (locally declared).
Use qualified Dublin Core attributes.
Some vocabularies are sharable industry standards,
while others will be organization-dependent.
Some value vocabularies will be particularly relevant to
intranet content.
Taxonomy Strategies LLC The business of organized information
4
FDA Metadata specification (excerpt)
Element
Data
Type
Length
Req. /
Repeat
Source
Purpose
Asset Metadata …
Title
String
Variable
1
User supplied
Content Type
String
Variable
1
Local Value Voc
Center
String
Variable
1
Local Value Voc
Date
Date
Fixed
1
System supplied
Text search & results display.
Group & filter search results.
Publish, feature, review content.
Subject Metadata …
Activity
String
Variable
*
Local Value Voc
Law
String
Variable
*
Standard Value Voc
Product
String
Variable
*
Standard Value Voc
Brand
String
Variable
*
Standard Value Voc
Company
String
Variable
*
Standard Value Voc
Condition
String
Variable
*
Standard Value Voc
Topic
String
Variable
*
Local Value Voc
Search for, browse, group &
filter search results.
Link Metadata …
Relation
String
Variable
*
Validate by lookup
Reference related resources.
Use Metadata …
Audience
String
Variable
*
Local Value Voc
Geography
String
variable
*
Standard Value Voc
Legend:
? – 1 or more
Taxonomy Strategies LLC The business of organized information
Target, personalize content.
* - 0 or more
5
FDA Metadata specification (excerpt)
Element
Data
Type
Length
Req. /
Repeat
Source
Purpose
Asset Metadata …
Title
DC.Title
String
Variable
1
User supplied
Content Type
DC.Type
String
Variable
1
Local Value Voc
Center
DC.Publisher
String
Variable
1
Local Value Voc
Date
DC.Date
Date
Fixed
1
System supplied
Text search & results display.
Group & filter search results.
Publish, feature, review content.
Subject Metadata …
Activity
Local
String
Variable
*
Local Value Voc
Law
Local
String
Variable
*
Blue Book
Standard
Value Voc
Product
Local
String
Variable
*
Orange Book
Standard
Value Voc
Brand
Local
String
Variable
*
Orange Book
Standard
Value Voc
Company
Local
String
Variable
*
Yellow Book
Standard
Value Voc
Condition
Local
String
Variable
*
ICD9
Standard
Value Voc
Topic
DC.Subject
String
Variable
*
Local Value Voc
Search for, browse, group &
filter search results.
Link Metadata …
Relation
DC.Relation
String
Variable
*
Validate by lookup
Reference related resources.
Use Metadata …
Audience
DCterms.Audience
String
Variable
*
Local Value Voc
Geography
DC.Coverage
String
variable
*
Standard
USGS Value Voc
Legend:DC.Language=“en”
? – 1 or more * - 0 or more
DC.Format=“text/html”,
Taxonomy Strategies LLC The business of organized information
Target, personalize content.
6
All facets and
sub-facets
Center
FDA* Taxonomy
Geography
Subject
Audience
Type
Activity
Law
Product
Brand
Company
Condition
Topic
* U.S. Food and Drug Administration
Taxonomy Strategies LLC The business of organized information
7
Intranet facets– a
taxonomy subset
Center
FDA Taxonomy*
Geography
Activity
Administration
Law
Application & Approval
Grant-Making & Sponsorship
Product
Investigation & Enforcement
Public Awareness
Audience
Type
Consumers
Directories
Employees
Dockets
Healthcare
Forms
Industry
Instructions & How-To
Job Information
News
Brand
Research
Rule-Making
Subject
Policies & Procedures
Product Alerts
Company
Training & Education
Product Information
Product Lists
Condition
Publications
Recalls
Topic
Subject Indexes
Tools & Databases
Transcripts & Statements
Warning Letters
* U.S. Food and Drug Administration
Taxonomy Strategies LLC The business of organized information
8
FDA.gov tagging example:
Information about what to do about bad spinach.
Taxonomy Facet
Tag Values
DC.Type
Recalls
DC.Publisher
CFSAN
DC.Subject.Activity
Public Awareness
DC.Subject.Law
n/a
DC.Subject.Product
Food: Produce
DC.Subject.Brand
n/a
DC.Subject.Company
n/a
DC.Subject.Condition
Gastroenteritis
DC.Subject.Topic
Food Safety
DCterms.Audience
Consumers
DC.Coverage
n/a
Taxonomy Strategies LLC The business of organized information
9
FDA.gov tagging example:
Information on “Accutane” for patients.
Taxonomy Facet
Tax Values
DC.Type
Product Information
DC.Publisher
CDER
DC.Subject.Activity
Public Awareness
DC.Subject.Law
n/a
DC.Subject.Product
Drugs: Prescription Drugs
DC.Subject.Brand
Accutane; isotretinoin
DC.Subject.Company
n/a
DC.Subject.Condition
Disease: Acne
DC.Subject.Topic
Drug Information; Consumer Education
DCterms.Audience
Healthcare; Consumers
DC.Coverage
n/a
Taxonomy Strategies LLC The business of organized information
10
Inside.FDA tagging example:
Instructions on how to replace a security badge.
Taxonomy Facet
Tag Values
DC.Type
Forms; Instructions & How-To
DC.Publisher
[applicable organizational dept]
DC.Subject.Activity
Administration
DC.Subject.Law
n/a
DC.Subject.Product
n/a
DC.Subject.Brand
n/a
DC.Subject.Company
n/a
DC.Subject.Condition
n/a
DC.Subject.Topic
n/a
DCterms.Audience
Employees
DC.Coverage
n/a
Taxonomy Strategies LLC The business of organized information
11
Best practices (2)
 Intranet and internet content should share a common
repository, but not replicate the same content in two
places.
 Tag content for appropriate audiences.
 E.g., Public, Internal, Confidential
Intranet
Internet
Intranet
Internet
Intranet
Internet
Public
Public
Internal
Content
Conf
.
Taxonomy Strategies LLC The business of organized information
Internal
Public
Internal
12
Mapping taxonomies
 More complicated approach than multiple attributes with
multiple value vocabularies.
 Cases:
 One-to-one.
 One-to many.
 Parallel, independent hierarchies.
 If mapping is done, then business rules can be used to
 Automatically add attribute values.
 Improve search.
 Create multiple views into the same content.
 An ontology specifies typed associative relationships
 Typically “Is a” relationships.
Taxonomy Strategies LLC The business of organized information
13
Taxonomy mapping
Case
Level
Benefit
Example
One-to-one
Easy
Automatic switching
Ivory Coast = Côte
d’Ivoire
One-to-many
Medium
Automatic hedging
(broadening/
narrowing)
Czechoslovakia =
Czech Republic;
Slovakia
Parallel,
Independent
Hierarchies
Hard
Multiple views of
same information
space
Geographic vs.
Political
Taxonomy Strategies LLC The business of organized information
14
Advanced
relations
Person
Taxonomy
Organization
Location
Products
Audience
Type
Product Line
Technology
Application
Industry
“Is a” Groups
of Products
Taxonomy Strategies LLC The business of organized information
15
Product relationships provide tagging rules for
product groupings
Product names are
consistent labels
Product
Product
Line
Oracle
Business
Activity
Monitoring
Oracle Fusion
Middleware
PeopleSoft
Collaborative
Supply
Management
PeopleSoft
Enterprise
Generic labels
Technology
Application
Industry
Application
Server;
Middleware;
SOA
Siebel Clinical Siebel
Taxonomy Strategies LLC The business of organized information
Supplier
Relationship
Management
Clinical
Life Sciences
& Pharma
16
press room application
http://pressroom.oracle.com/prNavigator.jsp
“Is a” Groups
of Products
Taxonomy Strategies LLC The business of organized information
17
events application
http://events.oracle.com/
“Is located” powers
Google Maps mash-up
“Is a” Groups
of Product
Taxonomy Strategies LLC The business of organized information
18
Taxonomy Strategies LLC
Questions
[email protected]
+1-415-377-7912
28 August 2007
Copyright 2007 Taxonomy Strategies LLC. All rights reserved.
GCC (Global Corporate Circle) Topics
 Change focus to large organizations including
governments & government agencies.
 Enterprise-Wide Metadata Applications Community (EnMAC)
 Is this agreeable?
 2007-2008 activities.
 Best practices case studies.
 Identify and describe projects that are using DC.
– What is the best way to do this?
 Other activities?
Taxonomy Strategies LLC The business of organized information
20