LBSC670_class04_metadata_092011

Download Report

Transcript LBSC670_class04_metadata_092011

LBSC 670
Information Organization
Thoughts from last class
•
•
•
•
“I feel like we are getting behind”
“Why are we learning HTML/CSS?”
“What is cloud computing?”
“Can we have printouts again?”
LITA National Forum
http://www.ala.org/
LITA - Free your metadata
• http://freeyourmetadata.org/ (Amalia)
• http://code.google.com/p/google-refine/
LITA – Digital Institutes
Data Services Librarian
Advocate for data publishing, research, curation, collaboration
Class Plan
• Explore historical foundation of
cataloging
• Identify metadata standards central to
cataloging
• Explore metadata schemas useful for
libraries
Review
• HTML implements a metadata schema
(e.g h1, h2, DOM. . .) and an encoding
system (e.g XHTML) in concert with
supporting technologies (e.g. CSS,
JavaScript)
• Digital documents have embedded
structure that programs use to encode
and decode information for use
Storage
• Relational database
– Tables, SQL, indexes, abstracted but semi-fixed structure
• Object databases
– Storage of objects which are directly accessible via programs
• Flat text files
– embedded structure, tight association with application, quick,
simple
• XML files
– Abstracted structure, portable, extensible, slower?
• Embedded in digital objects
– Portable, associative
Representation of objects
Object
Representation Record
Encoding model
Metadata definitions
• Common
• Data about Data
• Data that describes a resource
• Information about Information
• Gillian-Swetland, Baca
• "the sum total of what one can say about any information
object at any level of aggregation.“
• Content, Context, Structure
• Greenberg
• Structured data about an information object that facilitates
functions associated with the designated object
Metadata Life-cycle
•
•
•
•
Codification
Storage
Use/Reuse
Scalability
Gilliand, 2007
Gilliand, 2007
Standards Types
• Data structure standards
– Standards that govern the scope and
purpose of a metadata record (MARC,
Dublin Core, Text Encoding Initiative (TEI))
• Data communication standards
– Encoding (e.g., HTML/XHML, XML)
• Data syntax standards
– Element ordering, content syntax, and
encoding syntax (e.g. date/time syntax)
Cataloging purposes
• “A list of books, maps, and other items
arranged in some definite order” (cutter)
– Discovery
• Catalogs, indexes, databases
– Management
• Technical and administrative metadata
– Access
• User interfaces (OPACS)
Dewey’s rules for cataloging
• Based on:
– Panizzi’s 91 rules for cataloging
• British museum
– Charles coffin Jewett
• Smithsonian librarian 1852
• Standard cataloging, printed entries on cards
• Dewey
• Alphabetic ordering by subject
• System of all knowledge
• Classification – browse http://dewey.info/
A quick history of classification
• 245 BCE – Callimachus creates Pinakes
– 120 volume catalog for 400 scrolls
– Title, author, teachers, biography
– Six genres (rhetoric, law, epic, tragedy, lyric poetry,
history, medicine, mathematics, natural science, misc)
•
•
•
•
•
48 BC – Alexandria burns
. . .Then for a long time nothing happened  . .
1876 Dewey Decimal System
1882 – Charles Cutter – Cutter classification
1897 – Herbert Putnam – LC Classification
Images from Wikipedia 
Book Metadata
(circa 1960)
Book Metadata (circa 1980)
• 100 2_ |a Berners-Lee, Tim.
• 245 10 |a Weaving the Web : |b the original design and ultimate
destiny of the World Wide Web by its inventor / |c Tim BernersLee with Mark Fischetti.
• 250 __ |a 1st ed.
• 260 __ |a San Francisco : |b Harper SanFrancisco, |c c1999.
• 300 __ |a xi, 226 p. ; |c 25 cm.
• 500 __ |a Includes index.
• 650 _0 |a World Wide Web |x History.
• 600 20 |a Berners-Lee, Tim.
• 700 1_ |a Fischetti, Mark.
• 856 42 |3 Publisher description |u
http://www.loc.gov/catdir/description/hc044/99027665.html
http://www.oclc.org/bibformats/en/default.shtm
Book Metadata (circa 2002)
Library uses of metadata
• Descriptive cataloging
• Inventory of holdings
• Technical and administrative
metadata about acquisitions
• Interoperability with other systems
• Facilitating acquisition decisions
• Federate searches from other catalogs
Cataloging process
An example MARC record
1. Descriptio
3. Headings
4. References
Anatomy of a bibliographic
record
Encoding
Standards
(MARC /
MARCXML)
Content/syntax
Standards
(AACR / RDA)
Classification Systems
(LCSH / DDC)
AACR2 processes
•
•
•
•
•
•
•
•
Area 1:
Area 2:
Area 3:
Area 4:
Area 5:
Area 6:
Area 7:
Area 8:
title, statement of responsibility
edition
material type
publication, distribution
physical description
series
notes
standard number, terms
How to enter a title into a
MARC record
– AACR2
• Transcribe title exactly according to spelling but not necessarily
punctuation/capitalization.
• If an alternative title is present, precede it by a comma following
the regular title
• Use a General Material Designation in brackets []
– MARC Standard
•
•
•
•
•
Use 245 field – indicates Main title
Indicator 2 – Number of non-filing characters (leading articles)
Subfield a – main title
Subfield b – remainder of title
Subfield h – General Material Designation in brackets []
Dublin Core Overview
• Created out of a 1995 meeting in Dublin
Ohio
• An intentionally simple standard focused
on resource description
• DCMI conference (2007)
• Enjoys widespread adoption in Library
and Digital library community, particuarly
as a lowest-common-denominator
standard
Initial Dublin Core
• Focused on Digital Document-likeobjects
• Simple description, human based
• Focus on descriptive metadata over
technical, preservation, use metadata
Dublin Core (1.0 -1995)
1. Subject: The topic addressed by the work
2. Title: The name of the object
3. Author: The person(s) primarily responsible for the intellectual content of the
object
4. Publisher: The agent or agency responsible for making the object available
5. OtherAgent: The person(s), such as editors and transcribers, who have made
other significant intellectual contributions to the work
6. Date: The date of publication
7. ObjectType: The genre of the object, such as novel, poem, or dictionary
8. Form: The physical manifestation of the object, such as Postscript file or
Windows executable file
9. Identifier: String or number used to uniquely identify the object
10. Relation: Relationship to other objects
11. Source: Objects, either print or electronic, from which this object is derived, if
applicable
12. Language: Language of the intellectual content
13. Coverage: The spatial locations and temporal durations characteristic of the
object
Weibel, 1995
Dublin Core (1.1 - 1999)
• Title
• Author or Creator
• Subject and
Keywords
• Description
• Publisher
• Other Contributor
• Date
• Resource Type
•
•
•
•
•
•
•
Format
Resource Identifier
Source
Language
Relation
Coverage
Rights
Management
Qualified Dublin Core (Current)
• 71 properties, 35 classes . . .(Registry)
• Expansion of scope/purpose
• Multiple encoding models
(HTML/XHTML, XML, RDF)
• Addition of Application Profile concept
A possible record
•
•
•
•
•
•
•
•
•
•
•
•
Title:
Subject:
Subject:
Subject:
Subject:
Creator:
Creator:
Created:
Identifier:
Publisher:
Language:
Description:
•
•
Format:
Rights:
New Web language promises smarter surfing
World Wide Web
Extensible Markup Language
World Wide Web Consortium
Standards, Web
Heid, Jim
Glenn McDonald
01/07/1998
http://www.cnn.com/TECH/computing.......
Cable News Network
en
This article discusses the recent adoption of XML by the
W3C as a standard and its possible uses in a web
environment
text/html
All Rights Reserved
Dublin Core Abstract Model
HTML Encoding of DC
Example
• Title: Weaving the Web: The Original Design
and Ultimate Destiny of the World Wide Web
• Author: Tim Berners-Lee
• Subject: World Wide Web
• Publisher: Collins
• Date: 2000
• Language: English
• ISBN-13: 978-0062515872
Work time
• Complete pages 1-4 of the worksheet
– What is Dublin Core
– Creating a Dublin Core record
Issues in cataloging
• focus of 'by-value' cataloging instead of
by-reference means that consistency is
poor
• focus on text identifiers (title, author)
over unique IDs means record
duplication is rampant
• focus on traditional descriptive
measures limits effectiveness in new
discovery systems that do not respect
complex metadata
New concepts in cataloging
• RDA: Resource description and
analysis
• FRBR: Functional requirements for
bibliographic records
• FRAD: Functional requirements for
authority data
• FRSD: Functional requirements for
subject data
Resource Description and
Analysis
• RDA is an update to the AACR2
• RDA uses a new data model (FRBR)
• RDA includes new MARC fields
• http://www.loc.gov/marc/formatchangesRDA.html
• RDA is not yet implemented
Addresses user tasks
FRBR:
• Find
• Identify
• Select
• Obtain
FRAD:
• Find
• Identify
• Contextualize
• Justify
• ICP’s highest principle = “convenience of
the user”
40
Slide from http://www.loc.gov/aba/rda/training_modules.html
FRBR’s Entity-Relationship
Model
• Entities
• Relationships
• Attributes (data elements)
relationship
One Entity
Another Entity
• National level required elements
41
Slide from http://www.loc.gov/aba/rda/training_modules.html
Work
Person
FRBR’s EntityRelationship Model
created
was created by
Shakespeare
Hamlet
42
Slide from http://www.loc.gov/aba/rda/training_modules.html
Terminology
• FRBR and FRAD “attributes” are
“elements” in RDA = identifying
characteristics
• FRBR and FRAD Group 1 entities:
– Work
– Expression
– Manifestation
– Item
43
Slide from http://www.loc.gov/aba/rda/training_modules.html
FRBR
• Functional requirements for
bibliographic records
– group 1 - Entities -work, expression,
manifestation, item
– group 2 - person or corporate bodies
responsible for a work (FRAD)
– group 3 - subjects - concepts, events,
places. . . (FRSD)
FRBR Model
http://fictionfinder.oclc.org/
http//worldcat.org
http://www.frbr.org
http://www.ifla.org/
FRBR components
• Work
– distinct intellectual or artistic creation
• Expression
– intellectual or artistic realization of a work
• Manifestation
– physical embodiment of an expression of a
work
• Item
– a single exemplar of a manifestation
http://frbr.oclc.org/pages/Pages?sn=460059802&instname=
Adapted from Jane Greenberg
FRBR Example
• Rolling Stones’ IT'S ONLY ROCK-N –
ROLL (1974) (work)
– Group’s performance recorded for the
album (Expression)
• Recording released in 1974 by MCA
Records on tape cassette (Manifestation)
• Recording released in 1974 by MCA
Records on compact disc (Manifestation)
• Sheet music released in 1992 (?)
Adapted from Jane Greenberg
FRBR diagram
I: Your
CD, RCA,
2005 c.1
M: CD,
RCA, 2005
E: Music
and lyrics
M: RS, LP
1974
I: UNC
Musllib.CD,
RCA, 2005
c.3
I: My CD,
RCA, 2005
c.2
E: Music
(just
the instruments)
Work, the
Performance (1974)
M: 8-track,
RCA, 1975
Adapted from Jane Greenberg
FRBR Algorithm (1)
• Process
– Extract Author
• Construct Authority author entry from100, 400 using
subfields and 008 data to limit
– Extract Title
• Construct Authority title entry from 130, 240, 245, etc.
Normalize using NACO
– Combine these two authorities to create a unique
Work identifier
• <author>Mitchell, Margaret</author><title>Gone with the
wind</title>
FRBR Algorithm (2)
• Results from a sample extraction (From
FRBR doc)
•
•
•
•
<author>/<title>
<uniform title>
/<title>/[one or more <name>]
/<title>/<control number>
(75.97%)
(1.34 %)
(17.35%)
(5.34%)
• http://www.oclc.org/research/software/frbr/frbr_
workset_algorithm.pdf
Worktime
• Complete pages 5 & 6
– Mapping DC to MARC
Metadata tools
Tool Type
Uses
Conversion / Crosswalk
Migrate data from one form to
another
Creation
Automatic or semi-automatic
creation of metadata
Extraction / Harvesting
Pull metadata from digital objects or
systems for use/re-use
Evaluation
Validate schema or encoding of
metadata records
Searching
Facilitate discovery and use of
metadata
Evaluation
• Metadata evaluation methods
• Greenberg Review (2002)
– Toezer (1999)
• Accuracy, completeness, consistency,
timeliness, and intelligibility
– Rothenberg (1996)
• Correctness, appropriateness
– Zeng (1993)
• Specificity, exhaustivity, record completeness
Evaluating Representation
• Completeness, specificity, exhaustivity
• Did the record capture essential elements of
the object?
• Does the encoded record differentiate
appropriately between elements?
• Document/Index surrogation, retrieval
• Is this a surrogate/abstraction and not a
codification of the resource?
• Is the level of surrogation/abstraction
appropriate for storage/retrieval/use goals?
Evaluating Representation
• Accuracy, consistency
• Are the details of abstraction correct?
• Is the content represented/encoded accurately?
• Utility, effectiveness, timeliness
• Is the representation appropriate for a given
audience and use?
• Does the representation solve an information
need?
Worktime
• Complete pages 7-10 – Metadata tools
and evaluation
Next Week
• Online
– Read, complete worksheet, iscuss
• Encoding systems
– XML overview
– More on MARC encoding
• Assignment 1 questions
http://bit.ly/lbsc670_questions