Digital Libraries and Multimedia retrieval techniques

Download Report

Transcript Digital Libraries and Multimedia retrieval techniques

Digital Libraries and Multimedia
Searching
MIT 026B
Winter 2002
Today’s Information Environment
• Library catalogues
• Periodical databases
• Internet resources
HTML
“Hypertext Markup Language”
The language used for mounting documents
on the World Wide Web, so that they can be
formatted and presented in today’s
browsers.
<HTML>
<HEAD><TITLE> Memo to MIT 026 </TITLE>
</HEAD>
<BODY>
To: MIT 026<BR>
From: D.G. Campbell <BR>
Date: March 25, 2002
<p>
It’s a pleasure working with you this term.
</BODY>
</HTML>
Features of Hypertext Markup
Language
• Descriptive rather than procedural markup
• Based on format rather than content
• Creates documents that are designed for
human beings to read, not for machines to
manipulate in any meaningful way
What if……
• Your computer could understand the
semantic meaning of documents?
XML
“Extensible Markup Language”
A meta-language that can be used to create
specialized markup for particular purposes.
What is a “memo”?
Memo
Header
To
From
Body
Date
<XML DTD=“Memo”>
<HEADER>
<TO>MIT 026</TO> <BR>
<FROM>D.G.
Campbell</FROM> <BR>
<DATE>March 25, 2002</DATE>
</HEADER>
<BODY>
It’s a pleasure working with you this term.
</BODY>
</XML>
What if……
• Your computer could understand data and
use that data to produce a document?
“The Semantic Web”
• A vision of Tim Berners-Lee
• A transformation of the current World Wide Web
in which:
• Information is semantically identified so that it
can be retrieved more efficiently
• Data is semantically identified so that it can be
assembled by the computer into meaningful
displays
Web Portal
• A website that gathers together a wide range
of content and services
–
–
–
–
E-mail
List-servs
Search engines
Online shopping services
Web Portals of the Future?
• Hospital Information Systems
• E-Commerce Systems
• Information Systems
Digital Libraries
• A collection of digital resources that have
been created and/or gathered by a particular
administrative body
• The resources may have been in another
format, or may have originated in digital
form
• Sophisticated “library-style” searching is
possible
Multimedia Information
Retrieval
Sound
Video
Images
Metadata
• “Data about data”
• Machine-understandable information about
electronic resources
“The Dublin Core”
A set of simple bits of metadata that can easily
be added to the headers of a Web document.
This “core” can be added to by any
community that has extra elements they
want to add.
There are metadata sets for:
•
•
•
•
•
•
Museums (CIMI)
Archives (Encoded Archival Description)
Geospatial Information (FGDC)
Government Information (GILS)
Art Works (CDWA)
Literature (TEI)
The Dublin Core
•
•
•
•
•
•
•
•
Title
Creator
Subject
Description
Publisher
Contributor
Date
Type
•
•
•
•
•
•
•
Format
Identifier
Source
Language
Relation
Coverage
Rights
What does metadata look like?
• Attributes with values
<METATAG
Name=“Creator” Value=“Campbell, G.”>
Example of Dublin Core Metadata
<META NAME="DC.Title" LANG="en" CONTENT="Thinking critically about
discipline-based World Wide Web resources">
<META NAME="DC.Creator" LANG="en" CONTENT="Grassian, Esther G.">
<META NAME="DC.Subject" LANG="en" CONTENT="World Wide Web">
<META NAME="DC.Description" LANG="en" CONTENT="A website offering
guidelines on selecting reliable World Wide Web resources.">
<META NAME="DC.Publisher" LANG="en" CONTENT="UCLA College Library">
<META NAME="DC.Date.Available" LANG="en" CONTENT="September 6, 2000">
<META NAME="DC.Type" LANG="en" CONTENT="Web resource guide">
<META NAME="DC.Format" LANG="en" CONTENT="HTML">
<META NAME="DC.Identifier" SCHEME="URI" LANG="en"
CONTENT="http://www.library.ucla/libraries/college/help/critical/discipline.htm">
<META NAME="DC.Language" LANG="en" CONTENT="English">
<META NAME="DC.Relation.IsReferencedBy" LANG="en"
CONTENT="http://www.library.ucla/libraries/college/help/critical/index.htm">
<META NAME="DC.Coverage" LANG="en" CONTENT="Concerns evaluation
specifically of discipline-based resources.">
<META NAME="DC.Rights" LANG="en" CONTENT="copyright, Regents of
California">
Where is metadata found?
• In a separate database, just like a library
catalogue
• Embedded in the headers of the documents
themselves
Metadata
header
Web Page
Creator
Title
URL
Subject
__________
Dublin Core
Metadata
Scale
Map Type
Community-Based
Metadata
What’s New about Metadata?
• It takes information retrieval out of the
library.
• A lot of it will be done by Web authors and
publishers.
• There’ll be a lot more variety.
Metadata Harvesting
• A system in which:
– Organizations place their metadata in
repositories
– These repositories make the metadata available
through a special interface
– Software agents (robots) connect to different
repositories, collect metadata records, and
combine them into a new system
The Semantic Web
Toerntiosernet
Toewreitnssart
oiesrethsands
Toerntiosernet
Toerntiosernet
Toerntiosernet
Toewreitnssart
oiesrethsands
Toewreitnssart
oiesrethsands
Toewreitnssart
oiesrethsands
Metadata Harvesting
Toerntiosernet
Toerntiosernet
Toewreitnssart
oiesrethsands
Toerntiosernet
Toewreitnssart
oiesrethsands
Toewreitnssart
oiesrethsands
Toerntiosernet
Toewreitnssart
oiesrethsands
New Developments in
Information Retrieval
• Metadata for Information Retrieval
– Metadata Harvesting
– Semantic Web
• Multimedia Information Retrieval
– Images
– Sound
How do we retrieve non-textual
information?
(dog* OR canine) AND (train*)
Housetraining
dogs the easy
way.
How to train your
dog to fetch your
slippers to
retrieve objects.
Document 1
Canine behaviour
patterns and
their effect on the
training process.
Document 2
Document 3
Image
Image Retrieval
• 20th Century:
– Photography, film, television
• 1965 onward:
– Digital imaging
• 1980s onward:
– Cost-effective digital imaging
Key Players in Image Retrieval
• Fields that are heavily image-dependent:
– Medicine, Architecture, Engineering
• Geographic Information Systems
• Art galleries and museums
• Photograph libraries
Example: William Blake Archive
Problems as Archives Grow:
• Browsing
• Searching
Levels of Detail:
• Primitive attributes
• Logical attributes
• Abstract attributes
Current Methods of Image
Retrieval
• Controlled Vocabularies
– Art and Architecture Thesaurus
– Library of Congress Thesaurus for Graphic
Materials
• Classification
– ICONCLASS
Content-Based Image Retrieval
• Colour
• Texture
• Shape
• Position
Motion Picture Experts Group
(MPEG)
• MPEG 7: “Multimedia Content Description
Interface”
10-Level Indexing
1.
2.
3.
4.
Type/Technique
Global Distribution
Local Structure
Global Composition
1.
2.
3.
4.
5.
6.
Generic Object
Generic Scene
Specific Object
Specific Scene
Abstract Object
Abstract Scene
SYNTAX FEATURES
SEMANTIC FEATURES
Syntax Features
Semantic Features
The hills are alive…..
Music Retrieval
Traditional Methods of Music
Retrieval
• Standard Metadata Elements
– Composer, lyricist, opus number, date of
composition, etc.
Traditional Methods of Music
Retrieval: the “Incipit”
Beethoven, Ludwig van. Romance for violin and
orchestra. Ed. Zino Francescatti. Opus 50.
Retrieving Music on the Basis of
an Input Melody
• Retrieval is based on the variations in pitch
from one note to another, rather than on the
absolute pitch of the notes themselves
• Retrieval is enhanced by the directions of
the intervals, up or down
• Retrieval often depends on a clear
segmentation of the notes: “ta” or “da”
Problems
• Making the system “forgiving” to
inexperienced singers
• Variations in popular tunes
• Size and complexity of database records