Lecture9 (expanded lecture8)
Download
Report
Transcript Lecture9 (expanded lecture8)
Multimedia Information Retrieval
• Unlike alphanumeric data, multimedia data do not have
any semantic structure
• Achieving symmetry between annotation and query is
difficult
• Retrieval is based on similarity between query and stored
information instead of exact match
• Stored information is represented using indexing
IR Model
• Information is preprocessed to extract features and
semantic contents
• Indexed based on these features and semantics
• User’s query is processed and main features are extracted
• Query’s features are then compared with features or index
of each information item in the database
• Information item whose features are similar to those of the
query are retrieved and presented to the user
Design Issues
• Indexing
– a mechanism that reduces the search space of an
operator without losing any relevant information
• Similarity Computation
– easy to compute and should conform to human
judgement
Performance Measures
• Retrieval speed, recall, precision
• Recall measures the ability of retrieving relevant
information items from the database
– defined as the ratio between the number of retrieved relevant items
and the total number of relevant items in the database
• Precision measures retrieval accuracy
– defined as the ratio between the number of retrieved relevant items
and the number of total retrieved items
• Recall and precision are usually considered together
– high recall and low precision
– high precision and low recall
Text Retrieval
• Text may be used to annotate other media such as audio,
images and video and conventional IR techniques used to
retrieve multimedia information
• Boolean IR systems or text-pattern search systems
• Substantial effort is spent in analyzing the contents of the
documents and in generating keywords and indices
• Boolean queries are keywords connected with logical
operators (AND, OR, NOT)
File Structures
• Flat files
• Inverted files
– for each term a separate index is constructed that stores the
document identifiers for all documents containing the term
– each term and the document IDs containing the term are organized
into one row
– searching and retrieval is fast because only rows containing the
query terms need to be retrieved and there is no need to search the
whole database
Extensions
• Nearness parameters used in query specification help
define the topic more precisely and therefore increase
probable relevance of the retrieved item
• Within Sentence and Adjacency specification in queries
• Term location information is included in the inverted file
– Term i : document id, paragraph no., sentence no., word no.
• For example, if an inverted file has the following entries:
information: R99, 10, 8, 3; R155, 15, 3, 6; R166, 2, 3,1
retrieval: R77, 9, 7, 2; R99, 10, 8, 4; R166, 10, 2, 5
Indexing
• Stop words -- grammatical functional words, such as “of,”
“the,” and “a.”
• Stemming -- reducing words to a common root form
• Thesaurus -- list of synonyms
• Weighting -- term significance derived from occurrence
frequency within a document and among different
documents
Relevance Feedback
• Query modification
– terms occurring in documents previously identified as relevant are
added to the original query or the weight of such terms is increased
– terms occurring in documents previously identified as irrelevant
are deleted from the query or the weight of such terms is reduced
• Document modification
– terms in the query, but not in the user-judged relevant documents,
are added to the document index list with an initial weight
– weights of index terms in the query and also in relevant documents
are increased by a certain amount
– weights of index terms not in the query but in the relevant
documents are decreased by a certain amount
Problems with Annotation
• Automatic generation of descriptive key words or
extracting semantic information to build classification
hierarchies for broad varieties of images
• Involving human operators makes the process timeconsuming and subjective
– retrieval fails if the user forms a query based on key words not
employed by the operator
– retrieval fails if the query refers to elements of image content that
were not described
– certain visual properties, textures and shapes, are difficult or nearly
impossible to describe with text for general-purpose usage
Content-based IR
• Retrieve visual data using queries based on the visual
content of an image/video : patterns , colors, textures, and
shapes, layout and location information
– when it is necessary to verify that a trademark or logo
has not been used by another comapany
– comparing fabric patterns
• Search is driven by first establishing one or more sample
images and then identifying specific features of those
sample images which need to match images from the
database
Audio Search and Retrieval
• Keywords can be highly subjective because of a different
perspective or even a different taxonomy
• Hard to browse directly since it must be auditioned in realtime (unlike video which can be keyframed)
• Two categories : Speech and Non-speech
– with speech, indexing and retrieval is based on obtaining spoken
words either manually or by speech recognition technique
– with non-speech, indexing and retrieval may be based on text
annotation (but will it help a query like “find the first occurrence of
the note G-sharp.”)
Image Database Issues
• Selection, derivation, and computation of image features
and objects that provide useful query expressiveness
• Retrieval methods based on similarity, as opposed to exact
matching
• User interface that supports the visual expression of
queries and allows query refinement and navigation of
results
• Indexing which is compatible with the expressiveness of
the queries
• A system architecture that supports this approach
Color Analysis
• Color distribution represented as a histogram of intensity values each
of whose bins corresponds to a range of pixel values
• Histograms are compared by an intersection operation.
• This sum may be interpreted as enumerating the number of pixels
which are common to both histograms
• This value may be normalized by the total number of pixels in one of
the two histograms
• Computationally expensive -- O(NM) where N is th enumber of
histogram bins and M is the total number of images in the database
Color Analysis (contd.)
• Reduce search time by reducing the number of histogram bins
– transform RGB representation (coarse segmentation of color
space)
– apply clustering technique to determine K best colors in a given
color space (clustering process takes into account the color
distribution of images over the entire database)
– a small number of histogram bins tend to capture the majority of
pixels of an image; only largest bins in terms of pixels counts need
be selected as representation of any histogram. As long as the bins
of the query and image histograms are appropriately matched,
intersection may be computed over this reduce set.
Color Analysis (contd.)
• Disadvantages:
– histogram-based similarity computation lacks
information about location (this problem may be solved
by dividing an image into sub-areas and calculating a
histogram for each of those sub-areas
– image representations in the image database as well as
queries have to be the same
Texture Analysis
• Statistical methods are used to characterize texture in terms
of the spatial distribution of image intensity
• Tamura features:
– contrast : quantification is based on the statistical
distribution of pixel intensities
– coarseness : measure of the granularity of the texture
– directionality : to compute this measure, a gradient
vector is calculated at each pixel
Shape Analysis
• Histogram of significant edges
• Ordered list of interest points
• Chain-code-based shape representation and similarity
measure
Chain Code-based Shape Analysis
•
•
•
•
•
Chain code
4-directional
8-directional
Grid spacing
Normalization process -- starting point, rotation, scale
Starting Point Normalization
• Treat the chain code generated by an arbitrary starting
point as a circular sequence of direction numbers
• Redefine the starting point such that the resulting sequence
of numbers forms an integer of minimum magnitude
• 0303332221211010 (arbitrary starting point)
• 0030333222121101 (after normalizing)
• After normalizing, the shape boundary has unique chain
code (for a fixed orientation and grid size)
Shape Number
• Rotation normalization is needed because a boundary after
rotation has a different chain code. Rotation changes the
spatial relationships between the grid space and boundary.
• First difference of the chain code reflects spatial
relationships between boundary segments which are
independent of rotation
• The difference is computed by counting (in a counterclockwise) the number of directions that separate two
adjacent elements in a code
• Shape number of a boundary is defined as the first
difference of the smallest magnitude
Unique Shape Number
• Need for making the shape boundaries invariant to rotation
and scale
• Solution -- orient the resampling grid along the principal
axis of the shape boundary. In this case, the grid and the
boundary have fixed spatial relationships.
• Major axis is defined as the line segment between two
farthest points on the boundary. Minor axis is
perpendicular to the major axis and its length is such that a
rectangle formed by these axes will enclose the shape
boundary.
Scale Normalization
• Eccentricity of the boundary -- ratio of the major to the
minor axes
• Basic Rectangle -- rectangle formed by the major and the
minor axes of a boundary
• Shape number obtained using basic rectangle will be
unique
Unique Chain Code
• Algorithm
– select the first digit as any number within the chain
code direction range, say 0;
– the second digit differs from the first digit by an amount
determined by the first digit of the shape number
– use the shape number to determine the rest of the digits
in the unique chain code
Similarity Measurement
• The distance d between two boundaries is defined as the
number of grids not commonly covered by the two
boundaries
– boundaries with the same unique chain code have
distance 0
• Obtain a binary number for each boundary
• Exclusive OR of the binary numbers of the two boundaries
and the number of 1s in the result is the distance d
• Similarity is 1 - (d/N)
Indexing and Retrieval of Video
• Video is normally made of a number of logic units or
segments (video shots)
– frames depicting the same scene
– frames signify single camera operation
– frames contain a distinct event or or action (signifying
the presence of the same object)
• Consecutive frames on either side of a camera break
generally display a significant quantitative change in the
content (other camera operations such as dissolve, wipe,
fade-in, and fade-out require sophisticated measures to
quantify the change)
Shot Detection
• Difference metrics between frames are based on the
comparison of pixel intensity histograms
• Difference threshold are chosen such that all boundaries
are detected and false detection is minimized
• Dealing with gradual changes requires sophisticated
techniques
• Indexing is done by finding a representative frame and
features of this frame are extracted and indexed based on
text, color, shape, and/or texture