Content Based Image Retrieval . ppt

Download Report

Transcript Content Based Image Retrieval . ppt

Content Based Image Retrieval
Problems, issues, future directions
Problem ???
MEDIA FLOODING
EXAMPLE: GENERAL PHOTOGRAPHY

SNAPSHOT
PREVIEWS

MEMORY
REUSABLE

PRINTER
TECHNOLOGY

EASY SHARING
VIA INTERNET

CHEAP &
DENSE
STORAGE

EFFECTS &
PROCESSING
RESULT: DIGITAL MEDIA FLOOD
HOW DO WE COPE, TRACK, ORGANIZE IT
ALL?
MOTIVATION

DEVICE FUNCTION CONVERGENCE
 DATA RAPIDLY GENERATED BY MANY
DEVICES
 INTERNET ACTS AS GLOBAL TRANSPORT
 DATA CONSUMED BY DEVICES ON DEMAND

MULTIMEDIA DATA NEEDS TO BE
 EFFICIENTLY STORED
 INDEXED ACCURATELY
 EASILY RETRIEVED
History of Image Retrieval

Traditional text-based image search engines
 Manual annotation of images
 Use text-based retrieval methods
Water lilies

E.g.
Flowers in a pond
<Its biological
name>
Text Based Image retrieval
by google
 by yahoo
 etc..


IMPORTANT QUESTION ARISES:
“WHY NOT SIMPLY INDEX USING TEXT?”
(YAHOO! HAS HAD SOME SUCCESS WITH THIS)

INTUITIVE, YET USING TEXT IS

SIMPLE BUT SIMPLISTIC

TIME CONSUMING – CAN’T AUTOMATE

HIGHLY SUBJECTIVE & USER-DEPENDENT

SUSCEPTIBLE TO TRANSLATION PROBLEMS
Content based image retrieval

What is content in images ??
 Colour, texture, shape, etc.
CBIR
Definition:
“The process of retrieving images from a
collection on the basis of features (such
as colour, texture and shape)
automatically extracted from the images
themselves”

CBIR
SIMPLE EXAMPLE

FOR A GIVEN QUERY…
 EXAMPLE IMAGE
 ROUGH SKETCH
 EXPLICIT DESCRIPTION
CRITERIA
RETRIEVAL
SYSTEM
…RETURN
ALL ‘SIMILAR’
IMAGES
RETRIEVAL RESULTS
QUERY
IMAGE
BASED ON COLOR CONTENT
CBIR
QUERY TYPES
COLOR
SKETCH
SHAPE
EXAMPL
E
TEXTUR
E
MORE COMPLEX TYPES
EXIST YET ABOVE
ARE MOST
FUNDAMENTAL &
CBIR
(DIS)SIMILARITY?
SIMILARITY IS NOT SO SIMPLE

CONSIDER THREE IMAGES

ON WHAT BASIS ARE THEY
SIMILAR?
 COLOR CONTENT?
However …






Finding the right image is not always easy.
There are many millions of digital images
available on the Web.
Many more images waiting to be digitised.
Variable levels of metadata and associated
content – if any exist at all …
Diverse groups of users.
Users who don’t always know either what they
want or how to express it in words.
Why Research Image
Data?

Increasing use of digital images, particularly
via the Internet, by professionals of all kinds.

Inadequacy of current technology in
handling images.

Increasing interest in how people perceive,
search for and use images.

Exciting new applications opening up (ecommerce?).
Why Is Image Retrieval
Difficult?
Why Is Image Retrieval
Difficult?

Important to distinguish between the physical properties
of an image and how it is perceived.

Image retrieval should be based on the latter, not the
former!
Two Classes of CBIR
Narrow vs. Broad Domain

Narrow
Medical Imagery Retrieval
 Finger Print Retrieval
 Satellite Imagery Retrieval


Broad
Photo Collections
 Internet

Block diagram of CBIR
Image
Fectures Extraction
Color
Shape
Color Sensation
Spatial Relation
Query Interface
User Drawing
Query
Query by
Color
Server
Indexing
&
Filtering
Internet
or
Intranet
or
Extranet
Image Database
Query by
Color Sensation
Query by
Images
Query by
Shape
Query by
Spatial Relation
Similarity Measure
Color
Shape
Learning
Weight of Features
Mechanism
Color Sensation
Spatial Relation
Server
Client
Query specification

Interfaces



Browsing and navigation
Specifying the conditions the objects of interest
must satisfy, by means of queries
Queries can be specified in two different
ways


Using a specific query language
Query by example
 Using actual data (object example)
Conditions on multimedia data

Query predicates
 Attribute predicates



Concern the attributes for which an exact value is supplied
for each object
Exact-match retrieval
Structural predicates
Concern the structure of multimedia objects
 Can be answered by metadata and information
about the database schema
 “Find all multimedia objects containing at least one
image and a video clip”

Conditions on multimedia data
 Semantic
 Concern
predicates
the semantic content of the
required data, depending on the features
that have been extracted and stored for
each multimedia object
 “Find all the red houses”
 Exact match cannot be applied
Uncertainty, proximity, and
weights in query expressions

Specify the degree of relevance of the retrieved
objects


Using some imprecise terms and predicates
 Represent a set of possible acceptable values with
respect to which the attribute or he features has to
be matched
 Normal, unacceptable, typical
Particular proximity predicates
 The relationship represented is based on the
computation of a semantic distance between the
query object and stored ones
 Nearest object search
Uncertainty, proximity, and
weights in query expressions

Assign each condition or term a given weight
Specify the degree of precision by which a
condition must be verified by an object
 “Find all the objects containing an image
representing a screen (HIGH) and a keyboard
(LOW)”
The corresponding query is executed by assigning some
importance and preference values to each predicate and
term


Issues related to feature extraction

What are features of images ??
 Colour

Chromatic Histograms, dominant colours, moments,
 Shape
 Texture
For the purpose of Colour Based
Image Retrieval





Is the colour system device independent
Is the colour system perceptual uniform
Is the colour system linear
Is the colour system intuitive
Is the colour system robust against varying
imaging conditions
 Invariant to change in viewing direction
 Invariant to change in object geometry
 Invariant to change in direction of illumination
 Invariant to change in intensity of illumination
 Invariant to change in SPD of illumination

For the purpose of color based image
retrieval, color systems composed
according to the following criteria
•
is the color system device
independent ?
•
perceptual uniform ?
•
linear ?
•
intuitive ?
•
robust against varying imaging
conditions
invariant to a change in viewing direction
invariant to a change in object geometry
Change in illuminant
Color Histogram

The histogram of image I is defined as:
For a color Ci , Hci(I) represents the number of
pixels of color Ci in image I .
OR:
For any pixel in image I, Hci(I) represents the
possibility of that pixel is in color Ci.


Most commercial CBIR systems include color
histogram as one of the features (e.g., QBIC
of IBM).
No space information.
Improvement of color
histogram

There are several techniques proposed to
integrate spatial information with color
histograms:




W.Hsu, et al., An integrated color-spatial approach to
content-based image retrieval. 3rd ACM Multimedia
Conf. Nov 1995.
Smith and Chang, Tools and techniques for color image
retrieval, SPIE Proc. 2670, 1996.
Stricker and Dimai, Color indexing with weak spatial
constraints, SPIE Proc. 2670, 1996.
Gong, et al., Image indexing and retrieval based on
human perceptual color clustering, Proc. 17th IEEE
Conf. On Computer Vision and Pattern Recognition,
1998.
Color auto-correlogram

Pick any pixel p1 of color Ci in the image
I, at distance k away from p1 pick
another pixel p2, what is the probability
that p2 is also of color
RedC? i?
k
P2
P1
Image: I
Color auto-correlogram

The auto-correlogram of image I for
color Ci , distance k:
 (I )  Pr[| p1  p2 | k, p2  IC | p1  IC ]
(k )
Ci

i
i
Integrate both color information and
space information.
Color auto-correlogram
Implementations

Pixel Distance Measures
 Use D8 distance (also called chessboard
distance):
D8 ( p, q)  max(| px  qx |, | py  qy |)
(n2 )
(134* n2 )
Choose distance k=1,3,5,7
 Computation complexity:

Implementations

Features Distance Measures:
 D( f(I1) - f(I2) ) is small  I1 and I2 are
similar.
 Example: f(a)=1000,
f(a’)=1050;
f(b)=100,
| hC ( I ) 
hC ( I ' ) |
| I  I ' |h  
f(b’)=150 i[ m]1  hC ( I )  hC ( I ' )
i
i
i
 For
histogram:
| I  I ' | 
 For

i
|  C( ki ) ( I )   C( ki ) ( I ' ) |
(k )
(k )
1


(
I
)


i[ m ], k[ d ]
Ci
Ci ( I ' )
correlogram:
Human beings can perceive specific wavelengths as colors
CBIR …
is about:
but is not about:
“blue”
“Botticelli”
“seashell”
“Venus”
MPEG-7

Motivation


To efficiently search/retrieve relevant
information that people want to use
Goal

To make it easy to
search/retrieve/filter/exchange content to
maintain archive, and to edit multimedia
content etc.
MPEG-1, 2, 4 : Representation of contents itself
 MPEG-7 : Representation of information about
the content
Types of multimedia data



audio, speech, video, still pictures, graphics
Components of MPEG-7
1)
2)
3)
4)
5)
6)
7)
MPEG-7 Systems
MPEG-7 Description Definition
Language
MPEG-7 Visual
MPEG-7 Audio
MPEG-7 Multimedia DSs
MPEG-7 Reference Software
MPEG-7 Conformance
Color Descriptors
Color Spaces


Constrained color spaces
 Scalable Color Descriptor uses HSV
 Color Structure Descriptor uses HMMD
MPEG-7 color spaces:
 Monochrome
 RGB
 HSV
 YCrCb
 HMMD
Scalable Color Descriptor


A color histogram in HSV color space
Encoded by Haar Transform
Dominant Color Descriptor



Clustering colors into a small number of
representative colors
It can be defined for each object, regions, or
the whole image
F = { {ci, pi, vi}, s}
 ci : Representative colors
 pi : Their percentages in the region
 vi : Color variances
 s : Spatial coherency
Color Layout Descriptor
Clustering the image into 64 (8x8)
blocks
 Deriving the average color of each
block (or using DCD)
 Applying DCT and encoding
 Efficient for
 Sketch-based image retrieval
 Content Filtering using image
indexing

Color Structure Descriptor




Scanning the image by an 8x8 pixel block
Counting the number of blocks containing
each color
Generating a color histogram (HMMD)
Main usages:
 Still image retrieval
 Natural images retrieval
GoF/GoP Color Descriptor
Extends Scalable Color Descriptor
 Generates the color histogram for a
video segment or a group of pictures
 Calculation methods:
 Average
 Median
 Intersection

Color Opponency

Color
Afterimages
Color Opponency

Color Blindness
Color Constancy
Discounting the illuminant - adaptation
 The color of the ambient lighting quickly
fatigues photorecptors to that color -- There
is no eye position that allows the
photoreceptors to recover
 Once fatigued to the ambient color, that color
is subtracted, or discounted from the visual
scene and colors appear close to the way they
would in white light

Color Constancy
There is a cognitive
component as well.
Color Constancy
There is a cognitive
component as well.
Color Vision
1.
2.
Memory & Imagery
 Achromatopsia
Form & Motion
 Interactions of color system with
other visual components
Case of Achromatopsia
Damage to V4 can cause the complete
loss of color vision (as opposed to redgreen color blindness): V4 is more
sensitive to oxygen deprivation
 In addition, color imagery and color
memory are also lost
 What are the implications for
perception, imagery and memory?

Color, Form & Motion
Although V4 interacts with other areas (V3
& V5 are monochromatic), its interactions
are limited
 Equiluminant color conditions makes form
and motion perception difficult -- but not
impossible

Equiluminant Colors
Equiluminant Colors
Equiluminant Colors
Some Applications Areas

Planning and government:
there is a lot of satellite
imagery of the earth, which can be used to inform important
political debates. For example, how far does urban sprawl
extend? what acreage is under crops? how large will the maize crop
be? how much rainforest is left?, etc.

Military intelligence:
satellite imagery can contain
important military information. Typical queries involve finding
militarily interesting changes — for example, is there a
concentration of force? how much damage was caused by the
last bombing raid? what happened today? etc. — occurring at
particular places on the earth

Stock photo and stock footage: commercial libraries —
which often have extremely large and very diverse collections —
survive by selling the rights to use particular images. Effective tools
may unlock value in these collections by making it possible for
relatively unsophisticated users to obtain images that are useful to
them at acceptable expense in time and money.

Access to museums: museums are increasingly creating
web views of their collections, typically at restricted resolutions, to
entice viewers into visiting the museum. Ideally, one would want
viewers to get a sense of what is at the museum, why it is worth
visiting and the particular virtues of the museum’s gift store.

Trademark and copyright enforcement: as electronic
commerce grows, so does the opportunity for automatic searches to
.nd violations of trademark or of copyright. For example, at time of
writing, the owner of rights to a picture could register it with an
organisation called BayTSP, who would then search for stolen
copies of the picture on the web; recent changes in copyright law
make it relatively easy to recover fines from violators (see
http://www.baytsp.com/index.asp).


Managing the web: indexing web pages appears to be a
profitable activity; the images present on a web page
should give cues to the content of the page. Users may
also wish to have tools that allow them to avoid offensive
images or advertising. A number of tools have been built
to support searches for images on the web using CBIR
techniques. There are tools that check images for
potentially offensive content, both in the academic and
commercial domains.
Medical information systems: recovering medical images
“similar” to a given query example might give more
information on which to base a diagnosis or to conduct
epidemiological studies. Furthermore, one might be able
to cluster medical images in ways that suggest
interesting and novel hypotheses to experts.
5-min Recap

Why is Image IR important?
 “a picture is worth a 1000 words”
 Alternative form of communication
 Not everything can be described in
text; Not everything can be described
in images
 Popular medium of information on the
Internet
Search and Retrieval Process
It’s all over
Earlier Works of CBIR
The main features of earlier works of CBIR:
 Focused on effective FEATURE
representation
such as color, texture, shape.
 Indexing image contents based on
Features.
 Disadvantages of the previous work


Semantic gap between high level concepts and low
level image feature representation. Hence hard to
select appropriate features.
1st iteration
Display
User
Feedback
Feedback
to system
Estimation &
Display selection
2nd iteration
Display
User
Feedback
Problem Statement

Assumption: images of the same semantic
meaning/category form a cluster in feature
vector space

Given a set of positive examples, learn user’s
preference and find better result in the next
iteration
Former Approaches
Multimedia Analysis and Retrieval
System (MARS)
 IEEE Trans CSVT 1998
 Weight updating, modification of
distance function
 Pic-Hunter
 IEEE Trans IP 2000
 Probability based, updated by Bayes’
rule

Comparisons
Aspect
Model
Description
Modelin
g of
user’s
target
MARS
Weighted Euclidean distance
Pic-Hunter
Probability associated with each image
Our
approach
User’s target data point follow Gaussian
distribution
Learnin
g
method
MARS
Weight updating, modification of distance
function
Pic-Hunter
Bayes’ rule
Our
approach
Parameter estimation
MARS
K-NN neighborhood search
Pic-Hunter
Maximum entropy principle
Our
Simulated maximum entropy principle
Display
selectio
n
Estimation of Target
Distribution


Assume the user’s target follows a Gaussian
distribution
Construct a distribution that best fits the
relevant data points into some “specific”
region
Data points selected as relevant
Estimation of Target
Distribution


Assume the user’s target follows a Gaussian
distribution
Construct a distribution that best fits the
relevant data points into some “specific”
region
Data points selected as relevant
Estimation of Target
Distribution


Assume the user’s target follows a Gaussian
distribution
Construct a distribution that best fits the
relevant data points into some “specific”
region
Data points selected as relevant
Expectation Function


Best fit the relevant data points to medium
likelihood region
The estimated distribution represents user’s
target
Updating Parameters

After each feedback loop, parameters are
updated
 New estimated mean = mean of relevant
data points
 New estimated variance  found by
differentiation
 Iterative approach
Indexing and searching
Searching similar patterns
 Distance function
 Given two objects, O1 and O2, the
distance (=dissimilarity) of the two
objects is denoted by D(O1,O2)
 Similarity queries
 Whole match
 Sub-pattern match
 Nearest neighbors
 All pairs

Spatial access methods


Map objects into points in f-D space, and to
use multiattribute access methods (also
referred to as spatial access methods or
SAMs) to cluster them and to search for them
Methods
 R*-trees and the rest of the R-tree family
 Linear quadtrees
 Grid-files
 Linear quadtrees and grid files explode
exponentially with the dimensionality
R-tree

R-tree
 Represent a spatial object by its
minimum bounding rectangle (MBR)
 Data rectangles are grouped to form
parent nodes (recursively grouped)
 The MBR of a parent node completely
contains the MBRs of its children
 MBRs are allowed to overlap
 Nodes of the tree correspond to disk
R-tree

Range query
 Specify a region of interest, requiring
all the data regions that intersect it
 Retrieve
 Compute the MBR of the query
region
 Recursively descend the R-tree,
excluding the branches whose
MBRs do not intersect the query
Generic multimedia indexing
approach

“Whole match” problem
 A collection
of N objects: O1,
O2,…,ON
 The
distance/dissimilarity between
two objects (Oi,Oj) is given by the
function D(Oi,Oj)
 User specifies a query object Q, and
a tolerance ε
 Goal
GEMINI
Generic Multimedia object INdexIng
 Ideas
 A ‘quick-and-dirty’ test, to discard
quickly the vast majority of nonqualifying objects (possibly, allowing
some false alarms)
 The use of spatial access methods, to
achieve faster-than-sequential
searching

GEMINI

Example
 Database: yearly stock price movements, with
one price per day
1/ 2
 Distance function

2
D( S , Q)    S[i]  Q[i] 
 Euclidean distance
 i 1


The idea behind the quick-and-dirty test is to
characterize a sequence with a single number
(feature), which help us discard many nonqualifying sequences
 Average stock price over the year, standard
GEMINI



Mapping function
 Let F() be the mapping of objects to fdimensional points, that is, F(O) will be the
f-D point that corresponds to object O
Organize f-D points into a spatial access
method, cluster them in a hierarchical
structure, like the R*-trees
Upon a query, we can exploit the R*-tree, to
prune out large portions of the database that
are not promising
GEMINI

Search algorithm (for whole match
query)
 Map the query object Q into a point
F(Q) in feature space
 Using a spatial access method,
retrieve all points within the desired
tolerance εfrom F(Q)
 Retrieve the corresponding objects,
compute their actual distance from Q
GEMINI

Lower Bounding lemma
 To guarantee no false dismissals for wholematch queries, the feature extraction function
F() should satisfy the following formula
DfeatureF O1, F O2   DO1, O2 

Dfeature(): distance of two feature vectors
(mapping F() from objects to points should
make things look closer)
GEMINI

GEMINI algorithm
 Determine the distance function D()
between two objects
 Find one or more numerical featureextraction functions, to provide a
‘quick-and-dirty’ test
 Prove that the distance in feature
space lower-bounds the actual
distance D(), to guarantee
correctness
GEMINI
‘Feature-extracting’ question
 If we are allowed to use only one
numerical feature to describe each
data object, what should this feature
be?
 The successful answers to the question
should meet two goals
 They should facilitate step 3 (the
distance lower-bounding)
 They should capture most of the

R type database e.g. Access and
OLE




Object Linking and Embedding was Microsoft’s first
architecture for integrating files of different types:
Each file type in Windows is associated with an
application It is possible to place a file of one type
inside another:
 either by wholly embedding the data in which
case it is rendered by a plug-in associated with the
program
 or by placing a link to the data in which case it is
rendered by calling the original program
Access works with this system by providing a domain
type for OLE
There’s not much you can do with OLE fields since
the data is in a format that Access does not
R databases e.g. BFILEs in
Oracle

The BFILE datatype provides access to
BLOB files of up to 4 gigabytes that are
stored in file systems outside an Oracle
database.
 The BFILE datatype allows read-only
support of large binary files; you
cannot modify a file through Oracle.
Oracle provides APIs to access file
data.
Large Object Types in Oracle
and SQL3

Oracle and SQL3 support three large object
types:
 BLOB - The BLOB domain type stores
unstructured binary data in the database.
BLOBs can store up to four gigabytes of
binary data.
 CLOB – The CLOB domain type stores up
to four gigabytes of single-byte character
set data
 NCLOB - The NCLOB domain type stores
up to four gigabytes of fixed-width and
varying width multi-byte national character
Cont …

These types support
 Concatenation – making up one LOB by
putting two of them together
 Substring – extract a section of a LOB
 Overlay – replace a substring of one LOB
with another
 Trim – removing particular characters (e.g.
whitespace) from the beginning or end
 Length – returns the length of the LOB
 Position – returns the position of a
substring in a LOB
 Upper and Lower – turns a CLOB or
Large Object Types in
MySQL
MySQL has four BLOB and four CLOB (called
TEXT in MySQL) domain types:
 TINYBLOB and TINYTEXT – store up to 256
bytes
 BLOB and TEXT – store up to 64K bytes
 MEDIUMBLOB and MEDIUMTEXT – store
up to 16M bytes
 LONGBLOB and LONGTEXT – store up to
4G bytes
Oracle interMedia Audio,
Image, and Video

Oracle interMedia supports multimedia storage,
retrieval, and management of:
 BLOBs stored locally in Oracle8i onwards and
containing audio, image, or video data
 BFILEs, stored locally in operating systemspecific file systems and containing audio, image
or video data
 URLs containing audio, image, or video data
stored on any HTTP server such as Oracle
Application Server, Netscape Application Server,
Microsoft Internet Information Server, Apache
HTTPD server, and Spyglass servers
The Object Relational Multimedia
Domain Types in interMedia

interMedia provides the ORDAudio,
ORDImage, and ORDVideo object types and
methods for:
 updateTime ORDSource attribute
manipulation
 manipulating multimedia data source
attribute information
 extracting attributes from multimedia data
 getting and managing multimedia data
from Oracle interMedia, Web servers, and
other servers
Cont …




The properties available are:
ORDImage – the height, width, data size of
the on-disk image, file type, image
type,compression type, and MIME type
ORDAudio – the format, encoding, number of
channels, sampling rate, sample
size,compression type, and audio duration
ORDVideo – the format, frame size, frame
resolution, frame rate, video duration, number
of frames, compression type, number of
colours, and bit rate
Cont …

Oracle also stores metadata including:
 source type, location, and source
name
 MIME type and formatting information
 characteristics such as height and
width of an image, number of audio
channels, video frame rate, pay time,
etc.
Open issues
Gap between low level features and
high-level concepts
 Human in the loop – interactive systems
 Retrieval speed – most research
prototypes can handle only a few
thousand images.
 A reliable test-bed and measurement
criterion, please!

Query Refinement in
Multimedia Similarity Retrieval

To refine the query to represent the
information that the user is looking for.
optimal query representation
initial query representation
Sim=0.7
Sim=0.8
Sim=0.9
refine
Relevant according to the current query
Relevant according to the optimal query
Query Refinement Models
User Feedback

Inter-feature Refinement
(Feature Re-weighting)
Query Refinement Model
Multi-feature Query

Intra-feature Refinement
(Query Modification
& Re-weighting)
Index
Index
Index
Individual Feature Queries
Intra-feature Refinement
Query Expansion
Query Point Movement
new query representation (C*)
Q2
Q1
Sim=0.7
Sim=0.8
Sim=0.9
Q3
Q4
P
initial query representation
Dist(Q
new
,P)  Dist(C , P ) 
*
m
1

j 1
(C *j  P j ) 2
j
 j  standard deviation of Q in dimension j
C *  w eightedcentroid of Q
new query representation (Q1…4)
P
initial query representation
Dist(Q, P )   w i Dist(Qi , P )
w i  w eightof Qi
weights based on relevance level
Selecting Relevant Points in
Query Expansion
Cluster centroids to be added
to query representation
• Clustering algorithm used to
cluster the relevant points
• Cluster centroids chosen as
a new query points
Feature
space
Query Expansion: multi-point approach
Node distance from a multi-point query is defined as :
Q2
Q1
w2
Q3
w3
w1
R
P
MinDist(Q,R)  w1 MinDist(Q1,R) + w2 MinDist(Q2,R) + w3 MinDist(Q3,R)
PR : Dist(Q,P)  MinDist(Q,R)
Query Processing for Refined Query


Naively,
 execute the refined query just like
executing the initial query
Observation:
 the query representation does not change
dramatically across feedback iterations.

exploit the work done in the previous
iteration by reusing the priority queue used
in the previous kNN search for the next
iteration.
Refined Query
1 2 34 5
Previous priority queue
New priority queue
Previous query P = P1, P2, P3
New query Q = Q1, Q2, Q3, Q4
wp1P1 + wp2P2 + wp3P3
wq1Q1 + wq2Q2 + wq3Q3 + wq4Q4
CBIR
SUMMARY
BORN FROM MULTIMEDIA FLOOD
 TEXT TOO SIMPLE AND
LABORIOUS
 SYSTEMS WORK DECENTLY IN
VITRO
 QUERY BY SHAPE, COLOR,
TEXTURE, EXAMPLE
 SHORTCOMINGS

ONGOING
RESEARCH-2
RELEVANCE FEEDBACK

ITERATIVE QUERY REFINEMENT
 PLACE USER IN LOOP TO
ITERATIVELY IMPROVE
RETRIEVAL RATES
 HIGH-DIMENSIONAL SPACE
NEEDS PRUNING
 EMPHASIZED FEATURE(S)
MUST BE FOUND
ONGOING
RESEARCH-2
RELEVANCE FEEDBACK

FEATURE SELECTIVE INTERFACE
 WHY CHOOSE IMAGES ON
WHOLE? REQUIRES
PROCESSING/STATS TO FIND
RELEVANT SHAPE
GOOD FEATURES
EXPLICIT FEATURES
 USER CAN EXPLICITLY
TO R.F. ENGINE
INDICATE ELEMENTS OF
IMAGE WHICH
ARECOLOR
GOOD:
RELEVANT
ONGOING
RESEARCH-3
SIMILARITY AGGREGATION/HYBRID
QUERIES
TYPICALLY USED APPROACHES
 BOOLEAN (AND, OR & NOT
OPERATORS)
 EUCLIDEAN (MINKOWSKI W/
r=1)
 WEIGHTED AVERAGE (WA)
i.e. SUPERVECTORS
 DISADVANTAGES

ONGOING
RESEARCH-4
SIMILARITY AGGREGATION/HYBRID
QUERIES

FUZZY AGGREGATION OF
DECISIONS
 USE MEMBERSHIP FUNCTION
TO ‘FUZZIFY’ DISTANCES &
GENERATE A ‘FUZZY
DECISION’FUZZY
m
MEMBERSHIP
d
SIMILARITY
DISTANCE
FUNCTION
FUZZY DISTANCE
DECISION
ONGOING
RESEARCH-5
DISTRIBUTED MULTIMEDIA INDEXING

INDEXES USUALLY
CENTRALIZED
 ENTIRE SYSTEM FAILS IF
COMPONENT FAILS
 NO GRACEFUL PERFORMANCE
DEGRADATION
 HIGH DATA VOLUME = HIGH
SYSTEM REQ’S
ONGOING
RESEARCH-6
DISTRIBUTED MULTIMEDIA INDEXING
SMALL WORLD INDEXING
MODEL1
 SOCIOLOGICAL PEER
DESCRIPTIONS
 WE ARE NOT BLIND TO WHO
OUR PEERS ARE
 PEOPLE KEEP MEMORY OF
THEIR PEERS

[1] P. Androutsos, D. Androutsos and A. N. Venetsanopoulos, “A distributed fault-tolerant MPEG-7 retrieval scheme based on small world
theory”, Distributed Media Technologies and Applications Special Issue of IEEE Transactions on Multimedia, under review.
RESEARCH
AVENUES-1

HYBRID QUERIES &
AGGREGATION
 WHAT DO WEIGHTS MEAN?
HOW TO CHOOSE?
 ALTERNATIVE
AGGREGATIONS METHODS
 ADAPTIVE SCHEMES USING
REL. FEEDBACK
RESEARCH
AVENUES-2

PERCEPTUAL ISSUES
 EMPHASIS OF DOMINATING
FEATURES
 FEATURE MASKING
 EMOTIONAL INDEXING/
 ALL USERS DIFFERENT–
CUSTOMIZED PROFILE
RESEARCH
AVENUES-3

DISTRIBUTED INDEXING
 DISTRIBUTED INDEXES &
RETRIEVAL
 INDEX SYNCHRONIZATION
 RESULTS ORGANIZATION &
RANKING
 SWIM OVERHEAD
ESTIMATION
SUMMARY-1
MULTIMEDIA PROCESSING
 RESULTS FROM MULTIMEDIA
EXPLOSION
 USERS DEMANDING MORE
FROM DEVICES
 DEVICES ARE CONVERGING
 CONTENT BASED IMAGE
RETRIEVAL

Introduction


As Imaging systems evolved in
complexity and openness, the need for
device-independent image measures
became clear.
It was quickly recognized that devicedependent color coordinates (such as
monitor RGB and printer CMYK) could
not be used to specify and reproduce
color images with accuracy and
precision.
Colorimetry
Device-independent color
measurements are based on the
internationally-standardized CIE system
of colorimetry first developed in 1931.
 CIE colorimetry specifies a color
stimulus with numbers proportional to
the stimulation of the human visual
system independent of how the color
stimulus was produced.

Outline


Introduction :
 Colorimetry
 Color / Image Difference
 Color / Image Appearance Model
The iCAM framework
 Input Images
 First Stage: Chromatic Adaptation (Color
Appearance)
 Second Stage: Appearance Attributes
 Third Stage: Spatial Filtering (Image
Difference)
Color Difference Eqs
A color difference equation allows for
the mapping of physically measured
stimuli into perceived differences.
 CIELAB and CIELUV. (△Eab)
 CIE DE94 and CIEDE2000.

Image Difference
The CIE color difference formula were
developed using simple color patches in
controlled viewing condition. There is no
reason to believe that they are
adequate for predicting color difference
for spatially complex image stimuli.
 S-CIELAB contrast sensitivity functions.
(CSF)

Image Difference

The CSF serves to remove information
that is imperceptible to the visual
system. For instance, when viewing
dots at a certain distance the dots tend
to blur, and integrate into a single color.
Outline


Introduction :
 Colorimetry
 Color / Image Difference
 Color / Image Appearance Model
The iCAM framework
 Input Images
 First Stage: Chromatic Adaptation (Color
Appearance)
 Second Stage: Appearance Attributes
 Third Stage: Spatial Filtering (Image
Difference)
Color Appearance Model
CIE colorimetry is only strictly applicable
to situations in which the original and
reproduction are viewed in identical
conditions.
 Color appearance model developed to
predict color in different viewing
conditions.

Image Appearance Model
Color appearance models account for
many changes in viewing condition, but
they do not incorporate any of the
spatial or temporal properties of human
vision and the perception of images.
 One model for still images, referred as
iCAM, has recently been published by
Fairchild and Johnson.

2. Color Appearance
Image courtesy of John MCann
Image courtesy of John MCann
Color Appearance
More than a single color
 Adjacent colors (background)
 Viewing environment (surround)
surround
 Appearance effects
 Adaptation
 Simultaneous contrast
 Spatial effects

Color Appearance Models
Mark Fairchild
background
stimulus
Light/Dark Adaptation
Adjust to overall brightness
 7 decades of dynamic range
 100:1 at any particular time
 Absolute illumination effects
 Hunt effect
Higher brightness increases
colorfulness
 Stevens effect
Higher brightness increases

Chromatic Adaptation
Change in illumination
 Cones “white
balance”
 Scale cone
sensitivities
 von Kries
 Also cognitive
effects
 Creates unique white

Daylight
Tungsten
From Color Appearance Models, fig 8-1
Simultaneous Contrast
“After image” of background
adds to the color
Reality is more complex
Affects Lightness Scale
Effect of Spatial Frequency




Smaller = less saturated
The paint chip problem
Color image perception
S-CIELAB
Redrawn from Foundations of Vision, fig 6
© Brian Wandell, Stanford University
Keep it as simple as possible but
not simpler.
Albert Einstein
R-trees

B-trees in multiple dimensions

Spatial object represented by its MBR
Minimum Bounding Rectangle
R-trees
 Nonleaf
nodes
 <ptr, R>
 ptr – pointer to a child node
 R – MBR covering all rectangles
in the child node
 Leaf
nodes
 <obj-id, R>
R-trees

Algorithms
 Insert
 Find the most suitable leaf node
 Possibly, extend MBRs in parent
nodes to enclose the new object
 Leaf node overflow  split
 Split
 Heuristics
based
R-trees

Range queries
 Traverse the tree
 Compare query MBR with the current
node’s MBR

Nearest neighbor
 Branch and bound:
 Traverse the most promising sub-tree
 find neighbors
R-trees

Spatial joins
”find intersecting objects”
 Naïve method:
 Build a list of pairs of intersecting
MBRs
 Examine each pair, down to leaf
level
(Faster methods exist)
Variants
R+-tree
(Sellis et al 1987)
Avoids overlapping rectangles in
internal nodes
 R*-tree
(Beckmann et al 1990)

Applications
Spatial databases
 Text retrieval
 Multimedia retrieval

Color correction
Does Memorization =
Learning?

Test #1: Thomas learns his mother’s
face
Memorizes:
But will he recognize:
Thus he can generalize beyond what he’s seen!
Does Memorization =
Learning? (cont’d)

Test #2: Nicholas learns about trucks & combines
Memorizes:
But will he recognize others?
So learning involves ability to generalize from labeled examples
(in contrast, memorization is trivial, especially for a computer)
Some examples
Some examples
Some examples
That is all, folks…
Thank you for your patience!
Questions
Good Luck!