Slides ClassSlides - School of Computer Science

Download Report

Transcript Slides ClassSlides - School of Computer Science

Selected Advanced Topics
Storing and Retrieving Images
Content-based Image/Video Indexing and Retrieval
G52IIP, School of Computer Science, University of Nottingham
1
Problem
Find all
images
contain
horses …..
G52IIP, School of Computer Science, University of Nottingham
2
Text-based technology
 Annotation: Each image is indexed with a
set of relevant text phrases, e.g.,
Appropriate phrases to
describe the content of this
image include:
Mother, Child, Vegetable,
Yellow, Green, Purple ….
 Retrieval: based on text search technology
G52IIP, School of Computer Science, University of Nottingham
3
Text-based technology - Drawbacks
Annotation - subjective
different people may use different phrases
to describe the same or very similar
image/content
G52IIP, School of Computer Science, University of Nottingham
4
Text-based technology - Drawbacks
Annotation - Laborious
It will take a lot of man-hours to label large
image/video databases with 1m+ items
G52IIP, School of Computer Science, University of Nottingham
5
Content-based Technology
Using Visual Examples
G52IIP, School of Computer Science, University of Nottingham
6
Content-based Technology
Using Visual Features
r%
g%
b%
G52IIP, School of Computer Science, University of Nottingham
7
Content-based Technology
 Content-based image indexing and retrieval (CBIR), is
an image database management technique, which
indexes the data items (images, or video clips) using
visual features (e.g., color, shape, and texture) of the
images or video clips.
 A CBIR system lets users find pictorial information in
large image and video databases based on visual cues,
such as colour, shape, texture, and sketches.
G52IIP, School of Computer Science, University of Nottingham
8
Content-based Technology
 The visual features, computed using image
processing and computer vision techniques are used
to represent the image contents numerically.
 Image Content - a high level concept, e.g., this image
is a sunset scene, a landscape scene, etc.
 Numerical Content Representations - Low level
numbers, often the same set of numbers can come
from very different images, making the task very
hard!
G52IIP, School of Computer Science, University of Nottingham
9
Content-based Technology
 Techniques for Computing Visual
Features/Representing Image Contents –
 some are very sophisticated, and many are still not matured
 hence the computational processes
 in some cases are automatic
 but in other cases are semi-automatic
 in the most difficult cases, it may have to be done manually
G52IIP, School of Computer Science, University of Nottingham
10
Content-based Technology
 Comparing Image Content/Retrieving Images based
on Content
 Simple approaches - compute the metric distance between
low level numerical representations
 Advanced Approaches - using sophisticated pattern
recognition, artificial intelligence, neural networks, and
interactive (relevant feed-back) techniques to compare the
visual content (low level numerical features)
G52IIP, School of Computer Science, University of Nottingham
11
Content-based Technology - IBM QBIC
System
 The IBM’s QBIC (Query by Image and Video Content)
system is one of the early examples of CBIR system
developed in 1990s.
 The system lets users find pictorial information in
large image and video databases based on color,
shape, texture, and sketches.
G52IIP, School of Computer Science, University of Nottingham
12
Content-based Technology - IBM QBIC
System
 The User Interfaces Module
 Let user specify visual query by drawing, selecting from a color wheel,
selecting a sample image …
 Display results as an ordered set of images
 The Database Population and Database Query Modules
 Database population - process images and video to extract features
describing their content - colors, textures, shapes and camera and object
motion, and store the features in a database
 Database Query - let user compose a query graphically, extract features
from the graphical query, input to a matching engine that finds images or
video clips with similar features
G52IIP, School of Computer Science, University of Nottingham
13
Content-based Technology - IBM QBIC
System
 The Data Model
 Still image, or scene - full image
 Objects contained in the full image - subsets of an image
 Videos - broken into clips called shots - sets of contiguous
frames
 Representative frames, the r-frames, are generated for each
shot
 R-frames are treated as still image - from which features are
extracted and stored in the database.
 Further processing of shots generates motion objects - e.g.,
a car moving across the screen.
G52IIP, School of Computer Science, University of Nottingham
14
Content-based Technology - IBM QBIC
System
 Queries are allowed on
 Objects - e.g., Find images with a red round object
 Scenes - e.g., Find images that have approximately 30% red
and 15% blue colors
 Shots - e.g., Find all shots panning from left to right
 A combination of above - e.g., Find images that have 30%
red and contain a blue textured objects
G52IIP, School of Computer Science, University of Nottingham
15
Content-based Technology - IBM QBIC
System
 Similarity Measures
 Similarity queries are done against the database of pre-computed
features using distance functions between the features
 Examples include, Euclidean distance, City-block distance, ….
 These distance functions are intended to mimic human perception
to approximate a perceptual ordering of the database
 But, it is often the case that a distance metric in a feature space
will bear little relevance to perceptual similarity.
G52IIP, School of Computer Science, University of Nottingham
16
Content-based Technology - Basic Architecture
Similarity Measures
Imagery
Meta data
Record1
color
texture shape
positions
….
Record2
color
texture shape
positions
….
color
texture
shape
positions
….
Record3
Record4
Record n
Query
color
color
color
texture shape
texture shape
texture shape
positions
positions
positions
….
….
….
Image Database
G52IIP, School of Computer Science, University of Nottingham
17
Colour - An effective Visual Cue
Colors can be a more powerful visual cue than you initially thought!
What soft drink
Which fruit?
G52IIP, School of Computer Science, University of Nottingham
18
Colour - An effective Visual Cue
In many cases,
color can be very
effective.
Here is an example
Results of content-based image retrieval using 4096-bin color
histograms
G52IIP, School of Computer Science, University of Nottingham
19
Colour Spaces
Colour Models
RGB Model: This colour model uses the three NTSC primary colours to describe
a colour within a colour image.
B
Cyan
Sometimes in Computer Vision, it is
convenient to use rg chromaticity
space
White
Magenta
G
gray
R
r = R/(R+G+B)
g= G/(R+G+B)
Yellow
G52IIP, School of Computer Science, University of Nottingham
20
Colour Spaces
YIQ Model: The YIQ models is used in commercial colour TV broadcasting, which
is a re-coding of RGB for transmission efficiency and for maintaining compatibility
with monochrome TV standard.
Y  0.299
 I   0.596
  
Q  0.212
0.587
0.114   R 
 0.275  0.321 G 
 0.523 0.331   B 
In YIQ, the luminance (Y) and colour information (I and Q) are de-coupled.
YCbCr Model
Y = 0.299R + 0.587G + 0.114B
Cb = -0.169R - 0.331G + 0.500B
Cr = 0.500R - 0.419G - 0.081B
G52IIP, School of Computer Science, University of Nottingham
21
Perceived Color Differences
 One problem with the RGB colour
system is that colorimetric
distances between the individual
colours don't correspond to
perceived colour differences.
 For example, in the chromaticity
diagram, a difference between
green and greenish-yellow is
relatively large, whereas the
distance distinguishing blue and red
is quite small.
r = R/(R+G+B)
g= G/(R+G+B)
G52IIP, School of Computer Science, University of Nottingham
22
CIELAB
 CIE (Commission Internationale de
l'Eclairage) solved this problem in 1976
with the development of the Lab colour
space. A three-dimensional color space
was the result. In this model, the color
differences which you perceive
correspond to distances when measured
colorimetrically. The a axis extends from
green (-a) to red (+a) and the b axis
from blue (-b) to yellow (+b). The
brightness (L) increases from the
bottom to the top of the threedimensional model.
 With CIELAB what you see is what you
get (in theory at least).
G52IIP, School of Computer Science, University of Nottingham
23
Colour Histogram
 Given a discrete colour space defined by some
colour axes (e.g., red, green, blue), the colour
histogram is obtained by discretizing the image
colours and counting the number of times each
discrete colour occurs in the image.
 The image colours that are transformed to a
common discrete colour are usefully thought of as
being in the same 3D histogram bin centered at
that colour.
G52IIP, School of Computer Science, University of Nottingham
24
Colour Histogram Construction
Step 1
Colour quantization (discretizing the image colours)
Step 2
Count the number of times each discrete colour
occurs in the image.
G52IIP, School of Computer Science, University of Nottingham
25
Colour Quantization
 A true colour, 24-bit/pixel image (8 bit - R, 8 bit - G, 8 bit -B),
will have 224 = 16777216 bins !
 That is, each image will have to be represented by over 16
million numbers
 computationally impossible
 in practice not necessary
 Colour quantization - reduce the number of (colours) bins
G52IIP, School of Computer Science, University of Nottingham
26
Simple Colour Quantization
 Simple Colour Quantization (Non-adaptive)
 Divide each colour axis into equal length
sections (different axis can be divided
differently).
 Map (quantize) each colour into its
corresponding bin
G52IIP, School of Computer Science, University of Nottingham
27
Simple Colour Quantization
Example: In RGB space, quantize each image colour into one of 8x8x8 =
512 colour bins
R
0
31
63
95
127
159
191
223
255
0
31
63
95
127
159
191
223
255
0
31
63
95
127
159
191
223
255
G
B
Colour
Bin
Colour
Bin
(123,23,45)
(3, 0, 1 )
(122, 28, 46)
(3, 0, 2)
(132, 29,50)
(4, 0, 1)
(122, 172, 27)
(3, 5, 0)
(121,26,48)
(x, x, x)
(142, 28, 46)
(x, x, x)
G52IIP, School of Computer Science, University of Nottingham
28
Advanced Colour Quantization
 Adaptive Colour quantization (Not
required)
B
A pixel is a point
in the 3D colour
space
 Vector Quantization
 K-means clustering
 K representative colours
 The colour histogram consists of K bins, each
corresponding to one of the representative
colours.
 A pixels is classified as belonging to the nth
bin if the nth representative colour is the one
(amongst all the representative colours) that
is closest to the pixel.
G
R
Representative colours
G52IIP, School of Computer Science, University of Nottingham
29
Colour Histogram Construction - An Example
 A 3 x 3, 24-bit/pixel image has following RGB planes
Green
Red
23
11
22
24
24
12
77
69
12
213
11
22
Blue
24
232
12
77
239
12
23
12
22
24
24
123
77
69
123
 Construct an 8-bin colour histogram (using simple colour
quantization, treating each axis as equally important).
Bin (0,0,0) =
Bin (0,0,1) =
Bin (0,1,0) =
Bin (0,1,1) =
Bin (1,0,0) =
Bin (1,0,1) =
Bin (1,1,0) =
Bin (1,1,1) =
G52IIP, School of Computer Science, University of Nottingham
30
Colour Histogram Construction - An Example
 Quantized Colour Planes
Green
Red
0
0
0
0
0
0
0
0
0
1
0
0
Blue
0
1
0
0
1
0
0
0
0
0
0
0
0
0
0
 Count the number of times each discrete colour
occurs in the image.
Bin (0,0,0) = 6
Bin (0,0,1) = 0
Bin (0,1,0) = 3
Bin (0,1,1) = 0
Bin (1,0,0) = 0
Bin (1,0,1) = 0
Bin (1,1,0) = 0
Bin (1,1,1) = 0
G52IIP, School of Computer Science, University of Nottingham
31
Colour Based Image Indexing
# of pixels
The histogram of colours in an image defines the image colour distribution
# of pixels
10
0
0
0
100
10
30
0
0
Color
Distribution
=
(10,0,0,0,100,10,30,0,0)
G52IIP, School of Computer Science, University of Nottingham
32
Colour based Image Retrieval
Images are similar if their histograms are similar!
Colour
Distribution
=
(10,0,0,0,100,10,30,0,0)
Dissimilar
Colour
Distribution
Colour
Distribution
=
=
(0,40,0,0,0,0,0,0,110,0)
Similar!
(10,0,0,0,90,10,40,0,0)
G52IIP, School of Computer Science, University of Nottingham
33
Formalizing Similarity
Colour
Distribution
Colour
Distribution
1
=
2
=
H
(10,0,0,0,100,10,30,0,0)
(0,40,0,0,0,0,0,110,0)
H
1
2
Similarity(Image 1, Image 2) = D (H1, H2)
where D( ) is a distance measure between vectors (histograms)
H1 and H2
G52IIP, School of Computer Science, University of Nottingham
34
Metric Distances
A distance measure D( ) is a good measure if it is a metric!
D(a,b) is a metric if
D(a,a) = 0 (the distance from a to itself is 0
D(a,b) = D(b,a) (the distance from a to b = distance from b to a)
D(a,c) <= D(a,b) + D(b,c)
( triangle inequality [ the straight line distance is always the least!] )
a
D(a,b) + D(b,c) should be no smaller
than D(a,c)
b
c
G52IIP, School of Computer Science, University of Nottingham
35
Common Metric Distance measures
Histogram Intersection, HI
n
HI(H1, H2) =
1
2
min(
H
,
H

i
i )
i 1
H1 = (10, 0,
H2 = ( 0,
0, 0, 100, 10, 30, 0,
40, 0, 0, 0,
0)
6, 0, 110, 0)
Similarity = HI(H1, H2) = 0 + 0 + 0 + 0 + 0 + 6 + 0 + 0 = 6
G52IIP, School of Computer Science, University of Nottingham
36
Common Metric Distance measures
Euclidean or straight-line distance or L2-norm, D2
D (H , H ) 
2
1
2
 H
1
i
H

2 2
i
 H1  H 2
i
2
Root-mean square
error
H1 = (10, 0,
H2 = ( 0,
0)
40, 0)
Similarity = D2(H1, H2) = sqrt(100 + 1600 +0) = 41.23
G52IIP, School of Computer Science, University of Nottingham
37
Common Metric Distance measures
Manhattan or city-block or L1-norm, D1
D1 ( H 1 , H 2 )   H i1  H i2  H 1  H 2
i
1
sum of absolute
differences
H1 = (10, 0,
H2 = ( 0,
0)
40, 0)
Similarity = D1(H1, H2) = (10 + 40 +0) = 50
G52IIP, School of Computer Science, University of Nottingham
38
Histogram Intersection vs City
Block Distance
Theorem: if H1 and H2 are colour histograms and the total count
in each is N (there are N-pixels in an image) then:
N  H 1  H 2  0.5 H 1  H 2
1
(Histogram Intersection inversely proportional to a metric distance!)
Proof
(1)
H 1  H 2   min( H i1 , H i2 )
(by definition)
i
(2)
min( a, b)  max( a, b)  a  b 1
G52IIP, School of Computer Science, University of Nottingham
39
Histogram Intersection vs City Block Distance
(3)
max( a, b)  a  b  min( a, b)
Substituting (2) and (3) in (1)
1
2
1
2
1
2
min(
H
,
H
)

max(
H
,
H
)

H

H


i
i
i
i
i
i
(4)
i
i
  H i1   H i2   min( H i1 , H i2 )   H i1  H i2
i
i
i
i
2  min( H i , H i )  N  N  H 1  H 2
1
(5)
1
1
2
i
 N  H 1  H 2  0.5 H 1  H 2
1
1
G52IIP, School of Computer Science, University of Nottingham
40
Colour Histogram Database
Build
(1)Histogram
(2)
Histogram
Database
G52IIP, School of Computer Science, University of Nottingham
41
How well does Color histogram intersection work ?
66 test histograms in the database
Swain Original Test:
31 query images
Recognition rate almost 100%
Indeed, because color indexing worked so well it is at
he heart of almost all image database systems
G52IIP, School of Computer Science, University of Nottingham
42
Google Image Search
G52IIP, School of Computer Science, University of Nottingham
43
Google Image Search
After clicking
this colour
patch
G52IIP, School of Computer Science, University of Nottingham
44
Problems with color histogram matching
1. Color Quantization problem:
Colour
Distribution
=
(0,40,0,0,0,0,0,110,0)
0 intersection!
Colour
Distribution
=
Because, the
two images have
slightly different
color distributions
their histograms have
nothing in common!
(0,0,40,0,0,0,0,0,110)
Sources of quantization error: noise, illumination, camera
G52IIP, School of Computer Science, University of Nottingham
45
Problems with color histogram matching
2. The resolution of a color histogram
Colour
Distribution
=
(0,40,0,0,0 … ,0,0,110,0)
For the best results, Swain quantized colour
space into 4096 distinct colours => Each colour
distribution is a 4096-dimensional vector.
=> Histogram intersection costs O(4096) operations
(some constant * 4096)
4096 comparisons per database histogram => histogram intersection
will be very slow for large databases
Many newer methods work well using 8 - 64 D features
G52IIP, School of Computer Science, University of Nottingham
46
Problems with color histogram matching
3. The colour of the light
Under a yellowish light all image colours are
more yellow than they ought to be
G52IIP, School of Computer Science, University of Nottingham
47
Problems with color histogram matching
4. The structure of colour distribution
All four images have the same color distribution - need to take into
account spatial relationships!
G52IIP, School of Computer Science, University of Nottingham
48
Problem solution => Use statistical moments
1st order statistics
=
Mean
 R , G ,  B
2nd order statistics
=  R ,  G ,  B ,  RG ,  RB ,  GB
2
Variance/
Covariance
 R2  1 / N  ( Ri   R ) 2
i
2
2
 RG  1 / N  ( Ri   R )(Gi  G )
i
G52IIP, School of Computer Science, University of Nottingham
49
Statistical similarity
Colour
Distribution
=
(50,50,50)
Compare mean RGBs
(In general compare all statistical measures)
Colour
Distribution
=
Statistical similarity =
(20,70,40)
[  R ( I1 )   R ( I 2 )]2  [ G ( I1 )  G ( I 2 )]2  ...  [ R2   G2 ]2  .... 
(Euclidean distance between corresponding statistical measures)
G52IIP, School of Computer Science, University of Nottingham
50
Histogram vs Statistical Similarity
Completeness of
representation
#params/
Match speed
Sensitivity to
Quantization error
histogram
complete
Many/slow
sensitive
Low order stats
incomplete
Few/fast
insensitive
Low and high
order stats
complete
(or over complete)
Many/moderate
sensitive
G52IIP, School of Computer Science, University of Nottingham
51
Advanced Topics





Fast Indexing
Interactive/Relevant Feedback
Reducing the Semantic Gap
Visualization, Navigation, Browsing
Internet scale image/video retrieval
 Flickr – billions of photos
 Youtube – billions of videos
…
G52IIP, School of Computer Science, University of Nottingham
52