Indexing technique for Image DB

Download Report

Transcript Indexing technique for Image DB

IMAGE DATABASES
Prof. Hyoung-Joo Kim
OOPSLA Lab.
Computer Engineering
Seoul National University
Contents
•
•
•
•
•
•
•
Introduction
Image database systems
Indexing issues and basic mechanisms
A taxonomy on image indexes
Color-spatial hierarchical indexes
Signature-based color-spatial retrieval
Summary
Introduction
• Traditional DBMS
– effective in managing structured data
– not effective for images that are non-alphanumeric
and unstructured
• Applications to manage images
– medical application
– geographic information system
– satellite image database
– criminal database
– interior design
– art galleries and museum management
– etc...
Introduction(2)
• Content-based image retrieval techniques
– techniques that retrieve image based on their
visual properties such as texture, color, shape, etc
– sequentially comparing image feature is timeconsuming and impractical
– need content-based image index
Image Database System
• Additional functionalities
– feature extraction
• the system must be able to analyze an image to extract
key features such as shape, color, texture
– feature-based indexing
• the system must build indexes based on the features
extracted
– content-based retrievals
• the system should support a wide range of queries that
involve the contents of the image
– measure of similarity
• the system requires a measure to capture what we
humans perceive as similarity between two images
Architecture of Image DBMS
Input Image
PREPROCESSING MODULE
Image
Input/Scanner
Feature
Extraction
Update
Index/
Database
QUERY MODULE
User
Query
Output
Retrieved
Image
Interactive
Query
Formulation
Browsing
&
Feedback
Runtime
Processor
Feature
Extraction
Feature
Matching
Concurrency
Control &
Recovery
Manager
Feature/Image
Database
Indexing Issues
• Three issues in content-based index design
– determine a representation for the indexing feature
– determine a similarity measure between two images
based on their representation
– determine an appropriate index organization
Indexing Issues
• Desirable properties of a representation(first issue)
– exactness
– space efficiency
– computationally inexpensive similarity matching
– preservation of the similarity between the features
– automatic extraction
– insensitivity to noise, distortion, rotation
Indexing Issues
• Main criterion of the similarity measure(second issue)
– if two images are similar under the indexing feature,
then their representations should remain so.
• Several alternatives to determine the similarity
– exact match
• the representation of an image feature is usually coarse, in
sense that images with similar feature will be mapped to
the same representation
– approximate match
• the degree of similarity between the image representations
is computed based on some approximation techniques
Indexing Issues
• Criteria for selection of an index structure
– the similarity measure can be supported efficiently
– storage efficiency
– maintenance(update) overhead
• Appropriate index structure
– be determined by the representation and similarity
measure
– example
• if image feature is represented as a vector, an the
similarity measure is the Euclidean distance, then a
natural choice is the multi-dimensional point access
method
Basic Index Schemes
• Spatial access methods(SAMs)
– basic idea
• extract k image features for each image
• map images into points in a k-dimensional feature space
• use SAMs such as the grid file, quad-tree, the family of Rtree
– problem: “high-dimensionality curse”
• these techniques perform no better than sequential
scanning as the dimension becomes sufficiently large
Basic Index Schemes
• Inverted file
– basic idea
• an inverted list is created for each distinct key(indexed
features)
• the inverted list consists of a list of pointers to the objects
that contain features that are similar to the indexed feature
– problem
• high storage overhead & expensive update
• Signature file
– refer [Text Database]
Taxonomy on Image Indexes
•
•
•
•
Shape feature
Spatial relationship
Texture
Color
Taxonomy on Image Indexes
Content-based indexes
Objects
Spatial relationship
Signature
Texture
2-D string
Multi-level
signature file
Sequential file
Tamura features
Signature
file
Multi-dimensional
index
Color
Shape
1-D string
Inverted file
Signature
Numerical vectors
Multi-dimensional
index
Similarity against
representative
objects
Inverted file
Color histogram
Color-Spatial
Multi-dimensional
index
Multi-level
histogram
2-level
B+-tree
3-tier
color index
Shape Feature
• Representation
– boundary information
– a collection of rectangle that forms a rectangular cover of the
shape
– mathematical morphology
– pattern spectrum
– numerical vector using 2-D Fourier transformation or
Wavelet transformation
• Problems
– shape vary widely from object to object
– unless the images have very distinct shape, the performance
may suffer
Spatial relationship
• Representation
– 2-D string
• semantic representation for spatial relationship using a
two-dimensional string
• projection of the symbols along the x-axis and y-axis
• example : “O1 is to left of O2 which is left of O3”
• the projection on the x-axis - O1 < O2 < O3
• variations and extensions to the 2-D string have been
proposed
– Problem
• spatial relationships can be drastically affected by the
orientation of the image
Texture and Color
• Texture
– can be represented by the coarseness, contrast, directionality
– the extraction of text information is a computationally intensive
operations
• Color
– color histogram that captures the color composition of images
– color alone is not sufficient to characterize an image
• consider two image - one with the top half blue and bottom half red,
while the other’s left half is red and it’s right half is blue
• although these two images are similar in color composition, the
are entirely different to a human observer.
– recent studies have proposed to integrate color and its spatial
distribution
Color-spatial hierarchical Indexes
• Hierarchical Indexes
– multiple indexing mechanisms are integrated to form
a single index structure
– three indexes that have been proposed to integrated
color and spatial information for retrieval
• Two-level B+-tree structure
• Three-tier color index
• Sequential Multi-Attribute Tree(SMAT)
Two-level B+-tree
• Feature extraction
– the color-spatial information of an image is
modeled by splitting the image into 9 equal subareas(33)
– the color information within each sub-area is
represented by a color histogram
– one can obtain a more accurate similarity by
matching the corresponding color histogram of two
image
– the color histogram within each sub-area is
mapped into a numerical key
• Retrieval technique
Two-level B+-tree
• Retrieval technique
– use two level information
– the first level
• describes the composition of colors corresponding to the
histogram of the region
• the colors are grouped into 11 bins
• each group is assigned a range which bounds the
percentage of pixels in the mage with colors of group
– the second level
• contains the average H, average V, and average C values
of all the 11 histogram bins
Two-level B+-tree
Level 1:
B+-tree on Normal Pixel Count
Level 2:
B+-tree on Average H,V, and
C values
Three-tier Color Index
• Layer 1:
– dominant color classification
• a fixed number of dominant colors is extracted
• the dominant colors are those with the largest number of
pixel count
• Layer 2:
– multidimensional R-tree structure
• the image can be assigned to a partition
• prune away image within the candidate partitions that are not
relevant
• Layer 3:
– multi-level color histogram(quad-tree structure)
• compare the histograms of the query image with those of
remaining potential candidate image
Tree-tier Color Index
Tier 1: Dominant Color Classification
K=1
K=2
K=3
0
1
(0,1) (0,2) .. (0,n)
2
n(no.of colors)
(1,2) …...…(2,n) ….
(0,1,2) ... (0,1,n) ……. (0,2,n) …...… ….
Tier 2: R-tree
Tier 1: Multi-Level Color Histogram
K : dominant colors
SMAT
• Height-balanced color-spatial index
– the problem with the two approaches
• individual tree structures(B+-tree, R-tree, Dominant Color
Classification) are height-balanced, the entire hierarchical
index structure may not be so.
– height-balancing of SMAT
• SMAT is a multi-tier index structure similar to two
approaches, but remains height-balancing of the entire
index structure by controlling the growth of the tree using a
certain threshold which is application-specific value.
• refer to the VLDB Journal(1998) 7: 115-128 for more
information
Signature-based Color-Spatial
Retrieval
• Representation of color-spatial information
– an image is partitioned into a grid of mn cells of equal size
– the colors that can be used to represent a cell are determined
– for a given color, each cell is examined to determine the
percentage of the total number of pixels in the cell having that
color
0
1
2
Cell does satisfy the threshold
Cell does not satisfy the threshold
31
An image partitioned into a 4x8 grid
- an image is represented by 32-bit color signature,
10001111001000110000000001100011
Signature-based Color-Spatial
Retrieval
• The retrieval process
– an image with k colors has k color signatures
– let Qi and Di denote the signature of color i for a query image Q
and a database image D.
– let the representative color sets of Q and D be CQ and CD.
BitSet(Qi  Di )
if color i  C D
SIM basic(Q, D, i)  { BitSet(Qi )
0
otherwise
SIM basic(Q, D) 
 SIM
iCQ
(Q, D, i)
basic
Summary
• Promising area that require further research
– performance evaluation
• comparative study will be useful for application designers
and practitioners to pick the best method for their
applications
– more on access method
• designing efficient access methods will make the contentbased retrieval techniques more practical and useful
– concurrent access and distributed indexing
• we expect to see more real-time application as well as
application running in parallel or distributed environment
Summary
– Integration and optimization
• most content-based image retrieval techniques only capture
a part of the image’s semantics
• important issues
– selecting an “optimal” set of image features that fits best
for an application
– developing techniques that can integrate them into
achieve the optimal results
• one promising method(semantic-based retrieval technique)
– use content-based techniques as the basis, but also
exploits semantic meanings of the image and queries to
support concept-based queries