A Distributed Multimedia Data Management over the Grid

Download Report

Transcript A Distributed Multimedia Data Management over the Grid

A Distributed Multimedia Data Management over the
Grid
Kasturi Chatterjee
Advisors for this Project:
Dr. Shu-Ching Chen &
Dr. Masoud Sadjadi
Distributed Multimedia Information System Laboratory
School of Computing and Information Sciences
Florida International University, Miami, FL 33199, USA
Outline
• Motivation
–
–
–
–
Why multimedia data ?
Why handling and representing multimedia data challenging?
Why distributed environment ?
Why content based image/video retrieval ?
• Multimedia data management
– Representation
– Storage and Indexing
– Popular retrieval strategies
• Proposed Work Outline
– Issues to be addressed
– Components and Related Work
• Conclusion
Global Cyberbridges 2008
Proposal
2
Motivation
Why multimedia data ?
–
–
–
–
Attractive
Informative
Compact
Cheap memory makes storage easy
Why handling and representing multimedia data
challenging?
–
–
–
–
–
Huge size (a typical 10 sec MPEG video ~4M)
Temporal and Spatial Information
High-level meaning and the semantic gap
Multidimensional representation
Traditional database incapable of accommodating above
characteristics
Global Cyberbridges 2008
Proposal
3
Motivation
Why distributed environment ?
–
–
–
–
Shared storage
Shared Resources
Shared computing power
No single point of failure
Why content based image/video retrieval ?
–
unlike traditional data, temporal, spatial and semantic content should be considered during
query of multimedia data
Can queries be issued textually for image/video databases? MAY BE NOT!
– Meta data
– Keywords
• In Google Images: sunset
Query By Example, Similarity
Measurement, Content
Interpretation, User Feedback
etc. to be considered
Global Cyberbridges 2008
Proposal
4
Multimedia data management
Representation
– Multidimensional : Unlike traditional data which is unidimensional, multimedia data in the form of image or video is
multidimensional.
– Semantic Interpretation : Multimedia data can have varied
semantic interpretation.
– Feature Selection : Identifying feature space to represent the
multimedia data is an important and crucial step in MDBMS.
Features can be Color, Texture or Temporal information etc.
The atypical nature of multimedia data needs special representation
in the form of multidimensional feature vectors
Global Cyberbridges 2008
Proposal
5
Multimedia data management
Storage and Indexing
– Indexing is an integral part of designing a database system to reduce
computation overhead and optimize retrieval.
Multimedia Data Indexing Requirements
• Multimedia data stored as multidimensional feature vector.
• Need to index a high dimensional feature space.
• Index structure should map low level representation and high level
semantic relationship.
• Index structure should handle popular multimedia data retrieval
strategies like content-based image retrieval (CBIR), relevance
feedback (RF), video event retrievals etc.
Existing multidimensional indexing strategies fail to fulfill the above
requirements efficiently!
Global Cyberbridges 2008
Proposal
6
Multimedia data management
• Popular Retrieval Strategies
(Content-Based Image/Video Retrieval)
Image Database
Feature
Descriptor
Extraction
Retrieval Results
Image Descriptor Space
Global Cyberbridges 2008
Proposal
Similarity
Measurement
7
Proposed Work Outline
A typical Grid Architecture
Source: http://gridcafe.web.cern.ch/gridcafe/gridatwork/architecture.html
Global Cyberbridges 2008
Proposal
8
Proposed Work Outline
Research Issues
– Development of a technique to enable uniform representation of the
multimedia data
– Development of an efficient index structure, capable of handling
multimedia data and support applications like CBIR/CBVR, spanning
across multiple storages over a Grid/distributed environment
– Devising a mechanism by which users’ similarity concept across
multiple network domains can be considered during providing query
results
In short we envision to develop a distributed multimedia storage and
management system which will be capable of supporting popular retrieval
applications like CBIR/CBVR
Global Cyberbridges 2008
Proposal
9
Proposed Work Outline
The development and design of a multimedia data
management over grid has two critical components:
– Proper data management which prompts the requirement of a
distributed multidimensional index structure and development of
distributed retrieval algorithms (distributed k-NN or Range)
supported by the index structure
– Efficient retrieval which prompts the introduction of techniques
to map low level features with high level semantic concepts, over
a distributed environment, to provide relevant query results
Global Cyberbridges 2008
Proposal
10
Proposed Work Outline
Concepts to be utilized and Related Works
– We have developed an index structure, called Affinity Hybrid
Tree [1], for single node or stand alone applications, which is
capable of indexing multidimensional images/videos and support
CBIR/CBVR
• Plan to extend it as the basic indexing and storage framework since
it proved itself very efficient in stand alone environments
– To capture the high level similarity concepts among the users in
a distributed environment, we will develop a novel architecture
called Distributed Affinity Capture Model (DACM) based on
hierarchical markov model mediator [2].
Global Cyberbridges 2008
Proposal
11
Proposed Work Outline
Components
• Affinity Hybrid Tree
Feature based index mechanism
filters the feature space and
reduce the # of distance
computations to be performed
Increase retrieved image relevance
by capturing the user concept as it is
Reduce computational overhead
Distance based index mechanism
incorporates the high-level image
relationship as it is without
translating it into its low-level
equivalence
Global Cyberbridges 2008
Proposal
12
Proposed Work Outline
Components
• Building AH-Tree
Feature
space
filtering
Feature Vectors
feed
root
Space
Index
Indexed sub
space
Semantic
relationship
introduction
Distance based
indexing
Indexe
d data
Global Cyberbridges 2008
Proposal
Indexe
d data
Indexed sub
space
Distance based
indexing
Indexe
d data
Indexe
d data
13
Proposed Work Outline
Components
Sample Results
• Computation Cost
Feature-space filtering reduces # of image
objects to be examined. Hence, reduces
# of distance computations manifold.
Accuracy:
AH-Tree – 80%
M-Tree – 10-20%
Global Cyberbridges 2008
Proposal
14
Proposed Work Outline
Components
Hierarchical Markov Model Mediator (HMMM) [2]
–
A HMMM is represented by an 8-tuple
  (d , S , F , A, B, , O, L)
Where, d  # levels in HMMM
S  multimedia objects in different levels
F  distinctive features or semantic concepts (depending upon the
level)
A  Affinity Relationship between multimedia objects
B  Features/Concepts at each level
 Initial state probability distribution
O  Weights of importance for the lower level features and higher level
concepts
L  Link condition between higher level and lower level states
The model has been used successfully for several applications like CBIR and web document
clustering
Global Cyberbridges 2008
Proposal
15
Tentative Road Map
• Details Literature Review for the following concepts:
–
available data management tools and techniques in Grid
computing
– peer-to-peer file sharing systems
• Development of the following algorithms and models
– devise distributed k-NN search supporting CBIR/CBVR from
within an index structure
– develop Distributed Affinity Capture Model (DACM) to capture
users’ concept of high-level similarity
• Implementation of the entire system
Global Cyberbridges 2008
Proposal
16
Conclusion
We propose to develop
– An efficient multimedia data management framework over a
distributed environment like Grid
– Develop distributed content-based retrieval algorithms which will
span across the grid to provide
• semantically close query results
• quickly and efficiently
– Devise a way to capture users’ concept of similarity across the
grid (bridging the gap between low-level features and high-level
semantics is a challenge) with
• An architecture called Distributed Affinity Capture Model (DACM)
Global Cyberbridges 2008
Proposal
17
Questions
Global Cyberbridges 2008
Proposal
18
Selected References
[1] Kasturi Chatterjee and Shu-Ching Chen, "A Novel Indexing and Access Mechanism using Affinity
Hybrid Tree for Content-Based Image Retrieval in Multimedia Databases," International Journal of
Semantic Computing (IJSC), Vol. 1, Issue 2, pp. 147-170, June 2007.
[2] Mei-Ling Shyu, Shu-Ching Chen, Min Chen, Chengcui Zhang, and Chi-Min Shu, "MMM: A
Stochastic Mechanism for Image Database Queries," Proceedings of the IEEE Fifth International
Symposium on Multimedia Software Engineering (MSE2003), pp. 188-195, December 10-12,
2003, Taichung, Taiwan, ROC.
[3] M.-L. Shyu, S.-C. Chen, and C. Haruechaiyasak, C.-M. Shu, and S.-T. Li, “Disjoint Web
Document Clustering and Management in Electronic Commerce,” the Seventh International
Conference on Distributed Multimedia Systems (DMS’2001), pp. 494-497, 2001.
[4] Mei-Ling Shyu, Shu-Ching Chen, Min Chen, Chengcui Zhang, Kanoksri Sarinnapakorn,
"Image Database Retrieval Utilizing Affinity Relationships," accepted for publication, the First
ACM International Workshop on Multimedia Databases (ACM MMDB'03), November 7, 2003,
New Orleans, Louisiana, USA.
Global Cyberbridges 2008
Proposal
19