PPT - web.iiit.ac.in

Download Report

Transcript PPT - web.iiit.ac.in

Video Google –
A google approach to Video Retrieval
Introduction
Problem:
Retrieve key frames and shots that of a
video containing a particular object or scene
with the ease and accuracy of Google.
Approach:
• Effectively precompute matches
• Textual analogy
Architecture
Visual Word
User-End
Storage
Indexing
Video Google –Visual words
•Dhruvan
•Dileep
•Nishant
•Pradeep
•Pramod
•Sunil
MSER

Maximally Stable Extremal Regions

A Maximally Stable Extremal Region (MSER) is a
connected component of an appropriately thresholded
image
SA
•
•
The Shape Adapted regions are invariant to affine
transformations.
The SA regions tend to be centered on corner like features.
SIFT
•
•
Scale Invariant Feature Transform
Invariant to image scaling and rotation
– Partially invariant to changes in illumination and
viewpoint
– 128 dimensional descriptor
Clustering
• Aim : To vector quantize descriptors into clusters to be used as
Visual words
• Clustering Techniques
• Agglomerative
• O(n2) space.
• Kmeans
• O(n+k) space, O(n*k*e) time complexity
• Fast Kmeans
• Triangulation inequality used.
• O(n*k) space.
• Distance calculations reduced to ~ n than n*k*e
Statistics
•19 Half an hour videos:
Points
Clusters
Time
SA
102823
4000
3hr 45 mins
MSER
508191
2000
2hr 50 mins
•Classification 1060842 points – 9 hours
Clustering Evaluation
DB and API
Indexing/
Retrieval
Visual
Words
Vid_id
Frame_id Pos_x Pos_y V_id
UI
Results
Future Work
• Vocabulary Tree for interest point classification
• Increase the visual vocabulary through efficient clustering.
Indexing and Retrieval in Vgoogle
D Pavan Kumar
- B Rakesh Babu
- B Naveen Kumar
- Ankur Jaiswal
- V Sreekanth
- P Kowshik
- J Shashank
-
Overview
Visual Words
Indexing
Query
Results
Input format
Pre-processing
Video Id
Frame Id
pos_x
pos_y
Query
Set of visual words in the query rectangle
Visual word Id
Output format
Retrieved Results
Rank
Video Id
Frame Id
Objectives

Efficient Indexing

Fast Retrieval Time

Good Recall
Approach …

Removing the common words

Reverse Indexing

Ranking of results
Indexing and Retrieval in Document Retrieval

Stop list


Inverse File Structure


Used to remove the common words.
An entry for each word in the corpus followed by a list of all the
documents in which it appears.
Spatial Consistency Ranking

Use the ordering and separation of words to calculate the relevance of a
document.
Stop list

In textual context


Words are extracted from text.
Words are filtered based on the level of usefulness.

For instance words which are independent of subject or event being described are
filtered out.

Removing such words will have no effect on the results.

E.g.: The way the school is long and hard when walking in the rain.

Removing `the` will have no effect on the result.
Stop list (contd.)

In the current context


Stop list - list of visual words.
Occur very often or very less.

Determine stop list boundaries empirically.

Advantages



Reduce number of mismatches
Reduce size of inverted file
Meaningful visual vocabulary
Stop list (contd…)
Inverse File Structure
 Inverted File structure for Indexing
 Popular DS in Document Retrieval
 Mapping from words to Document
 Less query time compared to Forward indexing
 Forward Indexing – Sequential
 Inverted Indexing – Random
Words
Movie
D3
D23
D25
Spain
D1
D3
D8
D1051
D1
…….
…….
D2029
…….
D100
…….
D8
…….
D2
…….
Table
Song
D12
D1078
D102
D25
Visual Analogy
Words ~ Visual Words
Documents ~ Frames
Query vector ~ visual words in Sub-Part of frame
Visual words
V1
D3
D23
D25
V2
D1
D3
D8
D1051
D1
…….
…….
D2029
…….
D100
…….
D8
…….
D2
…….
V3
Vn
D12
D1078
D102
D25
Ranking the results - tf-idf
 Document – vector of word frequencies
 Each component of the vector is given some weight
 Standard Weighting Method
 TF-IDF
Ranking the results - tf-idf

Each document is represented as a vector < t1, t2, t3, … ti,…, tk-1, tk >
nid
N
ti 
log
nd
ni
nid
nd
ni
N
- number of occurrences of ith word in document d.
- total number of words in document d.
- number of occurrences of ith visual word in whole database.
- number of documents in the whole database

IDF – down weights most frequent words

Ranked by cosine of angle between query vector and all document vectors.
Ranking the results – Spatial Consistency
I have been there
once , while ……..
There it is. That’s
what I ….. been ....
have ….
“Google increases the probability of documents having
all the search words close to one another"
Spatial Consistency Ranking

Spatial arrangement of objects in images.

Spatial consistency measure - Re-rank the results

Neighboring matches in the query region lie in a surrounding area in the retrieved
image.
Spatial Consistency Ranking

Search area is defined by 15 nearest neighbors.

A neighbor in the surrounding area in the retrieved image counts as a vote.

Match with no support / hits is rejected.

Repeat this for every match.

Total number of votes decides the rank.
V
V
Number of votes = 3
Frame 1
Frame 2
Frame 3
Frame N
V1
10
7
0
4
V2
4
3
0
8
Visual words
…….
…….
14
…….
2
…….
0
…….
9
…….
V3
Vn
0
2
0
8
36
23
4
57
Initial Match
After Stoplist
After Spatial Consistency
Future Work

More efficient implementation of spatial consistency.

Improve the retrieval time.
USER INTERFACE
Chetan
Chhaya
Nishant
Revanth
Sandeep
Sheetal
Objective



Build a web interface for retrieving shots from news
video database which matches the given image query
Display the ranked list of shots
eg Date, Channel, Maximum match, Month
Input & Output
About The Interface…




The interface constitutes of the following three parts.
Database Schema
Data Directories
Source Code Files
Database Schema


All the videos and metadata corresponding to the videos is stored
in SQL database which can be queried using MySQL.
Following two tables used:
Table1
Table 2
videos
videoId
channelId
channels
Date
channelId
channelName
Data Directories






Contains following five directories where data is stored
Thumbnails
Keyframes
Stories
Shots
videos
Source Files

The interface part consists of 8 files.
index.cgi
server.cgi
shots.cgi
keyframes.cgi
SelectRect.js
display.cgi
play.cgi
conf.py

Each file is a module.








index.cgi



Home page of the Interface.
This page lists todays videos as thumbnail of first
keyframe corresponding to the first shot of the video.
It also gives the user option to select specific videos
based upon the criterias of date and channel through
comboboxes.
Server.cgi



User can be directed to this page from any of the pages since all
give the user choice to select from the combo boxes.
This page lists the results of the user selection from the
comboboxes(based upon the criterias of date and channels)
The displayed result shows the thumbnail of first keyframe of each
video.
shots.cgi



Page used to display the shots of the video selected from the
previous page.
The constituting stories of the videos are displayed on the screen
one
after another.
Corresponding to each story ,we display the thumbnail of the
keyframe of all the shots in that particular story.
keyframes.cgi



This page displays the keyframe of the selected shot in its original
size( 352 X 288 here ).
The user can select a rectangle region from the frame to query the
database.
This query ( Rectangle Co-ordinates of the selected region (xmin,
ymin, xmax, ymax , videoId , shotId) are passed to display.cgi.
SelectRect.js





This module of the library takes care of user-interaction.
Its a selection tool, basically to select a part of the image.
It is a JavaScript code which works with just two clicks on the
required part of the
image, first click denotes the start-coordinates and the second click
denotes
the end-coordinates.
Input to the module: Loaded Image
Output: Selected co-ordinates.
display.cgi




This page is used to display the results of the query.
The matching keyframe and its adjacent keyframes are
displayed for all the results.
Their corresponding thumbnails are displayed on the
screen.
The input for this file comes from the indexing module.
Play.cgi







This page is used to play the video starting from the matching
frame.
We use an embedded Quicktime player to play the video.
The functionalities of the player include:
seek,
play and,
pause.
funcionalities controlled by some buttons using Javascript.
conf.py


This file contains information regarding the Directory
paths of the Data Directories and Database details.
It is imported in all the cgi files used.
Thank You