CS322 Multimedia Information Systems Lecture 1

Download Report

Transcript CS322 Multimedia Information Systems Lecture 1

CS257 Modelling Multimedia Information
LECTURE 6
Introduction
• See beginning of Lecture 5…
Queries to Video Databases
• Users may want to query for a particular event
involving particular people, e.g. “find me video
with Bill hitting Tom” – why not use a list of
keywords [hit, Bill, Tom] for query and to
represent film content?
 Need more structured descriptions of what’s
happening (both for queries and for video
metadata), i.e. who is doing what to whom with
what and why. [More on this in PART 1]
Queries to Video Databases
• User may want to specify a temporal
sequence of events, e.g. “find me video
where this happens then this happens
while that happens”
[More on this in PART 2]
Queries to Video Databases
• How to express queries / How to describe
content – can be considered two sides of
the same coin; both require dealing with
the same kinds of issues
Creating Metadata for Video Data
• Content-descriptive metadata for video often
needs to be manually annotated
• However, in some cases the process can be
automated (partially) by:
– Video segmentation
– Feature recognition, e.g. to detect faces, explosions, etc.
– Extracting keywords from time-aligned collateral texts,
e.g. subtitles and audio description
Overview of LECTURE 6
• PART 1: Need to be able to formally describe video content in
terms of objects and events in order to make a query to a video
database, e.g. specify who is doing what.
 Subrahmanian’s Video SQL
• PART 2: May wish to specify temporal and / or causal
relationships between events, e.g. X happens before Y, A causes B
to happen
 Allen’s temporal logic
 Roth’s system for video browsing by causal links
• LAB –Bring coursework questions;
PART 1: Querying Video Content
Four kinds of retrieval according to Subrahmanian (1998)
Segment Retrieval: “find all video segments where an
exchange of a briefcase took place at John’s house”
Object Retrieval: “find all the people in the video
sequence (v,s,e)”
Activity Retrieval: “what was happening in the video
sequence (v,s,e)”
Property-based Retrieval: “find all segments where
somebody is wearing a blue shirt”
Querying Video Content
• Subrahmanian (1998) proposes an extension to
SQL in order to express a user’s information
need when querying a video database
– Based on video functions
• Recall that SQL is a database query language
for relational databases; queries expressed in
terms of:
SELECT (which attributes)
FROM (which table)
WHERE (these conditions hold)
Subrahmanian’s
Video Functions
FindVideoWithObject(o)
FindVideoWithActivity(a)
FindVideoWithActivityandProp(a,p,z)
FindVideoWithObjectandProp(o,p,z)
Subrahmanian’s
Video Functions (continued)
FindObjectsInVideo(v,s,e)
FindActivitiesInVideo(v,s,e)
FindActivitiesAndPropsInVideo(v,s,e)
FindObjectsAndPropsInVideo(v,s,e)
A Query Language for Video
SELECT may contain
Vid_Id : [s,e]
FROM may contain
video : <source>
WHERE condition allows statements like
term IN func_call
(term can be variable, object, activity or property value
func_call is a video function)
EXAMPLE 1
“Find all video sequences from the library
CrimeVidLib1 that contain Denis Dopeman”

SELECT vid : [s,e]
FROM video : CrimeVidLib1
WHERE
(vid,s,e) IN FindVideoWithObjects(Denis Dopeman)
EXAMPLE 2
“Find all video sequences from the library
CrimeVidLib1 that show Jane Shady giving
Denis Dopeman a suitcase”
EXAMPLE 2
SELECT vid : [s,e]
FROM video : CrimeVidLib1
WHERE
(vid,s,e) IN FindVideoWithObjects(Denis Dopeman) AND
(vid,s,e) IN FindVideoWithObjects(Jane Shady) AND
(vid,s,e) IN FindVideoWithActivityandProp(ExchangeObject, Item, Briefcase) AND
(vid,s,e) IN FindVideoWithActivityandProp(ExchangeObject, Giver, Jane Shady) AND
(vid,s,e) IN FindVideoWithActivityandProp(ExchangeObject, Receiver, Denis Dopeman)
EXAMPLE 3
“Which people have been seen with Denis
Dopeman in CrimeVidLib1”
EXAMPLE 3
SELECT vid : [s,e], Object
FROM video : CrimeVidLib1
WHERE
(vid,s,e) IN FindVideoWithObject(Denis Dopeman) AND
Object IN FindObjectsInVideo(vid,s,e) AND
Object = Denis Dopeman AND
type of (Object, Person)
Exercise 6-1
Given a video database of old sports broadcasts, called
SportsVidLib, express the following users’ information needs
using the extended SQL as best as possible. You should
comment on how well the extended SQL is able to capture
each user’s information need and discuss alternative ways of
expressing the information need more fully.
•Bob wants to see all the video sequences with Michael Owen kicking a ball
•Tom wants to see all the video sequences in which Vinnie Jones is tackling
Paul Gascoigne
•Mary wants to see all the video sequences in which Roy Keane is arguing
with the referee, because Jose Reyes punched Gary Neville, while Thierry
Henry scores a goal, and then Roy Keane is sent off.
Bob wants to see all the video sequences
with Michael Owen kicking a ball
Tom wants to see all the video sequences in which
Vinnie Jones is tackling Paul Gascoigne
Mary wants to see all the video sequences in
which Roy Keane is arguing with the referee,
because Jose Reyes punched Gary Neville, while
Thierry Henry scores a goal, and then Roy Keane
is sent off.
Think about…
What metadata would be required in
order to execute these kinds of video
query?
How could this be stored and searched
most efficiently?
Part 2: Enriching Video Data
Models and Queries
• More sophisticated queries to video databases
can be supported by considering:
– Temporal relationships between video intervals
– Causal relationships between events
 Need to be able to describe temporal
relationships between intervals formally and
make inferences about temporal sequences…
Temporal Relationships
between Intervals
• Allen’s (1983) work on temporal logic is often discussed
in the video database literature (and in other computing
disciplines)
• 13 temporal relationships that describe the possible
temporal relationships that can hold between temporal
intervals (e.g. intervals or events in video)  these can
be used to formulate video queries
• A transitivity table allows a system to infer the relationship
between A r C, if A r B and B r C are known (where r
stands for one temporal relationship, and A, B, C are
intervals)
SEE MODULE WEB-PAGE FOR EXTRA NOTES ON THIS
X equal Y =
=
XXXXX
YYYYY
X before Y < >
XXXX
X meets Y m mi
X overlaps Y o oi
XXXXYYYY
XXXXX
YYYYY
XXX
YYYYYYYYY
XXXX
YYYYYYYY
XXXXX
YYYYYYYYYY
X during Y d di
X starts Y s
X finishes Y f fi
si
YYYY
Temporal Relationships
between Intervals
• Crucial aspect of Allen’s work is the transitivity
table that enables inferences to be made about
temporal sequences
• Inferences take the form:
If A r B, and B r C, then r1, r2, r3… may hold
between A and C
For example:
If A < B and B < C, then A < C
Another Example
• If A “contains” B, and B < C then what
relationships can hold between A and C?
BBBBB ?CC? ?CCCC? ?CCCCC?
AAAAAAAAAAAAA?CCCCC?
?CCCCC?
Possibilities: A < C ; A “overlaps” C; A “meets
C”; A “contains” C; A “is finished by C”
Modelling the Relationships between
Entities and Events in Film
• Some temporal relationships might be
interpreted as causal relationships
• Roth (1999) proposed the use of a semantic
network to represent the relationships between
entities and events in a movie – including
causal relations
• The user can then browse between scenes in
a movie, e.g. if they are watching the scene of
an explosion, they may browse to the scene in
which a bomb was planted, via the semantic
network (extra note on semantic network will
be on the module website).
Organising and Querying Video
Content
• Should consider…
– Which aspects of the video are likely to be of
interest to the users who access the video
archive?
– How to store relevant information about the
video efficiently?
– How to express and process queries?
– What scope of automatic content extraction?
EXERCISE 6-2
• For an video database application domain of your
choosing write five video queries that use some of
Allen’s 13 temporal relationships
• If event A is ‘before (<)’ event B, and event B is
‘during’ event C, then what relationships could hold
between A and C?
• How do you think such reasoning about temporal
could be used in a video database?
LECTURE 6:
LEARNING OUTCOMES
After the lecture, you should be able to:
• Express a user’s query to a video database
using Subrahmanian’s VideoSQL and discuss
the limitations of this formalism
• Explain how and why temporal and causal
relationships between events are represented in
metadata for video databases
OPTIONAL READING
Dunckley (2003), pages 38-39; 393-395.
For details of the extended video SQL, see:
Subrahmanian (1998). Principles of Multimedia Databases
- pages 191-195. IN LIBRARY ARTICLE COLLECTION
For temporal relationships:
Allen (1983). J. F. Allen, ‘Maintaining Knowledge About Temporal
Intervals.’ Communications of the ACM 26 (11), pp. 832-843.
Especially Figure 2 for the 13 relationships and Figure 4 for the full
transitivity table. [In Library – on shelf]
For causal relationships:
Roth (1999). Volker Roth, ‘Content-based retrieval from digital video.’
Image and Vision Computing 17, pp. 531-540. [Available online
through library eJournals]