Transcript Outline
Research Directions
Web Databases
Will
discuss
Mobile MM systems
Pull Access
Push Access
Watermarking and Steganography
Rani
Hoitash
1
Web Databases
WWW
a collection of HTML documents
text, images, forms, tables
information
is distributed
There is no unifying schema
Information duplication
Semi-structured data
Retrieval is costly and not guaranteed
optimization is important
• use of similarity values
• use of feedback
2
Searching
Search engines
Yahoo
Infoseek
Excite
Maintain indices to the HTML pages
document retrieval
use of confidence values & ranking is essential
Problem:
based on keywords
Information overload
No media search capability
No complex search capability
single page at a time
web structure is ignored
page structure is ignored
3
MAVIS(microcosm)
Content based retrieval for non-textual data
media dependant features
links are enriched by media dependant signatures
feasible at a small scale hypermedia environment
Harder to implement at a larger scale:
it is not realistic to expect from authors to specify signatures
it is not realistic to expect from indexing tools(search engine) to
extract this information.
The servers can do this. We can augment servers to extract such
information off-line and annotated pages and links with signatures.
4
Other Systems
Image surfer(Yahoo)
Manual categorization of images
Histogram based image retrieval (keyword)
only images!!
Webseek(University of Chicago)
Image retrieval
keyword retrieval
face recognition
Only images!!!
Infoseek(RPI)
A combined search engine (integrated)
parallel execution of queries using multiple indices
query translation
result merging
Keyword based
do not provide complex query functionality
5
WebSQL(University of Toronto)
WWW is a table of documents
URL, title, date, etc are treated as attributes
SQL is augmented to query such information
E. g.
Select d. url, d. title, d. length
from Document d suchthat d Mentions “hypertext”
where d. type=“text/html”
Find all documents of type html that mentions hypertext, return the
url, title, length, on modification date for such documents.
6