Image searching on the Web

Download Report

Transcript Image searching on the Web

Image searching on the Web
Qunyan Mao
SIMS, UC Berkeley
Image on the Web
• More images on the web
• Format of image
– GIF: CompuServe Graphics Interchange
format, a dominant web format
– JPEG: support “true color”, ideal for
photographs
– TIF: large file size, not well suited for web
pages
Image format
– BMP: such as desktop wallpaper image, not
supported by any Web browser
– PNG: Portable Network Graphics. An image
format for the future.
Image indexing
• Two methods used in indexing images
– text-based
– content-based
Text-based vs content based
• Text-based : commonly used image indexing and
searching method
– descriptive text
– controlled vocabulary
• Drawbacks:
– have to have descriptor along with the image to make it
accessible
– consistency: different type of textual data
– human intervention
Text-based vs.. content based
• Content-based indexing and searching
– goal: provide algorithms that can automatically
recognize the important features in an image
without human intervention
– search by color, shape,spatial relationship
Where to start
• Search engine:
– general search engine: AltaVista, Lycos,
– image search engine: WebSEEK
• Specialized image database:
– Museum, archive, and library digital image
database
How search engine work
• How does a Web search engine identify images
and match to your criteria
– look for graphic files: HTML tags: <img src> and
<href>. Example: AltaVista
– Look for caption: HTML tag<alt>
– Look for title of Web page
– Employ human intervention to catalog images
Search example: Giraffe
• AltaVista
• Lycos
• WebSEEK
AltaVista
• Http://image.altavista.com/cgi-bin/avncgi
• Text-based indexing: file name and path
name
• (live search demo)
Lycos
• Http://lycospro.lycos.com
• Text-based indexing: file name, path name
and caption(<ALT> tag)
• (live search demo)
WebSEEK:
• Http://www.disney.ctr.columbia.edu/websee
k
• content-based image search engine
• (live search demo)
Specialized image database
• Museums, archive, and library digital image
database
– Fine arts museums of San Francisco
http://www.thinker.org/imagebase/index-2.html
– California heritage collection
http://sunsite.berkeley.edu/CalHeritage/collecti
on.html
– National Museum of America Art
Problems in using search engines
• Search result heavily relies on the how the
webmaster name the image file and
directory.
• Even with the content -based search, there
are a lot unexpected results
Specialized image database
• More likely use controlled vocabulary to
describe the image
• well organized comparing to the other
images on the Web
• have their own search tools or find aids
Before using the image
• Available for viewing doesn’t mean
available for reuse
• Make sure you have right to use it