Transcript View slides
Visiwords
John Tait
Chief Scientific
Officer
Warning
• A few half formed ideas from the world of
image and video indexing which may be of
interest to MT people
• Not original ideas (apart from I think the
link)
• In fact a line of work which derives
originally from MT
The nub
• Unsupervised Clustering of bundles of
features
– Colour, texture from image segments
– Words, phrases from sentences or paragraphs ?
• Associate these bundles with
“translations” by supervised machine
learning
– Categorised images
– Parallel texts
Origins
• “Matching Words and Pictures”: Barnard,
Duygulu, Forsyth, de Freitas, Blei, Jordan.
Journal of Machine Learning Research 3
(2003) 1107-1135
• “Image Classification Using Hybrid Neural
Networks” Tsai, McGarry and Tait.
Proceedings of the 26th ACM SIGIR
Conference on Research and Development
in Information Retrieval (SIGIR 2003),
Toronto, July, 2003. pp 431-432.
More or less general
clusters
General Concepts
Specific Concepts
Visiwords
• Derived from Visiterms
– These feature cluster nodes
– A notion of an area of the “semantic field”
– Remember these are colour, texture etc. for
an area of an image …. No relation to
language … or at least a very deep one
Matching
L1 General Concepts
Examples
L1 Specific Concepts
L2 General Concepts
L2 Specific
Concepts
Fast Forward to 2009
• Better statistical models tuned to the data
• Much Bigger vocabularies of words
categories
• … and lots of other advances
A question
• Is there anything like this current MT
research ?
Concluding remarks
• I’m surprised this worked at all
– Why should image data be coherent and
cohesive ?
• But text is !!!
• Is this a better way to deal with unknown
and changing vocabulary
Some other references
•
•
A Correlation Approach for Automatic Image Annotation Hardoon, D.,
Saunders, C., Szedmak, S. and Shawe-Taylor, J. (2006) A Correlation
Approach for Automatic Image Annotation. In: Second International
Conference on Advanced Data Mining and Applications, ADMA 2006,
August, Xi'an, China.
Kucuktunc, O., Sevil, S. G., Tosun, A. B., Zitouni, H., Duygulu, P., and
Can, F. 2008. Tag Suggestr: Automatic Photo Tag Expansion Using Visual
Information for Photo Sharing Websites. In Proceedings of the 3rd
international Conference on Semantic and Digital Media Technologies:
Semantic Multimedia (Koblenz, Germany, December 03 - 05, 2008). D.
Duke, L. Hardman, A. Hauptmann, D. Paulus, and S. Staab, Eds.
Lecture Notes In Computer Science, vol. 5392. Springer-Verlag, Berlin,
Heidelberg, 61-73. DOI= http://dx.doi.org/10.1007/978-3-540-922353_7