only image features - Personal Web Pages
Download
Report
Transcript only image features - Personal Web Pages
LEARNING SEMANTICS OF
WORDS AND PICTURES
TEJASWI DEVARAPALLI
CONTENT
• INTRODUCTION
• MODELING IMAGE DATASET STATISTICS
• HIERARCHICAL MODEL
• TESTING AND USING BASIC MODEL
• AUTO ILLUSTRATION
• AUTO ANNOTATION
• RESULTS
• DISCUSSIONS
SEMANTICS
• LANGUAGE USES A SYSTEM OF LINGUISTIC SIGNS, EACH OF WHICH IS A
COMBINATION OF MEANING AND PHONOLOGICAL AND/OR ORTHOGRAPHIC
FORMS.
• SEMANTICS IS TRADITIONALLY DEFINED AS THE STUDY OF MEANING IN
LANGUAGE.
ABSTRACT
• A STATISTICAL MODEL FOR ORGANIZING IMAGE COLLECTIONS.
• INTEGRATES SEMANTIC INFORMATION PROVIDED BY ASSOCIATED TEXT AND
VISUAL INFORMATION PROVIDED BY IMAGE FEATURES.
• PROMISING MODEL FOR INFORMATION RETRIEVAL TASKS LIKE DATABASE
BROWSING, SEARCHING FOR IMAGES.
• USED FOR NOVEL APPLICATIONS.
INTRODUCTION
• METHOD FOR ORGANIZING IMAGE DATABASES.
• INTEGRATES TWO KINDS OF INFORMATION DURING MODEL CONSTRUCTION.
• LEARNS LINKS BETWEEN IMAGE FEATURES AND SEMANTICS.
• LEARNINGS USEFUL IN
BETTER BROWSING
BETTER SEARCH
NOVEL APPLICATIONS
INTRODUCTION(CONTINUED)
• MODELS STATISTICS ABOUT OCCURRENCE AND CO-OCCURRENCE OF WORD AND
FEATURES.
• HIERARCHICAL STRUCTURE.
• GENERATIVE MODEL, IMPLICITLY CONTAINS PROCESSES FOR PREDICTING
IMAGE COMPONENTS
WORDS AND FEATURES
COMPARISON
• THIS MODEL SUPPORTS BROWSING FOR THE IMAGE RETRIEVAL PURPOSES
• SYSTEMS FOR SEARCHING IMAGE DATABASES INCLUDES SEARCH BY QUERY.
TEXT
IMAGE FEATURE SIMILARITY
SEGMENT FEATURES
IMAGE SKETCH
MODELING IMAGE DATASET STATISTICS
• GENERATIVE HIERARCHICAL MODEL
• COMBINATION OF
ASYMMETRIC CLUSTERING MODEL (MAPS DOCUMENTS INTO CLUSTERS)
SYMMETRIC CLUSTERING MODEL(MODELS JOINT DISTRIBUTION OF
DOCUMENTS AND FEATURES).
• DATA MODELED AS FIXED HIERARCHY OF NODES.
• NODES GENERATE
WORD
IMAGE SEGMENT
ILLUSTRATION
• DOCUMENTS MODELED AS SEQUENCE OF WORDS AND SEQUENCE OF SEGMENTS
USING BLOBWORLD REPRESENTATION.
• "BLOBWORLD" REPRESENTATION IS CREATED BY CLUSTERING PIXELS IN A JOINT
COLOR-TEXTURE-POSITION FEATURE SPACE.
• THE DOCUMENT IS MODELED BY SUM OVER THE CLUSTERS, TAKING ALL
CLUSTERS INTO CONSIDERATION.
HIERARCHICAL MODEL
• EACH NODE HAS A PROBABILITY OF
GENERATING A WORD/ IMAGE W.R.T
THE DOCUMENT UNDER
CONSIDERATION.
• CLUSTER DEFINES THE PATH.
Higher level nodes emit more
general words and blobs. (e. g . sky)
Moderately general words
and blobs. (e. g . Sun,sea)
Lower level nodes emit
more specific words and
blobs. (e. g . Waves)
• CLUSTER, LEVEL IDENTIFIES THE
NODE.
Sun
Sky
Sea
Waves
Mathematical Process for generating set of observations ‘D’ associated with a
document ‘d’ is described by
C – clusters, i – items, l – levels.
GAUSSIAN DISTRIBUTIONS
• NUMBER OF FEATURES LIKE ASPECTS OF SIZE, POSITION, COLOR, TEXTURE AND
SHAPE ALL TOGETHER FORM FEATURE VECTOR ‘X’.
• PROBABILITY DISTRIBUTION OVER IMAGE SEGMENTS BY USUAL FORMULA:-
MODELING IMAGE DATASET STATISTICS
• THIS MODEL USES HIERARCHICAL MODEL AS IT BEST SUPPORTS
BROWSING OF LARGE COLLECTIONS OF IMAGES
COMPACT REPRESENTATION
• PROVIDES IMPLEMENTATION DETAILS FOR AVOIDING OVER TRAINING.
• THE TRAINING PROCEDURE CLUSTERS A FEW THOUSAND IMAGES IN A FEW
HOURS ON A STATE OF THE ART PC.
MODELING IMAGE DATASET STATISTICS
• RESOURCE REQUIREMENTS LIKE “MEMORY” INCREASE RAPIDLY WITH NO.OF
IMAGES. SO WE NEED EXTRA CARE.
• THERE ARE DIFFERENT APPROACHES FOR AVOIDING OVER-TRAINING AND
RESOURCE USAGE.
FIRST APPROACH
• WE TRAIN ON RANDOMLY SELECTED SUBSET OF IMAGES UNTIL LOG
LIKELYHOOD
FOR HELD OUT DATA, RANDOMLY SELECTED FROM REMAINING DATA BEGINS TO
DROP.
• THE MODEL SO FOUND IS USED AS A STARTING POINT FOR NEXT TRAINING
ROUND USING SECOND RANDOM SET OF IMAGES.
SECOND APPROACH
• SECOND METHOD FOR REDUCING RESOURCE USAGE IS TO LIMIT CLUSTER
MEMBERSHIP.
FIRST COMPUTE APPROXIMATE CLUSTERING BY TRAINING ON A SUBSET.
THEN CLUSTER ON ENTIRE DATASET, MAINTAIN PROBABILITY THAT A POINT IS IN
A CLUSTER
FOR TOP TWENTY CLUSTERS.
REST OF THE MEMBERSHIP PROBABILITIES ASSUMED TO BE ZERO FOR NEXT FEW
ITERATIONS.
TESTING AND USING BASIC MODEL
• METHOD STABILITY IS TESTED BY RUNNING FITTING PROCESS.
• FITTING PROCESS IS RUN ON SAME DATA SEVERAL TIMES WITH DIFFERENT
INITIAL CONDITIONS AS EXPECTATION MAXIMIZATION(EM) PROCESS IS
SENSITIVE TO THE STARTING POINT.
• THE CLUSTERING POINT DEPENDS MORE ON STARTING POINT THAN ON EXACT
IMAGES CHOSEN FOR TRAINING.
• THE SECOND TEST IS TO VERIFY WHETHER CLUSTERING ON BOTH IMAGE AND
TEXT HAS ADVANTAGE OR NOT.
TESTING AND USING
THE BASIC MODEL
THIS FIGURE SHOWS 16
IMAGES FROM A CLUSTER
FOUND USING TEXT ONLY
TESTING AND USING THE
BASIC MODEL
THIS FIGURE SHOWS 16
IMAGES FROM A CLUSTER
FOUND USING ONLY IMAGE
FEATURES
TESTING AND USING THE BASIC MODEL
BROWSING
• MOST IMAGE RETRIEVAL SYSTEMS DO NOT SUPPORT BROWSING.
• THEY FORCE USER TO SPECIFY A QUERY.
• THE ISSUE IS WHETHER THE CLUSTERS FOUND THROUGH BROWSING MAKE
SENSE TO THE USER.
• IF THE USER FINDS THE CLUSTERS COHERENT THEN THEY CAN BEGIN TO
INTERNALIZE THE KIND OF STRUCTURE THEY REPRESENT.
BROWSING
• USER STUDY
GENERATE 64 CLUSTERS FOR 3000 CLUSTERS.
GENERATE 64 RANDOM CLUSTERS FROM THE SAME IMAGES.
PRESENT RANDOM CLUSTER TO USER, ASK TO RATE COHERENCE(YES/NO).
94% ACCURACY
IMAGE SEARCH
• SUPPLY A COMBINATION OF TEXT AND IMAGE FEATURES.
• APPROACH : COMPUTE FOR EACH CANDIDATE IMAGE, THE PROBABILITY OF
EMITTING THE QUERY ITEMS.
• Q = SET OF QUERY ITEMS D= CANDIDATE DOCUMENT.
IMAGE
SEARCH
THE FIGURE SHOWS THE RESULTS OF THE
“RIVER” AND “TIGER” QUERY.
IMAGE SEARCH
• SECOND APPROACH
FINDING THE PROBABILITY THAT EACH CLUSTER GENERATES A QUERY AND
THEN SAMPLE ACCORDING TO WEIGHTED CLUSTERS.
CLUSTER MEMBERSHIP PLAYS IMPORTANT ROLE IN GENERATING DOCUMENTS,
WE CAN SAY CLUSTERS ARE COHERENT.
IMAGE SEARCH
• PROVIDING MORE FLEXIBLE METHOD OF SPECIFYING IMAGE FEATURES IS AN
IMPORTANT NEXT STEP.
• THIS IS AS EXPLORED IN MANY “QUERY BY EXAMPLE” IMAGE RETRIEVAL
SYSTEMS.
EXAMPLE :-
WE CAN QUERY FOR A DOG WITH WORD DOG AND IF WE WANT BLUE SKY THEN
WE CAN GET IT BY ADDING IMAGE SEGMENT FEATURE TO THE QUERY.
PICTURES FROM WORDS AND WORDS FROM
PICTURES
• THERE ARE TWO TYPES OF APPROACHES FOR LINKING WORDS TO PICTURES AND
PICTURES TO WORDS.
AUTO ILLUSTRATION
AUTO ANNOTATION
AUTO ILLUSTRATION
• “AUTO ILLUSTRATION” – THE PROCESS OF LINKING PICTURES TO WORDS.
• GIVEN A SET OF QUERY ITEMS, Q AND A CANDIDATE DOCUMENT D, WE CAN
EXPRESS THE PROBABILITY THAT A DOCUMENT PRODUCES THE QUERY BY:
AUTO ANNOTATION
• GENERATE WORDS FOR A GIVEN IMAGE
CONSIDER THE PROBABILITY OF THE IMAGE BELONGING TO THE CURRENT
CLUSTER.
CONSIDER THE PROBABILITY OF THE ITEMS IN THE IMAGE BEING
GENERATED BY THE NODES AT VARIOUS LEVELS IN THE PATH ASSOCIATED TO
THE CLUSTER.
WORK THE ABOVE OUT FOR ALL CLUSTERS.
AUTO ANNOTATION
WE ARE COMPUTING THE PROBABILITY THAT AN IMAGE EMITS A PROPOSED
WORD,
GIVEN THE OBSERVED SEGMENTS, B:
AUTO
ANNOTATION
THE FIGURE SHOWS SOME
ANNOTATION RESULTS SHOWING THE
ORIGINAL IMAGE, THE BLOBWORLD
SEGMENTATION, THE COREL
KEYWORDS, AND THE PREDICTED
WORDS IN RANK ORDER.
AUTO ANNOTATION
• THE TEST IMAGES WERE NOT IN THE TRAINING SET, BUT THEY COME FROM
SAME SET OF CD’S USED FOR TRAINING.
• THE KEYWORDS IN UPPER-CASE ARE IN THE VOCABULARY.
AUTO ANNOTATION
• TESTING THE ANNOTATION PROCEDURE:
WE USE THE MODEL TO PREDICT THE IMAGE WORDS BASED ONLY ON THE
SEGMENTS, THEN COMPARE THE WORDS WITH SEGMENTS.
PERFORM TEST ON TRAINING DATA AND TWO DIFFERENT TEST SETS. THEY ARE
1ST SET - RANDOMLY SELECTED HELD OUT SET FROM PROPOSED TRAINING DATA
COMING FROM COREL CD’S.
2ND SET - IMAGES FROM OTHER CD’S
AUTO ANNOTATION
• QUANTITATIVE PERFORMANCE
USE 160 COREL CD’S , EACH WITH 100 IMAGES(GROUPED BY THEME)
SELECT 80 OF THE CDS, SPLIT INTO TRAINING (75%) AND TEST (25%).
REMAINING 80 CDS ARE A ‘HARDER’ TEST SET.
MODEL SCORING:
N = NUMBER OF WORDS FOR THE IMAGE , R= NUMBER OF WORDS RECTLY.
RESULTS
ANNOTATION RESULTS ON
THREE KINDS OF TEST DATA,
WITH THREE DIFFERENT
SCORING METHODS.
RESULTS
• THE ABOVE TABLE SUMMARIZES THE ANNOTATION RESULT USING THE THREE
SCORING METHODS AND THE THREE HELD OUT SETS.
• WE AVERAGE THE RESULTS OF 5 SEPARATE RUNS WITH DIFFERENT HELD OUT
SETS.
• USING THE COMPARISON OF SAMPLING FROM THE WORD PRIOR ,
WE SCORE 3.14 ON THE TRAINING DATA, 2.70 ON NON-TRAINING DATA FROM THE
SAME CD SET AS THE TRAINING DATA AND 1.65 FOR TEST DATA TAKEN FROM
COMPLETELY DIFFERENT SET OF CD’S.
DISCUSSION
• PERFORMANCE OF THE SYSTEM CAN BE MEASURED BY TAKING ADVANTAGE OF
ITS PREDICTIVE CAPABILITIES.
• WORDS WITH NO RELEVANCE TO VISUAL CONTENT CAUSE RANDOM NOISE, BY
TAKING AWAY PROBABILITY FROM MORE RELEVANT WORDS.
• SUCH WORDS CAN BE REMOVED BY OBSERVING THEIR EMISSION PROBABILITIES
ARE SPREAD OUT OVER THE NODES.
• THIS IS AUTOMATIC IMAGE REDUCTION METHOD WORKS DEPENDING ON THE
NATURE OF THE DATA SET.
REFERENCES
• LEARNING SEMANTICS OF WORDS AND PICTURES BY KOBUS BARNARD AND DAVID
FORSYTH, COMPUTER DIVISION, UNIVERSITY OF CALIFORNIA, BERKELEY
HTTP://WWW.WISDOM.WEIZMANN.AC.IL/~VISION/COURSES/2003_2/BARNARD00LEARNING.PDF
• C.CARSON, S.BELONGE, H. GREENSPAN AND J.MALIK, “BLOBWORLD: IMAGE
SEGMENTATION USING EXPECTATION MAXIMIZATION AND ITS APPLICATION TO
IMAGE QUERYING”, IN REVIEW.
HTTP://WWW.CS.BERKELEY.EDU/~MALIK/PAPERS/CBGM-BLOBWORLD.PDF
QUERIES
THANK YOU