MSc_2011x - University of Alberta

Download Report

Transcript MSc_2011x - University of Alberta

Cindy M. Wong, August 2011
Electronic Imaging Lab
University of Alberta
HUMAN-BASED COMPUTATION FOR
MICROFOSSIL IDENTIFICATION
Outline
 Introduction
 Evolutionary Prototyping
 Human Interaction
 Computation Algorithms
 Conclusion
Cindy Wong
Aug-11
2
Introduction
Cindy Wong
Aug-11
3
Introduction: Motivation
 Image understanding is considered an artificial
intelligence (AI) complete problem
 Human-based computation is gaining popularity
as a method to solve AI-complete problems
 Progress in this area may be made with a
concrete application of sufficient importance
 Microfossil identification is one such application,
which is the focus of this work
Cindy Wong
Aug-11
4
Introduction: Crowdsourcing
Cindy Wong
Aug-11
5
Introduction: Crowdsourcing
Humans
Computers
Cindy Wong
Aug-11
6
Introduction: Foraminifera
 Microfossils help to locate hydrocarbon deposits
via biostratigraphy and to study prehistoric
environmental conditions via geochemistry
 Foraminifera (forams) – single-celled protozoa
with shells (~1 mm) that live in bodies of water
Acarinina
Subbotina
Morozovella
 Identified manually by experts at present
 Research has been performed on automated
identification with limited success
Cindy Wong
Aug-11
7
Introduction: Automated
Identification
Rule-based approaches need a person to
input features
 Require experts to manually view and
manipulate specimens (example: VIDES)
Artificial Neural Network based approaches
involve training a system
 Need high quality SEM images (COGNIS),
generate high incorrect rates (COGNIS Light),
or are difficult to understand (SYRACO)
Cindy Wong
Aug-11
8
Evolutionary Prototyping
Cindy Wong
Aug-11
9
Evolutionary Prototyping:
Design Cycle
Requirements
Refinement
Testing and
Validation
Prototype
Modification
 Ideal design cycle because:
 Exploratory research requires validation
 Crowdsourcing is unpredictable
 Modifying old prototypes saves time
Cindy Wong
Aug-11
10
Evolutionary Prototyping:
Prototypes
 Prototype timeline (year 0 is Jan. 1, 2006):
 CASSIE 1 (0 to 1 1/12 )
 CASSIE 2 (1 1/12 to 3 1/2 )
 Microfossil Quest (3 1/2 to 5 2/3 )
Cindy Wong
Aug-11
11
Evolutionary Prototyping:
First Prototype
Specimen
Acquisition
Computation
Algorithms
Human
Interaction
 Computer-aided system for specimen
identification and examination (CASSIE) 1
prototype (Jan. 2006–Feb. 2007)
 Requirement: reduce expert workload
 Modification: clustering using image correlation
to compare similarity
 Validation: identifications obtained via
Microfossil Wiki for analysis
Cindy Wong
Aug-11
12
Evolutionary Prototyping:
Second Prototype
Specimen
Acquisition
Computation
Algorithms
Human
Interaction
Specimen
Dissemination
 CASSIE 2 prototype (Feb. 2007–Jun. 2009)
 Requirement: improve digital representations to
account for illumination variability
 Modification: automatic video capture
 Validation: difficulty obtaining ground truth
identifications but variability addressed
Cindy Wong
Aug-11
13
Evolutionary Prototyping:
Third Prototype
Specimen
Acquisition
Human
Interaction
Specimen
Dissemination
Computation
Algorithms
 Microfossil Quest prototype (Jun. 2009–Aug.
2011)
 Requirement: transition from computer-aided to
crowdsourcing system
 Modification: leverage crowdsourcing
 Validation: individual components validated
Cindy Wong
Aug-11
14
Evolutionary Prototyping:
Languages and Architectures
 Quest code organization, execution location,
inter and intra-component interaction, and
programming languages
Cindy Wong
Aug-11
15
Human Interaction
Cindy Wong
Aug-11
16
Human Interaction: Overview
 Created the Microfossil Quest website to
interact with volunteers and inform users
 For this human-based computation system,
the human interaction part incorporates
citizen science in its design
Cindy Wong
Aug-11
17
Human Interaction:
Organization
 Microfossil Quest site is navigated using a menu
for non-linear navigation
 Layout goes left-to-right from more specific
information to more general information
Specific
General
Cindy Wong
Aug-11
18
Human Interaction: Home
 Users search the
database for a subset
of specimens or use the
default search
 Users update captions
to update specimen
identifications
 Website demo
(http://www.ece.ualber
ta.ca/~imagesci/microf
ossilQuestO865)
Cindy Wong
Aug-11
19
Human Interaction: Tutorial
 Training for volunteers
and information for
other users
 Focus is placed on
teaching features
 Organization of topics
top-to-bottom based
on requirement of least
to most knowledge
Cindy Wong
Aug-11
20
Human Interaction: System
 Gives an
overview of the
Microfossil
Quest system
Users
Specimen
Acquisition
Knowledge
Base
 Users are able to
click on the
different
modules to get
more details
Computer
Intelligence
Human
Intelligence
Cindy Wong
Aug-11
21
Computation Algorithms
Cindy Wong
Aug-11
22
Computation Algorithms:
Overview
 Dynamic hierarchical identification (DHI)
 Unsupervised learning
 Supervised learning
 Dynamic learning
 Experimental results
Cindy Wong
Aug-11
23
Computation Algorithms:
Unsupervised Learning
 Generates clusters to increase thoroughness
 Does not require user input
 Uses agglomerative hierarchical clustering
 Formation of clusters visualized with trees
Cindy Wong
Aug-11
24
Computation Algorithms:
Unsupervised Learning
0.4118
2104
0.5027
0.5854
0.9141
0.4104
0.2458
2105
1472
1205
1633
0.9
0.7
0.3122
0.5
0.7087
0.2474
0.2
0.3066
Cindy Wong
Aug-11
25
Computation Algorithms:
Unsupervised Learning
0.4104
2104
2105
1472
1205
1633
0.5027
0.9
0.5854
0.2458
0.7
0.5
0.7087
0.2
0.3066
Cindy Wong
Aug-11
26
Computation Algorithms:
Unsupervised Learning
0.4104
2104
2105
1472
1205
1633
0.5027
0.9
0.2458
0.7
0.5
0.2
Cindy Wong
Aug-11
27
Computation Algorithms:
Unsupervised Learning
2104
2105
1472
1205
1633
0.9
0.2458
0.7
0.5
0.2
Cindy Wong
Aug-11
28
Computation Algorithms:
Unsupervised Learning
2104
2105
1472
1205
1633
0.9
0.7
0.5
0.2
Cindy Wong
Aug-11
29
Computation Algorithms:
Supervised Learning
 Propagates identifications reliably
 Assumes only some specimen identifications are
known (direct identifications)
 Uses the trees to propagate identifications
(indirect identifications)
 Propagates identifications according to majority
identification in the cluster
 Assigns confidence level for indirect
identifications according to merge level
Cindy Wong
Aug-11
30
Computation Algorithms:
Supervised Learning
M. subb
M. subb
0.75
M. subb
M. subb
0.51
M. subb
M. subb
0.9
M. vela
M.
M. vela
M. vela
M. vela
M.
M. vela
0.35
M. vela
M. vela
M. subb
M. subb
M. vela
0.108
M. vela
M. vela
M. subb
M. subb
M. vela
Cindy Wong
Aug-11
M. vela
31
Computation Algorithms:
Dynamic Learning
 Serves to increase throughput with priority
generation algorithm
 Assumes users are only able to identify a small
number of specimens at a time
 Encourages users to identify specimens
according to what increases the average
confidence of the dataset the most
 Calculates distance, or amount of improvement
if identified, to determine priority (one minus
merge level equals new priority)
Cindy Wong
Aug-11
32
∞
Computation Algorithms:
Dynamic Learning
∞
∞
∞
−∞
∞
∞
2011
2012
2013
2014
2015
2016
2017
∞
0.1
0.8
∞
0.2
0.6
0.4
0.2
−∞
0.5
0.4
0.2
−∞
0.9
=1-0.9
0.5
0.3
0.7
0.1
0.4
0.2
−∞
0.5
0.2
0.7
0.1
0.4
0.2
−∞
0.5
0.8
priority
(2)
(6)
(4)
(5)
(3)
(1)
Cindy Wong
Aug-11
33
Computation Algorithms:
Multiple Trees
Order
Genus
Species












- unknown
- known
 Computation algorithms depend on taxonomic
detail available for specimens in the tree
 Run algorithms with different trees using specimens
from the top to the bottom of the table
Cindy Wong
Aug-11
34
Computation Algorithms:
Experimental Results
 Validation of results was done by comparing DHI
to a standard clustering algorithm: k-nearest
neighbours (KNN)
 Testing materials used were 238 specimens with
particle-based identifications (ground truth)
 Examined:
 correct identification rates
 incorrect identification rates
 impact of thresholding
 average confidences
Cindy Wong
Aug-11
35
Computation Algorithms:
Correct Rates
 Correct rates illustrate the thoroughness in
dataset identification
 DHI has more thorough and predictable results
than KNN
Cindy Wong
Aug-11
36
Computation Algorithms:
Incorrect Rates
 Incorrect rates show the reliability of the
generated identifications in the dataset
 DHI is more reliable and predictable than KNN
Cindy Wong
Aug-11
37
Computation Algorithms:
Threshold Results
 Lower thresholds imply more leveraging
 Comparing threshold results illustrates how
limiting propagation confidence affects
throughput of identification
Cindy Wong
Aug-11
38
Computation Algorithms:
Average Confidence
 Average confidence illustrates how quickly
dataset identifications are propagated
 Results predict thoroughness of correct rates
Cindy Wong
Aug-11
39
Conclusion
Cindy Wong
Aug-11
40
Conclusion: Contributions
(Evolutionary Prototyping)
 Created the first crowdsourcing design for
microfossil identification
 Developed components of the Microfossil Quest
prototype, a crowdsourcing approach evolved
from a computer-aided approach
 Provided a case study on developing a
crowdsourcing project using the evolutionary
prototyping design cycle
Cindy Wong
Aug-11
41
Conclusion: Contributions
(Human Interaction)
 Unlike most crowdsourcing projects that involve
websites, the Microfossil Quest design:
 Enables volunteer control over identification tasks
 Incorporates educational material on the system
 A new interactive digital representation, which
presents illumination and depth information,
was included in the website – it is a contribution
to a coauthored Journal of Microscopy paper
Cindy Wong
Aug-11
42
Conclusion: Contributions
(Computation Algorithms)
 Created a supervised learning algorithm to
propagate identifications using tree structures
computed by unsupervised learning
 Created a dynamic learning algorithm, which
prioritizes specimens for identification
 Testing of the DHI algorithm verifies an increase
in thoroughness, reliability, predictability, and
throughput, when compared to a benchmark
KNN identification algorithm
Cindy Wong
Aug-11
43
Acknowledgements
 Thank you to Dr. Dileepan Joseph, Dr. Kamal
Ranaweera, and Adam Harrison for their
guidance and support
 Thank you to family and friends for their support
through both undergraduate and graduate
school
Cindy Wong
Aug-11
44
Appendix
Cindy Wong
Aug-11
45
Special Cases: Genus
 Correct and incorrect genus rates versus image quality:
(left) using specialist ratings of quality (S. Bains); (right)
using automatic ratings of quality (Fourier method)
Cindy Wong
Aug-11
46
Special Cases: Species
 Correct and incorrect species rates versus image quality:
(left) using specialist ratings of quality (S. Bains); (right)
using automatic ratings of quality (Fourier method)
Cindy Wong
Aug-11
47