MSc_2011x - University of Alberta
Download
Report
Transcript MSc_2011x - University of Alberta
Cindy M. Wong, August 2011
Electronic Imaging Lab
University of Alberta
HUMAN-BASED COMPUTATION FOR
MICROFOSSIL IDENTIFICATION
Outline
Introduction
Evolutionary Prototyping
Human Interaction
Computation Algorithms
Conclusion
Cindy Wong
Aug-11
2
Introduction
Cindy Wong
Aug-11
3
Introduction: Motivation
Image understanding is considered an artificial
intelligence (AI) complete problem
Human-based computation is gaining popularity
as a method to solve AI-complete problems
Progress in this area may be made with a
concrete application of sufficient importance
Microfossil identification is one such application,
which is the focus of this work
Cindy Wong
Aug-11
4
Introduction: Crowdsourcing
Cindy Wong
Aug-11
5
Introduction: Crowdsourcing
Humans
Computers
Cindy Wong
Aug-11
6
Introduction: Foraminifera
Microfossils help to locate hydrocarbon deposits
via biostratigraphy and to study prehistoric
environmental conditions via geochemistry
Foraminifera (forams) – single-celled protozoa
with shells (~1 mm) that live in bodies of water
Acarinina
Subbotina
Morozovella
Identified manually by experts at present
Research has been performed on automated
identification with limited success
Cindy Wong
Aug-11
7
Introduction: Automated
Identification
Rule-based approaches need a person to
input features
Require experts to manually view and
manipulate specimens (example: VIDES)
Artificial Neural Network based approaches
involve training a system
Need high quality SEM images (COGNIS),
generate high incorrect rates (COGNIS Light),
or are difficult to understand (SYRACO)
Cindy Wong
Aug-11
8
Evolutionary Prototyping
Cindy Wong
Aug-11
9
Evolutionary Prototyping:
Design Cycle
Requirements
Refinement
Testing and
Validation
Prototype
Modification
Ideal design cycle because:
Exploratory research requires validation
Crowdsourcing is unpredictable
Modifying old prototypes saves time
Cindy Wong
Aug-11
10
Evolutionary Prototyping:
Prototypes
Prototype timeline (year 0 is Jan. 1, 2006):
CASSIE 1 (0 to 1 1/12 )
CASSIE 2 (1 1/12 to 3 1/2 )
Microfossil Quest (3 1/2 to 5 2/3 )
Cindy Wong
Aug-11
11
Evolutionary Prototyping:
First Prototype
Specimen
Acquisition
Computation
Algorithms
Human
Interaction
Computer-aided system for specimen
identification and examination (CASSIE) 1
prototype (Jan. 2006–Feb. 2007)
Requirement: reduce expert workload
Modification: clustering using image correlation
to compare similarity
Validation: identifications obtained via
Microfossil Wiki for analysis
Cindy Wong
Aug-11
12
Evolutionary Prototyping:
Second Prototype
Specimen
Acquisition
Computation
Algorithms
Human
Interaction
Specimen
Dissemination
CASSIE 2 prototype (Feb. 2007–Jun. 2009)
Requirement: improve digital representations to
account for illumination variability
Modification: automatic video capture
Validation: difficulty obtaining ground truth
identifications but variability addressed
Cindy Wong
Aug-11
13
Evolutionary Prototyping:
Third Prototype
Specimen
Acquisition
Human
Interaction
Specimen
Dissemination
Computation
Algorithms
Microfossil Quest prototype (Jun. 2009–Aug.
2011)
Requirement: transition from computer-aided to
crowdsourcing system
Modification: leverage crowdsourcing
Validation: individual components validated
Cindy Wong
Aug-11
14
Evolutionary Prototyping:
Languages and Architectures
Quest code organization, execution location,
inter and intra-component interaction, and
programming languages
Cindy Wong
Aug-11
15
Human Interaction
Cindy Wong
Aug-11
16
Human Interaction: Overview
Created the Microfossil Quest website to
interact with volunteers and inform users
For this human-based computation system,
the human interaction part incorporates
citizen science in its design
Cindy Wong
Aug-11
17
Human Interaction:
Organization
Microfossil Quest site is navigated using a menu
for non-linear navigation
Layout goes left-to-right from more specific
information to more general information
Specific
General
Cindy Wong
Aug-11
18
Human Interaction: Home
Users search the
database for a subset
of specimens or use the
default search
Users update captions
to update specimen
identifications
Website demo
(http://www.ece.ualber
ta.ca/~imagesci/microf
ossilQuestO865)
Cindy Wong
Aug-11
19
Human Interaction: Tutorial
Training for volunteers
and information for
other users
Focus is placed on
teaching features
Organization of topics
top-to-bottom based
on requirement of least
to most knowledge
Cindy Wong
Aug-11
20
Human Interaction: System
Gives an
overview of the
Microfossil
Quest system
Users
Specimen
Acquisition
Knowledge
Base
Users are able to
click on the
different
modules to get
more details
Computer
Intelligence
Human
Intelligence
Cindy Wong
Aug-11
21
Computation Algorithms
Cindy Wong
Aug-11
22
Computation Algorithms:
Overview
Dynamic hierarchical identification (DHI)
Unsupervised learning
Supervised learning
Dynamic learning
Experimental results
Cindy Wong
Aug-11
23
Computation Algorithms:
Unsupervised Learning
Generates clusters to increase thoroughness
Does not require user input
Uses agglomerative hierarchical clustering
Formation of clusters visualized with trees
Cindy Wong
Aug-11
24
Computation Algorithms:
Unsupervised Learning
0.4118
2104
0.5027
0.5854
0.9141
0.4104
0.2458
2105
1472
1205
1633
0.9
0.7
0.3122
0.5
0.7087
0.2474
0.2
0.3066
Cindy Wong
Aug-11
25
Computation Algorithms:
Unsupervised Learning
0.4104
2104
2105
1472
1205
1633
0.5027
0.9
0.5854
0.2458
0.7
0.5
0.7087
0.2
0.3066
Cindy Wong
Aug-11
26
Computation Algorithms:
Unsupervised Learning
0.4104
2104
2105
1472
1205
1633
0.5027
0.9
0.2458
0.7
0.5
0.2
Cindy Wong
Aug-11
27
Computation Algorithms:
Unsupervised Learning
2104
2105
1472
1205
1633
0.9
0.2458
0.7
0.5
0.2
Cindy Wong
Aug-11
28
Computation Algorithms:
Unsupervised Learning
2104
2105
1472
1205
1633
0.9
0.7
0.5
0.2
Cindy Wong
Aug-11
29
Computation Algorithms:
Supervised Learning
Propagates identifications reliably
Assumes only some specimen identifications are
known (direct identifications)
Uses the trees to propagate identifications
(indirect identifications)
Propagates identifications according to majority
identification in the cluster
Assigns confidence level for indirect
identifications according to merge level
Cindy Wong
Aug-11
30
Computation Algorithms:
Supervised Learning
M. subb
M. subb
0.75
M. subb
M. subb
0.51
M. subb
M. subb
0.9
M. vela
M.
M. vela
M. vela
M. vela
M.
M. vela
0.35
M. vela
M. vela
M. subb
M. subb
M. vela
0.108
M. vela
M. vela
M. subb
M. subb
M. vela
Cindy Wong
Aug-11
M. vela
31
Computation Algorithms:
Dynamic Learning
Serves to increase throughput with priority
generation algorithm
Assumes users are only able to identify a small
number of specimens at a time
Encourages users to identify specimens
according to what increases the average
confidence of the dataset the most
Calculates distance, or amount of improvement
if identified, to determine priority (one minus
merge level equals new priority)
Cindy Wong
Aug-11
32
∞
Computation Algorithms:
Dynamic Learning
∞
∞
∞
−∞
∞
∞
2011
2012
2013
2014
2015
2016
2017
∞
0.1
0.8
∞
0.2
0.6
0.4
0.2
−∞
0.5
0.4
0.2
−∞
0.9
=1-0.9
0.5
0.3
0.7
0.1
0.4
0.2
−∞
0.5
0.2
0.7
0.1
0.4
0.2
−∞
0.5
0.8
priority
(2)
(6)
(4)
(5)
(3)
(1)
Cindy Wong
Aug-11
33
Computation Algorithms:
Multiple Trees
Order
Genus
Species
- unknown
- known
Computation algorithms depend on taxonomic
detail available for specimens in the tree
Run algorithms with different trees using specimens
from the top to the bottom of the table
Cindy Wong
Aug-11
34
Computation Algorithms:
Experimental Results
Validation of results was done by comparing DHI
to a standard clustering algorithm: k-nearest
neighbours (KNN)
Testing materials used were 238 specimens with
particle-based identifications (ground truth)
Examined:
correct identification rates
incorrect identification rates
impact of thresholding
average confidences
Cindy Wong
Aug-11
35
Computation Algorithms:
Correct Rates
Correct rates illustrate the thoroughness in
dataset identification
DHI has more thorough and predictable results
than KNN
Cindy Wong
Aug-11
36
Computation Algorithms:
Incorrect Rates
Incorrect rates show the reliability of the
generated identifications in the dataset
DHI is more reliable and predictable than KNN
Cindy Wong
Aug-11
37
Computation Algorithms:
Threshold Results
Lower thresholds imply more leveraging
Comparing threshold results illustrates how
limiting propagation confidence affects
throughput of identification
Cindy Wong
Aug-11
38
Computation Algorithms:
Average Confidence
Average confidence illustrates how quickly
dataset identifications are propagated
Results predict thoroughness of correct rates
Cindy Wong
Aug-11
39
Conclusion
Cindy Wong
Aug-11
40
Conclusion: Contributions
(Evolutionary Prototyping)
Created the first crowdsourcing design for
microfossil identification
Developed components of the Microfossil Quest
prototype, a crowdsourcing approach evolved
from a computer-aided approach
Provided a case study on developing a
crowdsourcing project using the evolutionary
prototyping design cycle
Cindy Wong
Aug-11
41
Conclusion: Contributions
(Human Interaction)
Unlike most crowdsourcing projects that involve
websites, the Microfossil Quest design:
Enables volunteer control over identification tasks
Incorporates educational material on the system
A new interactive digital representation, which
presents illumination and depth information,
was included in the website – it is a contribution
to a coauthored Journal of Microscopy paper
Cindy Wong
Aug-11
42
Conclusion: Contributions
(Computation Algorithms)
Created a supervised learning algorithm to
propagate identifications using tree structures
computed by unsupervised learning
Created a dynamic learning algorithm, which
prioritizes specimens for identification
Testing of the DHI algorithm verifies an increase
in thoroughness, reliability, predictability, and
throughput, when compared to a benchmark
KNN identification algorithm
Cindy Wong
Aug-11
43
Acknowledgements
Thank you to Dr. Dileepan Joseph, Dr. Kamal
Ranaweera, and Adam Harrison for their
guidance and support
Thank you to family and friends for their support
through both undergraduate and graduate
school
Cindy Wong
Aug-11
44
Appendix
Cindy Wong
Aug-11
45
Special Cases: Genus
Correct and incorrect genus rates versus image quality:
(left) using specialist ratings of quality (S. Bains); (right)
using automatic ratings of quality (Fourier method)
Cindy Wong
Aug-11
46
Special Cases: Species
Correct and incorrect species rates versus image quality:
(left) using specialist ratings of quality (S. Bains); (right)
using automatic ratings of quality (Fourier method)
Cindy Wong
Aug-11
47