Do Humans Make Good Observers and Can They Reliably Fuse

Download Report

Transcript Do Humans Make Good Observers and Can They Reliably Fuse

Do Humans make Good
Observers – and can they Reliably
Fuse Information?
Dr. Mark Bedworth
MV Concepts Ltd.
[email protected]
What we will cover:
•
•
•
•
•
•
The decision making process
The information fusion context
The reliability of the process
Where the pitfalls lie
How not to get caught out
Suggestions for next steps
What we will not cover:
•
•
•
•
•
•
•
Systems design and architectures
Counter-piracy specifics
Inferencing frameworks
Tracking
Multi-class problems
Extensive mathematics
In fact… most of the detail!
Our objectives:
• Understanding of the context of data fusion for
decision making
• Quantitative grasp of a few key theories
• Appreciation of how to put the theory into practice
• Knowledge of where the gaps in theory remain
Warning
This presentation contains
audience participation experiments
Decision Making
• To make an informed decision:
– Obtain data on the relevant factors
– Reason within the domain context
– Understand the possible outcomes
– Have a method of implementation
Boyd Cycle
• This is captured more formally as a fusion
architecture:
– Observe: acquire data
– Orient: form perspective
– Decide: determine course of action
– Act: put into practice
• Also called OODA loop
OODA loop
Decide
Orient
Act
Observe
Adversarial OODA Loops
Decide
Decide
Orient
Orient
Act
Act
Observe
Own
information
Observe
Physical world
Adversary
information
Winning the OODA Game
• To achieve dominance:
– Make better decisions
– In a more timely manner
– And implement more effectively
Dominance History
• Action dominance (-A)
– Longer range, more destructive, more accurate
weapons
• Observation dominance (O-)
– Longer range, more robust, more accurate sensors
• Information dominance (-O-D-)
– More timely and relevant information with better support
to the decision maker
Information Dominance
Part One: Orientation
“Having acquired relevant data;
to undertake reasoning about the data
within the domain context to form a
perspective of the current situation;
so that an informed decision can
subsequently be made”
A number of approaches
• Fusion of hard decisions
– Majority rule
– Weighted voting
– Maximum a posteriori fusion
– Behaviour knowledge space
• Fusion of soft decisions
– Probability fusion
Reasoning Frameworks
• Boolean
– Truth and falsehood
• Fuzzy (Zadeh)
– Vagueness
• Evidential (Dempster-Shafer)
– Belief and ignorance
• Probabilistic (Bayesian)
– Uncertainty
Probability theory
• 0 ≤ P(H) ≤ 1
• if P(H)=1
then H is certain to occur
• P(H) + P(~H) = 1
either H or not-H is certain to occur (negation rule)
• P(G,H) = P(G|H) P(H) = P(H|G) P(G)
the joint probability is the conditional probability
multiplied by the prior (conjunction rule)
Bayes’ Theorem
Likelihood
Prior
probability
P(X|H) P(H)
P(H|X) =
P(X)
Posterior
probability
Marginal
likelihood
Perspective Calculation
• Usually the marginal likelihood is awkward
to compute
– But is not needed since it is independent of the
hypothesis
– Compute the products of the likelihoods and
priors; then normalise over hypotheses
Human Fusion Experiment (1)
• A threat is present 5% of the time it is looked for
• Observers A and B both independently look for the
threat
• Both report an absence of the threat with posterior
probabilities 70% and 80%
• What is the fused probability that the threat is
absent?
Human Fusion Experiment (2)
•
•
•
•
•
•
Threat absent ≡ the hypothesis (H)
P(~H) = 0.05
P(H) = 0.95
P(H|XA) = 0.70
P(H|XB) = 0.80
P(H|XA,XB) = ?
Human Fusion Experiment (3)
No threat
H
=1.00
Report A
P(H|XA)
=0.70
Prior
P(H)
=0.95
Report B
P(H|XB)
=0.80
Conditional Independence
• Assume the data to be conditionally independent given the
class:
P(A, B|H) = P(A|H) P(B|H)
• Note that this does not necessarily imply:
P(A, B) = P(A) P(B)
Sensor 2 measurement
Conditionally Independent
Sensor 1 measurement
Sensor 2 measurement
Conditionally independent
Sensor 1 measurement
Sensor 2 measurement
Not conditionally independent
Sensor 1 measurement
Sensor 2 measurement
Not conditionally independent
Sensor 1 measurement
Fusion: Product Rule (1)
• We require:
P(H|A, B)
• From Bayes’ theorem:
P(A, B|H) P(H)
P(H|A, B) =
P(A, B)
Fusion: Product Rule (2)
• We assume conditional independence so
may write:
P(A|H) P(B|H) P(H)
P(H|A, B) =
P(A, B)
Fusion: Product Rule (3)
• Applying Bayes’ theorem again:
P(H|A) P(A) P(H|B) P(B) P(H)
.
.
P(H|A, B) =
P(H)
P(H)
P(A, B)
• And collecting terms:
P(H|A) P(H|B) . P(A) P(B)
P(H|A, B) =
P(H)
P(A, B)
Fusion: Product Rule (4)
• We may drop the marginal likelihoods again
and normalise:
Posterior
probability
Posterior
probability
P(H|A)P(H|B)
P(H|A, B) 
P(H)
Fused posterior
probability
Prior
probability
Multisource Fusion Rule
• The generalisation of this fusion rule to
multiple sources:
N
P(H|X) 
 P(H|x )
i =1
i
P(H)N-1
• This is commutative
Commutativity of Fusion (1)
R
 P(H|x )  P(H|x )
N
P(H|X) 
 P(H|xi )
i =1
N-1
P(H)
S
i =1
=
i
R-1
P(H)
.
i =1
P(H)
i
S -1
P(H)
Commutativity of Fusion (2)
• The probability fusion rule commutes:
– It doesn’t matter what the architecture is
– It doesn’t matter if it is single stage or multistage
Experiment: Results
P(H|A) P(H|B) 0.70 × 0.80
=
= 0.59
P(H|A, B) 
P(H)
0.95
P(~ H|A) P(~ H|B) 0.30 × 0.20
=
= 1.20
P(~ H|A, B) 
P(~ H)
0.05
• Normalising gives:
P(H|A,B) = 0.33 P(~H|A,B) = 0.67
Human Fusion Experiment (3)
No threat
H
=1.00
Fusion A,B
P(H|XA,XB)
=0.33
Report A
P(H|XA)
=0.70
Prior
P(H)
=0.95
Report B
P(H|XB)
=0.80
Why was that so hard?
• Most humans find it difficult to intuitively fuse
uncertain information
– Not because they are innumerate
– But because they cannot comfortably balance
the evidence (likelihood) with their
predisposition (prior)
Prior Sensitivity (1)
• If the issue is with the priors – do they
matter?
• Can we ignore the priors?
• Do we get the same final decision if we
change the priors?
Prior Sensitivity (2)
• If P(H|A) = P(H|B)
• What value of P(H)
makes P(H|A,B) =
0.5?
2
P(H|A)
P(H) =
2
2
P(H|A) + (1- P(H|A))
P(H)
Prior Sensitivity (3)
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
P(H|A)=P(H|B)
0.8
1
Prior Sensitivity (4)
• Between 0.2 < P(H|A) < 0.8 the prior has a
significant effect
• Carefully define the domain over which the
prior is evaluated
• Put effort into using a reasonable value
Sensitivity to Posterior Probability
• What about the posterior probabilities
delivered to the fusion centre?
• Can we endure errors here?
• Which types of errors hurt most?
Probability Experiment (1)
• 10 estimation questions
• Write down lower and upper bound
• So that you are 90% sure it covers the
actual value
• All questions relate to the highest point in
various countries (in metres)
Probability experiment (2)
• Winner defined as:
– Person with most answers correct
– Tie-break decided by smallest sum of ranges
(for all 10 questions)
• Pick a range big enough
• But not too big!
The questions:1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Australia
Chile
Cuba
Egypt
Ethiopia
Finland
Hong Kong
India
Lithuania
Poland
The answers:1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
Australia (2228m)
Chile (6893m)
Cuba (1974m)
Egypt (2629m)
Ethiopia (4550m)
Finland (1324m)
Hong Kong (958m)
India (8586m)
Lithuania (294m)
Poland (2499m)
Overconfidence (1)
• Large trials show that most people get fewer
than 40% correct
• Should be 90% correct!
• People are often overconfident
(even when primed that they are being
tested!)
Overconfidence (2)
Declared probability
overconfident
wrong
underconfident
underconfident
wrong
overconfident
Actual probability
Confidence Amplification(1)
1
2 sensors
3 sensors
4 sensors
5 sensors
0.9
probability
classprobability
Fused
Fused class
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
class
probability
InputInput
class
probability
0.8
1
Confidence Amplification(2)
Veto Effect
• If any local decision-maker outputs a probability of
close to zero for a class then the fused probability
is close to zero
– even if all the other decision-makers output a high
probability
– about 40% of the response surface for two sensors is
either <0.1 or >0.9
– this rises to 50% for three sensors and nearly 60% for
four
Moderation of probabilities
• If we suspect that the posterior probabilities
are overconfident then we should moderate
them
– By building it into automatic techniques
– By allowing for it if this is not possible
Gaussian Moderation
• For Gaussian classifiers the Bayesian
correction is analytically tractable
• By integrating over the mean and variance
rather than taking the maximum likelihood
value
Student t-distribution(1)
• For Gaussian data this is:


-
0
P( xi | D ) =  dm  ds 2 P( xi | m , s 2 ) P(m , s 2 | D )
• Which is a “Student” t-distribution:
N
G
2
2


m
ˆ
(
x
)
2
i
+1
P ( xi | mˆ , sˆ , N ) =
.
2
- s

sˆ ( N - 1)p G N 1   ( N 1) ˆ
 2 
-N
2
0.5
0.5
0.45
0.45
0.4
0.4
0.35
0.35
Likelihood
data
data
ofof
Likelihood
Likelihood of data
Student t-distribution(2)
0.3
0.3
0.25
0.25
0.2
0.2
0.15
0.15
0.1
0.1
0.05
0.05
0
-10
-5
0
Measurement value
5
10
0
-10
-5
0
Measurement
Measurementvalue
value
5
10
Probability of class 1
Probability of class 1 of class 1
Probability
Student t-distribution(3)
1
0.9
1
0.9
0.8
0.8
Probability of class 1
0.7
0.6
0.5
0.4
0.7
0.6
0.5
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
-10
-5
0
Measurement value
5
10
0
-10
-5
0
Measurement value
5
10
Approximate Moderation(1)
• We can get a similar effect at the fusion centre
using the posteriors
–
–
–
–
Convert back to “likelihoods” by dividing by the prior
Add a constant to everything
Convert back to “posteriors” by multiplying by the prior
Renormalise
Approximate Moderation(2)
• How much to add depends on the source of
the posterior probabilities
– Correction factor for each source
– Learned from data
Other Issues
•
•
•
•
•
Conditional independence not holding
Information incest
Missing data
Communication errors
Asynchronous information
Information Dominance
Part Two: Decision
“Having reasoned about the data
to form a perspective of the current
situation; to make an informed decision
which optimises the desirability of the
outcome”
Deciding what to do
“Decision theory is trivial, apart from the
details”
• Select an action that maximises the
expected utility of the outcome
Utility functions?
• A utility function describes how desirable
each possible outcome is
– People are sometimes irrational
– Desirability cannot be captured by a single
valued function
– Allais paradox
Utility Experiment(1)
1. Guaranteed €1 million
2. 89% chance of €1 million
10% chance of €5 million
1% chance of nothing
Utility Experiment(2)
1. 89% chance of nothing
11% chance of €1 million
2. 90% chance of nothing
10% chance of €5 million
Utility Experiment(3)
• If you prefer 1 to 2 on the first slide
You should prefer 1 to 2 on the second slide
as well
• If not you are acting irrationally…
Decision Theory
• Assume we are able to construct a utility function
(or at least get our superior to define one!)
• Enumerate the possible actions
– Use our fused probabilities to weight the utility of the
possible outcomes
– Choose the action for which the expected utility of the
outcome is greatest
Timing the decision
• What about timing?
• When should the decision be made?
– If we wait then maybe the (fused) probabilities
will be more accurate
– Or the action will be more effective
Explore versus Exploit
• By waiting you can explore the situation
• By stopping you can exploit the situation
• Stopping rule
– Sequential analysis
– SPRT
– Bayesian optimal stopping
Experiment with timing
• I will show you 20 numbers
• They are drawn from the same (uniform)
distribution
• Select the highest value
• But no going back
• A bit like ¡Allá tú!
Experiment with timing(1)
131
Experiment with timing(2)
16
Experiment with timing(3)
125
Experiment with timing(4)
189
Experiment with timing(5)
105
Experiment with timing(6)
172
Experiment with timing(7)
39
Experiment with timing(8)
94
Experiment with timing(9)
57
Experiment with timing(10)
133
Experiment with timing(11)
52
Experiment with timing(12)
69
Experiment with timing(13)
7
Experiment with timing(14)
242
Experiment with timing(15)
148
Experiment with timing(16)
163
Experiment with timing(17)
23
Experiment with timing(18)
139
Experiment with timing(19)
146
Experiment with timing(20)
211
The answer…
• How many people chose 242?
• Balance between collecting data on how big
the numbers might be (exploration)
and actually picking a big number
(exploitation)
The 1/e Law(1)
• Consider a rule of the form:
Observe M and remember the best value
(V)
Observe remaining N-M and pick the first
that exceeds V
The 1/e Law(2)
• It can be shown that the optimum value for
M is N/e
• And that for this rule the probability of
selecting the maximum is at least 1/e
• Even for huge values of N
Time Pressure (1)
• Individuals tend to make the decision too
early
• Committees tend to leave the decision too
late
Time Pressure (2)
• Lecturers tend to overrun their time slot!
Time Pressure (3)
• Apologies for skipping over so much of the detail
• Some of the other areas that warrant mention:
–
–
–
–
–
Game theory
Sensor management
Graphical models
Cognitive inertia
Inattentional blindness
Please feel free
to contact me
[email protected]
www.mv-concepts.com
Or just come and introduce yourself…
Thank you!
Questions…