Bentley Slides - staff.city.ac.uk

Download Report

Transcript Bentley Slides - staff.city.ac.uk

How bad is Human Judgment?
Peter Ayton
Department of Psychology
City University, London
How do psychologists study human judgment?
• Psychological experiments often compare the actual
with the ideal. The actual can be measured by
monitoring human decision making. The ideal is usually
determined from laws in logic or statistics.
• Discrepancies show that the human brain doesn’t seem
to solve problems by applying laws of logic or statistics
- so how does it work?
• Because people can’t utilise vast amounts of
information, the brain uses ‘heuristics’ – simple rules of
thumb – to make judgments and decisions quickly.
How bad is Human Judgment?
But psychological research has undermined
confidence in the quality of human judgment.
E.g. Psychologist Daniel Kahneman awarded the
2002 Economics Nobel: “…discovered how
human judgment may take heuristic shortcuts that
systematically depart from basic principles of
probability”
How do psychologists study human judgment?
Reflecting on the fact that their intent in studying heuristic
“errors” was akin to the use of optical illusions,
forgetfulness, or tongue twisters in order to understand
sight, memory, and language, the researchers wrote:
“Although errors of judgment are but a method by which
some cognitive processes are studied, the method has
become a significant part of the message.” (Kahneman &
Tversky, 1982, p. 492).
Illusions
Visual and Cognitive
Is the blue on the inner left back or the outer left front?
Is the left centre circle bigger?
No, they're both the same
size
It's a spiral, right?
No, these are a set of independent circles
Count the black dots
How many legs does this elephant have?
Are the horizontal lines parallel or do they slope?
How do people consider risks?
1) Relative insensitivity to probability information.
2) Driven by evaluation of qualities of outcomes (Risk
as Emotions)
How do people consider risks?
Judgement and Description
Effects of “unpacking” hypotheses.
E.g.
p (death from unnatural causes)
But,
p (death from accident)
p (death by homicide)
p (other unnatural causes)
SUM
= 32%.
= 32%
= 10%
= 11%
=53%
Experts (stockbrokers stock forecasts; Oil Engineers safety
assessments) show similar effects.
Judgement and Description
Effects of “unpacking” hypotheses.
How to Be Incoherent and Seductive: Bookmakers’ Odds and Support Theory
Judgement and Description
Effects of “unpacking” hypotheses.
How to Be Incoherent and Seductive: Bookmakers’ Odds and Support Theory
The Planning fallacy
•
WHY does everything take longer to finish and cost more than we think it will?
•
The Channel Tunnel was supposed to cost £2.6 billion. In fact, the final bill came to £15
billion. The Jubilee Line extension to the London Underground cost £3.5 billion, about
four times the original estimate. There are many other examples: the London Eye, the
Channel Tunnel rail link, the Dome.
•
This is not an exclusively British disease. In 1957, engineers forecast that the Sydney
Opera House would be finished in 1963 at a cost of A$7 million. A scaled-down version
costing $102 million finally opened in 1973. In 1969, the mayor of Montreal announced
that the 1976 Olympics would cost C$120 million and "can no more have a deficit than a
man can have a baby". Yet the stadium roof alone—which was not finished until 13 years
after the games—cost C$120 million.
•
Is gross incompetence behind such fiascos? Or a Machiavellian plot to secure approval for
projects that once started cannot easily be cancelled?
•
Research carried out by psychologist Roger Buehler suggests that the main cause may lie
deeper. Buehler found that students consistently underestimated how long it would take
them to finish their assignments. They seemed to have an over-idealised vision of a
smooth future and rarely anticipated more than trivial impediments.
Partition Dependence
How you frame a question affects the answer
‘Case prime’: “Will Sunday be the hottest day of the week”?
A two-fold partition of the sample space is evoked
Sunday versus the rest of the week. (1/2)
‘Class prime’: “Will the hottest day of the week be Sunday?”
A seven-fold partition is invoked.
Sunday is one of 7 possible options (1/7)
Overconfidence
Typical experiments have presented series of two alternative
general knowledge questions to subjects and asked them to
indicate the correct answer and state their subjective
probability, expressed as a percentage, that they have selected
the correct answer.
E.g. Which is longer ?
(a) Panama canal
“%sure”
(b) Suez canal
[“%sure” responses vary from 50% - completely uncertain –
to 100% - completely certain.]
Early general knowledge experiments suggested that people’s
confidence judgments are poorly “calibrated”.
Points below the diagonal represent overconfident responses – the
expressed confidence is higher than the proportion correct.
But some experts (e.g. weather forecasters) produce very well
calibrated subjective likelihood judgments in the domain of their
expertise.
But not all experts are well calibrated. Experienced physicians’
probabilistic diagnoses of pneumonia are poorly calibrated.
What makes experts well calibrated? Some experts get prompt
unambiguous feedback (e.g. weather forecasters) others (e.g
doctors) may not.
The hot-hand fallacy and the gambler’s fallacy:
Two faces of Subjective Randomness?
The hot-hand fallacy and the gambler’s fallacy:
Two faces of Subjective Randomness?
Pinker (1997) is critical of the presumption of faulty reasoning
typically accompanying observations of the gambler’s fallacy:
“It would not surprise me if a week of clouds really did predict that
the trailing edge was near and the sun was about to be unmasked,
just as the hundredth rail road car on a passing train portends the
caboose with greater likelihood than the third car. Many events
work like that. …An astute observer should commit the gambler’s
fallacy. A gambling device is by definition a machine designed to
defeat our intuitive predictions. It’s like calling our hands badly
designed because they fail to get out of handcuffs.” (p. 346).
The hot-hand fallacy and the gambler’s fallacy:
Two faces of Subjective Randomness?
Gilden and Wilson (1995; 1996) have shown that for golf putting, dart throwing and auditory and
visual signal detection there are streaks in performance; Adams (1995) reports “momentum” in
the performance of pocket billiards players and Smith (in press) reports that horseshoe pitchers
have modest hot and cold spells.
Thus, belief in the hot-hand is not always fallacious. Perhaps then people have learned to expect
the hot hand from observing human performances where it occurs.
Gains and losses
Samuleson’s paradox: Offers a bet on a coin toss.
Heads you win $200; tails you lose $100.
No-one takes it - but would play ten times.
Loss aversion: Losses are weighted more than gains
Insurance and extended warranties.
Gains and losses
Q1. Imagine that you face the following pair of concurrent decisions. First examine
both decisions and then indicate the options that you prefer.
Decision I: Choose between
A. A sure gain of £2,400
B. B. A 25% chance to gain £10,000, and a 75% chance to gain nothing
Decision II: Choose between
C. A sure loss of £7,500
D. D. A 75% chance to lose £10,000, and a 25% chance to lose nothing
Gains and losses
Q1. Imagine that you face the following pair of concurrent decisions. First examine
both decisions and then indicate the options that you prefer.
Decision I: Choose between
A. A sure gain of £2,400
B. B. A 25% chance to gain £10,000, and a 75% chance to gain nothing
Decision II: Choose between
C. A sure loss of £7,500
D. D. A 75% chance to lose £10,000, and a 25% chance to lose nothing
Most people choose A and D – hardly anyone prefers B and C. They like the sure gain in Decision I and
dislike the certain loss in Decision II. But the pair of choices B and C is much better than A and D.
Gains and losses
Q1. Imagine that you face the following pair of concurrent decisions. First examine
both decisions and then indicate the options that you prefer.
Decision I: Choose between
A. A sure gain of £2,400
B. B. A 25% chance to gain £10,000, and a 75% chance to gain nothing
Decision II: Choose between
C. A sure loss of £7,500
D. D. A 75% chance to lose £10,000, and a 25% chance to lose nothing
Most people choose A and D – hardly anyone prefers B and C. They like the sure gain in Decision I and
dislike the certain loss in Decision II. But the pair of choices B and C is much better than A and D.
If you combine the outcomes of the two choices you can add the sure gain of £2,400 to the risky outcomes
in D. So, A and D gives you
A and D. 25% chance to gain £2,400, and
75% chance to lose £7,600
Similarly, B and C can be combined – the sure loss of £7,500 in C can be subtracted from the risky
outcomes from B
B and C. 25% chance to gain £2,500, and
75% chance to lose £7,500
With B and C the chances of winning and losing are the same as in A and D but the amount you might win
is more and the amount you might lose is less.
Gains and losses
The same notions of loss
aversion and certainty
weighting can explain
the sunk cost effect.
Mindful of their investment
people can’t quit (but
animals can and do).
Gains and losses
The mental accounting of wine cellars.
You purchase several cases of wine at $20 a bottle and, after several years it has now increased
in value. You have been offered $75 a bottle.
You decide to drink a bottle to help you decide about the offer. How much does this cost
you?
Possible mental accounts...
(a) Nothing
(I already own it)
(b) $20
(what I paid)
(c) $20 + interest
(what I paid + interest)
(d) $75
(what I am offered)
(e) A gain of $55
(I drank a $75 bottle
and it only cost $20)
(a) Nothing
(b) $20
(c) $20 + interest (d) $75
(e) A gain of $55
____________________________________________________________________________
Students 30%
10%
1%
37%
22%
Experts 30%
18%
7%
20%
25%
(wine collectors)
A patient with severe chest pains is rushed to the emergency department in
a hospital. The physicians must (quickly) decide: Should the patient be
sent to the coronary care unit or to a regular bed with ECG telemetry?
In two Michigan hospitals, emergency physicians sent 90% of all patients
to the care unit. Such “defensive” decision-making led to over-crowding,
decreased quality of care, and greater health risks for patients.
Researchers taught the physicians to use the Heart Disease Predictive
Instrument, an expert system consisting of a chart with some 50
probabilities and a logistic formula with which the physician, aided by a
pocket calculator, computes the probability of requiring the coronary care
unit for each patient. If the probability is higher than a certain value, then
the patient is sent to the care unit, otherwise not.
Physicians don’t like using this and similar systems. They don’t understand
it - it does not conform to their intuitive thinking - and so avoid using it.
The researchers tried a third alternative: a heuristic that has the structure of physicians’
intuitions, but is based on empirical evidence. This fast and frugal tree (Figure 2) asks
only a few yes-no questions. If a patient has a certain anomaly in his electrocardiogram
(the so-called ST segment), he is immediately sent to the coronary care unit. No other
information is required. If that is not the case, a second cue is considered: whether the
patient’s primary complaint was chest pain. If this is not the case, he is immediately
assigned to a regular nursing bed. No further information is sought. If the answer is yes,
then a third question is asked to finally classify the patient.
Gaudi’s “Stereostatic Model”
Between the inverted rope-and-weight model and these painted
photographs, Gaudi obtained an unorthodox, but architecturally flawless
set of plans for his famous chapel, one that no engineer could have
derived using traditional methods.
Gaudi’s “Stereostatic Model”
“Since the plan of the church was so
complicated-towers and arcs emerging from
unexpected places, leaning on other arcs and
towers-it is practically impossible to solve the
set of equations which corresponds to the
requirement of equilibrium in this complex.
[But through Gaudi’s model] all the
computation was instantaneously done by
gravity! The set of arcs arranged itself such
that the whole complex is in equilibrium, but
upside down.”Dorit Aharonov, Quantum
Computation, Annual Reviews of
Computational Physics VI (Dietrich Stauffer,
ed., 1998).
How Dogs Navigate to Catch
Frisbees
According to the notion of bounded rationality (Simon, 1956;
1992), the computational limits of cognition and the structure of
the environment may foster the use of "satisficing" rather than
optimal strategies.
Thus for many of our decisions "fast and frugal" heuristics
would be a serviceable substitute for the “proper” rule.
But not always. E.g. U.K. Magistrates’ bail decisions are well
modelled by One-reason decision models (despite their
insistence that they look at all the information)
Human Judgment and choice: Rational or irrational?
How can anyone be perfectly “rational” in a world where
knowledge is limited, time is pressing, and deep thought is
often an unattainable luxury?
Traditional models of unbounded rationality and
optimization in cognitive science, economics, and animal
behavior have tended to view decision-makers as
possessing supernatural powers of reason, limitless
knowledge, and endless time.
But understanding judgment and decisions in the real
world requires a more psychologically plausible notion of
bounded rationality.
How bad is Human Judgment?
The good news is that, counter to some views, human judgment can
be very accurate – though it may not always be.
However, we are closer to understanding the conditions where
judgement may be more reliable. (Formats of information; learning
conditions with feedback).
Understanding judgement means understanding not just the mind –
but how it interacts with its environment.
The Beauty Contest
The game is called a beauty contest after a famous passage in
Keynes’ (1936) “General theory of Employment Interest and
Money”.
The Beauty Contest
The game is called a beauty contest after a famous passage in
Keynes’ (1936) “General theory of Employment Interest and
Money”.
Keynes remarked that the stock market is like a beauty contest. He had
in mind contests that were popular in England at the time, where a
newspaper would print 100 photographs, and people would write in and
say which six faces they liked most. Everyone who picked the most
popular face was automatically entered in a raffle, where they could win
a prize.
Keynes wrote, “It is not a case of choosing those [faces] which, to the
best of one’s judgment, are really the prettiest, nor even those which
average opinion genuinely thinks the prettiest. We have reached the
third degree where we devote our intelligences to anticipating what
average opinion expects the average opinion to be. And there are
some, I believe, who practise the fourth, fifth and higher degrees.”
The Beauty Contest
If you played this game repeatedly, your thoughts might run as follows. You might
assume that the starting average would probably be 50, so you’d guess 33. But then
you’d say, hmmm, if other people are as clever as I am, they will all pick 33, so I should
pick 22. But if everyone else does that, too, I should pick two-thirds of 22. And if you
carry this through infinitely many levels of reasoning to the logical end, you’ll wind up
picking zero.
Zero is what game theory predicts for this situation. Game theory is the branch of social
science that analyzes strategic interactions in mathematical terms. It was founded quite a
long time ago, but it’s had a slow fuse—only in the last 10 or 15 years has it come to the
fore in reasoning about economics and political science.
So how do people actually
behave? Do they
pick zero? The data here are
from undergrads from
Singapore, Germany, the
Wharton School of Business
at the University
of Pennsylvania, and Caltech.
The average choice across all
these experiments was
around 40, so if you guessed
about two-thirds of 40, or 27,
you’d probably win.
If we use these data to gauge
how many steps of reasoning
people are doing about other
people’s reasoning,
something from one to three
seems reasonable. It’s clearly
not the game-theory
prediction of infinity, but it
clearly demonstrates the
performance of at least one
step of reasoning.
Three Newspaper studies
Three Newspaper studies
The most popular numbers in all three experiments
are two-thirds of 50 (about 33), two
thirds of this number (about 22) and the equilibria
of the game (0 and 1 in The FT, 1 in Expansion
and 0 in Spektrum).
The steps of iterated dominance interpretation
claims that in the Beauty-contest game people
reason in steps. Step 0, which would be the
preliminary step of any reasoning, translates into
numbers that are arbitrarily distributed over the
interval.
Level-1 reasoning is (2/3)·50 = 33.333. Level-2
reasoning is (2/3)·33.333 = 22.22 and so on.
University of Chicago Economics PhDs; Other Economics PhDs; CEOs; The Caltech Board (eminent in various fields)