P(h) - John Wilcox

Download Report

Transcript P(h) - John Wilcox

Bayesian Epistemology
PHIL 218/338
Welcome and thank you!
Outline

Part I: What is Bayesian epistemology?
 Probabilities
 The
as credences
axioms of probability
 Conditionalisation

Part II: Applications and problems:
 Theism

Bear with me! Ideally we would discuss these
topics over several lectures.
What is Bayesian Epistemology?

Bayesianism is our “leading theory of uncertainty”


Alan Hájek and Stephan Hartmann
It concerns credences, or degrees of belief, which are often
uncertain

I’m not going to be attacked by a duck tomorrow

Bayesianism ≈ a theory about when our credences are rational or
justified (one which may complement other theories of justification)

There are many varieties of Bayesianism


(Irving Good calculated that there are at least 46,656!)
Bayesian epistemology is the “application of Bayesian methods to
epistemological problems.”
First component of Bayesianism:
Probabilities as credences
Credences

Traditional epistemology deals primarily with qualitative
concepts

Belief/disbelief

Knowledge/ignorance

In Bayesian epistemology, these binary concepts are
arguably less central and therefore receive less
attention

Bayesian epistemology deals largely with a quantitative
concept of credences

Credences ≈ degrees of belief or disbelief
First component of Bayesianism:
Probabilities as credences

In the 17th century, mathematicians Blaise Pascal and Pierre de
Fermat pioneered a representation of uncertainty as probabilities

Subjective interpretation of probability:

Subjective interpretation: ‘Probability is degree of belief’

But whose degree of belief?


Some actual person or

Some ideal person
This is the subjective or personal interpretation of probability because
these probabilities concern the psychological state of a subject or
person
Terminology

Terminology
h
= hypothesis/proposition
 ~h
= negation of the hypothesis
 P(h)

Example:
h
= It will rain tomorrow
 P(h)

= probability of the hypothesis
= Probability that it will rain tomorrow
These terms are on your handout
Quantitative nature of credences

Credences (or subjective probabilities) are taken to be associated with a
numerical value or an interval
P(h) - decimal
P(h) in %
P(h)=1
P(h)=100%
P(h) in normal language
P(~h) in normal language
Quantitative nature of credences

Credences (or subjective probabilities) are taken to be associated with a
numerical value or an interval
P(h) - decimal
P(h) in %
P(h) in normal language
P(~h) in normal language
P(h)=1
P(h)=100%
h is certainly true
~h is certainly false
Quantitative nature of credences

Credences (or subjective probabilities) are taken to be associated with a
numerical value or an interval
P(h) - decimal
P(h) in %
P(h) in normal language
P(~h) in normal language
P(h)=1
P(h)=100%
h is certainly true
~h is certainly false
P(h)=0
P(h)=0%
h is certainly false
~h is certainly true
Quantitative nature of credences

Credences (or subjective probabilities) are taken to be associated with a
numerical value or an interval
P(h) - decimal
P(h) in %
P(h) in normal language
P(~h) in normal language
P(h)=1
P(h)=100%
h is certainly true
~h is certainly false
P(h)=0
P(h)=0%
h is certainly false
~h is certainly true
P(h)=.8
P(h)=80%
h is probably true
~h is probably not true
Quantitative nature of credences

Credences (or subjective probabilities) are taken to be associated with a
numerical value or an interval
P(h) - decimal
P(h) in %
P(h) in normal language
P(~h) in normal language
P(h)=1
P(h)=100%
h is certainly true
~h is certainly false
P(h)=0
P(h)=0%
h is certainly false
~h is certainly true
P(h)=.8
P(h)=80%
h is probably true
~h is probably not true
P(h)=.2
P(h)=20%
h is probably not true
~h is probably true
Measuring credences

Consider your credence that h, the sun will
rise tomorrow

Consider your credence that you will (after
random selection) draw a red marble from
an urn containing

5 red marbles

5 black marbles

Are you more confident that the sun will rise
tomorrow?

If yes, then P(h)>.5
Measuring credences

Consider your credence that h, the sun will
rise tomorrow

Consider your credence that you will (after
random selection) draw a red marble from
an urn containing

90 red marbles

10 black marbles

Are you more confident that the sun will rise
tomorrow?

If yes, then P(h)>.9
Measuring credences

Consider your credence that h, the sun will
rise tomorrow

Consider your credence that you will (via
random selection) draw a red marble from
an urn containing

9,999 red marbles

1 black marble

Are you more confident that the sun will rise
tomorrow?

If yes, then P(h)>.9999
Measuring credences


What about your credence that:

It will rain tomorrow

You will be attacked by a duck tomorrow
Maybe an interval might represent your credences better

If h = It will rain tomorrow


Then P(h) = [.6, .7]
What do you think?

Can all of our credences be represented with numerical values?
Objections to the subjective
interpretation

The probability of h given some evidence e does not mean someone’s actual
credence since there may be no actual credence that is relevant

It’s not clear that the probability of h given some evidence e is the credence
of some epistemically rational agent

When is an agent’s credence epistemically rational?

When their credence for h given e equals the (inductive) probability of h given e?


When their belief is not blameworthy from an epistemic point of view?


This is uniformative! (Patrick Maher)
But someone might accidentally mistake the probability of h given e to be low and not be
blameworthy, but still the probability of h given e might be high (Patrick Maher)
Isn’t this just like saying “A proposition is true if and only if an omniscient God were
to believe it?” – It’s uninformative
Alternatives

Inductive probabilities are conceptual primitives – they can
be understood, but not expressed in terms of other simpler
concepts (Patrick Maher)

Probabilities are relative frequencies, which we might loosely
understand as the proportion of the time that something is
true (the frequentist interpretation of probability)

80% of the time when a student sits this course, it is true that they
pass

60% of the time when a patient undergoes chemotherapy, it is true
that they will recover
Second component of
Bayesianism:
Credences should conform to the
axioms (or rules) of probability
Second component of Bayesianism:
Credences should conform to the
axioms (or rules) of probability

(A1) All probabilities are between 1 and 0,


(A2) Logical truths have a probability of 1,


i.e. 0 ≤ P(h) ≤ 1 for any h.
i.e. P(T)=1 for any tautology T
(A3) Where h1 and h2 are two mutually exclusive hypotheses, the
probability of h1 or h2 (h1 ∨ h2) is the sum of their respective
probabilities,


i.e. P(h1 ∨ h2) = P(h1) + P(h2).
These are on your handout
The axioms in action


Suppose you draw a marble from an urn:

r = the marble you have drawn is red

~r = the marble you have drawn is not red

Suppose the urn is comprised of 3 red marbles and 7
black marbles
You set

𝑃(𝑟) = .3 (30%)

𝑃(~𝑟) = .7 (70%)

These assignments conform to axiom 1

By axioms 2 and 3, 𝑃(𝑟 ∨ ~𝑟) = 1 (100%)
Arguments for conformity to the
axioms

Argument from cases

Lindley draws out rules of probability from the urn example

We can prove other theorems using the axioms and see that they make
sense using the example


E.g. 𝑃 ~𝑟 = 1 − 𝑃 𝑟
Dutch book arguments

Dutch book = a combination of bets which an individual might
accept individually, but which collectively entail that they will
lose money
A Dutch book

If one violates the probability axioms, then they are vulnerable to having
a Dutch book made against them

E.g. suppose you violate A2 or A3 by setting
1. 𝑃 𝑟 = .7
2. 𝑃 ~𝑟 = .5

If you conform to axiom 2, then you do not conform to axiom 3
 By
axiom 2, 𝑃(𝑟 ∨ ~𝑟) = 1
 But
by the above assignments 1 and 2, 𝑃 𝑟 + 𝑃 ~𝑟 = .7 + .5 = 1.2
 So,
contrary to axiom 2, 𝑃 𝑟 ∨ ~𝑟 ≠ 𝑃 𝑟 + 𝑃 ~𝑟 because 1 ≠ 1.2
A Dutch book

If one violates the probability axioms, then they are vulnerable to having
a Dutch book made against them

E.g. suppose you violate A2 or A3 by setting
1. 𝑃 𝑟 = .7
2. 𝑃 ~𝑟 = .5

But if you conform to axiom 3, then you do not conform to axiom 2
 By
axiom 3, 𝑃 𝑟 ∨ ~𝑟 = 𝑃 𝑟 + 𝑃 ~𝑟
 So
by assignments 1 and 2, 𝑃 𝑟 ∨ ~𝑟 = 1.2 = .7 + .5 = 𝑃 𝑟 +
𝑃 ~𝑟
 So,
contrary to axiom 2, 𝑃 𝑟 ∨ ~𝑟 ≠ 1 because 1 ≠ 1.2
A Dutch book

If one violates the probability axioms, then they are vulnerable to having
a Dutch book made against them

E.g. suppose you violate A2 or A3 by setting
1. 𝑃 𝑟 = .7
2. 𝑃 ~𝑟 = .5

If you conform to axiom 2, then you do not conform to axiom 3

But if you conform to axiom 3, then you do not conform to axiom 2

So you cannot conform to the axioms
A Dutch book

Suppose you violate A2 or A3 by setting
1. 𝑃 𝑟 = .7
2. 𝑃 ~𝑟 = .5
𝑟
~𝑟
Bet 1 for assignment 1
+$3
-$7
Bet 2 for assignment 2
-$5
+$5

If r occurs, then they win $3 according to the first bet and lose $5
according to the second, so they lose $2

If r does not occur, then they lose $7 according to the first bet and
gain $5 according to the second, so they lose $2

Either way, they lose $2.
Dutch book argument
1.
If someone violates the probability axioms, then she is
vulnerable to having a Dutch book made against her
2.
One should avoid being vulnerable to having a Dutch
book made against her (because this is a rational
defect)
3.
Therefore, one should avoid violating the axioms of
probability
An objection to the second
component

Conformity to the axioms requires logical omniscience,
but no one is omniscient

“You’re right, but the component only sets an ideal
standard, irrespective whether any one can meet it”
Questions?
Do you think that one’s credences
should conform to the axioms of
probability?
Third component of Bayesianism:
Credences should be updated via
conditionalisation
Terminology

Before examining this component, we need to introduce some terms

Conditional probability


=𝑃 𝑝𝑞

= the probability of p on the condition that q obtains

= the probability of p given q
RATIO formula as an analysis of conditional probability:

𝑃 𝑝𝑞 =
𝑃(𝑝&𝑞)
𝑃(𝑞)
where 𝑃 𝑞 > 0.
Example of a conditional probability

m = Taylor is a mother

f = Taylor is a female

𝑃 𝑚 𝑓 = the probability that Taylor is a mother given that Taylor is a female

𝑃 𝑓 = .5

𝑃 𝑚&𝑓 = .2

So:

𝑃 𝑚&𝑓
𝑃 𝑓

𝑃 𝑚𝑓 =

=

= .4
.2
.5
Note the big difference between 𝑃 𝑚 𝑓 and 𝑃 𝑓 𝑚

𝑃 𝑚|𝑓 = .4

𝑃 𝑓𝑚 =1
Likelihoods

A likelihood = 𝑃 𝑒 ℎ where e represents some
evidence and h a hypothesis.

𝑃 𝑒 ℎ is called the likelihood of h on e.
Prior probabilities

𝑃𝑖 (ℎ) = Your prior probability = “your subjective probability for the hypothesis
immediately before the evidence comes in” (emphasis added)



Strevens
Terms:

e = A person, such as Taylor, smiles at you

h = A person, such as Taylor, likes you

~h = A person, such as Taylor, does not like you

𝑃𝑖 (ℎ) = prior probability of a person, such as Taylor, liking you

𝑃 ℎ 𝑒 = probability of a person, such as Taylor, liking you given that s/he smiles at
you
What is the probability that Taylor likes you given that he or she smiled at you?

𝑃 ℎ𝑒
What is the prior probability that Taylor
likes you?

Suppose you surveyed 100 people and find the following:
What is the probability that Taylor likes
you given the evidence?
P(h|e) = ?
P(h|e) = 9/(9+36) = 9/45 = 1/5 = 20% = .2
Posterior probabilities


What is the probability that Taylor likes you given the evidence?

𝑃𝑖 (ℎ) = Your prior probability = “your subjective probability for the hypothesis immediately
before the evidence comes in” – Michael Strevens(emphasis added)

𝑃𝑓 ℎ = Your posterior probability = “your subjective probability immediately after the
evidence (and nothing else) comes in” (emphasis added)
Conditionalisation:

One should adjust their probability for h from their prior probability 𝑃𝑖 (ℎ) to a posterior
probability 𝑃𝑓 (ℎ) which equals 𝑃 ℎ 𝑒 when having acquired some evidence e (which has a
non-zero initial probability).

This is called conditionalising h on e.

Conditionalisation should occur through Bayes’s theorem (where applicable).
Conditionalisation via Bayes’s theorem
Bayes’s theorem:
𝑃 𝑒 ℎ ×𝑃𝑖 (ℎ)
𝑃 ℎ𝑒 =
𝑃𝑖 (𝑒)
Where 𝑃𝑖 𝑒 = 𝑃(𝑒|ℎ)×𝑃𝑖 (ℎ) + 𝑃(𝑒|~ℎ)×𝑃𝑖 (~ℎ)
Application to the case:
.9 × .1
.2 =
.45
Where .45 = .9 × .1 + .4 × .9
Bayes’s theorem was expressed in a paper by Rev. Thomas Bayes that was published
posthumously.
Arguments for the conditionalization
norm
 Case-by-case
 Bayes’s
evidence
theorem is used widely in statistics
 Dutch-book
arguments
Part II: Applications and problems
Does God exist?

𝑃𝑖 ℎ = ? (where h = theism)

(One version of) The principle of indifference:



In the absence of evidence favouring one possibility over another, assign each
possibility an equal probability
The principle of indifference seems intuitively plausible in many cases

E.g. all you know is that a prize is behind one of three doors

Presumably the probability that it is behind a given door is 1/3 or approximately .33
Application to theism:

Either ℎ or ~ℎ, so 𝑃𝑖 ℎ = .5

Sounds reasonable right?

WRONG!
Multiple partitions problem


Suppose you’re cooking dinner for Jed, but you don’t know
whether he eats meat

One partition of possibilities: Either 1) Jed is a meat eater h or 2) he is
not a meat eater ~ h, so 𝑃𝑖 ℎ = .5

Another partition of possibilities: 1) Jed is a meat eater h, 2) Jed is a
vegetarian v1 or 3) Jed is a vegan v2, so 𝑃𝑖 ℎ = 1/3
The problem is that the space of possibilities can be partitioned
differently so that it is unclear as to how or whether to apply the
principle of indifference
Application to theism

Either ℎ or ~ℎ, so 𝑃𝑖 ℎ = .5

But what about another partition?

Either:

1.
There is no ultimate cause of the universe
2.
Or there is an ultimate cause of the universe, but this cause is not a person (or
conscious being)
3.
Or there is a personal and ultimate cause of the universe, but this cause is not
omnibenevolent
4.
Or there is a personal, omnibenevolent and ultimate cause of the universe, but this
cause is not omnipotent

…

Or theism is true
So already 𝑃𝑖 ℎ < 1/5 according to the principle of indifference!
The problem of the priors:
Subjective and objective Bayesianism

We can partition the logical possibilities differently so as to
yield conflicting results when the principle of indifference
applies

So which partition do we go with?



Some think that there is no uniquely correct partition
So how do we determine 𝑃𝑖 ℎ ?

Subjectivists: Well, just pick any value you like – no value is incorrect,
except for perhaps 1 or 0

Objectivists: There is a uniquely correct value for 𝑃𝑖 ℎ , and it is…
Let’s move on and assume that 𝑃𝑖 ℎ = .5, just for illustration
What evidence is there that God
exists?
Theistic evidence:
Atheistic evidence:

Fine-tuning of laws and constants

Human suffering

A universe

Animal suffering

Moral truths

Non-resistant, non-belief in God

Miracle reports

Scale of the universe

Abiogenesis (Origins of life)

Contradictory theistic theories

Consciousness

Theism is less simple (Occam’s razor)
The fine-tuning argument

e1 = the laws of the universe are finely tuned to permit meaningful life:


According to philosopher Robin Collins, if the strength of the gravitational
force were to change by one part in 1036, then any land-based or aquatic
organisms the size of humans would be crushed.
Likelihoods:

𝑃 𝑒1 ℎ) = .5

𝑃 𝑒1 ~ℎ) = 1/1036


Note that I will assume that ~h is equivalent to Western philosophical atheism (rather than also including
polytheism, pantheism, etc.)
What is the posterior probability of theism?

𝑃 ℎ 𝑒1 ) ≈ 1
The fine-tuning argument – Just
kidding!

e1 = the laws of the universe are finely tuned to permit meaningful life:


According to philosopher Robin Collins, if the strength of the gravitational
force were to change by one part in 1036, then any land-based or aquatic
organisms the size of humans would be crushed.
Likelihoods:

𝑃 𝑒1 ℎ) = .5

𝑃 𝑒1 ~ℎ) = .01


Note that I will assume that ~h is equivalent to Western philosophical atheism (rather than also including
polytheism, pantheism, etc.)
What is the posterior probability of theism?

𝑃 ℎ 𝑒1 ) ≈ .98
The multiverse objection

If there were (infinitely) many universes with the values of their laws randomly
generated by chance, then we wouldn’t be surprised to see that one of them
happen to have life-permitting values

In Bayesian terms:

Perhaps it is true that a where a = there is an (infinitely) large number of other universes with
values randomly generated by chance and 𝑃 𝑒 ~ℎ&𝑎 = 1 (or some relatively high figure)
The argument from suffering

e2 = humans suffer and this is a bad thing

Genocide

Oppression

Missing buses

Now our prior probability relative to e2 is our posterior probability relative to e1, so
𝑃𝑖 ℎ ≈ .98

What are the likelihoods?

Logical argument from evil (J.L. Mackie):

𝑃 𝑒2 ℎ = 0

𝑃 𝑒2 ~ℎ = .5

So, 𝑃 ℎ 𝑒2 ) = 0
The argument from suffering

e2 = humans suffer and this is a bad thing

Genocide

Oppression

Missing buses

Now our prior probability relative to e2 is our posterior probability relative to e1, so
𝑃𝑖 ℎ ≈ .98

What are the likelihoods?

Evidential argument from evil (William Rowe):

𝑃 𝑒2 ℎ = .01

𝑃 𝑒2 ~ℎ = .5

So, 𝑃 ℎ 𝑒2 ) = .5
Sceptical theism

“God knows a lot more than us and would have
reasons to justify his actions which we do not know of”

“So if God existed, there was suffering and we did not
see any reason that would justify God’s permission of
the suffering, then we would not be surprised”

More sophisticated defences of versions of sceptical
theism are given by Stephen Wykstra and Daniel
Howard-Snyder
The problem of the priors

There is sometimes a lot of debate about the likelihoods, or at least about what the
relevant likelihoods are

Suppose we agree that:


𝑃 𝑒 ℎ = .9

𝑃 𝑒 ~ℎ = .1
So if we assume that 𝑃𝑖 ℎ = .5


But if we assume that 𝑃𝑖 ℎ = .1


Then 𝑃 ℎ 𝑒) = .9
Then 𝑃 ℎ 𝑒) = .5
And if we assume that 𝑃𝑖 ℎ = .00001

Then 𝑃 ℎ 𝑒) ≈ .0009
The problem of the priors
The problem of the priors

The posterior probability is sensitive to the value of the prior probability

Subjective Bayesians often think that the subjectivity of the prior is not a major
problem since the subjectivity will be “washed out” as evidence accumulates


So two people starting off with different priors will converge on the probable truth
given their conditioning on a growing body of evidence
However, as Alan Hájek notes:

“Indeed, for any range of evidence, we can find in principle an agent whose prior is
so pathological that conditionalizing on that evidence will not get him or her
anywhere near the truth, or the rest of us.”

And there are other worries

So does the problem of the priors render Bayesianism practically useless?

Does it eliminate scepticism about the reliability of inductive inference?
Questions?
Thank you!