Document

Transcript Document

Tutorial 8, STAT1301 Fall 2010, 16NOV2010,
MB103@HKU
By Joseph Dong
Recall: A Partition on
a Set

 Any exhaustive and disjoint collection of subsets of a
given set forms a partition of that set.
 E.g.
 𝐵, 𝐵𝑐 forms a trivial partition of the presumed set
𝑆 = 𝐵 ∪ 𝐵𝑐 .
 If 𝑓: 𝑆 → 𝑇 = 1,2,3 , then the collection of pre-images
of atoms of the range, 𝑓 −1 1 , 𝑓 −1 2 , 𝑓 −1 3 , forms a
partition of the domain 𝑆.
2
Recall: Conditioning
on a Partition

 Shares the same idea
with
 Divide and Conquer
 Casewise enumeration
 A Tree-diagram
 Formal language:
 Goal = find the
probability of event 𝐸,
ℙ 𝐸 .
 It is equivalent to
finding the
intersection of it with
the sure event Ω.
ℙ 𝐸 ≡ℙ 𝐸∩Ω .
3
Recall: Conditioning
on a Partition (cont’d)

 Formal language (continued)
 Now break down the sure event into a number of
manageable smaller pieces and these pieces together
forms a partition {𝐴𝑘 |𝑘 ∈ 𝐼} of the sure event Ω.
 If we investigate all such events 𝐸 ∩ 𝐴𝑘 , then we’re
done.
ℙ 𝐸 ∩ Ω = ∑ℙ 𝐸 ∩ 𝐴𝑘
 The hardcore of the problem now becomes finding
each ℙ 𝐸 ∩ 𝐴𝑘 , and this is where the conditioning
takes place.
ℙ 𝐸 𝐴𝑘 ⋅ ℙ 𝐴𝑘
 Assuming it is more straight forward a task to find
ℙ 𝐸 𝐴𝑘 and ℙ 𝐴𝑘 .
4
Recall: What does an R.V. do
to its State Space?
 An r.v. cuts the state space
into blocks. On each of
these blocks, the r.v. sends
all points there to a
common atom in the
sample space.

 An r.v. causes a partition
on the state space.
 Conversely, given a
partition on the state space,
you can also define random
variables on it so that it
“conforms” the partition by
taking one value for each
block.
Random
Variable
Partition
on 𝛀
5
Conditioning an
Event on an R.V.

 Since an r.v. cuts the
state space into a
partition, conditioning
on an r.v. is just
conditioning on that
partition it caused on
the state space.
 The meaning of ℙ 𝐸 𝑋
is now clearly illustrated
on the right.
6
ℙ 𝐸 𝑋 as a Random Variable

 It contains a random variable 𝑋
inside, making itself a function of 𝑋.
 It has a distribution and
expectation.
 Lotus
 Question: What’s the meaning of its
expected value?
 To fix its value by fixing an 𝑋 value:
 ℙ 𝐸 𝑋 = 𝑥1 , ℙ 𝐸 𝑋 ∈ 𝑥1 , 𝑥2
 Every fixed value is now a
conditional probability involving
two events.
7
Exercise:
Finding ℙ 𝐸 from ℙ 𝐸 𝑋

 This is the prototypical problem of
finding the probability of an event
via the technique of conditioning
on a random variable.
 Hint: Ponder on the link
between Law of Total Probability
and Expectation.
 Ans:
ℙ 𝐸 =𝔼ℙ 𝐸𝑋
8
ℙ 𝑌𝑋

 It involves two r.v.’s now.
 Given ℙ 𝑌 𝑋 :
ℙ 𝑌 𝑋 is a function of the
 Q1: How to find ℙ 𝑌 =
bivariate random vector
𝑋, 𝑌 .
 Fixing 𝑋 will give you
back the conditional
density of 𝑌 given 𝑋 at the
fixed position.
9
Conditional, Marginal, and Joint
densities

 Difference among 3 types of densities:
 a conditional density ℙ 𝑌 𝑋
 is normalized by the marginal probability of ℙ 𝑋
 is a point dividing a row sum/integral
 is the density of 𝑌|𝑋
 a joint density ℙ 𝑋, 𝑌
 is normalized by the entire joint space
 is a point dividing the sum/integral of entire space
 is the density of 𝑋, 𝑌
 a marginal density ℙ 𝑌
 is also normalized by the entire space
 is a row sum dividing the sum/integral of entire space
 is the density of 𝑌
10
Handout Problem 1

11
Recall: What’s the Expectation
of a random variable

 First of all, the random variable has to be numerically valued.
That’s why expectation is also known as the “expected value”
and is a numerical characteristic of the sample space (a subset
of ℝ or simply ℝ itself with zero densities equipped at those
impossible points).
+∞
𝔼 𝑋 =
−∞
𝑥𝑓𝑋 𝑥 𝑑𝑥
 The expectation is both conceptually and technically equivalent
to the location of the center of probability mass of the sample
space.
 Expectation provides only partial information of the random
variable because it eliminates randomness by giving you back
only 1 representative point of the sample space.
12
For examples,

 𝐸|𝑋 is a set-valued random variable.
 Given 𝑋 = 𝑥, it evaluates to the set 𝐸 ∩ 𝑋 = 𝑥 .
 We cannot have an expected value defined for 𝐸|𝑋.
 Clarification: 𝐸|𝑋 is not ℙ 𝐸 𝑋 . The latter is numerically
valued, as we have previously established for its expected
value: 𝔼 ℙ 𝐸 𝑋 = ℙ 𝐸 .
 More elaboration: On the set-theory layer, 𝐸|𝑋 is not
strictly different from the set-r.v. pair 𝐸, 𝑋 . But when
onto the probability-theory layer, ℙ 𝐸 𝑋 is
normalized by a different space than is ℙ 𝐸, 𝑋 .
13
𝔼 𝑋𝐸

 𝑋|𝐸 is a numerically-valued random variable. We can
compute its expected value.
 𝔼 𝑋 𝐸 vs 𝔼 𝑋 : their sample spaces are different.
 Compute 𝔼 𝑋 𝐸 using ℙ𝐸 = ℙ ⋅ 𝐸 ≔
𝑋 𝜔
Ω
ℙ 𝑑𝜔, 𝐸
ℙ 𝐸
+∞
=
−∞
ℙ ⋅ ,𝐸)
ℙ 𝐸
𝑥𝑓𝑋|𝐸 𝑥 𝑑𝑥
 Compute 𝔼 𝑋 using ℙ
+∞
𝑋 𝜔 ℙ 𝑑𝜔 =
Ω
−∞
𝑥𝑓𝑋 𝑥 𝑑𝑥
14
Warm-up exercise

 Handout Problem 2
15
𝔼 𝑌 𝑋 : concepts

 First of all, this is a random variable—a function of 𝑋.
 Its randomness comes from the state space of 𝑋, but the
mapping mechanism is worked out together by both of 𝑋
and 𝑌.
 This expression is known as the conditional expectation of
the conditionee 𝑌 given the conditioner 𝑋.
 The expectation is done with respect to 𝑌.
 To be precise, should say w.r.t. 𝑌|𝑋.
 There are multiple (or even a continuum of) sample spaces
of 𝑌|𝑋, depending on which atom value 𝑋 takes. After fixing
𝑋 to an atom, or equivalently, a block in the state space that
has been partitioned by 𝑋, the expression 𝔼 𝑌 𝑋 = 𝑥1 is just
a constant.
 The expectation eliminates the randomness of 𝑌 given 𝑋.
16
𝔼 𝑌 𝑋 as an r.v.

 It uses the joint state space
of 𝑋 and 𝑌 as its own state
space.
 It uses a degenerated
version of the sample space
of 𝑌 as its own sample
space.
 The degeneration preserves
the locus of the overall
center of mass.
 Each point in the
degenerated space is a
block center of mass
17
“Degeneration preserves
overall center of mass”

 𝑋 cuts its own state space as
well as the joint state space
of it and 𝑌.
 This partition of the joint
state space will be mapped
by 𝑌 to a partition on its
own sample space (a
numeral set).
 Then the expression
𝔼 𝑌 𝑋 = 𝑥1 represents the
locus of center of mass of the
first block of the partition.
 𝔼 𝑌 𝑋 represents the
totality of loci of these block
centers of mass.
18
Exercise:
Finding 𝔼 𝑌 from 𝔼 𝑌 𝑋

 This is the prototypical problem of finding the expectation of a random
variable via the technique of conditioning on another random variable.
 Ans.
𝔼 𝑌 =𝔼 𝔼 𝑌𝑋
 In the divide-conquer-merge paradigm:
 Divide is done by the conditioner 𝑋
 Conquer refers to the inner expectation carried out at each
division
 Merge refers to the outer expectation to piece up the whole
plate. This exercise addresses the merge step.
 Compare with the conditional probability, ponder the link
between them.
19
Conditional Variance

 Finding variance by
conditioning:
𝕍 𝑌
=𝔼 𝕍 𝑌 𝑋 +𝕍 𝔼 𝑌 𝑋
 Pf.
 Unfortunately, the
degeneration of the
sample space of 𝒀 does
not preserve second
moments.
 That’s why there is the
addendum 𝕍 𝔼 𝑌 𝑋
in the formula.
20
Summary:
Conditional Expectation

The key observations are
 Obs1: To find the center of mass of a piece of material,
you can divide it into a few blocks, find their centers of
mass, and then find the center of mass of these block
centers of masses. The initial division of the piece is quite
arbitrary.
 This fundamental law of physics supports the many nice
properties of expectation in the calculus of probability.
 Obs2: A random variable partitions its state space into a
collection of atom-valued blocks.
 This suggests using random variable as a general device to
divide the piece mentioned in Obs1. Such a random
variable is called the conditioner.
21
Linking ℙ 𝐸 𝑋 to 𝔼 𝑌 𝑋

 Trick: Use indicator of set 𝐸. The indicator is a Bernoulli
random variable.
 Reason: ℙ 𝐸 𝑋 ≡ 𝔼 𝐼𝐸 𝑋
 Conclusion: The conditional probability of an event
conditioned on a random variable (a partition) is a
conditional expectation of the indicator of that event
conditioned on the same random variable in disguise.
 All properties of conditional expectation should apply to
conditional probability. Such as the Law of Total
Probability is just 𝔼 𝑌 ≡ 𝔼 𝔼 𝑌 𝑋 in disguise.
22
Choosing Conditioner

 The art of conditioning lies in the choice of the
conditioner.
 Usually, if our unknown target is the r.v. 𝑌, and we
know that 𝑌 is a known function of a known r.v. 𝑋,
then it would be natural to use 𝑋 as the conditioner
for 𝑌, that is
 Divide the state space of 𝑌 by 𝑋
 Conquer every 𝔼 𝑌 𝑋
 Merge them into 𝔼 𝑌
23
Exercises

 Handout problem 3
 Handout problem 4
 Handout problem 5
 Handout problem 6
24

Document

Transcript Document

Directory