Transcript x, y
Discrete Distributions
Random Variable
A random variable X is a function
that maps the possible outcomes
of an experiment to real numbers.
That is X: C --> R, where C is
the set of all outcomes of an
experiment and R is the set of
real numbers.
The space of X is the set of real
numbers S = {x: X(c)= x, c C }
An Example of Random
Variable
If we toss a coin one time, then
there are two possible outcomes,
namely “head up” and “tail up”.
We can define a random variable X
that maps “head up” to 1 and
“tail up” to 0.
We also can define a random
variable Y that maps “head up” to
0 and “tail up” to 1.
The spaces of both random
variables X and Y are {0,1}.
Further Illustration of
Random Variables
A random variable corresponds to
a quantitative interpretation of
the outcomes of an experiment.
For example, a company offers its
employees a drawing in its
yearend party. A computer will
randomly select an employee for
the first prize of $100,000 based
on the employees’ ID number,
which ranges from 1 to 100.
In addition, the computer will
randomly select two more
employees for the second and
third prizes of $50,000 and
$10,000, respectively.
Assume that each employee can
receive only one award and the
drawing starts with the third
prize and ends with the first
prize.
Then, there are totally 100 × 99
× 98 = 970200 possible outcomes.
To Edward, whose employee ID number is
10, the random variable of his interest
is as follows:
X(<10, *, *>) = 10,000
X(<*, 10, *>) = 50,000
X(<*, *, 10>) = 100,000
X(all other outcomes) = 0
To Grace, whose employee ID number is
30, the random variable of her interest
is as follows:
Y(<30, *, *>) = 10,000
Y(<*, 30, *>) = 50,000
Y(<*, *, 30>) = 100,000
Y(all other outcomes) = 0
The outcome spaces of random variables
X and Y are identical. However, X and
Y map some outcomes to different real
numbers.
The spaces of X and Y are also
identical and both are {0, 10000, 50000,
100000}.
The probability functions of X and Y
are also equal.
Prob(X=10,000) = Prob(Y=10,000) = 0.01
Prob(X=50,000) = Prob(Y=50,000)
= 0.01
Prob(X=100,000) = Prob(Y=100,000)
= 0.01
Prob(X=0) = Prob(Y=0) = 0.97
The expected values of X and Y
are equal to
E[X] = E[Y]
= 10,000 * 0.01 + 50,000 * 0.01
+ 100,000 * 0.01
= 1600.
Discrete Random Variables
Given a random variable X,
let S denote the space of X.
If S is a finite or
countable infinite set, then
X is said to be a discrete
random variable.
Countable Infinite
A set is said to be
countable infinite, if it
contains infinite number of
elements and there exists a
one-to-one mapping between
each element of the set and
the positive integers.
Examples of Countable /
Uncountable Infinite
The set of integer numbers is
countable.
The set of fractional numbers
is countable.
The set of real numbers is
uncountable.
Probability Mass Function
The probability mass
function (p.m.f.) of a
discrete random variable X
is defined to be
PX k Prob X k
Prob(q),
qQk
where Qk contains all outcomes that
are mapped to k by random variable X .
In the previous example of
drawing,
PX 10,000 Prob X 10,000
Prob( 10, i, j )
10,i , j
i 10, j 10
i j
10,i , j
i 10, j 10
i j
1
0.01.
100 99 98
In fact,the p.m.f. of a random
variable is defined on a set of
events of the experiment
conducted.
In the previous drawing example,
the set of outcomes that are
mapped to 10,000 by X is an
event.
Furthermore, in the previous
drawing example, random variables
X and Y map some outcomes to
different real numbers. However,
X and Y have the same
distribution, i.e. the p.m.f. of
X and the p.m.f. of Y are equal.
More precisely,
PX ( k ) PY ( k )
for every k {0,10000, 50000, 100000}.
Properties of the Probability
Mass Function
The p.m.f. of a random
variable X satisfies the
following three properties:
1
PX x 0 , x S : the space of X.
If S is finite, then PX x 0.
2 PX xi
1 .
3
P x ,
xi S
Prob A
x j A
X
j
where A S .
Probability Distribution
Function
For a random variable X, we
define its probability
distribution function F as
FX t Prob X t
Properties of a Probability
Distribution Function
1. lim FX t 1 .
t
2 . lim FX t 0 .
t
3 . FX w FX t , if w t .
Any function that satisfies
these conditions above can
be a distribution function.
An Example of the
Probability Distribution
Function of a Discrete
Random Variable
Assume that we toss a 4sided die twice. Then, we
have 16 possible outcomes:
, 1,2
, 1,3
, 1,4
, 2,1
, 2,2
, 2,3
, 2,4,
1,1
, 3,2
, 3,3
, 3,4
, 4,1
, 4,2
, 4,3
, 4,4
3,1
Let random variable X be the
sum of the outcome.
Then,
Prob X 2
1
2
, Prob X 3
16
16
3
4
Prob X 4
, Prob X 5
16
16
3
2
Prob X 6
, Prob X 7
16
16
1
Prob X 8 .
16
1
2
3
4 5
FX 5 Prob X 5
.
16 16 16 16 8
Operations of Random
Variables
Let X and Y be two random
variables defined on the same
outcome space of an experiment.
Then, we can define a new random
variable Z=f(X,Y).
For example, in the example of
drawing, if Edward and Grace are
husband and wife, then we can
define a new random variable
Z=X+Y.
We have
X(<30, 10, *>) = 50,000
Y(<30, 10, *>) = 10,000
Z(<30, 10, *>) = 60,000
Function of Random
Variables
Let X be a random variable and G
be a function. Then, random
variable Y=G(X)maps an outcome
ν in the outcome space of X to
value G(X(ν)).
With respect to the probability
distribution functions, if G(X)
is monotonically increasing, oneto-one mapping, then
FY t ProbY t ProbG X t
Prob X G 1 t FX G 1 t
An Example of Functions of
Random Variables
Let random variable X be the
sum of two tosses of a 4sided die and Y=X2.
Then,
FY 16 Prob Y 16 Prob X 16
Prob X 4 FX 4.
2
6 3
PX 4 PX 3 PX 2 .
16 8
Expected Value of a Discrete
Random Variable
Let X be a discrete random
variable and S be its space.
Then, the expected value of
X is
EX Pr ob( z ) X ( z ) PX xi xi
zC
xi S
μ is a widely used symbol
for expected value.
Expected Value of a Function of
a Random Variable
Let X be a random variable
and G be a function. Then,
the expected value of random
variable Y G X is equal to
Gx P x
xi S
i
X
i
Expected Value of a Function of
a Random Variable
Proof :
'
E Y PY yi yi , where S is the space of Y .
yi S '
ProbY y y
i
i
yi S '
ProbX x G x
j
yi S ' all x j
such that
G x j yi
P x G x .
x j S
X
j
j
j
For example, let X correspond to
the outcome of tossing a die once.
Then,
Px(1)=Px(2)=Px(3)=Px(4)=Px(5)=Px(6)=1/6.
and E[X]=3.5
If we are concerned about the
difference between the observed
outcome and the mean. And define
Y=|X-E[X]|, then PY(1/2)=1/3,
PY(3/2)=1/3, PY(5/2)=1/3.
Therefore,
1 1 3 1 5 1 9 1 3
E[Y ] .
2 3 2 3 2 3 2 3 2
On the other hand,
1
| xi E[ X ] |P x ( xi ) | xi 3.5 |
6
xi
xi
1 9 3
(2.5 1.5 0.5 0.5 1.5 2.5) .
6 6 2
Theorems about the
Expected Value
(a)If c is a constant, Ec c.
(b)If c is a constant and g
is a function,
Ecg X cEg X
(c)If c1 and c2 are constants
and g1 and g2 are functions,
then
Ec1g1 X c2 g2 X c1Eg1 X c2 Eg2 X .
Theorems about the
Expected Value
Proof of (a):
Trivial.
Proof of (b):
E cg X
cg x P x ,where S is the space
xi S
i
X
i
of X and PX x is
the p.m.f of X.
c g xi P X xi
xi S
cE g X
Theorems about the
Expected Value
Proof of (c):
Ec1 g1 X c2 g 2 X c1 g1 xi c2 g 2 xi PX xi
xi S
c1 g1 xi PX xi c2 g 2 xi PX xi
xi S
xi S
c1Eg1 X c2 Eg 2 X .
An extension of (c)
k
k
E ci g i X ci Eg i X .
i 1
i 1
Variance of a Discrete
Random Variable
The variance of a random variable
is defined to be E X 2
and is typically denoted by σ2.
For a discrete random variable X,
2
Var X E X E X 2 2 X 2
E X
E X 2 2 E X 2
2
2
.
σ is normally called the standard
deviation.
Variance of a Discrete
Random Variable
Let X be a random variable with
mean μX and variance σX2. Let Y=
aX+b, where a and b are constants.
Then,
EY EaX b aEX b a X b
E aX b a b
E a X a E X a
VarY E Y y
2
2
X
2
2
X
2
2
X
2
2
X
.
Variance of a Random
Variable
The variance of a random variable
measures the deviation of its
distribution from the mean.
For example, in one drawing,
Robert has 0.1% of chance to win
$100,000, while in another
drawing, he has 0.01% of chance
to win $1,000,000.
The expected amounts of award in
these two drawings are equal.
0.001 * 100000 = 100
0.0001 * 1000000 = 100
However, their variances are
different.
0.001 * (100000 – 100)2
+ 0.999 * (0 – 100)2 = 9,990,000
0.0001 * (1000000 – 100)2
+ 0.999 * (0 – 100)2 = 99,990,000
In many distributions, the mean
and variance together uniquely
determine the parameters of the
random variables.
The Bernoulli Experiment
and Distribution
A Bernoulli experiment is a random
experiment, the outcome of which
can be classified in one of two
mutually exclusive and exhaustive
ways, say, success and failure.
A sequence of Bernoulli trials
occurs when a Bernoulli experiment
is performed several independent
times, so that the probability of
success, say p, remains the same
from trial to trial.
The Bernoulli Distribution
Let X be a Bernoulli random
variable. The p.m.f of X can be
written as
PX k p 1 p
1 k
k
,
where k= 0 or 1 and p is the
probability of success.
The expected value of X is
1
k
kp
1 p
1 k
p.
k 0
The variance of X is
1
2 k
1 k
k
p
p
1
p
p1 p .
k 0
The Binomial Distribution
Let X be the random variable
corresponding to the number of
successes in a sequence of Bernoulli
trials.
n k
Then,
PX k Prob X k Ck p 1 p
nk
where n is the number of Bernoulli
trials and p is the probability of
success in one trial.
X is said to have a binomial
distribution and is normally denoted by
b(n , p).
,
Example of the Binomial
Distribution
Assume that Tiger and Whale are the two
teams that enter the Championship
series of the professional basket ball
league. Based on prior records, Tiger
has a 60% chance of beating Whale in a
single game. Larry, who is a fan of
Tiger, makes a bet with Peter, who is a
fan of Whale.
According to their
agreement, Larry will pay Peter $1000,
should Whale win the 5-game series. In
order to make a fair bet, how much
should Peter pay Larry, if Tiger wins
the series?
The probability
that Tiger wins
the series is
C35 (0.6) 3 (0.4) 2 C45 (0.6) 4 (0.4) C55 (0.6) 5 0.6826
Z * 0.6826 1000 * (1 0.6826)
Z 465.
If the championship series
consists of 3 games, then what is
the probability that Tiger win
the series?
C 23 (0.6) 2 (0.4) C 33 (0.6) 3
0.648 0.6826
The Moment-Generating
Function
Let X be a discrete random
variable with p.m.f PX x and space
S. If there is a positive number
h such that
e
E etX
xi S
txi
PX xi
exists and is finite for -h<t<h,
then the function of t defined by
M t E e
tX
is called the moment-generating
function of X. and often
abbreviated as m.g.f.
The Moment-Generating
Function
Let X and Y be two discrete random
variables with the same space S.
If E etX E etY ,
then the probability mass functions of
X and Y are equal.
Insight of the argument above:
Assume that S={s1,s2, …, sk}contains
only positive integers.
Then, we have
PX s1 ets1 PX s2 ets2 ... PX sk etsk
PY s1 ets1 PY s2 ets2 ... PY sk etsk .
Therefore, PX si PY si , i.e. X and Y
have the same p.m.f.
The Moment-Generating
Function
Let M X t be the m.g.f of a
discrete random variable X.
d K M X t
k txi
xi e PX xi .
K
dt
xi S
Furthermore,
d K M X 0
k
k
x
P
x
.
E
X
.
i
X
i
K
dt
xi S
In particular,
X M X 0 and M X
2
X
0 M X 0 .
2
The Moment-Generating Function
of the Binomial Distribution
Let X be b(n , p).
E X
n
n
k
kC
p
k 1 p
n k
k 0
n
n!
n k
p k 1 p
k 0 k 1! n k !
k
n
E X
2
k 0
2
Ckn p k 1 p
n k
n
n! k
n k
p k 1 p
k 0 k 1! n k !
are both difficult to compute.
On the other hand, we can easily
derive the m.g.f. of a binomial
distribution.
e
M X t E e
n
n
t
C
pe
k
k 0
n
tX
tk
k 0
1 p
k
nk
Ckn p k 1 p
nk
n
pe t 1 p .
The Moment-Generating Function
of the Binomial Distribution
M X t n pe 1 p pet
n2
n 1
t
t 2
t
t
M X t nn 1 pe 1 p
pe npe pe 1 p
M X 0 np
M X 0 nn 1 p 2 np.
n 1
t
Therefore,
X M X 0 np
2
σ X M X 0 M X 0
n 2 p 2 np 2 np n 2 p 2
np1-p .
2
The Poisson Process
A Poisson process models the
number of times that a particular
type of events occur during a time
interval.
The Poisson process is based on
the following 3 assumptions:
(1)The numbers of event
occurrences in non-overlapping
intervals are independent.
lim Prob(one occurrence
(2) t
0
between times t and t t )= t.
The Poisson Process
lim
(3) t
0 Prob(two occurrences
between times t and t t )= 0.
λ is the only parameter of the
Poisson process.
One example of the Poisson
process is to model the number of
Web accesses that a Web server
receives between 8 AM and 9 AM.
The Basis of the Assumptions
of the Poisson Process
Assume that an ideal random
number generator generates λ
numbers in [0, 1].
If we divide [0, 1] evenly into n
subintervals,then the probability
that there is exactly one of the
number generated in [0, 1/n] is
1
C1 1
n
1
n
1
1
1
1 .
n n
The Basis of the Assumptions
of the Poisson Process
The probability that there are
exactly two of the numbers
generated in [0, 1/n] is
1
C2
2
1
1
n
n
2
1 1
1
2
2n n
2
.
Let t 1 / n .Then,
1
lim Prob one occurrence in 0, t lim t 1
t 0
n
n
lim Prob two occurrences in 0, t lim
t 0
n
1
2
1
t.
t 1 1
n
2
2
1
2
t 2 .
The Poisson Distribution
Assume that we are concerned
about a Poisson process with
parameter λ and want to count the
number of event occurrences
during one time interval.
We can divide the time interval
evenly into n subintervals as the
following figure shows.
1/n
Time=0
Time=1
The Poisson Distribution
The probability that the event occurs k
times during the time interval is
lim C kn
1
n
n
n
k
nk
lim
n k!
n
Since
k
1
1
nk
n
n
k
n
lim 1
n
n
n!
lim
n k!n k !
n
1
k
lim
n k!
1
k
1
and
n
n
k
n
lim 1 e ,
n
n
k
the final result is e
.
k!
k
1
1
n
,
n
n
k
n
The Poisson Distribution
We say that a random variable X
has a Poisson distribution, if
PX k
k
k!
e .
By the Maclaurin’s series,
we have
1 k
e
.
k 0 k!
Therefore,
k
P k e k! e
k 0
X
k 0
e 1.
The Poisson Distribution
The moment-generating function of
a random variable with the
Poisson distribution is
M X t E e Xt
e
k 0
e
t
k!
e kt k
e
k!
k 0
k
e
e
e t
e
e t 1
M X t e e
t 2
M X t e e e 1 e t e e 1
t
e t 1
t
t
The Poisson Distribution
Therefore,
X M X 0 and X 2 M X 0 M X 0
2 2 .
Therefore, λ is the average rate of
event occurrence per unit of time.
Let Y be the random variable
corresponding to the number of event
occurrences during a time interval of
length t. Then,
k
t
PY k
k!
e
t
.
2
The Poisson Distribution
The probability that the event
occurs k times during a time
interval of length t is
t
n
lim Ck
n
n
lim
n
t
1
n
n k
t k 1 t n 1 t k
k!
t k t
e
k!
k
n
n
Joint Distributions
Joint Probability Mass
Function
Let X and Y be two discrete random variables
defined on the same outcome set. The
probability that X=x and Y=y is denoted by
PX,Y(x, y)= Prob(X=x,Y=y) and is called the
joint probability mass function(joint p.m.f)
of X and Y. PX,Y(x, y) satisfies the the
following 3 properties:
(1) 0 PX ,Y x, y 1
(2)
P x, y 1
x , y S
X ,Y
(3) Pr ob X , Y A
P x, y ,
x , y A
where A is a subset of S S.
X ,Y
Example of Joint
Distributions
Assume that a supermarket collected the
following statistics of customers’
purchasing behavior:
Purchasing
Wine
Not Purchasing
Wine
Male
45
255
Female
70
630
Purchasing
Juice
Not Purchasing
Juice
Male
60
240
Female
210
490
Example of Joint
Distributions
Let random variable M correspond
to whether a customer is male,
random variable W correspond to
whether a customer purchases wine,
random variable J correspond to
whether a customer purchases
juice.
The joint p.m.f of M and W is
W
PMW (0,1) = 0.07
PMW (1,1) = 0.045
M
PMW (0,0) = 0.63
PMW (1,0) = 0.255
The joint p.m.f of M and J is
W
PMW (0,1) = 0.21
PMW (1,1) = 0.06
M
PMW (0,0) = 0.49
PMW (1,0) = 0.24
Marginal Probability Mass
Function
Let PXY(x,y) be the joint p.m.f. of
discrete random variables X and Y.
PX x Pr ob X x Pr obX x, Y yi
yj
PXY ( x, yi )
yj
is called the marginal p.m.f of X.
Similarly,
PY y PX ,Y xi , y
xi
is called the marginal p.m.f. of Y.
More on Joint Probability
Mass Function
Note that we can always create a common
outcome set for any two or more random
variables. For example, let X and Y
correspond to the outcomes of the first
and second tosses of a coin,
respectively. Then, the outcome set of
X is {head up, tail up} and the outcome
set of Y is also {head up, tail up}.
The common outcome set of X and Y is
{(head up,head up),(head up,tail
up),(tail up,head up),(tail up,tail
up)}.
Independent Random
Variables
Two discrete random variables X and Y
are said to be independent if and only
if for all possible combination of x
and y
PX ,Y x, y PX x PY y .
Otherwise, X and Y are said to be
dependent.
Example of Independent
Random Variables
Assume that a supermarket collected the
following statistics of customers’
purchasing behavior:
Purchasing
soft drinks
Not purchasing
soft drinks
Male
90
210
Female
210
490
Example of Independent
Random Variables
Let random variable M correspond to
whether a customer is male or not and
random variable S correspond to whether
a customer purchases soft drinks or not.
Then, M and S are independent, since
for all possible combinations of the
values of M and S, we have
Prob(M=i,S=j)=Prob(M=i)Prob(S=j).
Another Example of Joint
Distribution
Object
X
Y
Class
Object
X
Y
Class
1
7.1
9.1
1
11
10.9
8.8
2
2
6.7
10.2
1
12
10.8
10.3
2
3
7.5
10.6
1
13
11.1
11
2
4
7.6
8.8
1
14
12.3
9.1
2
5
8.1
10.3
1
15
12.1
9.7
2
6
8.0
11.0
1
16
12
10.9
2
7
8.6
8.9
1
17
13.1
8.9
2
8
8.7
9.8
1
18
12.8
10.1
2
9
9.2
11.2
1
19
13.2
11.3
2
10
6.5
10.1
1
20
13.7
9.9
2
Average
7.8
10.0
-
Average
12.2
10.0
-
Joint p.m.f. of X, Y, and C
12
11
2
1
1
2
2
1
1
11
10
2
2
1
1
9
2
1
1
2
2
2
2
8
6
8
10
12
14
Joint p.m.f. of X and C
2
22 2
1
1 1 1 11 11
11
22 2
2 22
2
1
0
6
8
10
12
14
Joint p.m.f. of Y and C
2
22 2
1
11 1
2222
22 2
1 111 1 1 1
0
6
8
10
12
14
Joint Distribution Function
Let X and Y be two random variables. The joint
distribution function is defined as follows:
FXY(x,y)=Prob( X≤x, Y≤y).
Note that this definition applies to both
discrete and continuous random variables.
Joint Probability Density
Function
Assume that X and Y be two continuous
random variables defined on the same
space S. The joint probability density
function of X and Y is defined as follows:
x, y to be independent if and
Fxysaid
Xf and
Y
are
xy x, y
only if
xy
f XY x, y f X x fY y .
In some text books, it is defined
that two random variables are
independent, if and only if
We have
FXY x, y FX x FY y .
FXY x, y FX x FY y
f XY x, y
xy
xy
f X x fY y .
The marginal p.d.f of X is
f X x f XY x, y dy
and the marginal p.d.f of Y is
fY y
f x, y dx
XY
Jointly Independent and
Pairwise Independent
Note that, even we have
PX,Y (x,y) = PX (x)PY (y)
PY,Z (y,z) = PY (y)PZ(z)
PX,Z (x,z) = PX (x)PZ (z)
Then, it is not necessary true
that
PX,Y,Z (x,y,z) = PX (x)PY (y) PZ (z)
An Example of Pairwise
Independence
Let X and Y are two random variables
that correspond to tossing a unbiased
coin two times. Let Z = X Y.
Then
Prob(Z=0) = Prob(X=0,Y=0) +
Prob(X=1,Y=1) = ½
Prob(X=0,Z=0) = Prob(X=0,Y=0) = ¼ =
Prob(X=0)Prob(Z=0).
Therefore, X, Y and Z are pairwise
independent.
However, Prob(X=0,Y=0,Z=1) = 0 and
Prob(X=0)Prob(Y=0)Prob(Z=1)= 1/8
Hence, X, Y and Z are not jointly
independent.
On the other hand, jointly
independent implies pairwise
independent. For example,
PX ,Y x, y PX ,Y ,Z x, y , z
z
PX x PY y PZ z
z
PX x PY y PZ z
PX x PY y .
z
Addition of Two Random
Variables
Let X and Y be two random variables. Then,
E[X+Y]=E[X]+E[Y].
Note that the above equation holds even if X
and Y are dependent.
Proof of the discrete case :
E[ X Y ] PXY ( x, y )( x y )
x
y
x PXY ( x, y ) y PXY ( x, y )
x
y
x
y
x PXY ( x, y ) y PXY ( x, y )
x
y
y
x
xPX ( x) yPY ( y ) E[ X ] E[Y ]
x
y
On the other hand,
Var[ X Y ]
E[(( X Y ) ( x y )) 2 ]
E[( X Y ) 2 ( x y ) 2 2( X Y )( x y )]
E[( X Y ) 2 ] ( x y ) 2 2( x y ) 2
E[ X ] E[Y ] 2 E[ XY ] x y 2 x y
2
2
2
2
( E[ X 2 ] x ) ( E[Y 2 ] y ) 2( E[ XY ] x y )
2
2
Var[ X ] Var[Y ] 2( E[ XY ] E [ X ]E[Y ]) #
Note that if X and Y are independent, then
E[ XY ] xyPXY ( x, y )
x
y
xyPX ( x) PY ( y )
x
y
xPX ( x) yPY ( y )
x
y
E[ X ]E[Y ]
Therefore, if X and Y are independent,
then Var[X+Y]=Var[X]+Var[Y].
Covariance
Let X and Y be two random variables.
Then, E[(X-µX)(Y- µY)] is called the
covariance of X and Y, and is
denoted by σXY, where µX and µY are
the means of X and Y, respectively.
Covariance
E[(X-µX)(Y- µY)]
= E[XY- µYX- µXY+ µXµY]
= E[XY]- µYE[X]- µXE[Y]+E[µXµY]
= E[XY]- µXµY
Therefore, if X and Y are
independent, then Cov[X,Y]=0.
Examples of Correlated
Random Variables
Assume that a supermarket collected the
following statistics of customers’
purchasing behavior:
Purchasing
Wine
Not Purchasing
Wine
Male
45
255
Female
70
630
Purchasing
Juice
Not Purchasing
Juice
Male
60
240
Female
210
490
Examples of Correlated
Random Variables
Let random variable M correspond
to whether a customer is male,
random variable W correspond to
whether a customer purchases wine,
random variable J correspond to
whether a customer purchases
juice.
The joint p.m.f of M and W is
W
PMW (0,1) = 0.07
PMW (1,1) = 0.045
M
PMW (0,0) = 0.63
PMW (1,0) = 0.255
Cov(M,W)= E[MW]-E[M]E[W]
= 0.045 – 0.3*0.115
= 0.0105 >0
M and W are positively correlated.
The joint p.m.f of M and J is
W
PMW (0,1) = 0.21
PMW (1,1) = 0.06
M
PMW (0,0) = 0.49
PMW (1,0) = 0.24
Cov(M,J)= E[MJ]-E[M]E[J]
= 0.06 – 0.3*0.27
= -0.021 < 0
M and J are negatively correlated.
Covariance of
Independent Random
Variables
Assume that the supermarket also
collected the following statistics of
customers’ purchasing behavior:
Purchasing
soft drinks
Not purchasing
soft drinks
Male
90
210
Female
210
490
The joint p.m.f of M and S is
S
PMS (0,1) = 0.21
PMWS(1,1) = 0.09
M
PMS (0,0) = 0.49
PMWS(1,0) = 0.21
Cov(M,S)= E[MS]-E[M]E[S]
= 0.09 – 0.3*0.3
= 0,
due to the fact that M and S are
independent.
Correlation Coefficient
The correlation coefficient of two
random variables X and Y is defined
as follows:
cov( X , Y )
XY
Bounds of a Correlation
Coefficient
Let K (b) E ((Y uY ) b( X u X )) 2
Y2 2b X Y b2 X2 .
We have
Y
2
2
K (
) Y (1 ).
X
Since K(b) is the expected value of
a square, K(b) 0 for all b R.
Therefore, 1 1.
Implication of the Value
of the Correlation
Coefficient
Assume that the supermarket collected
the following statistics of customers’
purchasing behavior:
Purchasing
cosmetics
Not purchasing
cosmetics
Male
10
290
Female
260
440
Implication of the Value
of the Correlation
Coefficient
Let random variable M correspond to
whether a customer is male, random
variable C correspond to whether a
customer purchases cosmetics.
Then, the correlation coefficient of
M and C is
-0.349.
Implication of the Value
of the Correlation
Coefficient
On the other hand, we also have the
following dataset:
Purchasing
juice
Not purchasing
juice
Male
60
240
Female
210
490
The correlation coefficient of M and J
is -0.103.
Another Example of
Correlation Coefficient
Object
X
Y
Class
Object
X
Y
Class
1
7.1
9.1
1
11
10.9
8.8
2
2
6.7
10.2
1
12
10.8
10.3
2
3
7.5
10.6
1
13
11.1
11
2
4
7.6
8.8
1
14
12.3
9.1
2
5
8.1
10.3
1
15
12.1
9.7
2
6
8.0
11.0
1
16
12
10.9
2
7
8.6
8.9
1
17
13.1
8.9
2
8
8.7
9.8
1
18
12.8
10.1
2
9
9.2
11.2
1
19
13.2
11.3
2
10
6.5
10.1
1
20
13.7
9.9
2
Average
7.8
10.0
-
Average
12.2
10.0
-
Joint p.m.f. of X, Y, and C
12
11
2
1
1
2
2
1
1
11
10
2
2
1
1
9
2
1
1
2
2
2
2
8
6
8
10
12
14
Joint p.m.f. of X and C
2
22 2
1
1 1 1 11 11
11
22 2
2 22
2
1
0
6
8
10
12
14
Joint p.m.f. of Y and C
2
22 2
1
11 1
2222
22 2
1 111 1 1 1
0
6
8
10
12
14
Another Example of
Correlation Coefficients
The correlation coefficient of X and C
is
E[ XC ] E[ X ]E[C ]
X C
16.1 10 1.5
0.925.
2.379 0.5
On the other hand, the covariance of Y
and C is
E[YC]-E[Y}E[C] = 15-10×1.5 =0
and therefore the correlation
coefficient of Y and C is 0.
With respect to data analysis,
random variable X provides
valuable information about the
class of an object.
On the other hand, random
variable Y essentially provides
no information about the class of
an object.
Example of Uncorrelated
Random Variables
Assume X and Y have the following
joint p.m.f
PXY(0,1)= PXY(1,0)= PXY(2,1)= 1/3
We have the following marginal
p.m.f.s
PX 0 PXY (0, y ) 1 / 3 ; PX 1 PXY (1, y ) 1 / 3
y
y
PX 2 PXY (2, y ) 1 / 3 ; PY 0 PXY ( x,0) 1 / 3
x
PY 1 PXY ( x,1) 2 / 3
x
x
Example of Uncorrelated
Random Variables
Since PXY(0,1) = 1/3
PX(0) x PY(1) = 1/3 x 2/3 = 2/9,
X and Y are not independent.
However,
Cov (X, Y) = E[XY] – E[X]E[Y] =
[2/9 x 1 + 2/9 x 2] – [1 x 2/3] = 0.
Therefore, independence implies
uncorrelated, but the inverse is not
true.