Random Variable - The University of Vermont
Download
Report
Transcript Random Variable - The University of Vermont
Random Variable
• A random variable X is a function that assign a real number, X(ζ), to
each outcome ζ in the sample space of a random experiment.
• Domain of the random variable -- S
• Range of the random variable -- Sx
• Example 1: Suppose that a coin is tossed 3 times and the sequence
of heads and tails is noted.
Sample space S={HHH,HHT,HTH,HTT,THH,THT,TTH, TTT}
X :number of heads in three coin tosses.
ζ : HHH
HHT
HTH
THH
HTT
THT
TTH
TTT
X(ζ): 3
2
2
2
1
1
1
0
Sx={0,1,2,3}
1
Probability of random variable
• Example 2: The event {X=k} ={k heads in three coin tosses} occurs
when the outcome of the coin tossing experiment contains k heads.
P[X=0]=P[{TTT}]=1/8
P[X=1]=P[{HTH}]+P[{THT}]+P[{TTH}]=3/8
P[X=2]=P[{HHT}]+P[{HTH}]+P[{THH}]=3/8
P[X=3]=P[{HHH}]=1/8
• Conclusion:
B⊂SX
A={ζ: X(ζ) in B}
P[B]=P[A]=P[ζ: X(ζ) in B].
Event A and B are referred to as equivalent events.
All numerical events of practical interest involves {X=x} or {X in I}
2
Events Defined by Random Variable
• If X is a r.v. and x is a fixed real number, we can define the event
(X=x) as
(X=x)={ζ: X(ζ)=x)}
(X=x)={ζ: X(ζ)=x)}
(X=x)={ζ: X(ζ)=x)}
(x1<X≤x2)={ζ: x1<X(ζ)≤x2}
These events have probabilities that are denoted by
P[X=x]=P{ζ: X(ζ}=x}
P[X=x]=P{ζ: X(ζ}=x}
P[X=x]=P{ζ: X(ζ}=x}
P[x1<X≤x2]=P{ζ: x1<X(ζ)≤x2}
3
Distribution Function
The cumulative distribution function (cdf) of a random variable X is
defined as the probability of events {X ≤ x}:
Fx(x)=P[X ≤ x] for -∞< x ≤ +∞
In terms of underlying sample space, the cdf is the probability of the
event {ζ: X(ζ)≤x}.
• Properties:
0 FX 1
lim FX ( x ) 1
x
lim FX ( x ) 0
x
a b FX ( a ) FX (b)
FX (b) lim FX (b h) FX (b )
h 0
P[ a x b] FX (b) FX ( a )
P[ X b] FX (b) FX (b )
P[ X b] 1 FX (b)
4
A typical example of cdf
• Tossing a coin 3 times and counting the number of heads
x
-1
0
1
2
3
4
X≤x
Ø
{TTT}
{TTT,TTH,THT,HTT}
{TTT,TTH,THT,HTT,HHT,HTH,THH}
S
S
FX(x)
0
1/8
4/8
7/8
1
1
5
Two types of random variables
• A discrete random variable has a countable number of possible
values.
X: number of heads when trying 5 tossing of coins.
The values are countable
• A continuous random variable takes all values in an interval of
numbers.
X: the time it takes for a bulb to burn out.
The values are not countable.
6
Example of cdf for discrete random variables
• Consider the r.v. X defined in example 2.
7
Discrete Random Variable And
Probability Mass Function
• Let X be a r.v. with cdf FX(x). If FX(x) changes value only in jumps
and is constant between jumps, i.e. FX(x) is a staircase function,
then X is called a discrete random variable.
• Suppose xi < xj if i<j.
P(X=xi)=P(X≤xi) - P(X≤xj)= FX(xi) - FX(xi-1)
Let px(x)=P(X=x)
The function px(x) is called the probability mass function (pmf) of the
discrete r.v. X.
• Properties of px(x):
0 p X ( xk ) 1 k 1,2,...
p X ( x) 0
x xk
(k 1,2,...)
pX ( xk ) 1
k
8
Example of pmf for discrete r.v.
• Consider the r.v. X defined in example 2.
9
Continuous Random variable and
Probability Density function
• Let X be a r.v. with cdf FX(x) . If FX(x) is continuous and also has a
derivative dFX(x) /dx which exist everywhere except at possibly a
finite number of points and is piecewise continuous, then X is called
a continuous random variable.
• Let
dF ( x)
f X ( x)
X
dx
• The function fX(x) is called the probability density function (pdf) of
the continuous r.v. X . fX(x) is piecewise continuous.
• Properties:
f X ( x) 0
f X ( x)dx 1
b
P(a X b) f X ( x)dx
a
10
Conditional distribution
•
Conditional probability of an event A given event B is defined as
P( A | B)
•
P( A B)
P( B)
Conditional cdf FX(x|B) of a r.v. X given event B is defined as
P{( X x) B}
P( B)
If X is discrete, then the conditional pmf pX(x|B) is defined by
P{( X xk ) B}
p X ( xk | B) P( X xk | B)
P( B)
If X is continuous r.v., then the conditional pdf fX(x|B) is defined by
FX ( x | B) P( X x | B)
•
•
f X ( x | B)
dFX ( x | B)
dx
11
Mean and variance
•
Mean:
The mean (or expected value) of a r.v. X, denoted by μX or E(X), is
defined by
xk p X ( xk )
X E ( X ) k
xf ( x)dx
X
• Moment:
The nth moment of a r.v. X is defined by
xkn p X ( xk )
E ( X n ) k
x n f X ( x)dx
• Variance:
The variance of a r.v. X, denoted by σX2 or Var(X), is defined by
X2 Var ( X ) E{[ X E ( X )]2 } E ( X 2 ) [ E ( X )]2
( x k X ) 2 p X ( xk )
X2 k
( x X ) 2 f X ( x)dx
12
Expectation of a Function of a Random variable
•
Given a r.v. X and its probability distribution (pmf in the discrete
case and pdf in the continuous case), how to calculate the expected
value of some function of X, E(g(X))?
• Proposition:
(a) If X is a discrete r.v. with pmf pX(x), then for any real-valued
function g,
E[ g ( X )] g ( xk ) p( xk )
k
(b) If X is a continuous r.v. with pdf fX(x), then for any real-valued
function g,
E[ g ( X )] g ( x) f ( x)dx
13
Limit Theorem
•
Markov's Inequality: If X is a r.v. that takes only nonnegative values,
then for any value a>0,
E[ X ]
P{ X a}
a
• Chebyshev's Inequality: If X is a random variable with mean μ and
variance σ2, then for any value k>0
P{ X k}
2
k
2
14
Application of Limit theorem
• Suppose we know that the number of items produced in a factory
during a week is a random variable with mean 500.
• (a) What can be said about the probability that this week's
production will be at least 1000?
• (b) If the variance of a week's production is known to equal 100,
then what can be said about the probability that this week's
production will be between 400 and 600?
• Solution: Let X be number of item that will be produced in a week.
(a) By Markov's inequality, P{X≥1000}≤E[X]/1000=0.5
(b) By Chebyshev's inequality,
P{|X-500|≥100}≤ σ2/(100)2=0.01
P {|X-500|<100}≥1-0.01=0.99.
15
Some Special Distribution
•
•
•
•
•
•
•
•
Bernoulli Distribution
Binomial Distribution
Poisson Distribution
Uniform Distribution
Exponential Distribution
Normal (or Gaussian) Distribution
Conditional Distribution
……
16
Bernoulli Random Variable
An experiment with outcome as either a "success" or as a "failure" is
performed. Let X=1 if the outcome is a "success" and X=0 if it is a
"failure". If the pmf is given as following, such experiments are called
Bernoulli trials, X is said to be a Bernoulli random variable.
S X {0,1}
p X (1) p
p X (0) 1 p
X E[ X ] p X2 Var[ X ] p(1 p)
Note: 0 ≤ p ≤ 1
Example: Tossing coin once. The head and tail are equally likely
to occur, thus p=0.5. pX(1)=P(H)=0.5, pX(1)=P(T)=0.5.
17
Binomial Random Variable
• Suppose n independent Bernoulli trails, each of which results in a
"success" with probability p and in a "failure with probability 1-p, are
to be performed. Let X represent the number of success that occur
in the n trials, then X is said to be a binomial random variable with
parameters (n,p).
S X {0,1,2,...n}
n k
p X (k ) p (1 p ) n k k 0,1,...n
k
X E[ X ] np X2 np(1 p)
Example: Toss a coin 3 times, X=number of heads. p=0.5
pX (0) 0.125 pX (1) 0.375 pX (2) 0.375 pX (3) 0.125
18
Geometric Random Variable
•
Suppose the independent trials, each having probability p of being
a success, are performed until a success occurs. Let X be the
number of trails required until the first success occurs, then X is said
to be a geometric random variable with parameter p.
S X {1,2,...}
p X (k ) p(1 p) k 1 k 1,2,...
1
(1 p)
2
X E[ X ]
X
p
p2
Example: Consider an experiment of rolling a fair die. The average
number of rolls required in order to obtain a 6:
1
1
X E( X )
6
p 1/ 6
19
Poisson Random Variable
•
A r.v. X is called a Poisson random variable with parameter λ(>0) if
its pmf is given by
p X (k ) P( X k ) e
k
k!
k 0,1,2,...
S X {0,1,2,...}
X E[ X ] X2
An important property of the Poisson r.v. is that it may be used to
approximate a binomial r.v. when the binomial parameter n is
large and p is small. Let λ=np
P( X k ) e
k
k!
20
Uniform Random Variable
S X [ a, b]
1
f X ( x)
ba
a xb
2
ab
(
b
a
)
X E[ X ]
X2
2
12
A uniform r.v.X is often used when we have no prior
knowledge of the actual pdf and all continuous values in
some range seem equally likely.
21
Exponential Random Variable
S X [0,)
f X ( x ) e x
x 0, 0
FX ( x) 1 e x
X E[ X ]
1
2
X
1
2
The most interesting property of the exponential r.v. is
"memoryless".
P( X x t | X t ) P( X x)
x, t 0
X can be the lifetime of a component.
22
Gaussian (Normal) Random Variable
•
S X (,)
f X ( x)
e
( x ) 2 / 2 2
x , 0
2
X E[ X ] X2 2
An important fact about normal r.v. is that if X is normally
distributed with parameter μ and σ2, then Y=aX+b is normally
distributed with paramter a μ+b and (a2 σ2);
Application: central limit theorem-- the sum of large number of
independent r.v.'s,under certain conditions can be approximated b
a normal r.v. denoted by N(μ;σ2)
23
The Moment Generating Function
•
etxk p( xk )
(t ) E[etX ] k
etx f ( x)dx
The important property: All of the moment of X can be obtained
by successively differentiation.
n (0) E[ X n ]
24
Application of Moment Generating Function
•
The Binomial Distribution (n,p)
n k
(t ) E[e ] e p (1 p ) n k
k 0
k
(t ) n( pet 1 p ) n 1 pet
E[ X ] (0) np
n
tX
tk
25
Entropy
• Entropy is a measure of the uncertainty in a random experiment.
• Let X be a discrete r.v. with SX={x1,x2, …,xk} and pmf pk=P[X=xk].
Let Ak denote the event {X=xk}.
Intuitive facts: the uncertainty of Ak is low if pk is close to one, and it
is high if pk is close to zero.
Measure of uncertainty:
1
I ( X xk ) ln
ln P[ X xk ]
P[ X xk ]
26
Entropy of a random variable
• The entropy of a r.v. X is defined as the expected value of the
uncertainty of its outcomes:
H X E[ I ( X )] p( xk ) ln
k
1
p( xk ) ln p( xk )
p( xk )
k
The entropy is in units of ''bits'' when the logarithm is base 2
Independent fair coin flips have an entropy of 1 bit per flip.
A source that always generates a long string of A's has an
entropy of 0, since the next character will always be an 'A'.
27
Entropy of Binary Random Variable
•
•
•
•
Suppose r.v. X with Sx={0,1}, p=P[X=0]=1-P[X=1]. (Flipping a coin).
The HX=h(p) is symmetric about p=0.5 and achieves its maximum at p=0.5;
The uncertainty of event (X=0) and (X=1) vary together in complementary
manner.
The highest average uncertainty occurs when p(0)=p(1)=0.5;
28
Reduction of Entropy Through Partial
Information
•
Entropy quantifies uncertainty by the amount of information required
to specify the outcome of a random experiment.
• Example:
If r.v. X equally likely takes on the values from set
{000,001,010,…,111} (Flipping coins 3 times), given the event A={X
begins with a 1}={100,101,110,111}, what is the change of entropy of
r.v.X ?
1
1 1
1
1
1
H X log2 log2 ... log2 3bits
8
8 8
8
8
8
1
1
1
1
H X | A log2 ... log2 2bits
4
4
4
4
29
Thanks!
Question?
30
Extending discrete entropy to the continuous
case: differential entropy
•
Quantization method: Let X be a continuous r.v. that takes on
values in the interval [a b]. Divide [a b] into a large number K of
subintervals of length ∆. Let Q(X) be the midpoint of the subinterval
that contains X. Find the entropy of Q.
• Let xk be the midpoint of the kth subinterval, then P[Q= xk]=P[X is in
kth subinterval]=P[xk-∆/2<X< xk+∆/2]≈ fX(xk) ∆
•
K
H Q P[Q xk ] ln P[Q xk ]
k 1
K
K
k 1
k 1
H Q f X ( xk ) ln( f X ( xk )) ln f X ( xk ) ln f X ( xk )
Trade off: ∆→0, HQ→∞
Differential Entropy is defined as
H X f X ( x) ln f X ( x)dx E[ln f X ( x)]
31
The Method of Maximum Entropy
The maximum entropy method is a procedure for estimating the pmf
or pdf of a random variable when only partial information about X, in
the form of expected values of functions of X, is available.
Discrete case:
X being a r.v. with Sx={x1,x2,…,xk} and unknown pmf px(xk).
Given the expected value of some function g(X) of X:
K
g(x ) p
k 1
k
X
( xk ) c
32