Chapter 2 Probability and Risk
Download
Report
Transcript Chapter 2 Probability and Risk
Chapter 2
Probability and Risk
What Is Risk In Finance?
Risk can refer to
1. Uncertainty,
2. Volatility,
3. Bad outcomes (e.g., bankruptcy),
4. Probability of bad outcomes,
5. Extent of bad outcomes,
6. Certainty of bad outcomes, etc.
Sets
A set A is a collection of well-defined objects called elements such that
1.
Any given object x either (but not both) belongs to A (x A) or
2.
Object x does not belong to A (x A).
The cardinality of the set is its number of members, which can be either
finite or infinite.
Set Notations
AB:
AB:
A∪B:
A∩B:
Set A is a subset of B if and only if every element of A is also
an element of B.
Set A is a proper subset of B if and only if every element of A
is also an element of B but A B.
The union of sets A and B is the set of all distinct elements
that are either in A or B, or both in A and B.
The intersection of sets A and B is the set of all distinct
elements they share in common.
Illustration: Toss of 2 Dice
Consider an experiment based on the total of a single toss of two dice. Let
D be the set of all possible outcomes, which range from 2 to 12. Let B be
the set of outcomes between 5 and 10 inclusive. Let C be the set of all
outcomes not less than 9. Let A be the set of odd outcomes between 5 and
9 inclusive.
We have the following sets:
A={5,7,9},
B={5,6,7,8,9,10},
C={9,10,11,12},
D ={2,3,4,5,6,7,8,9,10,11,12}.
We can infer the following:
I. 𝐴 ⊂ 𝐵 ⊂ 𝐷
II. 𝐵 ∩ 𝐶 = 9, 10
III. 𝐵 ∪ 𝐶 = 5,6,7,8,9,10,11,12
Finite, Countable, and Uncountable Sets
A finite set consists of a finite number of elements, meaning that the
elements in the set can be numbered from 1 to n for some positive
integer n. A finite set is countable.
A countably infinite set can be put in a one-to-one correspondence with
the natural numbers, ℕ = 1,2,3, … .
A set ℝ of all numbers on the real number line contains numbers such
7
as 3, -4, , 0, 2, . The infinite set ℝ is said to be uncountable because
5
it can not be put into a one-to-one correspondence with the natural
numbers.
Measurable Spaces
A measurable space (Ω, Φ) is a collection of outcomes ω comprising the
set Ω, referred to as the universal set or sample space, and a set Φ, a
specified collection of subsets of Ω.
The set Φ, comprising events , forms a σ-algebra, which has the
following three properties:
1.
2.
3.
∅ ∈ Φ and Ω ∈ Φ where ∅ = { } is the empty set (set containing no elements)
If ∈ Φ, then 𝑐 ∈ Φ (𝑐 is the complement of )
If 𝑖 ∈ Φ for every positive integer i, then ∞
𝑖=1 𝑖 ∈ Φ.
Probability Spaces
A probability space (Ω, Φ, P) consists of a measurable space (Ω, Φ) and
a probability measure P, to be defined shortly. A probability space
consists of a sample space Ω, events Φ and their associated probabilities
P. The probability (measure) P is a function that assigns a value to each
event taken from Φ so that the following properties are satisfied:
1.
2.
3.
For every event taken from Φ, 0 ≤ P() ≤ 1.
P(Ω) = 1.
For any countable set of pair-wise disjoint (mutually exclusive) events {𝜙𝑖 }∞
𝑖=1 , we
∞
have 𝑃( ∞
𝜙
)
=
𝑖=1 𝑖
𝑖=1 𝑃(𝜙𝑖 ).
Physical and Risk Neutral Probabilities
A physical probability is a specific type of probability that reflects the
frequency or likelihood of an event.
Risk-neutral probability is a synthetic probability that leads to correct
and consistent valuations.
Illustration: Probability Space
Suppose that a stock will either increase (u) or decrease (d) in each of two
periods with equal probabilities. The sample space of outcomes is the set
Ω = {uu, ud, du, dd} of all potential outcomes or elementary states w of
the sample space (“world”) Ω. An event is a subset of elementary states
w taken from Ω and is an element of the set of events Φ.
If a set Ω has n elements, its power set has 2n elements . Thus, in this case the set of
events Ф has 24 = 16 events:
{uu}
{ud}
{du}
{dd}
{uu,ud}
{uu,du}
{uu,dd}
{ud,du}
{ud,dd}
{du,dd}
{uu, ud,du}
{uu, ud,dd}
{uu, du,dd}
{ud,du,dd}
{uu, ud,du,dd}
Illustration: Probability Space (continued)
Physical probabilities associated with each event in the σ-algebra are
then:
P{} = 0
P{uu} = .25
P{ud} = .25
P{du} = .25
P{dd} = .25
P{uu,ud} = .5
P{uu,du} = .5
P{uu,dd} = .5
P{ud,du} = .5
P{ud,dd} = .5
P{du,dd} = .5
P{uu, ud,du} = .75
P{uu, ud,dd} = .75
P{uu, du,dd} = .75
P{ud,du,dd} = .75
P{uu, ud,du, dd} = 1
Random Variables
A random variable X defined on a probability space (Ω, Φ,P) is a
function from the set Ω of outcomes to ℝ (the set of real numbers) or a
subset of the real numbers.
A random variable X is said to be discrete if it can assume at most a
countable number of possible values: x1, x2, x3,….
A random variable X is said to be continuous if it can assume any value
on a continuous range on the real number line.
Illustration: Discrete Random Variables
Returning to the Illustration: Probability Space
The random variable X(ω) could denote the value of the stock given an
outcome ω in the sample space. Suppose that the stock price is defined in
the following way: X(uu) = 30, X(ud) = 20, X(du) = 20, and X(dd) = 10.
Thus,
1
4
P(X = 30) = P({uu}) = = .25,
P(X
and
2
= 20) = P({ud,du})= =.50,
4
1
P(X=10) = P({dd})= =.25.
4
Metrics in Discrete Spaces
Expected Value
Variance and Standard Deviation
Co-movement Statistics:
1. Covariance
2. Coefficient of Correlation
Expected Value
Two important characteristics of a distribution are central location
(measured by mean, median, or mode) and dispersion (measured by
range, variance or standard deviation).
The mean is also called the average, and in the setting of probability is
often called the expected value
𝜇:
xi :
fi :
n:
𝛴𝑥𝑖 𝑓𝑖
𝜇=
𝑛
The mean of a population
The possible values from the population
Frequencies
The total number of data values
Expected Value (continued)
In the event that we have a random variable X with possible values xi for
i=1,…,n and associated probabilities Pi, then the expected value is
𝑛
𝐸𝑋 =
𝑥𝑖 𝑃𝑖
1
The expectation operator E[*] is linear. More precisely, given the random
variables X1,X2,…,Xn, and the constants c1,c2,…,cn, we have
𝐸
𝑛
𝑗=1 𝑐𝑗 𝑋𝑗
=
𝑛
𝑗=1 𝑐𝑗 𝐸[ 𝑋𝑗 ]
Variance and Standard Deviation
Variance 2 is a measure of the dispersion (degree of spread) of the
values in a data set. In a finance setting, variance is also used as an
indicator of risk.
n
n
2
2
(
x
)
i
i 1
n
2
x
i
i 1
n
2
In the event that we have a random variable X,
E[( X E[ X ]) ] E[ X ] E[ X ]
2
2
2
2
The standard deviation is simply the square root of variance.
Illustration:
Expected Value, Variance and Standard Deviation
Consider again the stock price example:
P(X=30)=P({uu})=.25, P(X=20)=P({ud,du})=.5, and
P(X=10)=P({dd})=.25.
The expected value of the stock price is:
𝐸 𝑋 = 𝑥𝑖 𝑃𝑖 = 30 .25 + 20 .5 + 10 .25 = 20.
The variance of the stock price is:
𝜎2 = 𝐸 𝑋 − 𝐸 𝑋 2
= (30 − 20)2 .25 + (20 − 20)2 .5 + (10 − 20)2 .25 = 50.
Covariance
Covariance measures the mutual variability of outcomes selected from
each set; that is, covariance measures the variability in one data set
relative to variability in the second data set, where variables are selected
one at a time from each data set and paired.
The covariance between random variables X and Y is
X ,Y E[( X E[ X ])(Y E[Y ])] E[ XY ] E[ X ]E[Y ]
Covariance (continued)
Interpretations of Covariance:
1. The sign associated with the covariance indicates whether the
relationship associated with the random variables are direct
(positive sign) or inverse (negative sign).
2. If data sets are unrelated, the covariance is zero
Coefficient Of Correlation
Coefficient of Correlation X,Y
X ,Y
X ,Y
E[ XY ] E[ X ]E[Y ]
2
2
2
2
XY
E[ X ] E[ X ] E[Y ] E[Y ]
Note: Correlation coefficients will always range between -1 and +1.
Coefficient of Correlation (continued)
Interpretations of Coefficient of Correlation:
A correlation coefficient indicates the degree of the linear relationship
between the variables X and Y.
A correlation coefficient of -1 means that there is a perfect linear
relationship and that as the X values increase the Y values decrease on
the line (negative slope).
A correlation coefficient of +1 means that there is a perfect linear
relationship and that as the X values increase the Y values also increase
on the line (positive slope).
A correlation coefficient equal to zero implies no linear relationship
between the two random variables.
Metrics in Continuous Spaces
Probability Density Function
Cumulative Density Function
The Mean And Variance
Probability Density Function
A probability density function p(x) is a theoretical model for a
frequency distribution of a continuous random variable.
A density function p(x) can be computed based on a differentiable
distribution function P(x) as follows:
𝑝 𝑥 =
𝑑𝑃(𝑥)
=
𝑑𝑥
P'(x)
Cumulative Density Function
The distribution function P(x), also called the cumulative density
function, is defined to be the probability that the values of the random
variable X will fall below a given value x; that is, P(x) = P(X ≤ x).
The distribution function P(x) is found by integration:
𝑃 𝑥 =
𝑥
𝑝
−∞
𝑡 𝑑𝑡
The Mean And Variance
The mean and variance for a random variable Y = f(x) where f is a function
of the random variable X
Mean :
𝐸𝑌 =
Variance:
𝜎𝑌2
∞
𝑓(𝑥)𝑝
−∞
𝑥 𝑑𝑥
= 𝐸 𝑌2 − 𝐸 𝑌
∞
=
𝑓 𝑥
−∞
2
2
2
∞
𝑝 𝑥 𝑑𝑥 −
𝑓 𝑥 𝑝 𝑥 𝑑𝑥
−∞
Note: The product p(x)dx can be interpreted as the probability that the values of a
continuous random variable X lies between x and (x + dx) as dx 0
Illustration: Distributions in a Continuous Space
Consider a random variable X with a simple density function
2
0.375𝑥
p(x) =
0
0≤𝑥≤2
𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
Distribution function
𝑃 𝑥 =
𝑥
𝑝(𝑡)𝑑𝑡
−∞
=
𝑥
2
.375𝑡
𝑑𝑡
0
= .125𝑥 3 for 0 ≤x ≤2.
P(x)=0 for x<0, and P(x)=1 for x>2
Illustration: Distributions in a Continuous Space
(continued)
Suppose that the random return on a given stock is r = f(X) = .5X. What
is the probability that the stock's return r will be between .05 and .15?
𝑃 .05 < 𝑟 < .15 = 𝑃 .1 < 𝑋 < .3
=
.3
2 𝑑𝑥
.375𝑥
.1
.125𝑥 3 .3
.1
=
= .003375 − .000125
= .00325
Illustration: Distributions in a Continuous Space
(continued)
The return expected value and variance for our numerical example are
calculated as follows:
𝜎2 = 𝐸 𝑟2 − 𝐸 𝑟
∞
𝐸𝑟 =
=
=
𝑓(𝑥)𝑝 𝑥 𝑑𝑥
−∞
2
.5𝑥𝑝
0
2
.5𝑥 ∙
0
=
𝑥 𝑑𝑥
=
.375𝑥 2 𝑑𝑥
=
2
0
=
= .046875𝑥 4
= .75
∞
𝑓 𝑥
2
2
𝑝 𝑥 𝑑𝑥 − (.75)2
−∞
2
2
2
.5𝑥
∙
.375𝑥
𝑑𝑥 − .56
0
2
4 𝑑𝑥 − .5625
.09375𝑥
0
.01875𝑥 5 20 − .5625
= .6 − .5625 = .0375
Conditional Probability
Conditional probability is used to determine the likelihood of a particular
event A contingent on the occurrence of another event B.
𝑃[𝐴 ∩ 𝐵]
𝑃𝐴𝐵 =
𝑃[𝐵]
Illustration: Drawing a Spade
Suppose that we draw a card at random from an ordinary deck of 52
playing cards. What is the probability that a spade is drawn given that the
drawn card is black? Assume that spades and clubs are black cards, and
that each type comprises 25% of the deck.
Solution:
Let A be the event that a spade is drawn, and let B be the event that a black card
is drawn.
13
P[A∩B]=
52
=
1
4
and
26
P[B]=
52
=
1
,
2
𝑃[𝐴∩𝐵]
P[A|B]=
𝑃[𝐵]
=
1
4
1
2
=
1
.
2
Conditional Expectation
The expected value of a random variable V conditioned on its exceeding
some constant C
In the discrete case:
𝑉𝑖 >𝐶 𝑉𝑖 𝑃(𝑉 = 𝑉𝑖 )
𝐸 𝑉𝑉>𝐶 =
𝑉𝑖 >𝐶 𝑃(𝑉 = 𝑉𝑖 )
In the continuous case:
𝐸 𝑉𝑉>𝐶 =
*where p(x) as the density function for V.
∞
𝑥𝑝 𝑥 𝑑𝑥
𝐶
∞
𝑝 𝑥 𝑑𝑥
𝐶
Bayes Theorem
Bayes Theorem is useful for computing the conditional probability of an
event B given event A, when we know the value of the reverse conditional
probability; that is, the probability of event A given event B. In its
simplest form, Bayes Theorem states that:
𝑃𝐵𝐴 =
𝑃
𝐴𝐵𝑃𝐵
𝑃𝐴
Illustration: Detecting Illegal Insider Trading
Suppose that 1% of all NYSE trades are known to be motivated by the
illegal use of inside information. Further suppose that the NYSE is
considering implementation of a new software system for detecting illegal
insider trading. In this proposed system, a positive signal from the system
has a 90% probability of leading to a conviction for illegal insider trading.
Further, assume that among all trades, that there is a 5% probability that
the system signals positive for an illegal trade. Based on this information,
what is the probability that a given positive signal by the proposed system
for insider trading actually results in a conviction? Based on your
calculation, what is the probability that the system's signal for illegal
trading is false?
Illustration: Detecting Illegal Insider Trading (continued)
Let A be the event that a trade tests positive for being initiated illegally.
Let B be the event that a trade actually is illegally motivated by inside
information.
Thus, we have P[A|B]=.9, P[B]=.01, and P[A]=.05
𝑃𝐴𝐵𝑃𝐵
.9 .01
𝑃𝐵𝐴 =
=
= .18.
𝑃𝐴
.05
Thus, there is an 18% probability that a trade indicated by the system to
be illegal actually was illegal. That is, 82% of signals of illegal trading are
actually false.
Independent Random Variables
Events A and B are independent if:
𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝑃[𝐵]
If events A and B are independent, then
P[A] = P[A|B] and P[B] = P[B|A]
Independent Random Variables (continued)
Discrete random variables X and Y are said to be independent if for each
possible value of X = x and Y = 𝑦,
𝑃 𝑋 = 𝑥, 𝑌 = 𝑦 = 𝑃 𝑋 = 𝑥 𝑃 𝑌 = 𝑦 .
Continuous random variables X and Y are said to be independent if their joint
density function f(x,𝑦) satisfies f(x,𝑦) = g(x)h(𝑦), where g(x) is the density
function for X and h(𝑦) is the density function for Y.
𝑓 𝑥, 𝑦 𝑑𝑥𝑑𝑦 = 𝑃 𝑥 ≤ 𝑋 ≤ 𝑥 + 𝑑𝑥, 𝑦 ≤ 𝑌 ≤ 𝑦 + 𝑑𝑦
= 𝑃 𝑥 ≤ 𝑋 ≤ 𝑥 + 𝑑𝑥 𝑃 𝑦 ≤ 𝑌 ≤ 𝑦 + 𝑑𝑦
= 𝑔 𝑥 ℎ 𝑦 𝑑𝑥𝑑𝑦
Independent Random Variables (continued)
Multiple Random Variables:
The discrete random variables X1,X2,…,Xn are said to be pairwise
independent if
𝑃 𝑋𝑖 = 𝑥𝑖 , 𝑋𝑗 = 𝑥𝑗 = 𝑃 𝑋𝑖 = 𝑥𝑖 𝑃[𝑋𝑗 = 𝑥𝑗 ]
for every i≠j.
Pairwise independence implies by induction that
𝑃 𝑋1 = 𝑥1 , 𝑋2 = 𝑥2 , … , 𝑋𝑛 = 𝑥𝑛 = 𝑃 𝑋1 = 𝑥1 𝑃 𝑋2 = 𝑥2 … 𝑃 𝑋𝑛 = 𝑥𝑛
Independent Random Variables (continued)
Properties of independent random variables:
1.
E[X1,X2,…,Xn]=E[X1]E[X2]…E[Xn].
2.
𝑉𝑎𝑟 𝑐1 𝑋1 + 𝑐2 𝑋2 + ⋯ + 𝑐𝑛 𝑋𝑛 = 𝑐12 𝑉𝑎𝑟 𝑋1 + 𝑐22 𝑉𝑎𝑟 𝑋2 + ⋯ + 𝑐𝑛2 𝑉𝑎𝑟 𝑋𝑛 .
3.
The random variables c1X1+d1,c2X2+d2,…,cnXn+dn are pairwise independent.
*Where the random variables X1,X2,…,Xn are independent.
Distributions and Probability Density Functions
A probability distribution is a function that assigns a probability to
each possible outcome in an experiment involving chance producing a
discrete random variable. In the case of a continuous random variable,
a probability density function is used to obtain probabilities of events.
Bernoulli Trial
Suppose that a random trial (or experiment) with two possible
outcomes, one of which is called a “success” and the other a “failure,” is
performed. We define variable X=1 when the outcome is a success and
define X=0 when the outcome is a failure. Suppose that the probability
of success is p, with 0 ≤p≤ 1, such that P[X = 1]=p and P[X = 0]=1-p.
This random variable X is called a Bernoulli trial or process.
Binomial Random Variable
Suppose that n independent identically distributed Bernoulli trials are
performed. Let X represent the number of successes after n trials. Note
that we can express 𝑋 = 𝑛𝑖=1 𝑋𝑖 where each 𝑋𝑖 is a Bernoulli process. X is
a binomial random variable.
The probability of obtaining exactly k successes after n trials equals:
𝑛 𝑘
𝑃 𝑋=𝑘 =
𝑝 (1 − 𝑝)𝑛−𝑘
𝑘
𝑛
where
=
𝑘!
𝑘
𝑛!
𝑛−𝑘 !
for k=0,1,…,n.
Illustration: Coin Tossing
Suppose 5 fair coins are tossed. Find the physical probability that exactly
3 of the coins land on heads. Here “heads” is a “success,” and since the
1
coins are fair, p= . If X equals the number of heads that are obtained
2
with 5 tosses, then:
Solution:
5
𝑃 𝑋=3 =
3
1
2
3
1
1−
2
2
1
= 10
8
1
5
=
4
16
Illustration: DK Trades
Suppose that the probability that any given trade is a DK is .02. In a series of
10 trades, find the probability that 2 or more of them are DKs. Assume that
among the 10 trades, the event that any one trade is a DK is independent of any
other trade being a DK. In this problem, a trade being a DK is a “success.” We
wish to find P[X 2]. Since 10
𝑖=0 𝑃 𝑋 = 𝑖 = 1(having accounted for all possible
outcomes), the easiest way to calculate P[X2] is:
Solution:
𝑃 𝑋 2 = 1 − 𝑃 𝑋 = 0 − 𝑃 𝑋 = 1
10
10
=1−
.02 0 .98 10 −
(.02)(.98)9
0
1
= 1 − .98 10 − 10(.02)(.98)9 =.01618
The Uniform Random Variable
Let X be a continuous random variable whose values lie in the interval
𝑎 ≤ 𝑋 ≤ 𝑏 where a and b are real numbers and a < b. X is a uniform
random variable on the interval [a,b] if the probability that X takes on a
range of values in any interval contained inside [a,b] is equal to the
probability that X takes on a range of values in any other interval
contained inside [a,b] of the same width. The density function for X is
equal to:
1
𝑓 𝑥 = 𝑏 − 𝑎 𝑓𝑜𝑟 𝑎 ≤ 𝑥 ≤ 𝑏
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
The Uniform Random Variable (continued)
The mean of a uniform distribution on the interval [a, b] :
1
𝐸 𝑋 = (𝑎 + 𝑏)
2
The variance of a uniform distribution on the interval [a, b] :
𝜎𝑋2
1
=
(𝑏 − 𝑎)2
12
Illustration: Uniform Random Variable
Suppose that the returns on a series of bonds of varying credit risks in a given
year happened to be uniformly distributed on the interval [.02,.07]. What is
the probability that the return for a single bond is between .03 and .05 ?
Assume X is the annual return on the bond.
Solution:
1
𝑓 𝑥 = .07 − .02 = 20 𝑓𝑜𝑟 .02 ≤ 𝑥 ≤ .07
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
.05
𝑃 .03 < 𝑋 < .05 =
.05
𝑓 𝑥 𝑑𝑥 =
.03
20𝑑𝑥 = .4
.03
The Normal Random Variable
We say that Y is a normal random variable with parameters mean equal
to μ and variance equal to σ2, and we write Y ~ N(μ, σ2) if Y has the
density function:
𝑓 𝑦 =
1
𝑒
𝜎 2𝜋
−(𝑦−𝜇)2
2𝜎2
The distribution function for Y is
𝐹 𝑦 =𝑃 𝑌≤𝑦 =
𝑦
1
𝑒
𝜎 2𝜋 −∞
−(𝑡−𝜇)2
2𝜎2
𝑑𝑡
The Normal Random Variable
The area under the normal density function’s graph within a specified
range (a,b) is found by evaluating the following definite integral:
𝑃(𝑎 ≤ 𝑌 ≤ 𝑏) =
𝑏
1
𝑒
𝜎 2𝜋 𝑎
a
b
= 0, = 1
−(𝑦−𝜇)2
2𝜎2
𝑑𝑦
The Normal Random Variable (continued)
The standard normal random variable, usually denoted by Z, has mean 0
and variance 1. Thus, we can write Z ~ N(0,1). It is also easy to see that if
Y ~ N(μ,σ2), then Y = μ + σZ. This linear relationship means that every
calculation of the distribution function for a normal random variable Y
can be reduced to a calculation of the distribution function for the
standard normal random variable Z. It is convenient to use the notation
N(z) to refer to the distribution function of Z; that is, N(z) = P(Z ≤ z). The
formula that relates the distribution function for Y in terms of that for Z is
quite simple:
𝑦−𝜇
𝑦−𝜇
𝑃 𝑌 ≤ 𝑦 = 𝑃 𝜇 + 𝜎𝑍 ≤ 𝑦 = 𝑃 𝑍 ≤
=𝑁
.
𝜎
𝜎
The Normal Random Variable (continued)
Calculating Cumulative Normal Density:
N(z) is approximated with a polynomial function:
1
N z 1
e
2
z2
2
c k c k
1
2
2
c3k 3 c4 k 4 c5 k 5
where:
k = 1/(1+c0z)
c1 = .319381530
c3 = 1.781477937
c5 = 1.330274429
c0 = .2316419
c2 = -.356563782
c4 = -1.821255978
In practice, one uses a software package such as Microsoft Excel to compute values for N(z).
Linear Combinations of Independent
Normal Random Variables
Theorem: If X1, X2,…,Xn is a family of pairwise independent random
variables, each with a normal distribution so that the mean of Xk is μk and
its variance is 𝜎𝑘2 , then for any constants c1, c2,…,cn, the random variable
defined by
𝑛
𝑋=
𝑐𝑘 𝑋𝑘
𝑘=1
has a normal distribution with mean
𝑛
𝑘=1 𝑐𝑘 𝜇𝑘
and variance
𝑛
2 2
𝑐
𝑘=1 𝑘 𝜎𝑘 .
The Lognormal Random Variable
A random variable Y is said to be lognormal if the logarithm of Y is a
normal random variable X. Therefore, lnY = X where 𝑋~𝑁 𝜇, 𝜎 2 , or
equivalently Y = eX. Because -∞ < X <∞, 0 < Y < ∞.
Since the density function for X satisfies:
∞ −(𝑥−𝜇)
1
𝑒 2𝜎2
−∞
𝜎 2𝜋
2
𝑑𝑥 = 1,
By the change of variables x = ln(y) and dx = dy/y, we obtain:
(𝑙𝑛𝑦−𝜇)
∞
1
−
2𝜎2
𝑒
0 2𝜋𝜎𝑦
2
𝑑𝑦 = 1.
Thus, the density function for the lognormal random variable is:
𝑓 𝑦 =
1
𝑒
2𝜋𝜎𝑦
(𝑙𝑛𝑦−𝜇)2
−
2𝜎2
The Expected Value of the Lognormal Distribution
Let X has a normal distribution with mean μ and variance σ2, we can express X as X = μ + σZ where Z has a standard
normal distribution. Thus:
𝐸
𝑒𝑋
=
∞
1
2𝜋
1
− 2𝑧 2
𝜇+𝜎𝑧
𝑒
𝑒
𝑑𝑧.
−∞
We complete the square in the exponents:
1 2
1
1 2
2
𝜎𝑧 − 𝑧 = − (𝑧 − 𝜎) + 𝜎
2
2
2
and make the change of variables u = z – σ in the integral, we then obtain:
𝐸𝑒
𝑋
=
1
𝜇+2𝜎2
𝑒
1
2𝜋
∞
−∞
1
−2𝑢2
𝑒
𝑑𝑢
=𝑒
1
𝜇+2𝜎2
.
Illustration: Risky Securities
Suppose that the returns on a security are lognormally distributed with parameters μ =
.08 and σ = .05. This means that the distribution of price-relatives St/St-1 is lognormal
with the given parameters μ = .08 and σ = .05. The logs of the price-relatives are
distributed normally. If the value of the security was $100 at time 0, we will find the
probability that its value after one year lies between $95 and $120.
Solution: First, let S1 refer to the security's value after one year. Given that ln(S1/100)
= .08+.05Z,we wish to determine P(95 < S1 < 120):
𝑆1
100
𝑆
< 1.2 = 𝑃(𝑙𝑛 .95 < 𝑙𝑛 1 < 𝑙𝑛 1.20 )
100
−.05129 − .08 .05𝑍 .1823 − .08
= 𝑃 −.05129 < .08 + .05𝑍 < .1823 = 𝑃
<
<
.05
.05
.05
= 𝑃 −2.626 < 𝑍 < 2.046 = 𝑁 2.046 − 𝑁 −2.626 = .97962 − .00432 = .9753
𝑃 95 < 𝑆1 < 120 = 𝑃 .95 <
Illustration: Risky Securities (Continued)
We can find the one-period expected return for our security as follows:
1
1
𝑆1
𝜇+ 𝜎 2
.08+ ×.052
𝑋
2
𝐸
−1 =𝐸 𝑒 −1 =𝑒 2 −1= 𝑒
− 1 = .0846
100
The Poisson Random Variable
Suppose that we are interested in the number of occurrences X of an event over a fixed time
period of length t in which X is a Poisson random variable. An example might be one in
which a trader might wish to predict the number of 10% price jumps experienced by a stock
over a given time period t. We will derive a distribution function for P[X = k]. It is assumed
that the expected value of the number of occurrences X of the event over a period of time t is
known, and we denote this parameter by λt. Variable X will be a Poisson random variable
with parameter λt if the following three properties hold:
1.
2.
3.
The probability is negligible that the event occurs more than once in a sufficiently small
interval of length h→0.
The probability that the event occurs exactly once in a short time interval of length h is
approximately equal to λh, such that this probability divided by (λh) approaches one as h
approaches 0.
The number of times that the event occurs over a given interval is independent of the
number of times that the event occurs over any other non-overlapping interval.
Deriving the Poisson Distribution
Suppose that we divide the time interval [T,T+t] into n subintervals of
equal length so that the ith interval takes the form [T+t(i-1)/n,T+ti/n] for
i=1,…,n. By property 1 , the probability that the event occurs more than
once in the ith time interval is negligible if n is large. By property 2, the
probability that the event occurs in the ith time interval is approximately
p=λt/n for i =1,…,n. Thus, the Poisson process follows a Bernoulli process
within each of the n time intervals, with probability p=λt/n of success and
probability 1-p=1-λt/n of failure. Furthermore, by property 3, we can
regard the Poisson process as approximating n independent identically
distributed Bernoulli processes. Thus, the process follows a binomial
distribution with n trials and p=λt/n.
Deriving the Poisson Distribution (continued)
Referring back to the binomial distribution, we have:
𝑛
𝑃 𝑋=𝑘 ≈
𝑝 𝑘 1 − 𝑝 𝑛−𝑘
𝑘
𝑛 𝜆𝑡 𝑘
𝜆𝑡 𝑛−𝑘
=
1−
𝑛
𝑘 𝑛
=
𝜆𝑡 𝑘
𝑛𝑘
1
=
𝜆𝑡 𝑘
𝑘!
1
=
𝜆𝑡 𝑘
𝑘!
𝜆𝑡 𝑛 𝑛 𝑛−1 … 𝑛−𝑘+1
−
𝑛
𝑘!
𝜆𝑡 𝑛 𝑛 𝑛−1 … 𝑛−𝑘+1
−
𝑛
𝑛𝑘
𝜆𝑡 𝑛
1
2
1−
𝑛
(1 − ) 1 −
𝑛
𝑛
1−
1−
𝜆𝑡 −𝑘
𝑛
𝜆𝑡 −𝑘
𝑛
… 1−
𝑘−1
𝑛
1−
𝜆𝑡 −𝑘
𝑛
Deriving the Poisson Distribution (continued)
Holding k fixed as 𝑛 → ∞, it follows that:
1
2
𝑘−1
𝜆𝑡
(1 − ) 1 −
… (1 −
) 1−
𝑛
𝑛
𝑛
𝑛
−𝑘
→1
𝜆𝑡 𝑛
𝑛
Recall from calculus that 1 −
→ 𝑒 −𝜆𝑡 as 𝑛 → ∞. Thus, the Poisson
random variable X with parameter λt>0 is a discrete random variable that
takes on values 0,1,2,… with probability:
𝑃 𝑋 = 𝑘 = 𝑒 −𝜆𝑡
𝜆𝑡 𝑘
𝑘!
𝑓𝑜𝑟 𝑘 = 0,1, 2, …
Illustration: Stock Price Jumps
Suppose that a trader needs to predict the probability that a given stock
will not experience any 10% price jumps over the course of a month.
Normally, the number jumps of per year can be modeled as a Poisson
variable, with an expected value of =3; that is, on a monthly basis, 𝜆𝑡 =
3
:
12
Solution:
0
.25
𝑃 X = 0 = e−.25
= .7788
0!
Illustration: Bank Auditor
Suppose the average number of visits from a regulatory bank auditor per
year is 1.5. Assume that the number of audits per year is a Poisson
random variable. We will find the probability that the number of visits by
the auditor to the bank in a two-year time period is between 3 and 5,
inclusive. In this case, λ=1.5 and t=2, so that λt=3. Thus, we find the
probability of between 3 and five visits as follows:
Solution:
𝑃 3≤𝑋 ≤5 =𝑃 𝑋 =3 +𝑃 𝑋 =4 +𝑃 𝑋 =5
=
3
3
−3
𝑒
3!
34
−3
+𝑒
4!
35
−3
+𝑒
5!
= .4929
The Central Limit Theorem
The Central Limit Theorem states that the distribution of the sample
means drawn from a population of independently and identically
distributed random variables will approach the normal distribution as the
sample size approaches infinity. Regardless of the distribution of the
population, as long as observations are independently and identically
distributed, the distribution of the sample mean will approach a normal
distribution.
The Central Limit Theorem (continued)
Let Xi, i = 1 to n, be independent and identically distributed (iid) random
variables such that E[Xi] = μ and Var[Xi] = 2 < , and denote:
𝑛
𝑋=
𝑖=1
𝑋𝑖
𝑛
where 𝑋 is the mean of a sample of Xi of size n. As n approaches , the
distribution of (𝑋 − 𝜇)/(𝜎/ 𝑛) approaches the standard normal distribution
(~N(0,1)):
lim 𝑃[𝑎 ≤
𝑛→∞
𝑋−𝜇
𝜎/ 𝑛
≤ 𝑏] =
𝑏 −𝑥 2 /2
1
𝑒
𝑑𝑥
2𝜋 𝑎
Illustration of the Central Limit Theorem:
7
70
6
60
5
50
4
40
3
30
2
20
1
10
0
Frequency
Sample Means of 3-Random Number Samples
0
Frequency
Sample Means of 30-Random Number Samples
Joint Probability Distributions
We have been studying probability from the perspective of a single
random variable. Often, one is interested in the probabilistic behavior
of two (or more) random variables studied jointly.
The joint probability distribution
discrete random variables
The joint probability distribution for discrete random variables X and Y
is specified by:
P(X = x,Y = 𝑦) = the probability that X = x and Y = 𝑦.
If X and Y are independent random variables, then
P(X = 𝑥 ,Y = 𝑦) = P(X = 𝑥)P(Y = 𝑦)
The joint probability distribution:
Continuous random variables
If X and Y are continuous random variables, then one specifies the joint
density function p(x,𝑦) for X and Y.
The joint distribution function for X and Y is:
𝑦
𝑥
𝑃 𝑥, 𝑦 = 𝑃 𝑋 ≤ 𝑥, 𝑌 ≤ 𝑦 =
𝑝 𝑠, 𝑡 𝑑𝑠𝑑𝑡
−∞ −∞
If X and Y are independent random variables, then p(x,𝑦) = pX(x)pY(𝑦)
where pX(x) is the density function for X and pY(x) is the density
function for Y.
Illustration: Transaction Execution Delay
Suppose that X is the amount of time that it takes for a security’s
transaction to transmit from a hedge fund to a stock exchange. Assume
that this time delay has a uniform distribution from 0 to 20 milliseconds.
Let Y be the amount of time required for the stock exchange to actually
execute the transaction once received. Assume this time has a uniform
distribution from 0 to 10 milliseconds. Also assume that the random
variables X and Y are independent. Determine the probability that the
total transaction completion time (transmission and execution) is less
than 10 milliseconds.
Illustration: Transaction Execution Delay (continued)
Solution:
To solve this problem, consider that X has a uniform distribution and that its
density function is pX(x) = 1/20 for 0 ≤ x ≤ 20 and equals 0 otherwise.
Similarly, pY(𝑦) = 1/10 for 0 ≤ y ≤ 10 and 0 otherwise. Since X and Y are
independent, then their joint density function is p(x,𝑦) = (1/20)(1/10) = 1/200
for 0 ≤ x ≤ 20 and 0 ≤ y ≤ 10, and 0 otherwise. We wish to determine:
𝑃 𝑋 + 𝑌 ≤ 10 = 𝑃(𝑋 + 𝑌 ≤ 10,0 ≤ 𝑌 ≤ 10) = 𝑃 0 ≤ 𝑋 ≤ 10 − 𝑌, 0 ≤ 𝑌 ≤ 10
10 10−𝑦
=
0
0
1
𝑑𝑥𝑑𝑦 =
200
10
0
𝑥
200
10
10−𝑦
𝑑𝑦
0
=
0
10 − 𝑦
1
1 2
𝑑𝑦 =
𝑦−
𝑦
200
20
400
10
0
1
= .
4
The Bivariate Normal Distribution
The joint probability distribution for random variables X and Y is said to have a bivariate
normal distribution if its density function is given by:
𝑓 𝑥, 𝑦 =
1
2𝜋𝜎𝑥 𝜎𝑦 1 −
𝜌2
𝑒
(𝑦−𝜇𝑦 )2
(𝑥−𝜇𝑥 )2
(𝑥−𝜇𝑥 )
1
−
+
−2𝜌
𝜎𝑥
2(1−𝜌2 )
𝜎𝑥2
𝜎𝑦2
(𝑦−𝜇𝑦 )
𝜎𝑦
The joint distribution function for X and Y is:
𝑦
𝑥
𝐹 𝑥, 𝑦 =
𝑓 𝑠, 𝑡 𝑑𝑠𝑑𝑡
−∞ −∞
=
𝑦
𝑥
1
𝑒
−∞ −∞ 2𝜋𝜎𝑥 𝜎𝑦 1−𝜌2
1
−
2(1−𝜌2 )
(𝑠−𝜇𝑥 )2
𝜎2
𝑥
+
(𝑡−𝜇𝑦 )2
𝜎2
𝑦
−2𝜌
(𝑠−𝜇𝑥 )
𝜎𝑥
(𝑡−𝜇𝑦 )
𝜎𝑦
𝑑𝑠𝑑𝑡
The Bivariate Normal Distribution (continued)
Consider the case involving two standard normal random variables Zx =
(X - x)/x and Z𝑦 = (Y - 𝑦)/𝑦, in which each has mean 0 and standard
deviation 1. We can write in this special case that our standard normal
variables Zx and Z𝑦 with correlation coefficient ρ have the joint
distribution function M(zx,z𝑦; ):
𝑧𝑦
𝑀 𝑧𝑥 , 𝑧𝑦 ; 𝜌 =
𝑧𝑥
−∞ −∞ 2𝜋
1
1 − 𝜌2
Note: Spreadsheet-based bivariate distribution calculator:
John Hull's website at http://www.rotman.utoronto.ca/~hull/software/bivar.xls
𝑒
𝑠2 +𝑡 2 −2𝜌𝑠𝑡
−
2(1−𝜌2 )
𝑑𝑠𝑑𝑡
Illustration: Calculating Bivariate Normal Densities
Suppose that the log of one stock’s price relative Xt =ln(St /St-1) is
distributed normally with mean 5% and standard deviation 1%. The log
of another stock’s price relative Yt =ln(St /St-1) is distributed normally
with mean 6% and standard deviation 2%. Assume that Xt and Yt have
a bivariate normal distribution and that their correlation coefficient is ρ
= .5. Find the probability of the event that the first stock’s return is less
than 6.5% and the second stock’s return is less than 6.6%.
Illustration: Calculating Bivariate Normal Densities
(continued)
Solution:
𝑃 𝑋 ≤ 6.5%, 𝑌 ≤ 6.6%
𝑋−5%
1%
6.5%−5% 𝑌−6%
,
1%
2%
=𝑃
≤
= 𝑃 𝑍1 ≤ 1.5, 𝑍2 ≤ .3
where Z1 =
𝑋 – 5%
and
1%
Z2 =
𝑌 – 6%
2%
≤
6.6%−6%
2%
are the standardized normal random variables for X and Y
The correlation coefficient for Z1 and Z2 is also .5.
The joint distribution function M(1.5,.3;.5) =.603
Thus:
𝑃 𝑋 ≤ 6.5%, 𝑌 ≤ 6.6% =M(1.5,.3;.5) =.603
Portfolio Mathematics: Portfolio Arithmetic
(continued)
The expected return on security i based on m potential return outcomes
Ri,k and associated probabilities Pk :
E Ri
m
R
k 1
i ,k
Pk
The expected return on a portfolio p based on m potential return
outcomes Rp,k and associated probabilities Pk :
m
E R p R p ,k Pk
k 1
Portfolio Mathematics: Portfolio Arithmetic
(continued)
The portfolio return Rp is simply a weighted average of the individual
security returns 𝑅𝑖 .
𝑅𝑝 =
𝑛
𝑖=1 𝑤𝑖 𝑅𝑖
=wTr
where: weights wi= ($invested in security i)/($ invested in p)
r = (R1,…,Rn)T
and the vector of weights by w=(w1,...,wn)T
Portfolio Mathematics: Portfolio Arithmetic
(continued)
The expected return of the portfolio in terms of the expected value of each of
the individual returns :
𝑛
𝐸 𝑅𝑝 =
𝑤𝑖 𝐸 𝑅𝑖 .
𝑖=1
Portfolio return variance is a function of potential portfolio returns and
associated probabilities or individual security weights and covariance pairs:
n
n
p E[( R p E[ R p ]) ] wi w j i , j
2
2
= 𝒘𝑻 𝑽𝒘
i 1 j 1
where V=(σi,j) is the n × n matrix of return covariances
Optimal Portfolio Selection
The investor’s portfolio problem is to select n portfolio weights wi so as to
minimize portfolio risk 𝜎𝑝2 subject to some target return rp as follows:
min 𝜎𝑝2 = 𝒘𝑻 𝑽𝒘
𝒘
𝒘𝑇 𝒓 = 𝑟𝑝
𝒘𝑇 𝜾 = 1
where 𝜾 =(1,1,…,1)T.
Optimal Portfolio Selection (Continued)
The investor’s portfolio problem is solved with the following Lagrangian:
𝐿 = 𝒘𝑻 𝑽𝒘 − 𝜆1 𝒘𝑇 𝒓 − 𝑟𝑝 − 𝜆2 (𝒘𝑇 𝜾 − 1)
with n+2 first-order conditions as follows:
𝛁L(𝐰) = 𝟎
𝜕L
= −𝒘𝑻 𝒓 + 𝒓𝒑 = 0
𝜕λ1
𝜕L
= −𝒘𝑻 𝒍 + 𝟏 = 𝟎
𝜕λ2
The gradient 𝛁L(𝐰) is an n×1 vector and its ith component will be 2i,1w1+2σi,2w2+…+2σi,nwn-λ1Ri-λ2 for i=1,2,…,n.
Portfolio Selection Illustration
Suppose that an investor wishes to comprise the lowest risk portfolio of
three assets. The expected returns of the three assets A, B and C are .1, .2
and .3, respectively. Their standard deviations are .2, .3 and .4. The
returns covariance between assets A and B is .03, the returns covariance
between assets A and C is .04 and the returns covariance between assets
B and C is .06. What should be the weights of this optimal or lowest risk
portfolio if he seeks an expected portfolio return of 15%?
Portfolio Selection Illustration (continued)
Solution:
The LaGrange function is written as either of the following:
𝐿 = .04𝑤𝐴2 + .09𝑤𝐵2 + .16𝑤𝐶2 + 2 ∙ .03𝑤𝐴 𝑤𝐵 + 2 ∙ .04𝑤𝐴 𝑤𝐶 + 2 ∙ .06𝑤𝐵 𝑤𝐶
+𝜆1 .15−.1𝑤𝐴 −.2𝑤𝐵 −.3𝑤𝐶 + 𝜆2 1 − 𝑤𝐴 − 𝑤𝐵 − 𝑤𝐶
Or
L = 𝑤𝐴
𝑤𝐵
𝑤𝐶
.04 .03 .04 𝑤𝐴
.03 .09 .06 𝑤𝐵
.04 .06 .16 𝑤𝐶
−𝜆1 𝑤𝐴
𝑤𝐵
𝑤𝐶
.1
.2 + .15𝜆1 − 𝜆2 𝑤𝐴
.3
𝑤𝐵
𝑤𝐶
1
1 + 1𝜆2
1
Portfolio Selection Illustration (continued)
First order conditions are follows, which combines 𝛁L(𝐰) = 𝟎,
and
𝜕L
𝜕λ2
𝜕L
𝜕λ1
= 𝒘𝑻 𝒓 − 𝒓𝒑 = 0
= −𝒘𝑻 𝒍 − 𝟏 = 0, the latter two of which are slightly rearranged:
.08
.06
.08
−.1
−1
.06
.18
.12
−.2
−1
0
.08 −.1 −1 𝑤𝐴
0
.12 −.2 −1 𝑤𝐵
.32 −.3 −1 𝑤𝐶 = 0
𝜆1
0 0
−.3
−.15
𝜆2
0 0
−1
−1
V*
w* = c*
We invert Matrix V* then solve for Vector w*. We find the following weights: wA =
.625, wB = .25 and wC = .125; our LaGrange multipliers are 1 = .225 and 2 = .0525.
Portfolio Selection Illustration (continued)
The expected return and standard deviation of portfolio returns are .15
and .207665, respectively:
E[rp] = 𝒘𝑇 𝒓 = .625 .25
.1
.125 .2 = .15
.3
𝜎𝑝2 = 𝒘𝑻 𝑽𝒘 = .625
.04
.125 .03
.04
.25
.03 .04 .625
.09 .06 .25 = .043125
.06 .16 .125