Lecture 10 Slides

Transcript Lecture 10 Slides

Risk Analysis & Modelling
Lecture 10: Extreme Value Theory
http://www.angelfire.com/linux/lecturenotes
What we will learn in this lecture
• We will look at a method for dealing specifically
with infrequent, extreme events: EVT (Extreme
Value Theory)
• EVT can be used to describe the tails of almost
any distribution
• EVT can be used as a method of calculating a
distribution independent measure of VaR
• The limitations of EVT
• More advanced programming techniques in VBA!
The Tails Of A Distribution & Risk
• In our look at Value At Risk we estimated the
likely loss by describing the complete behaviour
of a random variable
• From this distribution we took the 5% lower tail or
1% lower tail as our estimate of a serious, but
possible loss
• We had to make some strict assumptions about the
distribution in order to estimate the position of
these losses
• Since we are only interested in the tails of the
distribution cant we make fewer assumptions and
just focus on the tails?
• The answer is yes and the method is EVT!
Tail Risks From VaR
We make a lot of strict assumptions
about the random variable in order
to estimating the whole distribution
All we are interested
is the lower tail
We do not need a complete description of the random distribution
just the lower tail. Isn’t this inefficient?
Probability Distribution Recap
• Before we discuss EVT we will recap some statistics
• Imagine we have a probability distribution F which
describes a continuous random variable
• The definition of a probability distribution is that it
describe the chance of observing a value equal or less
than a given level
• It is sometimes called Cumulative Density
• So F(0.5) would give the probability that our random
variable will take a value equal to or less than 0.5
• We will assume we are dealing with variables that can
be between –infinity and +infinity
• F(-infinity) is 0, F(+infinity) is 1
• If the upper bound is +infinity then were are certain
the random variable will be bellow it!
Probability Distribution Example
For Random Variable X
1.0
P(X<=C)
-X
0.0
C
+X
Peak Over Threshold Distribution
• The Peak Over Threshold Distribution (POTD)
describes the distribution of a random variable given
that we know it has exceeded a given boundary
• An example of such a distribution would be the
distribution describing daily returns greater than 5%
• This distributions is obviously related to the
distribution describing the complete behaviour of the
random variable
• More specifically the POTD is:
Fu ( y)  P( X  y  u | X  u)  P( X  u  y | X  u)
• The probability of observing a value of X that is less
than or equal to y+u given that X is above u
• If we know the Probability Distribution describing a random
variable X, F, then we can express the POTD in terms of F, u, y:
F ( y  u )  F (u )
Fu ( y ) 
1  F (u )
• F(y+u) is the probability that X will be less than or equal to y+u
• F(u) is the probability that X will be less than or equal to u
• F(y+u) - F(u) is the probability that X will be greater than u but
less than or equal to y+u
• 1-F(u) is the probability that u will be greater than u (F(inf) = 1)
• Since our probability distribution is conditional of the fact that we
are above the boundary u we divide by the probability 1–F(u) (ie
rescale our probabilities)
POTD Interpretation
PDF(X)
Probability that X is greater than a
threshold value u
u
X
u
POTD measures the probability that X will
be greater than u and less than u + y, given
that we know X is greater than u
u+y
A Very Important Result
• It can be shown that regardless of the probability
distribution F of the variable the POTD
distribution approaches a set distribution as the
threshold u increases:
Limu  Fu ( y)  G( y)
• Where G is a Generalised Pareto Distribution (GPD):

 .y
G( y)  1  
1  






1

  is the shape parameter,  is a scaling parameter (like
s scales the normal distribution)
• When  > 0 we say the GPD has ‘heavy’ tails
Use Of POTD Limit
• Re-expressing this result we can say that for
all values of x greater than u
G( y)  G( x  u) 
F ( x)  F (u )
1  F (u )
F ( x)  G ( x  u ).(1  F (u ))  F (u )
• This means we can estimate the Probability
Distribution F(x) for the tail (x>u) interms
of the Generalised Pareto Distribution
• All we have to do is find out a way of
estimating G’s parameters!
Peak Over Threshold Approach
• If we have an observation of a random variable X then
EVT tells us that if we set a high enough boundary (u) the
distribution describing the random points above this
boundary will have a Generalised Pareto Distribution
Random Points Over Threshold u Are Described By GPD
X
u
• We can describe the tail of the distribution by fitting a GPD to the datapoints above the boundary
• If 10% of the points are above the boundary then we say that F(u) is
90% (ie 90% of the points are below the boundary u, this is our
estimate for the specific value F(u), remember we do not know what
F(u) looks like!)
• We estimate F(u) by from the number of points above the boundary
(Nu) relative to to the total N of points in our data set:
F(u) = 1 – (Nu/N)
• For this boundary the relationship between F(x) and G(x) would be:
F ( x)  G ( x  u ) * (1  0.9)  0.9  0.1* G ( x  u )  0.9
• F(x) describes the probability distribution for x above the
boundary u
• We can estimate the parameters of G (, ) using
maximum likelihood
• What values for ,  are most likely to produce the points
we observe above the boundary?
Maximum Likelihood Estimator
• We have a set of data points (S) we observe above the
boundary we set u (A1,A2,A3..)
• What is the probability of observing this set of results?
Given they are independent it is the product of the
probability of observing each individually:
N
P( S )  P( A1 ).P( A2 ).P( A3 )...   P( Ai )
i 1
• We want to select the GPD which maximises P(S)
• We can also express this problem as selecting the GPD to
maximise the log likelihood, (which is often a simpler
problem to solve)
N
ln( P ( S ))  ln( P ( A1 ).P ( A2 ).P ( A3 )...)   ln( P ( Ai ))
i 1
• We still have to work out how to calculate the
probability of observing a point above the boundary
• This is given by the probability density function
(pdf) for the GPD:
1 
 .A 

P( A)  GD( A) 
.
1

 
 


1

1








• Note this is just the derivative of G with respect to y
• We notice that the log of this is:

1 
 .A 
ln( GD( A))   ln(  )  
1   
. ln 
1   


 

• So to find the GPD distribution that is most likely
to have produced the data points we have observed
above the boundary we simply have to find the
values of ,  that maximise:

1  1
ln( P( S ))    
1   
. ln 

i 1

 
N

 . Ai
.
1









Max ln(P(s)) by changing , 
• We Ai is the set of observations above the
boundary we set
• We cannot user solver to find these values!
• We need to use a grid searching algorithm because
it can have multiple peaks
• Once we have ,  we have a GPD tail distribution
that we can use to calculate Value at Risk!
Using EVT To Calculate VaR
• VaR tells us how far into the tail of a distribution
we have to go to be sure only 5% or 1% of
possible outcomes will be bellow that point
• Since EVT describes the tails we should be able to
use it to calculate VaR
• We want to use the tail distribution to ask at what
level of loss can we say that only X% of losses
will be less than that loss level?
• There are 3 problems that must be solved before
we can calculate VaR using EVT
Problem 1: EVT Deals With The
Upper Tail
• The EVT model we have looked at deals exclusively with
the upper tail of the distribution, while VaR deals with the
lower tail
• The solution to this is fairly simple, instead of measuring
returns on the portfolio we measure losses. A positive loss
(L) is a negative return (R), A positive return is a negative
loss.
 Pt 
 Pt 1 
  ln 

L   R   ln 
 Pt 1 
 Pt 
• Using this definition the problem of finding that maximum
loss is to find the upper tail of the distribution describing
L.
Problem 2: Selecting The Upper
Boundary
• To use the Peak Over Threshold we need to set an
upper boundary on the level of loss and only look
at points above that line
• Since the tail distribution is only valid above this
peak we must select a threshold which is not
above the level of VaR we wish to calculate
• For example if we set our threshold so that only
3% of the distribution is above it we cannot then
use this tail distribution to estimate the 5% tail
Diagram of the relationship between the
threshold and the VaR level we can estimate
The level of the threshold determines
how much of the tail our GPD
estimates
The VaR confidence
interval must be contained
by our tail estimation!
u
We only estimate the distribution
above our boundary
X
Problem 3: Inverting the Tail Estimator
• Our tail estimator for the function is:
F ( x)  G ( x  u ).(1  F (u ))  F (u )
• F(x) tells us the probability of observing a value
less than or equal to x (loss)
• We are interested in finding the the level of x
(loss) for which there is only a probability P of
observing losses greater than x
• For example we want to find the loss level which
we can say only 1% of losses will be greater than
• We need to rearrange the above to get x interms of
the probability rather than the probability in terms
of x
• This comes down to reordering
1




 .( x  u )  

F ( x )  1  
1 








.(1  F (u ))  F (u )


• After some work:



 F ( x )  F (u )  
 
u
1  


x

1
 1  F (u )


 






• We observe that F(x) measures the probability that the loss
L is less than or equal to some level x (L <= x). We want
the probability the loss is greater than some level x V(x).
Which is simply V(x) = 1-F(x).

x

  V ( x )  


u


1


1  F (u ) 




The Terms Of EVT VaR



   N .V ( x) 
u


x

1



 
N
u




• V(x) is the VaR Confidence level, such as 5% for 5% VaR
• X is the upper boundary on loss that we only expect to be
above V(x) % of the time
• U is the level of the threshold we set (the loss level) to
estimate the tail
• N is the total number of observations for losses in our
dataset and Nu, therefore our estimate for 1-F(u) is Nu/N
 ,  are terms we estimate using Maximum Likelihood
from the points over the threshold from our dataset of
losses
• This is only value for risks estimate above u, our tail
estimator is only valid above the threshold we set!
The Advantages of EVT
• The advantage of EVT is that it just focuses
on the tail of the distribution
• The GPD this method can estimate the fat
tail losses we observe in financial
instruments and insurance liabilities
• The calculation is not excessively complex
or computationally intensive
The Problem With EVT
• The problem with EVT is that we need a lot of
data to estimate the GPD of the tail
• The higher we set the boundary for the Peak Over
Threshold the closer our distribution will be to the
GPD
• Unfortunately the higher we set the boundary the
less data we have to estimate the GPD
• EVT is still under development, we will have to
wait and see what people come up with!
Part 2: Arrays, Objects & More
VBA Tricks
A Recap of Last Week
• Last week we introduced the key concepts of
variables and statements
• Variables were like boxes that store a single piece
of data
• Statements were instructions to the computer
• We looked at If statements and Loops
• This week will look at a special type of variable
called an array
• We will look at how to create our own variable
types with objects!
An Array
• An array is a variable that can store more than one
value
• Last week we looked at variables as boxes that
only contain one value
• An array is a variable that can contain many
variables
• Arrays are important because often we want to
store lists or blocks of things of various length of
variable length (such as a list of all the students in
the class)
• Each element in the array is identified by a
number
Creating an Array
• Let us say we want to create an Array of 10 strings
we would write:
Dim StudentNames(10) as String
• If we wanted to create an Array of 500 Daily
Returns on a stock:
Dim DailyReturns(500) as Double
• If we wanted to create a list of student ages where
ClassSize is an integer variable we would say:
Dim StudentAges(ClassSize) as Integer
• If the variable ClassSize contained 35 then the
Array StudentAges would contain 35 variables
Accessing The Elements Of An
Array
• Just like a Variable the Array is initially
blank
• Let us say we wanted to assign the name
Frank Bloggs to the first element in the
StudentNameArray we would write:
StudentNameArray(1) = “Frank Bloggs”
• StudentNameArray(1) would now contain
the string “Frank Bloggs”
The For Loop
• The for loop is a different type of loop
• It is especially designed for the case where we use an
integer variable to count the number of loops
• If we wanted to count all the cells in column A which have
a value greater than 0.5 we would write
Dim I as integer
Dim CellCount as Integer
Dim CellValue as Double
For I = 1 to 100
CellValue = Cells(I,1)
If CellValue > 0.5 then
CellCount = CellCount + 1
End If
Next I
If..Then..Else..End If
• Last week we looked at If Then Blocks
• There is an extension to this called If Then Else Blocks:
If MyNumber > 0.5 Then
Call MsgBox(“MyNumber is Greater Than 0.5”)
Else
Call MsgBox(“MyNumber is Not Greater Than 0.5”)
End if
• This is useful when we want to say to the computer: “if
this is then do this else do something else”, rather than just
saying “if this is do this”.
Introduction To Objects
• Using objects we can create our own variable
types
• So we could say:
Dim TopStudent as New Student
• This is very useful and is the basis for object
oriented programming (OO)
• The type Student is know as a class (the type of
the variable) and TopStudent is an object of type
student (instance or example of that class)
Class Modules
• To create our own variable types or “Classes” we
have to create a Class Module
• The name we give the Class Module is the name
of the new Variable Type we create
• There is a class module called student containing
the following:
Public StudentName as String
Public StudentAge as Integer
Public StudentGrade as Double
• Every object of type student will have 3 subvariable or members: StudentName, StudentAge
and StudentGrade
• They are declared as Public so we can access them
outside the class module
Using Objects
• Here is some example code of using an object of
type Student:
Dim TopStudent as new Student
TopStudent.Name = “Frank Bloggs”
TopStudent.StudentAge = 32
TopStudent.StudentGrade = 72.1
• Notice how we access the members of the object
using a ‘.’
• Objects make the code readable
• Objects have many uses in advanced programming
techniques but those are for you to discover!
THE END

Lecture 10 Slides

Transcript Lecture 10 Slides

Directory