TERStatistics of Extremes .pdf
Nom original: TERStatistics_of_Extremes.pdf
Ce document au format PDF 1.5 a été généré par LaTeX with hyperref package / pdfTeX1.40.16, et a été envoyé sur fichierpdf.fr le 20/05/2016 à 15:26, depuis l'adresse IP 87.231.x.x.
La présente page de téléchargement du fichier a été vue 3554 fois.
Taille du document: 1.5 Mo (28 pages).
Confidentialité: fichier public
Aperçu du document
Université Claude Bernard Lyon 1
Institut de Science Financière et
D’Assurances
Work of Studies and Research
Statistics of Extremes
Theory and Applications with R
Author:
Supervisor:
Mehdi Cherid
Pierre Ribereau
April 26, 2016
Contents
Acknowledgement
3
Introduction
4
1 Basics and Recalled Results
1.1 Order Statistics . . . . . . . . . .
1.2 Cumulative Distribution Function
1.3 Strong Law of Large Numbers . .
1.4 Central Limit Theorem . . . . . .
1.5 QuantileQuantile Plot . . . . . .
.
.
.
.
.
5
5
5
5
5
6
2 Basics of the Univariate Extreme Value Theory
2.1 Limit Distribution of the Maximum . . . . . . . . . . . . . . . . . . .
2.2 Domain of Attraction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Max–Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
6
9
10
3 Statistics Inference and Estimation
3.1 Nonparametric Estimators of the Tail Index ξ . . . . . . . . . . . . . .
3.1.1 Pickands Estimator . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.2 Hill Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1.3 Moment Estimator . . . . . . . . . . . . . . . . . . . . . . . . .
3.2 Parametric Estimation of the GEV Parameters . . . . . . . . . . . . . .
3.2.1 Estimation of the Parameters by the Maximum Likelihood (MLE)
3.2.2 Estimation of the Parameters by the Probability Weighted Moment (PWM) Method . . . . . . . . . . . . . . . . . . . . . . .
10
10
11
11
12
12
13
4 Extreme Value Methods
4.1 Block Maxima Method . . . . . . . . . . . . . . . .
4.2 PeaksOverThreshold Method . . . . . . . . . . . .
4.2.1 The Generalized Pareto Distribution (GPD)
4.2.2 Estimation of the Parameters . . . . . . . .
4.2.3 Choice of the Threshold . . . . . . . . . . .
15
15
17
18
19
19
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
14
5 Application
20
5.1 Block Maxima Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.2 PeaksOverThreshold Method . . . . . . . . . . . . . . . . . . . . . . . 23
Conclusion
24
Bibliography
25
Annex
26
Acknowledgement
Acknowledgement
First of all, I would like to thank my supervisor Pierre Ribereau for suggested me the
topic of extreme value theory since I loved the statistics lesson during the first semester
and I wanted a work realized in this way. He offered his continuous help and advice
throughout the course of this work.
I am thankful to my brother who lent me his computer to be able to finish this
work and help me to check my typo.
Finally, I take this opportunity to express the profound gratitude from my deep
heart to my beloved parents for their love and continuous support throughout my life.
3
Introduction
Introduction
Emil J. Gumbel says “It’s impossible that the improbable will never happen”. Rare
events are events that have a low probability of occurrence but even if they are discrete occurrences that are statistically “improbable” in that they are very infrequent,
such events are plausible. Rare events are everywhere and encompass natural phenomena (earthquakes, tsunamis, hurricanes,. . . ), financial market crashes, etc. They
are events which recede strongly to the mean or the usual trend and which have consequences that are disastrous to human beings and environment.
Studying the occurrence of these events is of prime importance today for insurance and
financiers. Measurement and risk management have become major issues for financial
market operators, actuaries, etc.
The question is how to model the rare phenomena that lie outside the range of
available observations? Extreme value theory provides a good theoretical foundation
on which we can build statistical models describing extreme events. This modeling
corresponds to the study of the tail of the distribution that will enable us to calculate
or estimate rather extreme quantiles. Indeed, we would like to be able to answer to
these two complementary questions: Calculate the threshold zp which will be exceeded
only with a very small probability p (i.e. the (1 − p)V aR) and what is the probability
p that a disaster exceeds a threshold zp . This is the purpose of our study.
The outline of the present report is as follows. First, we give or recall some definitions and theorems. Then, extreme value theory is introduced and we see how
to estimate the extreme value index and others parameters. Finally, we present two
methods to estimate extreme quantiles: the block maxima method and the peaksoverthreshold method. All these approaches are then applied to a real dataset.
4
1. Basics and Recalled Results
1
Basics and Recalled Results
1.1
Order Statistics
Let X1 , . . . , Xn be a sample, i.e. a family of nondegenerate independent and identically
distributed random variables with a common distribution F . We define the ordered
sample as the family Xn,n ≤ · · · ≤ X1,n so that Xn,n = min(X1 , . . . , Xn ) and X1,n =
Mn = max(X1 , . . . , Xn ).
The random variable Xk,n is called the korder statistic. Xn,n and X1,n = Mn are
called smallest order statistic and larger order statistic, respectively. The difference
X1,n − Xn,n is called the sample range.
1.2
Cumulative Distribution Function
The cumulative distribution function of a realvalued random variable X is the function
given by F (x) = P(X ≤ x) and the survival function is defined by F = 1 − F (x).
We also define the right endpoint xF of the distribution function F as:
xF = sup{x : F (x) < 1}.
Let X1 , . . . , Xn be independent, identically distributed real random variables with the
common cumulative distribution F . Then we define the empirical distribution function
as:
n
1X
Fˆn (x) =
1{Xi ≤x}
n i=1
1.3
Strong Law of Large Numbers
Let X1 , . . . , Xn be a sequence of independent and identically distributed random variables with expected value E[X1 ] = E[X2 ] = µ < ∞.
The sample mean X = n1 (X1 + · · · + Xn ) → µ when n → ∞.
1.4
Central Limit Theorem
Let X1 , . . . , Xn be a sequence of independent and identically distributed random variables with mean µ and finite, positive variance σ 2 . Then, defining:
Sn =
X 1 + · · · + Xn
,
n
√ (Sn − µ) d
n
→
− Z
σ
as n → ∞ where Z ∼ N (0, 1).
5
(1.1)
1.5
QuantileQuantile Plot
The central limit theorem is generally used in statistical applications by interpreting
(1.1) as an approximation for the distribution of the sample mean X for large n.
This theorem is helpful for applications because the approximating distribution of
the sample mean is normal regardless of the parent population of the Xi . Similar
arguments will be use in this paper to obtain approximating distributions for sample
extremes.
1.5
QuantileQuantile Plot
The Quantile–Quantile(QQ) plot is a graph for determining if two data sets come from
populations with a common distribution. This is a plot of the quantiles of the first data
set against the quantiles of the second data set. So it compares empirical quantiles
with theoretical quantiles. This plot should be linear (or at least near diagonal) if the
adjustment is good.
2
2.1
Basics of the Univariate Extreme Value Theory
Limit Distribution of the Maximum
The model focuses on the statistical behavior of Mn . We know that the distribution
of Mn is given by:
FMn = P(Mn ≤ x)
= P(
n
\
(Xi ≤ x))
i=1
=
=
n
Y
i=1
n
Y
P(Xi ≤ x)
F (x) = F n (x)
∀x ∈ IR
i=1
But it is not helpful in practice, since the distribution function F is generally unknown.
An other nonparametric approach would be to use the empirical function distribution
6
2.1
Limit Distribution of the Maximum
but Fˆn (t)=1 when x > Mn so we see that standard approaches are not suitable to
study tails of distributions.
One approach is similar to the usual practice of approximating the distribution of
sample means by the normal distribution, as justified by the central limit theorem. The
arguments we develop here are essentially an extreme value analog of the central limit
theory. The question is can we find sequences of normalization constants {an > 0},
n
→ G?
{bn } and a nondegenerate distribution G such that Mna−b
n
Theorem 2.1. (FisherTippettGnedenko)
Let X1 , . . . , Xn be a sequence of i.i.d. variables with common distribution F . If there
exist sequences of constants {an > 0}, {bn } and a nondegenerate distribution function
G such that:
!
x−µ
Mn − bn
≤ x = F n (an x + bn) → Gξ
P
an
σ
as n → ∞
then G belongs to the generalized extreme value (GEV) distribution:
exp
− 1 !
x−µ
− 1+ξ
σ
G(x) =
x
−
µ
)
exp −exp(−
σ
x−µ
ξ=
6 0 and 1 + ξ
>0
σ
ξ
ξ=0
The GEV family encompasses 3 possible limit distributions of the maxima Mn :
• Gumbel or Type I (ξ = 0):
x−µ
Gµ,σ,ξ (x) = exp −exp −(
)
σ
x ∈ IR
• Fréchet or Type II (ξ > 0):
0
x≤µ
Gµ,σ,ξ (x) =
x − µ − 1ξ
exp −(
)
σ
x>µ
• Weibull or Type III (ξ < 0):
Gµ,σ,ξ (x) =
x − µ 1ξ
exp − −(
)
x<µ
1
x≥µ
σ
µ ∈ IR, σ > 0 and ξ ∈ IR are the location parameter, the scale parameter and the
shape parameter respectively.
With µ = 0 and σ = 1, we obtain standard expressions of these distributions:
7
2.1
• Λ(x) = exp (−exp(−x))
• Φθ (x) = exp (−x)−θ
Limit Distribution of the Maximum
x ∈ IR (Gumbel)
x > 0, θ > 0 (Fréchet)
• Ψθ (x) = exp −(−xθ )
x ≤ 0, θ > 0 (Negative Weibull)
A lot of classic laws satisfy the hypothesis of the existence of G for the maximum Mn .
It is the case in particular for the following distributions:
• Let F be the cumulative distribution function of the Pareto distribution defined
as F (x) = 1 − x−α , x ≥ 1 and α > 0.
1
Choosing an = n α and bn = 0, we verify that ∀x ∈ IR:
−α
lim F n (an x + bn ) = e−x 1x>0
n→∞
• Let F be the cumulative function of the standard Cauchy distribution, admitting
a density f defined by:
f (x) =
In selecting an =
n
π
1
π(1 + x2 )
∀x ∈ IR
and bn = 0, we verify that ∀x ∈ IR:
−1
lim F n (an x + bn ) = e−x 1x>0
n→∞
• Let F be the cumulative function of the standard exponential distribution defined
by F (x) = 1 − e−x ∀x ≥ 0
Choosing an = 1 and bn = log n, we verify that ∀x ∈ IR:
lim F n (an x + bn ) = e−e
−x
n→∞
• Let F be the cumulative distribution function of the uniform distribution on the
interval [0, 1], defined by F (x) = x for 0 ≤ x ≤ 1
Choosing an =
1
n
and bn = 1, we verify that ∀x ∈ IR:
lim F n (an x + bn ) = ex 1x≤0 + 1x>0
n→∞
8
2.2
Domain of Attraction
Figure 1: Normal, Gumbel, Fréchet and Weibull distributions
We can see that the Fréchet is bounded below and the Weibull is bounded above.
2.2
Domain of Attraction
Theorem 2.2.
Let Gξ be a nondegenerate distribution. The distribution F is in the domain of
attraction of the extreme value Gξ (F ∈ D(Gξ )) if there exist sequences {an > 0} and
{bn } such that:
F n (an x + bn ) → Gξ (x)
as n → ∞
The belonging to a domain of attraction is bound to the form of the tail of distribution
and more specifically at the speed of decrease of F (x) toward 0.
Therefore, we have 3 domains of attraction:
• Gumbel domain of attraction if ξ = 0. This domain encompasses laws having a
survival function with exponential decay. It is the case with normal distributions,
gamma distributions, exponential distributions for example.
• Fréchet domain of attraction if ξ > 0. It contains laws whose survival function
decreases as a power function. They are heavytailed laws as is the case with
Pareto distributions, Student’s tdistributions, Cauchy distributions.
9
2.3
Max–Stability
• Weibull domain of attraction if ξ < 0. This domain contains laws on which the
endpoint xF is finite. It is the case for example with uniform distributions and
beta distributions.
In practice, we wish to determine, for a given sample, the shape of the tail of distribution, or at least, to have an idea of the domain of attraction of the distribution.
The quantilequantile method can be used to provide an indication of the probable
belonging to a possible domain of attraction.
2.3
Max–Stability
Proposition 2.1.
The GEV distribution has the property of maxstability, understood as follows: if
Y1 , . . . , Yn are i.i.d. draws from G, then max(Y1 , . . . , Yn ) also has distribution G,
i.e. Gn (z) = G(an z + bn ) for appropriate sequences {an > 0} and {bn }. In fact, a
distribution is maxstable if and only if it is a member of the GEV family.
For that matter, what about the minima of the sample? The theoretical approach is
the same that for the maximum.
=
=
=
'
3
3.1
P(min(X1 , . . . , Xn ) ≤ x)
P(−max(−X1 , . . . , −Xn ) ≤ x)
P(max(−X1 , . . . , −Xn ) ≥ −x)
1 − P(max(−X1 , . . . , −Xn ) < −x)
−x − µ
1 − Gξ (
)
σ
Statistics Inference and Estimation
Nonparametric Estimators of the Tail Index ξ
We present here three nonparametric estimators of the tail index, both based on the
order statistic Xk,n ≤ · · · ≤ X1,n obtained from the initial series by considering the
klarger values (or we could consider the smallest), k depends in priori on n even if
we won’t mention it in the notation. The idea is to have k → ∞ when n → ∞ but
without taking too much values in the sample, what leads to impose nk → 0.
That means the question will arise on the optimal choice of k. Indeed, it is essential to
calculate these estimators on the distribution tails. Choosing a too high k generates
the risk of taking into account the values that are not extreme, conversely, a too small
subsample does not allow estimators to reach their level of stability.
Finally, we note that the nonparametric approach is only possible if we have a large
number of observations.
10
3.1
3.1.1
Nonparametric Estimators of the Tail Index ξ
Pickands Estimator
The Pickands Estimator is defined by the statistic:
p
ξˆk,n
1
Xk,n − X2k,n
ln
=
ln 2
X2k,n − X4k,n
!
It presents the interest to be valid whatever is the extreme distribution (Gumbel,
Fréchet or Weibull).
The graphical representation of this estimator, according to the number of considered
observations, k, shows a behavior generally very volatile at first, what harms the
legibility of the graph. Furthermore, it is very sensitive to the size of the selected
sample, what makes it not very robust. It is thus delicate to handle.
We can note that the estimator is asymptotically normal:
p
√ ξˆk,n
−ξ
k
→ N (0, 1)
σ(ξ)
as k → ∞
the asymptotic variance being given by:
√
ξ 22ξ+1 + 1
σ(ξ) =
2(2ξ − 1) ln 2
3.1.2
Hill Estimator
This estimator assumes the tail index ξ to be positive, so it is usable only for Fréchet
distributions, for whose it supplies an estimator of the tail index more effective than
Pickands estimator.
The Hill Estimator is defined by:
H
ξˆk,n
=
If we choose k, n → ∞ such that
k
n
X
1 k−1
Xj,n
ln
k − 1 j=1
Xk,n
!
→ 0, then we can show that
H
=ξ
lim ξˆk,n
k→∞
The Hill estimator is also asymptotically normal:
H
√ ξˆk,n
−ξ
k
→ N (0, 1) the convergence being in law.
ξ
However, the Hill estimator has an important bias and for this reason it is delicate to
handle.
11
3.2
Parametric Estimation of the GEV Parameters
Note: it plays another role  that of estimating the threshold (see PeaksOverThreshold method, lower).
3.1.3
Moment Estimator
This estimator is defined by the statistic:
−1
(1)
(ξˆk,n )2
1
1 − (1)
−
2
ξˆk,n
M
ξˆk,n
=1+
(1)
ξˆk,n
where
(i)
ξˆk,n
i
k
1X
Xj,n
=
)
ln(
k j=1 Xk−1,n
This estimator is convergent and asymptotically Gaussian:
M
√ ξˆk,n
−ξ
k√
→ N (0, 1)
1 + ξ2
As for the Hill estimator, it is usable only for the Fréchet domain of attraction.
3.2
Parametric Estimation of the GEV Parameters
We will see through an example how to estimate the GEV parameters by the maximum likelihood method and the probability weighted moment method. Then, we will
estimate the ValueatRisk.
Simulated example: We simulate data z1 , . . . , zm which represent for example the
cost of car accidents and which are naturally maximums obtained with M = rgev(100, loc =
9, scale = 2.6, shape = 0.3).
12
3.2
3.2.1
Parametric Estimation of the GEV Parameters
Estimation of the Parameters by the Maximum Likelihood (MLE)
We know these observations are realizations of a GEV distribution (since they are
maximums). We assume in addition they are independent with a common distribution.
So we will estimate parameters of this distribution.
The loglikelihood associated with case where ξ 6= 0 is then:
m
m
X
zi − µ
1 X
zi − µ
)−
1+ξ
l(µ, σ, ξ) = −m log σ − (1 + )
log (1 + ξ
ξ i=1
σ
σ
i=1
− 1
ξ
with 1 + ξ ziσ−µ > 0 for i = 1, . . . , m.
The loglikelihood associated with case where ξ = 0 is different from the fact that the
GEV corresponds to the Gumbel distribution:
l(µ, σ) = −m log σ −
m
X
zi
i=1
m
X
−µ
zi − µ
−
)
exp −(
σ
σ
i=1
Asymptotically normal properties for estimators is only valid for ξ > − 12 . In
practice, estimators have no explicit expression and many problems can occur if there
are too much data or on the contrary, too few data.
With the function gev.f it, we find:
µ
8.6916369
σ
2.4248499
ξ
0.2882475
Since we have the asymptotic normality of estimators with ξ > − 12 , we have confidence intervals of the form:
CI1−α
q
q
ˆ ξˆ + q1− α V ar(ξ)
ˆ
: ξˆ − q1− α2 V ar(ξ);
2
where q1− α2 is the (1 − α2 )quantile of the standard normal distribution, i.e.
P(Y ≤ q1− α2 ) = 1 −
α
2
with Y ∼ N (0, 1)
.
Thus, results of the estimate of parameters by maximum likelihood with 95% confidence interval are:
ξ
0.29
[0.11 ,0.46]
µ
8.69
[8.14 ,9.23]
13
σ
2.42
[1.97 ,2.88]
3.2
3.2.2
Parametric Estimation of the GEV Parameters
Estimation of the Parameters by the Probability Weighted Moment
(PWM) Method
It’s a general method that was introduced by Greenwood et al. [1979]. The weighted
moments of a random variable Z having a cumulative function F are defined by:
Mp,r,s = E [Z p (F (Z))r (1 − F (Z))s ]
for p, r, s ∈ IR
The study for the GEV was realized by Hosking et al. [1985]. We obtain for ξ 6= 0,
ξ < 1:
!
M1,r,0
σ
1
µ − [1 − (r + 1)ξ Γ(1 − ξ)]
= βr =
r+1
ξ
where Γ(t) =
Z ∞
xt−1 e−x dx
0
ˆ of (σ, µ, ξ)
If Z1 , . . . , Zm is a sample of a GEV distribution, weighted moments (ˆ
σ, µ
ˆ, ξ)
are the solutions of the system:
β0
=µ − σξ (1 − Γ(1 − ξ))
2β1 − β0 = σξ Γ(1 − ξ)(2ξ − 1)
3β2 −β0 = 3ξ −1
2β1 −β0
2ξ −1
replacing βr by the unbiased estimator based on the m observations Z1 , . . . , Zm :
m
r
Y
1 X
j−l
βˆr =
Zj,m
m j=1
l=1 m − l
!
(for further details, see Beirlant et al. [2004], chapter 5)
For this method, we can use the function gevF it in the f Extremes package and we
find:
ξ
0.2888079
µ
8.6770687
σ
2.4246313
The asymptotic normality is valid here for ξ < 21 but the domain of validity is restrictive
for many applications. However, this domain of validity can be extended with the
generalized probability weighted moments method. In practice, this method is good
in simulations for small sample size.
We now turn to the calculation of the 99% V aR (V aR(1 − p) = zp ). We have:
i
σh
if ξ 6= 0
zp = µ −
1 − (− log(1 − p))−ξ
ξ
= µ − σ log(− log(1 − p)) if ξ = 0
We will then use the following estimators:
zˆp = µ
ˆ−
i
σ
ˆh
ˆ
1 − (− log(1 − p))−ξ
ξˆ
if ξ 6= 0
= µ
ˆ−σ
ˆ log(− log(1 − p)) if ξ = 0
14
4. Extreme Value Methods
We can also calculate confidence intervals for zp . For that, it is first necessary to
calculate the associated variance.
Using the delta method, we have:
V aR(ˆ
zp ) ' ∇tzp V ∇zp
ˆ and
with V the variancecovariance matrix of (ˆ
µ, σ
ˆ , ξ)
"
∇tzp
∂zp ∂zp ∂zp
=
,
,
∂µ ∂σ ∂ξ
#
ˆ
evaluated in (ˆ
µ, σ
ˆ , ξ).
We find the 99% V aR with a 95% confidence interval: 31.96 ∈ [19.97, 43.94].
Note: We have the V aR stability.
Simulate two different samples of the same distribution:
A = rgev(100, loc = 9, scale = 2.6, shape = 0.3)
B = rgev(100, loc = 9, scale = 2.6, shape = 0.3)
Using this time the function f gev in the evd package, we find:
fgev ( A )$estimate
loc
scale
9.1707366 2.5334243
fgev ( B )$estimate
loc
scale
8.4849578 2.3563261
shape
0.1961111
shape
0.3680578
The function f gev also provides directy the V aR:
quantile
28.1156989
quantile
28.4638800
scale
2.5323606
scale
2.3528401
shape
0.1967211
shape
0.2484344
We can see that estimates of parameters are pretty different but we obtain similar
values of the 99% V aR in both cases.
4
Extreme Value Methods
In extreme value theory, there are two important approaches: the block maxima (BM)
method and the peaksoverthreshold (POT) method. Whereas much theoretical research has gone into the POT method, the BM method has not been studied thoroughly. We will see in this present paper both methods with an application example
and we will also provide a theoretical comparative study of the methods.
4.1
Block Maxima Method
We have considered to this point maximums directly. Now, we suppose we have
independent realizations having a same distribution F of a certain phenomenon of
15
4.1
Block Maxima Method
interest.
One approach to working with extreme value data is to divide the sample in m blocks
of equal length and fit the data to the maximums of each block. Indeed, the block
maxima method involves fitting a GEV distribution to maximum data values from a
fixed time interval of block length.
The delicate point of this method is the “good” choice of the block size as blocks that
are too small can lead to bias and blocks that are too large generate too few block
maxima, which leads to large estimation variance. This choice depends on the context,
on the number of data,. . .
(X1 , X2 , . . . , Xk ), (Xk+1 , Xk+2 , . . . , X2k ), (X(m−1)k+1 , . . . , Xmk )

{z
Z1
} 
{z
} 
Z2
{z
Zm
}
We obtain a sample of m realizations (of GEV distribution) on which we can apply
the FisherTippettGnedenko theorem and can implement methods seen previously.
Figure 2: Block Maxima method illustrated
However, the block maxima method is wasteful of data as only one data point in each
block is taken. The second highest value in one block may be larger than the highest
16
4.2
PeaksOverThreshold Method
of another block and this is generally not accounted for.
The peaksoverthreshold (POT) method is a way to avoid this drawback as it uses
the data more efficiently: the idea is to consider several large values instead of only
the largest one.
4.2
PeaksOverThreshold Method
This approach consists in using observations which are above a given threshold, called
exceedances.
Figure 3: PeaksOverThreshold method illustrated
Definition 4.1.
Let X be a random variable with a distribution function F and right endpoint xF .
For all u < xF , the function:
Fu (x) = P (X − u ≤ xX > u) x ≥ 0
is called the distribution function of exceedances above threshold u.
By the conditional probabilities, Fu can also be defined as:
Fu (x) =
F (u + x) − F (u)
if x ≥ 0
0
else
1 − F (u)
17
4.2
PeaksOverThreshold Method
Let Y = X − u for X > u and for n observed variables X1 , . . . , Xn , we can write
Yj = Xi − u such that i is the index of the jth exceedance, j = 1, . . . , nu . The
distribution of the exceedances (Y1 , . . . , Ynu ) can be approximated by a Generalized
Pareto Distribution.
4.2.1
The Generalized Pareto Distribution (GPD)
Definiton 4.2.
Let σu be a strictly positive function and ξ ∈ IR. The Generalized Pareto Distribution
(GPD) is given by:
1 − (1 +
ξy − 1ξ
)
σ0
Gξ,σu (y) =
y
1 − exp(− )
σu
ξ 6= 0
(4.1)
ξ=0
where y ≥ 0 if ξ ≥ 0 and 0 ≤ y ≤ −
σu
ξ
if ξ < 0
.
Theorem 4.1. (PickandsBalhemade Haan)
If F is in one of the three domains of attraction of the GEV (Gumbel, Fréchet or
Weibull), then there exist a strictly positive function σu and ξ ∈ IR such that:
lim
sup
u→xF 0≤y≤x −u
F
Fu (y) − Gξ,σu (y) = 0
where Gξ,σu is the GPD and Fu the distribution function of the exceedances above u.
Thus, for a threshold u large enough, the distribution of exceedances above u is
approximated by a GPD:
Fu ' Gξ,σu
The parameters of the GPD are uniquely determined by those of the GEV: the shape
parameter ξ is equal to that of the corresponding GEV and the scale parameter σu is
a function of the GEV location and shape parameters: σu = σ + ξ(u − µ).
The notation σu was used to distinguish the scale parameter from the corresponding
parameter of the GEV and to emphasize its dependence on u. From now on, the
distinction will be dropped out for notational convenience.
As with the GEV, the shape parameter of the GPD is dominating in determining the
behavior of the tail. If ξ < 0, the distribution of excesses has un upper bout at u − σξ
while on the contrary, if ξ > 0, the distribution has no upper limit.
18
4.2
PeaksOverThreshold Method
Note: The expected value of the GPD with parameters σ and ξ is:
E[Y ] =
σ
1−ξ
provided ξ < 1
(4.2)
and E[Y ] = ∞ when ξ ≥ 1.
4.2.2
Estimation of the Parameters
As for the GEV distribution, the method to estimate σ and ξ which combines theoretical efficiency and gives a general basis for inference is the maximum of likelihood
method (but here as well, several methods exist like Pickands estimator, probability
weighted moments method, moments method,. . . ).
Let y1 , . . . , ynu be a sequence of nu exceedances of a threshold u. For ξ 6= 0, the
loglikelihood is derived from (4.1) as:
nu
yi
1 X
log (1 + ξ )
l(σ, ξ) = −nu log σ − (1 + )
ξ i=1
σ
provided (1 + ξ yσi ) > 0 for i = 1, . . . , nu , otherwise l(σ, ξ) = ∞.
When ξ = 0, the loglikelihood can be derived in a similar way:
l(σ) = −nu log σ −
nu
1X
yi
σ i=1
However, one more time, the maximum likelihood estimator (MLE) is not always valid
and regularity conditions don’t always exist. The MLE is valid for ξ > −1, but the
asymptotically normal properties of the MLE is only valid for ξ > − 12 . When ξ < −1,
maximum likelihood estimators generally don’t exist.
4.2.3
Choice of the Threshold
The choice of the threshold is not straightforward. Indeed, a compromise between bias
and variance has to be found: a high threshold value reduces the bias but however
increases the variance for the estimators of the GPD and a low threshold value results
in the opposite, i.e. a high bias but a low variance of the estimators, since there is
more data with which to estimate the parameters.
A solution is to set a number of excess r and to deduct the threshold u from it. We
−1
look then for u such that P (X > u) = nr . In others words, we estimate u = F ( nr )
by Xn−r+1,n .
This solution, widely used in practice, moves the problem of choosing the threshold u
to the problem of choosing the number of excess r to consider.
The Mean Excess Function (MEF)
The mean excess function or the mean residual life plot is a graphical method for the
selection of the threshold. This method is based on the mean of the GPD. Suppose
19
5. Application
the GPD is valid as a model for the excesses of a threshold u0 generated by a series
X1 , . . . , X n .
By (4.2) we have
E(X − u0 X > u0 ) =
σu0
1−ξ
The threshold stability property of the GPD means that if the GPD is a valid model for
excesses over some threshold u0 , then it’s valid for excesses over all threshold u > u0 .
Hence, for all u > u0 and ξ < 1:
σu +ξ(u−u0 )
σu
= 0
1−ξ
1−ξ
σu0 − ξu0
ξ
u+
=
1−ξ
1−ξ
E(X − uX > u) =
Thus, for all u > u0 , the MEF is a linear function of u. Furthermore, E(X − uX > u)
is the mean of the excesses of the threshold u, and can be estimated by the sample
mean of the threshold excesses.
In practice, we plot the locus of points:
(
nu
1 X
u,
(x(i) − u) , u < xmax
nu i=1
!
)
where x(1) , . . . , x(nu ) consist of the nu observations that exceed u, and xmax is the
largest of the Xi .
It means that for a range of threshold u, we identify the corresponding mean threshold
excess, then plot this mean threshold excess against u, and look for the value u0 above
which we can see linearity in the plot.
Note:
• If the MEF is increasing, it corresponds to the case ξ > 0 and then to the Fréchet
domain of attraction.
• If the MEF is decreasing, it corresponds to the case ξ < 0 and then to the Weibull
domain of attraction.
• If the MEF is constant, it corresponds to the case ξ = 0 and then to the Gumbel
domain of attraction.
5
Application
Our dataset is bmwRet and it is available in the f Extremes package. These data
correspond to daily logreturns of BMW share prices.
In finance we generally don’t have i.i.d. observations but we will suppose it is the case
here (some reasons we won’t develop in this paper as we may assume stationarity of
the underlying times series so that many limit results (such as the strong law of large
20
5.1
Block Maxima Method
numbers) remain valid under general conditions, allow us to interpret these data in a
similar way to the i.i.d. case.
Furthermore, we won’t contrast here positive daily logreturns from the negative ones.
5.1
Block Maxima Method
As we have seen above, the block maxima method models the extreme distribution by
the GEV.
Daily logreturns of BMW share prices for the period January 2, 1973 – July 23, 1996
(n = 6146).
We create then maximas on blocks of length 20.
21
5.1
Block Maxima Method
Diagnostics graphical
We calculate then parameters of the GEV and for each parameter, the 95% confidence
interval. Results of the estimate by maximum likelihood are:
Then for the 99% V aR, we find 0.039 with a 95% confidence interval of [0.036, 0.042].
22
5.2
5.2
PeaksOverThreshold Method
PeaksOverThreshold Method
We have seen that the POT method models the distribution of excess above a threshold
by the GPD.
According to the graph, we can take the threshold = 0.035 because the locus of points
(x, e(x)) is approximately linear for x > u. In addition, with the f indT hreshold
function in the evir package, with a number of excess equal to 100, we also find a
threshold approximately equal to 0.035.
We then estimate the GPD by the maximum likelihood using the gpd function and we
find a 99% V aR equal to 0.042.
23
Conclusion
Conclusion
We have illustrated extreme value theory and how it can be used to model risk measures such as ValueatRisk. Our conclusion is that extreme value theory can be useful
for assessing the size of extreme events.
While we focused in this paper on the univariate case, when we look at extreme
events, it is often the simultaneous implementation of several events that matters.
So we have many variables that come into play. But the univariate case cannot be
generalized straightforwardly to the multivariate case. The problem arises when we
leave the univariate case leads us to answer to questions: How to order vectors of
random variables? What is a multivariate extreme event? Is it enough that only one
component is an extreme realization? Or all the components must be extreme? Or do
we need another condition?
To be continued. . .
24
Bibliography
Bibliography
[1] Stuart Coles. An Introduction to Statistical Modeling of Extreme Values. Springer,
2001.
[2] P.Embrechts, C.Klüppelberg, T. Mikosch. Modelling External Events for Insurance
and Finance, SpringerVerlag, 1997.
[3] Pierre Ribereau. Valeurs extremes: applications en R, 2011.
[4] Jan Beirlant, Yuri Goegebeur, Jozef Teugels. Statistics of Extremes. John Wiley
Sons, Ltd, 2004.
[5] JeanNoël Bacro, Pierre Ribereau. Théorie des valeurs extrêmes appliquée avec R,
2011
[6] Frédéric Planchet. Utilisation de la théorie des valeurs extrêmes dans le contexte
solvabilité 2, 2012 URL http://www.ressourcesactuarielles.net/EXT/ISFA/fpisfa.nsf/0/72EE1310B7EBC2A2C1256FD2002E9C76/$FILE/Seance301.pdf?OpenElement
[7] J.J Droesbeke, M. MaumyBertrand, G. Saporta, C. ThomasAgnan. Approches
statistiques du risque, Technip, 2014. 85112.
[8] Esther Bommier. PeaksOverThreshold Modelling of Environmental Data, 2014.
URL https://uu.divaportal.org/smash/get/diva2:760802/FULLTEXT01.pdf
[9] Abhinay Sawant. Extreme Value Theory with HighFrequency Financial Data, 2010.
[10] Marco Rocco. Extreme value theory for finance: a survey, 2011. URL
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1998740
[11] Younes Bensalah. Steps In Applying Extreme Value Theory to Finance: A Review,
2000.
25
Annex
Annex
# simulated example
library ( evd )
M = rgev (100 , loc =9 , scale =2.6 , shape =0.3)
plot (M , xlab =" i " , ylab =" maximum cost of car accidents ")
library ( ismev )
# Estimation of the parameters by maximum likelihood
simuGEV = gev . fit ( M )
simuGEV$mle
# and 95 confidence interval for theses parameters
simuGEV$mle 1.96* simuGEV$se
simuGEV$mle +1.96* simuGEV$se
# Calculation of the 99% VaR with 95% confidence interval
zp = simuGEV$mle [1] ( simuGEV$mle [2]/ simuGEV$mle [3])*
+(1 (  log (0.99))^(  simuGEV$mle [3]))
zp
ypm1 =  log (0.99)
gradzp <  c (1 ,  simuGEV$mle [3]^{ 1}*(1  ypm1 ^{  simuGEV$mle [3]}) ,
+ simuGEV$mle [2]* simuGEV$mle [3]^{ 2}*
+(1  ypm1 ^{  simuGEV$mle [3]})  simuGEV$mle [2]*
+ simuGEV$mle [3]^{ 1}* ypm1 ^{  simuGEV$mle [3]}* log ( ypm1 ))
varzp <  t ( gradzp )%*%simuGEV$cov%*%( gradzp )
c ( zp 1.96* sqrt ( varzp ) , zp +1.96* sqrt ( varzp ))
# Estimation of the parameters by PWM
library ( fExtremes )
resgevFit = gevFit (M , type =" pwm ")
resgevFit
# Var Stability
A = rgev (100 , loc =9 , scale =2.6 , shape =0.3)
B = rgev (100 , loc =9 , scale =2.6 , shape =0.3)
fgev ( A )$estimate
fgev ( B )$estimate
fgev (A , prob =0.01)$estimate
fgev (B , prob =0.01)$estimate
26
Annex
# Block Maxima Method
library (" fExtremes ")
library (" ismev ")
data ( bmwRet )
head ( bmwRet )
BMW = as . timeSeries ( bmwRet )
# Construction of maximas on blocks of length 20
B = blockMaxima ( BMW , block = 20)
maxB = B$max . BMW . RET
plot ( maxB , main =" BMW . RET maximums by block of length 20" ,
xlab =" t " , ylab =" BMW . RET ")
# Estimation of the GEV parameters by the PWM
BMWGEV = gevFit ( maxB , type =" pwm ")
BMWGEV
# Estimation of the GEV parameters by the MLE
BMWGEV2 = gev . fit ( maxB )
BMWGEV2
BMWGEV2$mle
# We calculate for every parameter the 95% confidence interval
BMWGEV2$mle 1.96* BMWGEV2$se
BMWGEV2$mle +1.96* BMWGEV2$se
# Calculation of the 99%VaR with 95% confidence interval
ypm1 < (  log (0.99^20))
zp <  BMWGEV2$mle [1] ( BMWGEV2$mle [2]/ BMWGEV2$mle [3])*(1  ypm1 ^(
 BMWGEV2$mle [3]))
zp
gradzp <  c (1 ,  BMWGEV2$mle [3]^{ 1}*(1  ypm1 ^{  BMWGEV2$mle [3]}) ,
+ BMWGEV2$mle [2]* BMWGEV2$mle [3]^{ 2}*(1  ypm1 ^{  BMWGEV2$mle [3]})
 BMWGEV2$mle [2]* BMWGEV2$mle [3]^{ 1}* ypm1 ^{  BMWGEV2$mle [3]}*
log ( ypm1 ))
varzp <  t ( gradzp )%*%BMWGEV2$cov%*%( gradzp )
c ( zp 1.96* sqrt ( varzp ) , zp +1.96* sqrt ( varzp ))
# Calcualtion of the VaR with the VaR function
VaR ( BMW ,0.99 , type =" sample ")
# Peaks  Over  Threshold Method
library (" fExtremes ")
data ( bmwRet )
BMW = as . timeSeries ( bmwRet )
# Mean Excess Distribution
mePlot ( BMW )
27
Annex
# Mean Residual Life Plot
mrlPlot ( BMW )
# We estimate here the threshold in a different way ,
by defining a number of excesses = 100
findThreshold ( BMW ,100)
# Estimation of the GPD by the maximum likelihood
library (" evir ")
T = gpd ( BMW ,0.035 , method =" ml ")
T
# Calculation of the 99% VaR
riskmeasures (T ,0.99)
28
Télécharger le fichier (PDF)
TERStatistics_of_Extremes.pdf (PDF, 1.5 Mo)