Importance Sampling in the Presence of PD-LGD Correlation

Adam Metzler; Alexandre Scott

doi:10.3390/risks8010025

Importance Sampling in the Presence of PD-LGD Correlation

Metzler, Adam;Scott, Alexandre 2020-03-10 00:00:00 risks Article Importance Sampling in the Presence of PD-LGD Correlation 1, 2 Adam Metzler * and Alexandre Scott Department of Mathematics, Wilfrid Laurier University, Waterloo, ON N2L 3C5, Canada Department of Applied Mathematics, University of Western Ontario, London, ON N6A 3K7, Canada; alexandre.scott202@gmail.com * Correspondence: ametzler@wlu.ca Received:20 January 2020; Accepted: 5 March 2020; Published: 10 March 2020 Abstract: This paper seeks to identify computationally efﬁcient importance sampling (IS) algorithms for estimating large deviation probabilities for the loss on a portfolio of loans. Related literature typically assumes that realised losses on defaulted loans can be predicted with certainty, i.e., that loss given default (LGD) is non-random. In practice, however, LGD is impossible to predict and tends to be positively correlated with the default rate and the latter phenomenon is typically referred to as PD-LGD correlation (here PD refers to probability of default, which is often used synonymously with default rate). There is a large literature on modelling stochastic LGD and PD-LGD correlation, but there is a dearth of literature on using importance sampling to estimate large deviation probabilities in those models. Numerical evidence indicates that the proposed algorithms are extremely effective at reducing the computational burden associated with obtaining accurate estimates of large deviation probabilities across a wide variety of PD-LGD correlation models that have been proposed in the literature. Keywords: importance sampling; acceptance-rejection sampling; portfolio credit risk; tail probabilities; large deviation probabilities; stochastic recovery; PD-LGD correlation; credit risk; loss probabilities 1. Introduction This paper seeks to identify computationally efﬁcient importance sampling (IS) algorithms for estimating large deviation probabilities for the loss on a portfolio of loans. Related literature assumes that realised losses on defaulted loans can be predicted with certainty, i.e., that loss given default (LGD) is non-random. In practice, however, LGD is impossible to predict and tends to be positively correlated with the default rate and the latter phenomenon is typically referred to as PD-LGD correlation (here PD refers to probability of default, which is often used synonymously with default rate). There is a large literature on modelling stochastic LGD and PD-LGD correlation, but there is a paucity of literature on using importance sampling to estimate large deviation probabilities in those models. This gap in the literature was brought to our attention by a risk management professional at a large Canadian ﬁnancial institution, and ﬁlling that gap is the ultimate goal of this paper. Problem Formulation and Related Literature Consider a portfolio of N exposures of equal size. Let L , L , . . . , L denote the losses on the 1 2 N individual loans, expressed as a percentage of notional value. The percentage loss on the entire portfolio is: L := L . (1) N å i i=1 Risks 2020, 8, 25; doi:10.3390/risks8010025 www.mdpi.com/journal/risks Risks 2020, 8, 25 2 of 36 We are interested in using IS to estimate large deviation probabilities of the form: p := P(L x) , (2) x N where x >> E[L ] = E[L ] is some large, user-deﬁned, threshold. i N In practice the number of exposures is large (e.g., in the thousands) and prudent risk management requires one to assume that the individual losses are correlated. In practice, then, L is the average of a large number of correlated variables. As such, its probability distribution is highly intractable and Monte Carlo is the method of choice for approximating p . As the probability of interest is typically 3 4 small (e.g., on the order of 10 or 10 ), the computational burden required to obtain an accurate estimate of p using Monte Carlo can be prohibitive. For instance if p is on the order of 10 and x x N is on the order of 1000 then, in the absence of any variance reduction techniques, the sample size required to reduce the estimator ’s relative error to 10% is on the order of one hundred thousand. Since each realisation of L requires simulation of one thousand individual losses, a sample size of 100, 000 requires one to generate one hundred million variables. If the desired degree of accuracy is reduced to 1%, the number of variables that must be generated increases to a staggering 10 billion. Importance sampling (IS) is a variance reduction technique that has the potential to signiﬁcantly reduce the computational burden associated with obtaining accurate estimates of large deviation probabilities. In the present context, effective IS algorithms have been identiﬁed for a variety of popular risk management models, but most are limited to the special case that loss given default (LGD) is non-random. The seminal paper in the area is (Glasserman and Li 2005), other papers include (Chan and Kroese 2010) and (Scott and Metzler 2015). It is well documented empirically, however, that portfolio-level LGD is not only stochastic, but positively correlated with the portfolio-level default rate as seen, for instance, in any of the studies listed in (Kupiec 2008) or (Frye and Jacobs 2012). This phenomenon is typically referred to as PD-LGD correlation. (Miu and Ozdemir 2006) show that ignoring PD-LGD correlation when it is in fact present can lead to material underestimates of portfolio risk measures. There is a large literature on modelling PD-LGD correlation (Frye 2000); (Pykhtin 2003); (Miu and Ozdemir 2006); (Kupiec 2008); (Sen 2008); (Witzany 2011); (de Wit 2016); (Eckert et al. 2016); and others listed in (Frye and Jacobs 2012), but there is a much smaller literature on using IS to estimate large deviation probabilities in such models. To the best of our knowledge only (Deng et al. 2012) and (Jeon et al. 2017) have developed algorithms that allow for PD-LGD correlation (the former paper considers a dynamic intensity-based framework, the latter considers a static model with asymmetric and heavy-tailed risk factors). The present paper contributes to this nascent literature by developing algorithms that can be applied in a wide variety of PD-LGD correlation models that have been proposed in the literature, and are popular in practice. The paper is structured as follows. Section 2 outlines important assumptions, notation, and terminology. Section 3 theoretically motivates the proposed algorithm in a general setting, and Section 4 discusses a few practical issues that arise when implementing the algorithm. Section 5 describes a general framework for PD-LGD correlation modelling that includes, as special cases, many of the models that have been developed in the literature and Section 6 describes how to implement the proposed algorithm in this general framework. Numerical results are presented and discussed in Section 7 and demonstrate that the proposed algorithms are extremely effective at reducing the computational burden required to obtain an accurate estimate of p . Relative error is the preferred measure of accuracy for large deviation probabilities. If pb is an estimator of p , its relative x x error is deﬁned as SD( pb )/ p , where SD denotes standard deviation. x x Risks 2020, 8, 25 3 of 36 2. Assumptions, Notation and Terminology We assume that individual losses are of the form L = L(Z, Y ), where L is some deterministic i i function, Z = (Z , . . . , Z ) is a d-dimensional vector of systematic risk factors that affect all exposures, and Y is a vector of idiosyncratic risk factors that only affect exposure i. We assume that Z, Y , Y , . . . i 1 2 are independent, and that the Y are identically distributed. The primary role of the systematic risk factors is to induce correlation among the individual exposures, and it is common to interpret the realised values of the systematic risk factors as determining the overall macroeconomic environment. It is worth noting that the we do not require the components of Z to be independent of one another, etc. for the components of Y . 2.1. Large Portfolios and the Region of Interest In a large portfolio, the inﬂuence of the idiosyncratic risk factors is negligible. Indeed, since individual losses are conditionally independent, given the realised values of the systematic risk factors, we have the almost sure limit: lim L = m(Z) , (3) N!¥ where m(z) := E[L jZ = z] = E[L jZ = z] . (4) i N Since m(Z) L for large N by Equation (3), the random variable m(Z) is often called the large portfolio approximation (LPA) to L . The LPA is often used to formalise the intuitive notion that, in a large portfolio, all risk is systematic (i.e., idiosyncratic is “diversiﬁed away”). We deﬁne the region of interest as the set: fz 2 R : m(z) xg . (5) The region of interest is “responsible” for large deviations in the sense that: lim P(m(Z) xjL x) = 1 (6) N!¥ for most values of x. Together, Equations (3) and (6) suggest that for large portfolios, it is relatively more important to identify an effective IS distribution for the systematic risk factors, as compared to the idiosyncratic risk factors. 2.2. Systematic Risk Factors We assume that Z is continuous and let f (z) denote its joint density. We assume that f is a member of an exponential family (see Bickel and Doksum 2001 for deﬁnitions and important properties) with d p natural sufﬁcient statistic S : R 7! R . Any other member of the family can be put in the form: f (z) := exp(l S(z) K(l)) f (z) , (7) where K() is the cumulant generating function (cgf) of S(Z) and l 2 R is such that K(l) is well-deﬁned. The parameter l is called the natural parameter of the family in Equation (7). Appendix B embeds the Gaussian and multivariate t families into this general framework. In light of the almost sure limit in Equation (3), we have that L converges to m(Z) in distribution, which implies that Equation (6) is valid for all values of x such that P(m(Z) = x) = 0. If m(Z) is a continuous random variable (which it is in most cases of practical interest) then Equation (6) is satisﬁed for every value of x. Risks 2020, 8, 25 4 of 36 We will eventually be using densities of the form in Equation (7) as IS densities for the systematic risk factors. The associated IS weight is: f (Z) = exp(l S(Z) + K(l)) , (8) f (Z) and it will be important to know when the variance of the IS weight is ﬁnite. The following observation is readily veriﬁed. Remark 1. If Z f , then Equation (8) has ﬁnite variance if and only if both K(l) and K(l) are well deﬁned. A standard result in the theory of exponential families is that: rK(l) = E [S(Z)] , (9) where r denotes gradient and E denotes expectation with respect to the density f . l l 2.3. Individual Losses We assume that L takes values in the unit interval. In general L will have a point mass at zero i i (if it did not, the loan would not be prudent) and the conditional distribution of L , given that L > 0, i i is called the (account-level) LGD distribution. We allow the LGD distribution to be arbitrary in the sense that it could be either discrete or continuous, or a mixture of both. This contrasts with the case of non-random LGD, where the LGD distribution is degenerate at a single point. We let ` 2 (0, 1] max denote the supremum of the support of L . Individual losses will therefore never exceed ` but could max take on values arbitrarily close (and possibly equal) to ` . max Remark 2. Despite the fact that L is not a continuous variable, in what follows we will proceed as if it was and make repeated reference to its “density." This is done without loss of generality, and in an interest of simplifying the presentation and discussion. Nothing in the sequel requires L to be a continuous variable, and everything carries over to the case where it is either discrete or continuous, or has both a discrete and continuous component. For z 2 R we let g(`jz) denote the conditional density of L , given that Z = z. We assume that the support of g(jz) is identical to the unconditional support, in particular it does not depend on the value of z. Note that m(z) is the mean of g(jz). In practice (i.e., for all of the PD-LGD correlation models listed in the introduction) g(jz) is not a member of an established parametric family, and direct simulation from g(jz) using a standard technique such as inverse transform or rejection sampling is not straightforward. Simulation from g(jz) is most easily accomplished by simulating the idiosyncratic risk factors, Y , from their density, say h(y), and then setting L = L(z, Y ). In other words, in order to simulate from g(jz) we make use i i of the fact that L = L(z, Y ) is a drawing from g(jz) whenever Y is a drawing from h(). i i i For q 2 R and z 2 R we let: k(q, z) := log(E[exp(q L )jZ = z]) and ¶k k (q, z) := (q, z) . ¶q Then k(, z) is the conditional cgf of L , given that Z = z, and k (, z) is its ﬁrst derivative. In practice, neither k(, z) nor k (, z) is available in closed form. In the examples we consider later in the paper each can be expressed as a one-dimensional integral, but the numerical values of those integrals must Risks 2020, 8, 25 5 of 36 be approximated using quadrature. This contrasts with the case of non-random LGD, where the conditional cgf can be computed in closed form . d 0 For x 2 (0, ` ) and z 2 R we let q(x, z) denote the unique solution to the equation k (q, z) = max ˆ ˆ ˆ max(x, m(z)). We often suppress dependence on x and z, and simply write q instead of q(x, z). That q is well-deﬁned follows immediately from the developments in Appendix A.1. Based on the discussion there we ﬁnd that q is zero whenever z lies in the region of interest, and is strictly positive otherwise. Remark 3. In practice, the value of q cannot be computed in closed form and must be approximated using a numerical root-ﬁnding algorithm. Since each evaluation of the function k (, z) requires quadrature, computing ˆ ˆ q is straightforward but relatively time consuming. This contrasts with the case of non-random LGD, where q can be computed in closed form at essentially no cost. For z 2 R we let q(, z) denote the Legendre transform of k(, z) over [0, ¥). That is, ˆ ˆ q(x, z) := max(q x k(q, z)) = q x k(q, z) . (10) q0 That q is the uniquely deﬁned point at which the function q 7! q x k(q, z) attains its maximum on [0, ¥) follows from the developments in Appendix A.2. Based on the discussion there, we ﬁnd that both q and q are equal to zero whenever z lies in the region of interest, and that both are strictly positive otherwise. 2.4. Conditional Tail Probabilities Given the realised values of the systematic risk factors, individual losses are independent. Large deviations theory can therefore provide useful insights into the large-N behaviour of the tail probability P(L xjZ = z). For instance, Chernoff’s bound yields the estimate: P(L > xjZ = z) exp(Nq(x, z)) , (11) and Cramér ’s (large deviation) theorem yields the limit: log(P(L > xjZ = z)) lim = q(x, z) . (12) N!¥ N Together these results are often used to justify the approximation: P(L > xjZ = z) exp(Nq(x, z)) , (13) which will be used repeatedly throughout the paper. The approximation in Equation (13) is often called the large deviation approximation (LDA) to the tail probability P(L > xjZ = z). Note that since q(x, z) = 0 whenever m(z) x, the LDA suggests that P(L > xjZ = z) 1 whenever z lies in the region of interest. 2.5. Conditional Densities N d Let L = (L , . . . , L ), noting that L takes values in [0, ` ] . For z 2 R and ` = (` , . . . , ` ) 2 max 1 N 1 N [0, ` ] , we let h (z, `) denote the conditional density of (Z, L), given that L > x. Then h is max x N x given by: f (z) g(` jz) i=1 h (z, `) = 1 , (14) f`2 A g N,x 3 (1R)q In the case of non-random LGD we have k(q, z) = log(1 + (e 1) P(L > 0jZ = z)), where R is the known recovery rate on the exposure. Risks 2020, 8, 25 6 of 36 N 1 where A is the set of points ` 2 [0, ` ] for which N ` > x. N,x max i i=1 We let f (z) denote the conditional density of the systematic risk factors, given that L > x, noting that: P(L > xjZ = z) f (z) = f (z) . (15) P(L x) In the examples we consider the mean of f tends to lie inside, but close to the boundary of, the region of interest. And relative to the unconditional density f , the conditional density f tends to be much more concentrated about its mean. Finally, we let g (`jz) denote the conditional density of an individual loss, given that Z = z and L > x, noting that: x` P(L > x + jZ = z) N1 N1 g (`jz) = g(`jz) . (16) P(L > xjZ = z) If the realised value of z lies inside the region of interest, the conditional density g (jz) tends to resemble the unconditional density g(jz). Intuitively, for such values of z the LDA informs that the event fL > xg is very likely, and conditioning on its occurrence is not overly informative. If the realised value of z does not lie in the region of interest then g (jz) tends to resemble the exponentially tilted version of g(jz) whose mean is exactly x. See Appendix A.3 for more details. Neither h , f , nor g are numerically tractable, but as we will soon see they do serve as useful x x x benchmarks against which to compare candidate IS densities. In addition, it is worth noting here that the representations of Equations (15) and (16) lend themselves to numerical approximation via the LDA in Equation (13). 3. Proposed Algorithm In practice, the most common approach to estimating p via Monte Carlo simulation in this framework is summarised in Algorithm 1 below. Algorithm 1 Standard Monte Carlo Algorithm for Estimating p 1: Simulate M i.i.d. copies of the systematic risk factors. Think of these as different economic scenarios and denote the simulated values by z , . . . , z . 1 M 2: For each scenario m: (a) Simulate the idiosyncratic risk factors for each exposure. Denote the simulated values y , . . . , y . 1,m N,m 1 N (b) Set ` = L(z , y ) for each exposures i, and ` = ` . i,m m i,m m i,m i=1 1 M 3: Return pb = 1 . å ¯ M m=1 f` >xg Algorithm 1 consists of two stages. In the ﬁrst stage one simulates the systematic risk factors, and in the second stage one simulates the idiosyncratic risk factors for each exposure. Mathematically, the ﬁrst stage induces independence among the individual exposures, so that the second stage amounts to simulating a large number of i.i.d. variables. Intuitively, it is useful to think of the ﬁrst stage as determining the prevailing macroeconomic environment, which ﬁxes economy-wide quantities such as default and loss-given-default rates. The second stage of the algorithm overlays idiosyncratic noise on top of economy-wide rates, to arrive at the default and loss-given-default rates for a particular portfolio. Relative error is the preferred measure of accuracy for estimators of rare event probabilities. The relative error of the estimator pb in Algorithm 1 is: 1 1 p p , M x Risks 2020, 8, 25 7 of 36 and the sample size required to ensure the relative error does not exceed some predetermined threshold e is: 1 1 p M(e) = . (17) e p The number of variables that must be generated in order to achieve the desired degree of accuracy e is therefore (N + d) M(e), which grows without bound as p ! 0. For instance if p = 10 , x x 3 2 N = 10 , d = 2, and e = 5 10 then the number of variables that must be generated is approximately four hundred million, which is an enormous computational burden for a modest degree of accuracy. In the next section we discuss general principles for selecting an IS algorithm that can reduce the computational burden required to obtain an accurate estimate of p . 3.1. General Principles For practical reasons, we insist that our IS procedure retains conditional independence of individual losses, given the realised value of the systematic risk factors. This is important because it allows us to reduce the problem of simulating a large number of dependent variables to the (much) more computationally efﬁcient problem of simulating a large number of independent variables. In the ﬁrst stage we simulate the systematic risk factors from the IS density f (z). The IS weight I S associated with this ﬁrst stage is therefore: f (z) L (z) := . f (z) I S In the second stage we simulate the individual losses as i.i.d. drawings from the density g (`jz). The I S IS weight associated with this second stage is: g(` jz) L (z, `) = , 2 Õ g (` jz) I S i i=1 and the IS density from which we sample (Z, L) is therefore of the form: h (z, `) = f (z) g (` jz) . (18) I S I S Õ I S i i=1 The so-described algorithm, with as-yet unspeciﬁed IS densities, is summarised in Algorithm 2. Algorithm 2 IS Algorithm for Estimating p 1: Simulate M i.i.d. copies of the systematic risk factors from the density f (z). Think of these as I S different economic scenarios and denote the simulated values by z , . . . , z . 1 M 2: For each scenario m: (a) Independently simulate ` , ` , . . . ,` from the density g (jz ). 2,m N,m m 1,m I S 1 N (b) Set ` = ` . m å i,m N i=1 3: Return pb = L (z ) L (z , ` ) 1 , where ` = (` , . . . ,` ). å ¯ x 1 m 2 m m m 1,m N,m m=1 f` >xg It is important to note that in the second stage, we will not be simulating individual losses directly from the (conditional) IS density g . Rather, we will simulate the idiosyncratic risk factors Y in such I S i a way as to ensure that for a given value of z, the variable L = L(z, Y ) has the desired density g . i i I S Risks 2020, 8, 25 8 of 36 Focusing on the “indirect" IS density of L , as opposed to “direct" IS density of Y , allows us to identify i i a much more effective second stage algorithm . The estimator pb produced by Algorithm 2 is demonstrably unbiased and its variance is: 2 2 2 E [(L(Z, L) 1 p ) ] = p E [(L (Z, L) 1 1) ] , (19) ¯ ¯ I S x I S x fL >xg x fL >xg N N where E denotes expectation under the IS distribution, L(z, `) := L (z) L (z, `) and I S 1 2 L(z, `) L (z, `) := . Note that L is the ratio of (i) the IS density in Equation (18) to (ii) the conditional density in Equation (14). The estimator ’s squared relative error can then be decomposed as: E [(L (Z, L) 1) 1 ] + [1 P (L > x)] , (20) x ¯ I S I S N fL >xg where P denotes probability under the IS distribution. I S Inspecting Equation (20) we see that an effective IS density should (i) assign a high probability to the event of interest and (ii) should resemble the conditional density in Equation (14) as closely as possible, in the sense that the ratio L should deviate as little as possible from unity. Clearly, an estimator that satisﬁes (ii) should also satisfy (i), since h assigns probability one to the event that L > x. The task now is to identify a density of the form in Equation (18) that resembles the ideal density in Equation (14), in some sense. 3.2. Identifying the Ideal IS Densities Our measure of similarity is Kullback–Leibler divergence (KLD), or divergence for short. See Chatterjee and Diaconis (2018) for a general discussion of the merits of minimum divergence as a criteria for identifying effective IS distributions. We begin by writing: h (z, `) f (z) g ˜ (`jz) x x x = , (21) h (z, `) f (z) g ˜ (`jz) I S I S I S where for ﬁxed z, g(` jz) i=1 g ˜ (`jz) = 1 f`2 A g N,x P(L > xjZ = z) is the joint density of N independent variables having marginal density g(jz), conditioned on their average value exceeding the threshold x, and g ˜ (`jz) = g (` jz) . I S Õ I S i i=1 is the joint density of N independent variables having marginal density g (jz). I S Using Equation (21) it is straightforward to decompose the divergence of h from h as: I S x D(h jjh ) = D( f jj f ) + E [ D(g ˜ (jZ)jjg ˜ (jZ))j L > x] , (22) x x x I S I S I S N where D(xjjh) denotes the divergence of the density x from the density h. The ﬁrst term in Equation (22) is the divergence of f from f , and is therefore minimised by setting f = f . In other words, the best I S x I S x In the earliest stages of this project we focused directly on an IS density for Y and had difﬁculties identifying effective candidates. Risks 2020, 8, 25 9 of 36 possible IS density for the systematic risk factors (according to the criteria of minimum divergence) is the conditional density f . The second term in Equation (22) is the average divergence of g ˜ (jz) I S from g ˜ (jz), averaged over all possible realisations of the systematic risk factors and conditioned on portfolio loss exceeding the threshold. Based on the developments in Appendix A.5, for ﬁxed z 2 R the divergence of g ˜ (jz) from g ˜ (jz) is minimised by setting g (jz) = g (jz). The average I S x I S x divergence in Equation (22) is, therefore, also minimised by setting g (jz) = g (jz) for every z 2 R . I S Remark 4. Among all densities of the form in Equation (18), the one that most resembles the ideal density h (in the sense of minimum divergence) is the density: d N h (z, `) := f (z) g (` jz) , z 2 R , ` 2 [0, ` ] . x x x max Õ i i=1 In other words, h is the best possible IS density (among the class Equation (18) and according to the criteria of minimum divergence) from which to simulate (Z, L). It is worth noting that the IS density h “gets marginal behaviour correct”, in the sense that the marginal distribution of the systematic risk factors, as well as the marginal distribution of an individual loss, is the same under h as it is under the ideal density h . The dependence structure of individual x x losses is different under h and h —this is the price that we must pay for insisting on conditional x x independence (i.e., computational efﬁciency). 3.3. Approximating the Ideal IS Densities Simulating directly from h requires an ability to simulate directly from f and g . Unfortunately, x x x neither f nor g is numerically tractable (witness the unknown quantities in Equations (15) and (16)), x x and it does not appear that either is amenable to direct simulation. Our next task is to identify tractable densities that resemble f and g . x x 3.3.1. Systematic Risk Factors As a tractable approximation to f , we suggest using that member of the parametric family in Equation (7) that most resembles f in the sense of minimum divergence. Using Equations (7) and (15) we get that: f (z) log = l S(z) + K(l) + log (P(L > xjZ = z)) log( p ) , f (z) whence the divergence of f from f is: l x ¯ ¯ ¯ D( f jj f ) = l E[S(Z)jL > x] + K(l) + E[log (P(L > xjZ = z))jL > x] log( p ) . (23) x x l N N N As a cgf, K() is strictly convex. As such, Equation (23) attains its unique minimum at that value of l such that: rK(l) = E[S(Z)jL > x] , (24) which, in light of Equation (9), is equivalent to: E [S(Z)] = E[S(Z)jL > x] . (25) Intuitively, we suggest using that value of the IS parameter l for which the mean of S(Z) under the IS density matches the conditional mean of S(Z), given that portfolio losses exceed the threshold. In what follows we let l denote that suggested value of the IS parameter l, i.e., that value of l that solves Equation (24). Risks 2020, 8, 25 10 of 36 Remark 5. The ﬁrst-stage IS weight associated with the so-described density is: ˆ ˆ L (Z) = exp(l S(Z) + K(l )) . (26) 1 x It is entirely possible—and quite common in the examples we consider in this paper—that K(l ) is not well-deﬁned, in which case Equation (26) has inﬁnite variance under f (recall Remark 1). At ﬁrst glance it might seem absurd to consider IS densities whose associated weights have inﬁnite variance, but as we discuss in Section 4.2 it is straightforward to circumvent this issue by trimming large ﬁrst-stage IS weights . It remains to develop a tractable approximation to the right hand side of Equation (24), so that we can approximate the value of l . To this end we write the natural sufﬁcient statistic as S(z) = (S (z), . . . , S (z)) and note that: E[S (Z) 1 ] ¯ E[S (Z) P(L > xjZ)] i fL >xg i N E[S (Z)jL > x] = = . i N ¯ ¯ P(L > x) E[P(L > xjZ)] N N Next, we use the LDA in Equation (13) to get: E[S (Z) exp(Nq(x, Z))] E[S (Z)jL > x] . (27) i N E[exp(Nq(x, Z))] As it only involves the systematic risk factors (and not the large number of idiosyncratic risk factors), the expectation on the right hand side of Equation (27) is amenable to either quadrature or Monte Carlo simulation. 3.3.2. Individual Losses We encourage the reader unfamiliar with exponential tilts to consult Appendix A.3, before reading the remainder of this section. Our approximation to g (`jz) is obtained by using the LDA of Equation (13) to approximate both conditional probabilities appearing in Equation (16) (see Appendix A.4 for details). The resulting approximation is: ˆ ˆ g ˆ (`jz) := exp(q` k(q, z)) g(`jz) , (28) where we recall that q is deﬁned and discussed in Section 2.3. If the realised values of the systematic risk factors obtained in the ﬁrst stage lie in the region of interest then q = 0 and g ˆ is identical to g. Otherwise, q is strictly positive and g ˆ is the exponentially tilted version of g whose mean is x. Intuitively, we can interpret g ˆ as that density that most resembles (in the sense of minimum divergence) g , among all densities whose mean is at least x, and the numerical value of q as the degree to which the density g(jz) must be deformed, in order to produce a density whose mean is at least x. Remark 6. The mean of Equation (28) is max(m(z), x). The implication is that the event of interest is not a rare event under the proposed IS algorithm. Indeed, E [L ] = E [E [L jZ]] = E [E [L jZ]] = E [max(x, m(Z))] x , I S i I S I S i f g ˆ i f ˆ ˆ l l which implies that lim P (L > x) = 1. N!¥ I S N An alternative to trimming is truncation of large weights; see Ionides (2008) for a general and rigorous treatment of truncated IS. Risks 2020, 8, 25 11 of 36 The second-stage IS weight associated with Equation (28) is: ˆ ˆ ˆ ¯ ˆ L (Z, L) = exp(q L + k(q, z)) = exp(N[q L k(q, Z)]) . 2 i N i=1 ¯ ¯ Since the second stage weight depends only on Z and L , we will often write L (Z, L ) instead of N 2 N L (Z, L). In order to assess the stability of the second-stage IS weight, we note that: ˆ ¯ ˆ ˆ ¯ exp(N[q L k(q, Z)]) = exp(q N[L x]) exp(Nq(x, Z)) . N N ¯ ¯ If Z lies in the region of interest then q = q = 0, whence L (Z, L ) = 1 whatever the value of L . 2 N N ˆ ¯ ¯ Otherwise, both q and q are strictly positive, which implies that L (Z, L ) < 1 whenever L > x. The 2 N N net result of this discussion is that: ¯ ¯ L (Z, L ) 1 whenever L > x . (29) 2 N N The implication is that large, unstable, IS weights in the second stage will never be a problem. If the realised value of z does lie in the region of interest then g ˆ and g are identical, and simulation from g is straightforward. Our ﬁnal task is to determine how to sample from Equation (28) in the case where z does not lie in the region of interest. One approach would be to identify a family of densities fh (y) : z 2 R g such that L = L(z, Y ) is a draw from g ˆ (jz) whenever Y is a draw from h (), but z i i x i z this approach appears to be overly complicated. A simpler approach is to sample from Equation (28) using rejection sampling with g as the proposal density. To this end, we note that for ﬁxed z, the ratio ˆ ˆ of g ˆ to g is exp(q` k(q, z)), which is bounded and strictly increasing on [0, ` ]. The best possible x max (i.e., smallest) rejection constant is therefore: ˆ ˆ c ˆ = c ˆ(x, z) := exp(q` k(q, z)) , (30) max and the algorithm for sampling from g ˆ would proceed as follows. First, sample Y from its actual density and set L = L(z, Y ). Then generate a random number U, uniformly distributed on [0, 1] and i i independent of Y . If, g ˆ (L jz) ˆ ˆ U = exp(q(` L )) , max i c ˆ g(L jz) set L = L and proceed to the next exposure. Otherwise return to the ﬁrst step and sample another i i pair (Y , U). 3.4. Summary and Intuition The proposed algorithm is summarised in Algorithm 3 below. The initial step is to approximate the value of the ﬁrst-stage IS parameter, l . In our numerical examples we use a small pilot simulation (10% of the sample size that we eventually use to estimate p ) and the approximation of Equation (27) in order to estimate l . Having computed l , the ﬁrst stage of the algorithm proceeds by simulating independent realisations of the systematic risk factors from the density f , and computing the associated ﬁrst-stage weights of Equation (26). Recall that we can interpret these realisations as corresponding to different economic scenarios. Intuitively, sampling from f instead of f increases the proportion of adverse scenarios that are generated in the ﬁrst stage. In the examples we consider, f concentrates most of its mass near the boundary of the region of interest, and the effect is to concentrate the distribution of m(Z) near x. In the second stage, one ﬁrst checks whether or not the realised values of the systematic risk factors lie inside the region of interest. If they do then the event of interest is no longer rare and there is no need to apply further IS in the second stage. Otherwise, if we “miss” the region of interest in the Risks 2020, 8, 25 12 of 36 ﬁrst stage, we “correct” this mistake by applying an exponential tilt to the conditional distribution of individual losses. Speciﬁcally, we transfer mass from the left tail of g to the right tail, in order to produce a density whose mean is exactly x. Algorithm 3 Proposed IS Algorithm for Estimating p 1: Compute l using a small pilot simulation. 2: Simulate M i.i.d. copies of the systematic risk factors from f (z) and compute the corresponding ﬁrst-stage IS weights. Denote the realised values of the factors by z , . . . , z and the associated IS 1 M weights by L (z ), . . . , L (z ). 1 1 1 M 3: For each scenario m, determine whether or not z lies in the region of interest (i.e., whether or not m(z ) x). If it does lie in the region, proceed as follows: (a) Simulate the idiosyncratic risk factors for each exposure. Denote the simulated values by y , . . . , y . 1,m N,m 1 N ¯ ¯ (b) Set ` = L(z , y ), ` = å ` and L (z , ` ) = 1. m m 2 m m i,m i,m i,m N i=1 Otherwise, proceed as follows: ˆ ˆ ˆ ˆ ˆ ˆ (a) Compute q = q(x, z ), k = k(q, z ) and c ˆ = exp(q` k). For each exposure i: m m max (i) Simulate the exposure’s idiosyncratic risk factor (denote the realised value by y ˆ ) and i,m set ` = L(z , y ). i,m i,m (ii) Simulate a random number drawn uniformly from the unit interval (denote the realised ˆ ˆ ˆ value by u) and determine whether or not u exp(q(` ` )). If it is, set ` = ` max i,m i,m i,m and proceed to the next exposure. Otherwise, return to step (i). 1 N ¯ ¯ ˆ ¯ ˆ (b) Set ` = å ` and L (z , ` ) = exp(N[q` k]) m 2 m m m i,m N i=1 1 M 4: Return pb = L (z ) L (z , ` ) 1 . å ¯ x 1 m 2 m m m=1 f` >xg M m 4. Practical Considerations In this section we discuss some of the practical issues that arise when implementing the proposed methodology. 4.1. One- and Two-Stage Estimators The rejection sampling procedure employed in the second stage of the proposed algorithm involves repeated evaluation of q, which requires a non-trivial amount of computational time time. In addition, rejection sampling in general requires relatively complicated code. As such, it is worth considering a simpler algorithm that only applies importance sampling in the ﬁrst stage, and is therefore easier to implement and faster to run. In what follows we will distinguish between one- and two-stage IS algorithms. A one-stage algorithm only applies IS in the ﬁrst stage and samples (Z, L) from the IS density: h (z, `) := f (z) g(` jz) . (31) 1S ˆ i i=1 The associated IS weight is L (z) and the one-stage algorithm is summarised in Algorithm 4 below. Note the simplicity of Algorithm 4, relative to Algorithm 3. The two-stage algorithm applies IS in both the ﬁrst stage and the second stage, sampling (Z, L) from the IS density: h (z, `) := f (z) g (` jz) . (32) 2S ˆ x Õ i i=1 Risks 2020, 8, 25 13 of 36 The associated IS weight is L (z) L (z, ` ), and the two-stage algorithm was summarised previously 1 2 N in Algorithm 3. Algorithm 4 Proposed One-Stage IS Algorithm for Estimating p 1: Compute l using a small pilot simulation. 2: Simulate M i.i.d. copies of the systematic risk factors from f (z) and compute the corresponding ﬁrst-stage IS weights. Denote the realised values of the factors by z , . . . , z and the associated IS 1 M weights by L (z ), . . . , L (z ). 1 1 1 M 3: For each scenario m: (a) Simulate the idiosyncratic risk factors for each exposure. Denote the simulated values by y , . . . , y . 1,m N,m 1 N (b) Set ` = L(z , y ) and ` = ` . i,m m i,m m i,m i=1 1 M 4: Return pb = L (z ) 1 . x å m ¯ M m=1 f` >xg Although it is simpler to implement and faster to run, the one-stage algorithm is less accurate than the two-stage algorithm. More precisely, the two-stage estimator never has larger variance than the one-stage estimator. To see this, ﬁrst let E denote expectation under the one-stage IS density 1S h (z, `) given in Equation (31). Then the variance of the one-stage estimator is: 1S 2 2 E [(L (Z) 1 ) ] p 1S 1 x fL xg where M denotes sample size. And if we let E denote expectation under the two-stage IS density 2S h (z, `) given in Equation (32) then the variance of the two-stage estimator is: 2S 2 2 E [(L (Z) L (Z, L ) 1 ) ] p 2S 1 2 N fL xg x In order to compare variances it sufﬁces to compare the second moments appearing above under the actual density h(z, `), and we let E denote expectation with respect to this density. To this end we note that: E [(L (Z) 1 ) ] = E[L (Z) 1 ] ¯ ¯ 1S 1 1 fL xg fL xg N N and ¯ ¯ E [(L (Z) L (Z, L ) 1 ) ] = E[L (Z) L (Z, L )] 1 ] 2S 1 2 N ¯ 1 2 N ¯ fL xg fL xg N N In light of Equation (29) we get that: L (Z, L ) 1 1 1 = 1 , (33) 2 N ¯ ¯ ¯ fL >xg fL >xg fL >xg N N N whence ¯ ¯ E [(L (Z) L (Z, L ) 1 ) ] = E[L (Z) L (Z, L ) 1 ] ¯ ¯ 2S 1 2 N 1 2 N fL xg fL xg N N E[L (Z) 1 ] 1 fL xg = E [(L (Z) 1 ) ] . 1S 1 ¯ fL xg The two-stage estimator will therefore never have larger variance than the the one-stage estimator. 4.2. Large First-Stage Weights In the examples that we consider in this paper, the systematic risk factors are Gaussian. When selecting their IS density, one could either (i) shift their means and leave their variances (and correlations) unchanged or (ii) shift their means and adjust their variances (and correlations). In general Risks 2020, 8, 25 14 of 36 the latter approach will lead to a much better approximation to the ideal density f , but could lead to an IS weight that has inﬁnite variance. By contrast, the former approach will always lead to an IS weight with ﬁnite variance, but could lead to a poor approximation of the ideal density. At ﬁrst glance it might seem absurd to consider IS densities whose weights are so unstable as to have inﬁnite variance, but we have found that adjusting the variances of the systematic risk factors can lead to more effective estimators, in terms of both statistical accuracy and run time (see Section 6.1 for more details), provided one stabilises the resulting IS weights in some way. In the remainder of this section we describe a simple stabilisation technique that leads to a computable upper bound on the associated bias (an alternative would be to stabilize unruly IS weights via truncation, as discussed in Ionides (2008)). Returning now to the general case, suppose that the ﬁrst-stage IS parameter, l , is such that the ﬁrst-stage IS weight, L (Z), has inﬁnite variance. We trim large ﬁrst-stage weights by ﬁxing a set A R such that L () is bounded over A, and discarding those simulations for which Z 2 / A. Speciﬁcally, the last line of Algorithm 3 would be altered to return the trimmed estimate: p = L (z ) L (z, ` ) 1 1 , x 1 m 2 m ¯ å fz 2 Ag f` >xg m m=1 etc. for Algorithm 4. The variance of the so-trimmed estimator is necessarily ﬁnite (recall that ¯ ¯ L (z, `) 1 if ` > x), and its bias is: ¯ ¯ E [L (Z) L (Z, L ) 1 1 ] = E[1 1 ] = E[P(L > xjZ) 1 ] , ¯ ¯ 2S 1 2 N N fL >xg fZ2 / Ag fL >xg fZ2 / Ag fZ2 / Ag N N where we have used the tower property (conditioning on Z) to obtain the last equality. Using Chernoff’s bound in Equation (11) we get that: E[P(L > xjZ) 1 ] E[exp(Nq(x, Z)) 1 ] . (34) fZ2 / Ag fZ2 / Ag As it only depends on the small number of systematic risk factors, and not the large number of idiosyncratic risk factors, the right-hand side of Equation (34) is a tractable upper bound on the bias committed by trimming large (ﬁrst-stage) IS weights. This upper bound can be used to assess whether or not the bias associated with a given set A is acceptable. 4.3. Large Rejection Constants The smaller the c ˆ, the more efﬁcient is the rejection sampling algorithm employed in the second stage. Indeed the average number of proposals that must be generated in order to obtain one realisation from g ˆ is 1/c ˆ. In the examples we consider in this paper, c ˆ is (essentially) a decreasing function m(z), such that c ˆ ! 1 as m(z) ! x and c ˆ ! ¥ as m(z) ! 0 (see Figure 1). The second-stage rejection algorithm is therefore quite efﬁcient when m(z) x and quite inefﬁcient when m(z) 0. Now, the IS density for the ﬁrst-stage risk factors is such that the distribution of m(Z) concentrates most of its mass near x (where c ˆ is a reasonable size), but it is still theoretically possible to obtain a realisation of the systematic risk factors for which m(z) is very small and c ˆ is unacceptably large (e.g., 10 ). In such situations the algorithm effectively grinds to a halt, as one endlessly generates proposed losses that have no realistic chance of being accepted. It is extremely unlikely that one does obtain such a scenario under the ﬁrst-stage IS distribution, but it is still important to protect oneself against this unlikely event. To this end we suggest ﬁxing some maximum acceptable rejection constant c , and only max applying the second stage IS to those ﬁrst-stage realizations for which m(z) < x and c ˆ c . In other max words, even if the realised values of the systematic risk factors lie outside the region of interest, we avoid applying the second stage if the associated rejection constant exceeds the predeﬁned threshold. Risks 2020, 8, 25 15 of 36 4.4. Computing q ˆ ˆ Repeated evaluations of q(x,) are necessary when computing l at the outset of the algorithm, as well as during the second stage of the two-stage algorithm. Recall that in order to compute q(x, z) “exactly” one must numerically solve the equation k (q, z) = x, which requires a non-trivial amount of CPU time. As each evaluation of q is relatively costly, repeated evaluation would, in the absence of any further approximation (over and above that inherent in numerical root-ﬁnding), account for the vast majority of the algorithm’s total run time. In order to reduce the amount of time spent evaluating q we ﬁt a low degree polynomial to the function q(x,) that can be evaluated extremely quickly, considerably reducing total run time. Speciﬁcally, suppose that we must compute q(x, z ) for each of n points z , . . . , z (either the sample n 1 n points from the pilot simulation, or the ﬁrst-stage realisations that did not land in the region of interest). We identify a small set C R that contains each of the n points, construct a mesh of m << n points in C, evaluate q exactly at each mesh point, and then ﬁt a ﬁfth degree polynomial to the ¯ ¯ ¯ resulting data. Letting q(x,) denote the resulting polynomial, we then evaluate q(x, z ), . . . , q(x, z ) 1 n ˆ ˆ instead of q(x, z ), . . . , q(x, z ). If m is substantially smaller than n, then the reduction in CPU time 1 n is considerable. 5. PD-LGD Correlation Framework All of the PD-LGD correlation models listed in the introduction are special cases of the following general framework—an observation that, to the best of our knowledge, has not been made in the literature. The systematic risk factors take the form Z = (Z , Z ), where Z and Z are bivariate D L D L normal with standard normal margins and correlation r . Idiosyncratic risk factors take the form Y = S i (Y , Y ), where Y and Y are bivariate normal with standard normal margins and correlation r . i,D i,L i,D i,L I Associated with each exposure is a default driver X and a loss driver X , deﬁned as follows: i,D i,L X = a Z + 1 a Y , (35) D D i,D i,D X = a Z + 1 a Y . (36) i,L L L i,L The factor loadings a and a are constants taking values in the unit interval, and dictate the relative D L importance of systematic risk versus idiosyncratic risk. The correlation between default drivers of 2 2 distinct exposures is r := a and the correlation between loss drivers of distinct exposures is r := a . D L D L The correlation between the default and potential loss drivers of a particular exposure is: q q 2 2 r := a a r + 1 a 1 a r , D L D L S I D L which can be positive or negative (or zero). Note that if r and r have the same sign then, since both S I factor loadings are positive, r inherits this common sign. D L The realised loss on exposure i is L = D L , where: i i i D = 1 1 fX F (P)g i,D is the default indicator associated with exposure i and L = h(X ) i i,L is called the potential loss (our terminology) associated with exposure i. Here P denotes the common default probability of all exposures and h is some function from R to [0, ` ]. It is useful (but not max necessary) to think of potential loss as L = max(0, 1C ), where C is the value of the collateral i i i pledged to exposure i expressed as a fraction of the loan’s notional value. Risks 2020, 8, 25 16 of 36 Models in this framework are characterised by (i) the correlation structure of the risk factors, speciﬁcally restrictions on the values of r and r , and (ii) the marginal distribution of potential loss. I S For instance: Frye (2000) assumes perfect systematic correlation (r = 1) and zero idiosyncratic correlation (r = 0); Pykhtin (2003) assumes perfect systematic correlation (r = 1) but allows for arbitrary idiosyncratic correlation (r unrestricted); Witzany (2011) allows for arbitrary systematic correlation (r unrestricted) but insists on zero idiosyncratic correlation (r = 0); Miu and Ozdemir (2006) allow for arbitrary systematic correlation (r unrestricted) and arbitrary idiosyncratic correlation (r unrestricted). Note that if r = 1 then the systematic risk factor is effectively one-dimensional. Indeed if r = 1 then Z = (Z, Z) from some standard Gaussian variable Z, and if r = 1 then Z = (Z,Z). S S We refer to the case jr j = 1 as the one-factor case, and the case jr j < 1 as the two-factor case. In the S S one-factor case we use Z, and not Z, to denote the systematic risk factor. The ﬁrst two models listed above are one-factor models, the last two are two-factor models. The marginal distribution of potential loss is determined by the speciﬁcation of the function h. For instance: Frye (2000) speciﬁes h(x) = max(0, 1 a(1 + bx)) for constants a 2 R and b > 0. Potential loss takes values in [0, ¥). Its density has a point mass at zero and is proportional to a Gaussian density on (0, ¥). Since L is not constrained to lie in the unit interval, this speciﬁcation violates the assumptions made in Section 2.3; a+bx Pykhtin (2003) speciﬁes h(x) = max(0, 1 e ) for constants a 2 R and b > 0. Potential loss takes values in [0, 1). Its density has a point mass at zero, and is proportional to a shifted lognormal density over (0, 1); Witzany (2011) and Miu and Ozdemir (2006) both specify h(x) = B (F(x)), where a, b > 0 and a,b B denotes the cdf of the beta distribution with parameters a and b. Potential loss takes values in a,b (0, 1). It is a continuous variable and follows a beta distribution. The sign of r and the nature of the function h (increasing or decreasing) will in general D L determine the sign of the relationship between D and L . If r > 0 then the relationship will be i i D L positive [negative] provided h is decreasing [increasing], and vice versa if r < 0. D L 5.1. Computing m(z) 2 T Here vectors z 2 R take the form z = (z , z ) . In order to obtain an expression for m(z) = D L E[L jZ = z], we begin with the observation that: E[L jZ] = E[L D jZ] = E[L E[D jX , Z]jZ] = E[L P(D = 1jX , Z)jZ] . i i i i i i,L i i i,L Thus, m(z) = h(x ) F(d, m(x , z), v) f(x , a z , 1 a ) dx , (37) L L L L L L where 1 a m(x , z) := a z + r (x a z ) L D D I L L L 1 a and 2 2 v = v(x , z) := (1 a )(1 r ) D I Risks 2020, 8, 25 17 of 36 are the conditional mean and variance of X , respectively, given that (X , Z) = (x , z). In general i,D i,L L m(z) must be evaluated using quadrature, and doing so is straightforward . On average (across parameter values and points z 2 R ) a single evaluation of m() requires approximately one millisecond. In the one-factor case with r = 1 [r = 1] the expression for m(z) = E[L jZ = z] is obtained by S S i plugging z = (z, z) [z = (z,z)] into Equation (37). 5.2. Computing k(q, z) and q(x, z) 2 T Here again, vectors z 2 R take the form z = (z , z ) . In order to derive an expression for k(q, z) D L we begin with the observation that: q L qL qL i i i e = 1(D = 0) + e 1(D > 0) = 1 + (e 1) 1(D > 0) , i i i q L and since k(q, z) = log(E[e jZ = z]), we get that: qh(x ) 2 k(q, z) = log 1 + (e 1) F(d, m(x , z), v) f(x , a z , 1 a ) dx , (38) L L L L L where m(x , z) and v are given in the previous section. In the one-factor case with r = 1 [r = 1] L S S the expression for k(q, z) = log(E[exp(q L )jZ = z]) is obtained by plugging z = (z, z) [z = (z,z)] into Equation (38). As with m(z), k(q, z) must in general be evaluated using quadrature, which is straightforward. The time required for a single evaluation of k(q,) is comparable to that required for a single evaluation of m(). In order to compute q we must solve the equation k (q, z) = x with respect to q. Differentiating Equation (38) we get: qh(x ) 2 ¶k(q, z) h(x ) e F(d, m(x , z), v) f(x , a z , 1 a ) dx L L L L L L 0 R L k (q, z) = = , (39) ¶q exp(k(q, z)) which is straightforward to compute using quadrature. A single evaluation of k (q, z) requires approximately twice as much time as a single evaluation of k(q, z). As the root of k (q, z) = x must be evaluated numerically, evaluating q is much more time consuming than evaluating k or k . Across parameter values and points z 2 R , and using q = 0 as an initial guess, the average time required for a single evaluation of q(x,) is slightly less than one tenth of one second. The right panel of Figure 1 illustrates the relationship between expected losses and the rejection ˆ ˆ constant employed in the second stage, c ˆ = exp(q k(q, z)). We see that c ˆ is essentially a decreasing ˆ ˆ function of m(z), such that c ! 1 as m(z) ! x and c ! ¥ as m(z) ! 0. The left panel of Figure 1 illustrates the graph of the LDA approximation P(L > xjZ = z) exp(Nq(x, z)). The approximation is identically equal to one inside the region of interest, and decays to zero very rapidly outside the region. In other words, most of the variability in the function q(x,) occurs along, and just outside, the boundary of the region of interest. All calculations are carried out using Matlab 2018a on a 2015 MacBook Pro with 6.8 GHz Intel Core i7 processor and 16 GB (1600 MHz) of memory. Numerical integration is performed using the built-in integral function. We use the Matlab function fzero for the root-ﬁnding. Risks 2020, 8, 25 18 of 36 LDA Approximation to Conditional Tail Probability Expected Losses and Rejection Constant 0.8 0.6 0.4 0.2 2 -2 -2.5 -3 -2 -3.5 -4 -4 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Figure 1. The left panel of this ﬁgure illustrates the relationship between expected losses m(z) and the second-stage rejection constant c ˆ = c ˆ(x, z), in the two-factor model. The right panel illustrates the graph of the LDA approximation of Equation (13). Parameters (randomly selected using the procedure in Section 5.3) in both panels are (P, r , r , r , r , a, b, N) = (0.0063, 0.3964, 0.2794, 0.3356, 0.7599, D L I S 0.6497, 0.5033, 134) and the threshold is x = 0.1575. Mean losses are E[L ] = 0.0029, and the probability that losses exceed the threshold x is on the order of 50 basis points. Points in the left panel were obtained by generating 1000 realizations of the systematic risk factors from their actual distribution (as opposed to the ﬁrst-stage IS distribution) using the indicated parameter values. 5.3. Exploring the Parameter Space The model contains ﬁve parameters, in addition to any parameters associated with the transformation h. We are ultimately interested in how well the proposed algorithms perform across a wide range of different parameter sets. As such, in our numerical experiments we will randomly select a large number of parameter sets according to the procedure described below, and assess the algorithms’ performance for each parameter set. Generate the default probability P uniformly between 0% and 10%, and generate each of the 2 2 correlations r = a and r = a uniformly between 0% and 50%; D L D L In the one-factor model, generate r uniformly on f1, 1g, i.e., r takes on the value 1 or +1 S S with equal probability. If r = 1 we generate r uniformly between 0% and 100%, and if r = 1 S I S we generate r uniformly between100% and 0%. This allows us to control the sign of r , which I D L we must do in order to ensure a positive relationship between default and potential loss. In the two-factor model we randomly generated r uniformly on [1, 1]. If r is positive, randomly S S generate r uniformly on [0, 1], otherwise randomly generate r uniformly on [1, 0]; I I We choose the transformation h() to ensure that (i) potential loss is beta distributed and (ii) there is a positive relationship between default and loss. The paramters a and b of the beta distribution are generated independently from an exponential distribution with unit mean. If r < 0 we D L 1 1 set h(x) = B (F(x)) and if r > 0 we set h(x) = B (F(x)), where B () is the cumulative D L a,b a,b a,b distribution function for the beta distribution with parameters a and b. Note that under these restrictions, in the one-factor model the expected loss function m(z) is monotone decreasing. In order to ensure that we are considering cases of practical interest, we randomise the portfolio size and loss threshold as follows. Generate the number of exposures randomly between 10 and 5000; 1 q In the one-factor model we generate the threshold x by setting x = m(F (10 )), where q is uniformly distributed on [1, 5]. The LPA suggests that 1 q p = P(L > x) P(m(Z) > x) = P(Z < m (x)) = 10 . x N Risks 2020, 8, 25 19 of 36 This means that log( p ), the order of magnitude of the probability of interested, is approximately uniformly distributed on [5,1]. In the two-factor model we set x = m(z ), where z = q q 1 1 (F (q), r F (q)) and q is uniformly distributed on [5,1]. 6. Implementation In this section we discuss our implementation of the algorithm proposed in Section 3 in the general framework outlined in Section 5. As the general framework encompasses many of the PD-LGD correlations that have been proposed in the literature, this section effectively discusses implementation of the proposed algorithm across a wide variety of models that are used in practice. 6.1. Selecting the IS Density for the Systematic Risk Factors The systematic risk factors here are Gaussian. When constructing their IS density we could either shift their means and leave their variances (and correlations) unchanged, or shift their means and adjust their variances (and correlations). Recall that the ultimate goal is to choose an IS density that closely resembles the ideal density f given in Equation (15). As illustrated in Figure 2, the ideal density f tends to be very tightly concentrated about its mean, and adjusting the variance of the systematic risk factors leads to a much better approximation to the ideal density for “typical values” of the ideal density. The left tail of the ideal density is, however, heavier than the variance-adjusted IS density, an issue that can be resolved by trimming large IS weights. Normal Approximation to Optimal Density Normal Approximation to Optimal Density 1.8 2.5 1.6 1.4 1.2 1.5 0.8 0.6 0.4 0.5 0.2 0 0 -5 -4.5 -4 -3.5 -3 -4 -3.5 -3 -2.5 -2 Figure 2. This ﬁgure illustrates f (in fact, the approximation of Equation (40)) for two randomly generated sets of parameters. Each panel superimposes (i) a normal density with the same mean and variance as f (dashed blue line), and (ii) a normal density with the same mean as f and unit variance x x (dash-dot red line). The mean and variance of f are computed via (computationally inefﬁcient) quadrature. The mean and variance of f are computed using quadrature. Parameters in the right panel are (P, r , r , r , r , a, b, N) = (0.02, 0.33, 0.27, 0.96, 1, 2.47, 4.32, 454), and for the left panel they D L I S are (P, r , r , r , r , a, b, N) = (0.03, 0.13, 0.12, 0.85, 1, 1.81, 1.90, 271). In both cases, the transformation D L I S h is taken to be h(x) = B (F(x)). a,b The downside to adjusting the variance of the systematic risk factors is that it can lead to ﬁrst-stage IS weights with inﬁnite variance, but numerical evidence suggests that this issue can be mitigated by In the one-factor model, a tractable approximation to the ideal density can be obtained by using the LDA of Equation (13) to approximate both probabilities appearing in Equation (15). The result is: exp(Nq(x, z)) f(z) f (z) , (40) exp(Nq(x, w)) f(w) dw and the right-hand side of Equation (40) can be approximated via quadrature. As the integrand involves q, the approximation is computationally very slow. Risks 2020, 8, 25 20 of 36 trimming large weights. Indeed, numerical experiments suggest that adjusting variance and trimming large weights leads to substantially more accurate estimators of p . Intuitively, it is more important for the IS density to mimic the behaviour of the ideal density over its “typical range”, as opposed to faithfully representing its tail behaviour. In addition to improving statistical accuracy, adjusting variance has the added beneﬁt of making the second stage of the algorithm more computationally efﬁcient in terms of run time. Indeed, as discussed in more detail in Section 6.3, adjusting variance tends to increase the proportion of ﬁrst-stage simulations that land in the region of interest (thereby reducing the number of times the rejection sampling algorithm must be employed in the second stage) and reduces the average size of the rejection constants employed in the second stage (thereby making the rejection algorithm more effective whenever it must be employed). 6.2. First Stage In this section we explain how to efﬁciently approximate the parameters of the optimal IS density for the systematic risk factors, in both the one- and two-factor models. We also explain how we trim large IS weights, and demonstrate that the resulting bias is negligible. 6.2.1. Computing Parameters in the Two-Factor Model In the two-factor model the systematic risk factors are bivariate Gaussian with zero mean vector and covariance matrix: " # 1 r S = . r 1 The mean vector and covariance matrix that satisfy the criteria of Equation (25) are: m := E[ZjL > x] (41) I S N and S := E[(Z m )(Z m ) jL > x] , (42) I S I S I S N respectively. In order to approximate the suggested mean vector and covariance matrix we use Equation (27) to get: E[exp(Nq(x, Z)) Z] m (43) I S E[exp(Nq(x, Z))] and E[exp(Nq(x, Z)) (Z m )(Z m ) ] I S I S S . (44) I S E[exp(Nq(x, Z))] The expected values appearing on the right-hand sides of Equations (43) and (44) are both amenable to simulation, and we use a small pilot simulation of size M << M to approximate them. In our numerical examples, the size of the pilot simulation is 10% of the sample size that is eventually used to estimate p . Whether or not we adjust the variance of the systematic risk factor, the standard error of the resulting estimator is of the form n/ M, where n depends on the model parameters and is easily estimated via simulation. Using 100 randomly selected parameter sets from the one-factor model, selected according to the procedure described in Section 5.3, we ﬁnd that for 0.03 the one-stage estimator n /n 1.54 p , where n denotes the value of n assuming we only shift the mean of the MS VA MS systematic risk factor and do not adjust its variance and n denotes the value when we do adjust variance. For probabilities VA in the range of interest, then, adjusting the variance of the systematic risk factor leads to an estimator that is nearly four times as efﬁcient, in the sense that the sample size required to achieve a given degree of accuracy (as measured by standard error) is nearly four times larger if we do not adjust variance. As discussed in Appendix B, the natural sufﬁcient statistic here consists of the components of Z plus the components of T T T ¯ ¯ ZZ . As such, in order to satisfy Equation (27) we must ensure that E [Z] = E[ZjL > x] and E [ZZ ] = E[ZZ jL > x], I S N I S N where E denotes mean under the IS distribution. These conditions are clearly equivalent to Equations (41) and (42).. I S Risks 2020, 8, 25 21 of 36 In order to implement the approximation we must ﬁrst simulate the systematic risk factors and then compute q(x, z) for each sample point z. The most natural way to proceed is to (i) sample the systematic risk factors from their actual distribution (bivariate Gaussian with zero mean vector and covariance matrix S) and (ii) numerically solve the equation k (q, z) = x in order to compute compute q(x, z) for each pilot sample point z that lies outside the region of interest. In our experience this leads to unacceptably inefﬁcient estimators, in terms of both (i) statistical accuracy and (ii) computational time. We deal with each issue in turn. As most of the variation in q(x,) occurs just outside the boundary of the region of interest (recall the right panel of Figure 1), we suggest using an IS distribution for the pilot simulation that is centered on the boundary of the region. Speciﬁcally, we suggest using that point on the boundary at which the density of the systematic risk factors attains its maximum value (i.e., the most likely point on the boundary): T 1 z := arg minfz S z : m(z) = xg . (45) The non-linear minimisation problem appearing above is easily and rapidly solved using standard techniques. We used fmincon function in Matlab. As z lies on the boundary of the region of interest, roughly half the pilot sample will lie outside the region. In Section 5.2 we noted that it takes nearly one tenth of one second to numerically solve the equation k (q, z) = x. As such, if we are to compute q exactly (i.e., by numerically solving the indicated equation) for each sample point that lies outside the region of interest, the total time required (in seconds) to estimate the ﬁrst-stage IS parameters will be at least M /20. In our numerical examples we use a pilot sample size of M = 1000, which means that it would take nearly one full minute to compute the ﬁrst-stage IS parameters. This discussion suggests that reducing the number of times we must numerically solve the equation k (q, z) = x could lead to a dramatic reduction in computational time. We suggest ﬁtting a low degree polynomial to the function q(x,), over a small region in R that contains all of the pilot sample points that lie outside the region of interest. Speciﬁcally, we determine the smallest rectangle that contains all of the pilot sample points, and discretize the rectangle using a mesh of n points, equally spaced in each direction. Next, we identify those mesh points that lie outside the region of interest and compute q(x, z) exactly (i.e., by solving k (q, z) = x numerically) for each such point. Finally, we ﬁt a polynomial to the resulting (z, q(x, z)) pairs and call the resulting function q(x,). Numerical evidence indicates at using a ﬁfth-degree polynomial and a mesh with 15 = 225 points leads to a sufﬁciently accurate approximation to q(x,) over the indicated range (the intersection of (i) the smallest rectangle that contains all sample points and (ii) the complement of ¯ ˆ the region of interest). Note that q could be an extremely inaccurate approximation to q outside this range, but that is not a concern because we will never need to evaluate it there. It remains to compute q(x, z) for each of the pilot points z. For those points z that lie inside the region of interest, we set q(x, z) = 0. For those points that lie inside the region, we set q(x, z) = ¯ ¯ ¯ ¯ ¯ q x k(q, z), where q = q(x, z). Evaluating q(x,) requires essentially no computational time (it is a polynomial), and if the mesh size and degree are chosen appropriately the difference between q ¯ ˆ and q is very small. In total, the suggested procedure reduces the number of evaluations of q from M /2 to n /2, for a percentage reduction of n / M . In our numerical examples we use n = 15 and p g p g M = 1000, which corresponds to a reduction of 75% in computational time. To summarise, we estimate the optimal ﬁrst-stage IS parameters as follows. First, we compute z . Second, we draw a random sample of size M from the Gaussian distribution with mean vector z and p x ¯ ˆ covariance matrix S. Third, we construct q(x,), the polynomial approximation to q(x,), as described Risks 2020, 8, 25 22 of 36 in the previous paragraph. Fourth, for those sample points z that lie outside the region of interest we ¯ ˆ compute q(x, z) using q instead q. The estimates of the optimal ﬁrst-stage IS parameters are then: w(Z ) exp(Nq(x, Z ))Z m m m m=1 m ˆ = I S w(Z ) exp(Nq(x, Z )) m m m=1 and å w(Z ) exp(Nq(x, Z ))(Z m ˆ )(Z m ˆ ) m m m m I S I S m=1 S = , I S w(Z ) exp(Nq(x, Z )) m m m=1 where Z , . . . , Z is the random sample and 1 M f(z; 0, S) w(z) = f(z; z , S) is the IS weight associated with shifting the mean of the systematic risk factors from 0 to z . The upper left panel of Figure 3 illustrates a typical situation where the mean of the IS distribution lies “just inside” the region of interest. Region of Interest 4.5 3.5 2.5 1.5 0.5 -2.95 -2.9 -2.85 -2.8 -2.75 Figure 3. This ﬁgure illustrates the locations of (i) the importance sampling (IS) mean used for the pilot simulation and (ii) the IS mean used for the actual simulation, relative to the region of interest. Parameters (randomly selected using the procedure in Section 5.3) in both panels are (P, r , r , r , r , a, b, N) = (0.0063, 0.3964, 0.2794, -0.3356, -0.7599, 0.6497, 0.5033, 134) and the threshold D L I S is x = 0.1575. Mean losses are E[L ] = 0.0029. 6.2.2. Computing Parameters in the One-Factor Model The procedure described in the previous section specialises in the one-factor case as follows. First, under the parameter restrictions outlined in Section 5.3, the expected loss function m(z) is a strictly decreasing function of z. As such, the region of interest is the semi-inﬁnite interval (¥, z ), where z := m (x), and its boundary is the single point z . In general z must be computed numerically, x x x which is straightforward. Second, we draw a random sample of size M from the Gaussian distribution with mean z and unit variance. Third, the polynomial approximation to q is constructed by evaluating q exactly (i.e., by numerically solving the equation k (q, z) = x) at each of n equally-spaced points z in the interval [z , z ], where z and z are the largest and smallest values obtained in the pilot + + simulation, respectively, and then ﬁtting a polynomial to the resulting (z, q(x, z)) pairs. Fourth, we Risks 2020, 8, 25 23 of 36 evaluate q(x, z) for each pilot sample point z as follows—if z lies inside the region of interest we set q(x, z) = 0, otherwise we compute q(x, z) by replacing the exact value q(x, z) with the approximate ¯ ¯ value q(x, z), where q is the polynomial constructed in the previous step. Note that a single evaluation ¯ ˆ of q requires far less computational time than a single evaluation of q. Finally, the approximations to the ﬁrst-stage IS parameters are: w(Z ) exp(Nq(x, Z ))Z m m m m=1 m ˆ = I S w(Z ) exp(Nq(x, Z )) å m m m=1 and w(Z ) exp(Nq(x, Z ))(Z m ˆ ) m m m I S 2 m=1 s = , I S w(Z ) exp(Nq(x, Z )) å m m m=1 where Z , . . . , Z is the random sample and 1 M f(z; 0, 1) w(z) = . f(z; z , 1) is the IS weight associated with shifting the mean of the systematic risk factor from 0 to z . 6.2.3. Trimming Large Weights In the one-factor model the ﬁrst-stage IS weight will have inﬁnite variance whenever s < 0.5 I S (see Remark A1 in Appendix B). In a sample of 100 parameter sets, randomly selected according to the procedure in Section 5.3, the largest realised value of s was 0.38, and the mean and median were 0.11 I S and 0.09, respectively. It appears, then, that the ﬁrst-stage IS weight in the one-factor model will have inﬁnite variance in all cases of practical interest. We trim large weights as described in Section 4.2, using the set: A = fz 2 R : jz m ˆ j Cs g I S I S for some constant C. In the numerical examples that follow we use C = 4, in which case we expect to trim less than 0.01% of the entire sample. Specialising Equation (34) to the present context, we get that an upper bound on the associated bias is given by: exp(Nq(x, z))f(z) dz , (46) which is straightforward (albeit slow) to compute using quadrature. Figure 4 illustrates the relationship between the probability of interest p and the upper bound of Equation (46) for the 100 randomly generated parameter sets, and clearly demonstrates that the bias associated with our trimming procedure is negligible. For instance, for probabilities on the order of 10 the bias is no larger than 10 , or 1% of the quantity of interest. In the two-factor model the ﬁrst-stage IS weight will have inﬁnite variance whenever det(2S S ) < 0. In a random sample of 100 parameter sets, this condition occurred 96 times. As in the I S one-factor model, then, the ﬁrst-stage IS weight in the two-factor model can be expected to have inﬁnite variance in most cases of practical interest. We trim large weights using the set: n o 2 T 1 2 A = z 2 R : (z m ˆ ) S (z m ˆ )j C I S I S I S for some constant C, and use C = 4 in the numerical examples that follow. Risks 2020, 8, 25 24 of 36 -3 -4 -5 -6 -7 -1 -2 -3 -4 -5 10 10 10 10 10 Figure 4. This ﬁgure illustrates the bias introduced by trimming large weights (vertical axis) as a function of the probability of interest (horizontal axis), for 100 randomly generated parameter sets in the one-factor case. For each set, we compute bias (in fact, an upper bound on the bias) by using quadrature to approximate Equation (46) and estimate the probability of interest using the full two-stage algorithm. 6.3. Second Stage The ﬁrst stage of the algorithm consists of (i) computing the ﬁrst-stage IS parameters, (ii) simulating a random sample of size M from the systematic risk factors’ IS distribution, and (iii) computing the associated IS weights, trimming large weights appropriately. Having completed these tasks, the next step is to simulate individual losses in the second stage. In the remainder of this section we let z = (z , z ) denote a generic realisation of the systematic risk factors obtained in the D L ﬁrst stage. 6.3.1. Approximating q Before generating any individual losses ﬁrst construct the polynomial approximation to q, using the same procedure described in Section 6.2.1. The basic idea is to ﬁt a relatively low degree polynomial to the surface of q(x,), over a small region that contains all of the ﬁrst-stage sample points. The values of z obtained in the pilot sample are invariably different from those obtained in the ﬁrst stage, so it is ¯ ˆ essential that the polynomial is reﬁt to account for this fact. In what follows we use q to approximate q whenever the numerical value of q is required, but since the difference between the two is small we do ˆ ¯ not distinguish between the two (i.e., we write q in this document, but use q in our code). 6.3.2. Sampling Individual Losses In this section we describe how to sample individual losses in the two-factor model. The procedure carries over in an obvious way to the one-factor model, so we do not discuss that case explicitly. If z lies inside the region of interest then the second stage is straightforward. For a given exposure i, we ﬁrst simulate the exposure’s idiosyncratic risk factors Y = (Y , Y ), from the bivariate normal i i,D i,L distribution with standard normal margins and correlation r . Next, we set: q q 2 2 (X , X ) = (a z + 1 a Y , a z + 1 a Y ) . i,D i,L D D i,D L L i,L D L Risks 2020, 8, 25 25 of 36 If X > F (P) then the exposure did not default and we set L = 0 and proceed to the next exposure. i,D i Otherwise the exposure did default, in which case we must compute h(X ), set ` = h(x ) and i,L i i,L then proceed to the next exposure. Note that we only evaluate h for defaulted exposures—this is important since evaluating h requires numerical inversion of the beta cdf, which is relatively slow. Having computed the individual losses associated with each exposure, we then compute the average 1 N ¯ ¯ loss ` = N ` and set L (z, `) = 1. i 2 i=1 ˆ ˆ If z lies outside the region of interest we must compute q, k(q) and c, which we do approximately using the polynomial approximation q. We then sample from g ˆ (jz) as follows. First simulate the idiosyncratic risk factors Y = (Y , Y ) from the bivariate normal distribution with standard normal i i,D i,L margins and correlation r . Also generate a random number U, independent of Y . Then set: I i q q 2 2 (X , X ) = (a z + 1 a Y , a z + 1 a Y ) . i,D i,L D D i,D L L i,L D L ˆ ˆ If the exposure did not default we set L = 0, otherwise we compute h and set L = h(X ). Next we i i i,L check whether or not 1 g ˆ (L jz) x i ˆ ˆ U = exp(q(` L )) (47) max c ˆ g(L jz) ˆ ˆ then accept L as a drawing from g ˆ , that is, set L = L and proceed to exposure i. Otherwise, draw i x i i another random number U and set of idiosyncratic factors. Once we have sampled the individual ¯ ¯ losses associated with each exposure we compute the average loss ` = N ` and set L (z, `) = i 2 i=1 ˆ ¯ ˆ ˆ exp(N[q` k(q, z)]), using the polynomial approximation to estimate the value of q. 6.3.3. Efﬁciency of the Second Stage The frequency with which the rejection sampling algorithm must be applied in the second stage is governed by P (m(Z) > x). The left panel of Figure 5 illustrates the empirical distribution of this I S probability across 100 randomly selected parameter sets. The distribution is concentrated towards small values (the median fraction is 27%) but does have a relatively thick right tail (the mean fraction is 35%). In some cases—particularly when the value of the parameter r is close to zero, in which case individual losses are very nearly independent and systematic risk is largely irrelevant—the vast majority of ﬁrst-stage simulations require further IS in the second stage. The efﬁciency of the rejection sampling algorithm, when it must be applied, is governed by the conditional distribution of c ˆ = c ˆ(x, Z) given that m(Z) < x. For each of the 100 parameter sets we estimate E [c ˆ(x, Z)jm(Z) < x], which determines the average size of the rejection constant for a given I S set of parameters, by computing the associated value of c ˆ for each ﬁrst-stage realisation that lies outside the region of interest and then averaging the resulting values. The right panel of Figure 5 illustrates the results, and we note that the mean and median of the data presented there are 1.09 and 1.17, respectively. The ﬁgure clearly indicates that the rejection sampling algorithm can be expected to be quite efﬁcient, whenever it must be applied. The distributions of P (m(Z) < x) and E [c ˆ(x, Z)jm(Z) < x] across parameters depend heavily I S I S on whether or not we adjust the variance of the systematic risk factors in the ﬁrst stage. When we do not adjust variance, the mean and median of P (m(Z) < x) (across 100 randomly selected parameter I S sets) rise to 49% and 45% (as compared to 35% and 27% when we do adjust variance), and the mean and median of E [c ˆ(x, Z)jm(Z) < x] rise to 18.6 and 1.8, respectively (as compared to 1.17 and 1.09 I S when we do adjust variance). Remark 7. If we do not adjust the variance of the systematic risk factors in the ﬁrst stage, then (i) the rejection sampling algorithm must be applied more frequently and (ii) is less efﬁcient whenever it must be applied. As such, adjusting the variance of the systematic risk factors reduces the total time required to implement the two-stage algorithm. Risks 2020, 8, 25 26 of 36 0.45 0.8 0.4 0.7 0.35 0.6 0.3 0.5 0.25 0.4 0.2 0.3 0.15 0.2 0.1 0.1 0.05 0 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 Figure 5. This ﬁgure illustrates the variation of P (m(Z) < x) (left panel) and E [c ˆ(x, Z)jm(Z) < x] I S I S (right panel) across model parameters. Recall that the former quantity determines the frequency with which the second-stage rejection sampling algorithm must be applied and the latter quantity determines the efﬁciency of the algorithm when it must be applied. For each of 100 parameter sets, randomly selected according to the procedure described in Section 5.3, we compute the ﬁrst-stage IS parameters and then draw 10,000 realisations of the systematic risk factors from the variance adjusted ﬁrst-stage IS density. The intuition behind this fact is as follows. First recall that the mean of the systematic risk factors tends to lie just inside the region of interest (recall Figure 3). In such cases the effect of reducing the variance of the systematic risk factors is to concentrate the distribution of Z just inside the boundary of the region of interest. Not only will this ensure that more ﬁrst-stage realisations lie inside the region of interest (thereby reducing the fraction of points that require further IS in the second stage), it will also ensure that those realisations that lie outside the region (i.e., for which m(z) < x) do not lie “that far ” outside the region (i.e., that m(z) is not “that much less” than x), which in turn ensures that the typical size of c ˆ is relatively close to one (recall the left panel of Figure 1). 7. Performance Evaluation In this section we investigate the proposed algorithms’ performance in terms of statistical accuracy, computational time, and overall. Unless otherwise mentioned, we use a pilot sample size of M = 1000 to estimate the ﬁrst-stage IS parameters and a sample size of M = 10,000 to estimate the probability of interest ( p ). We use the value C = 4 to trim large ﬁrst-stage IS weights, and a value of c = 10 to x max trim large rejection constants. 7.1. Statistical Accuracy The standard error of any estimator that we consider is of the form n / M for some constant n that depends on the algorithm used and the model parameters. For instance, for the one-stage estimator in the two-factor case we have n = SD (L (Z) 1 ), where SD denotes standard x ¯ 1S 1 fL >xg 1S deviation under the one-stage IS density of Equation (31). Note that in the absence of IS we have 0.5 n = p (1 p ) p as p ! 0. x x x x Figure 6 illustrates the relationship between n and p using 100 randomly selected parameters x x sets, for the two-stage algorithm and in the two-factor case. Importantly, we see that (i) n seems to be a function of p (i.e., it only depends on model parameters through p ) and (ii) for small probabilities x x the functional relationship appears to be of the form n = a p for constants a and b. These features are also present in the case of the one-stage estimator, as well as for both estimators in the one-factor model. The numerical values of a and b are easily estimated using the line of best ﬁt (on the logarithmic scale), and the estimated values for both the one- and two-factor cases are summarised in Table 1. Of particular note is the fact that the value of b is extremely close to one in every case. Risks 2020, 8, 25 27 of 36 Statistical Accuracy of Two-Stage Estimator -1 -2 -3 -4 -5 -1 -2 -3 -4 -5 10 10 10 10 10 Probability of Interest Figure 6. This ﬁgure illustrates the relationship between n and p , where n is the standard deviation x x x of L (Z)L (L , Z)1 under the two-stage IS density of Equation (32), in the two-factor case. 1 2 N fL xg The numerical values of p and n are estimated for each of 100 randomly generated parameters sets, x x according to the procedure described in Section 5.3. Table 1. This table reports ﬁtted values of the relationship n a p for each estimator (one- and two-stage) and each model (one- and two-factor). Values of a and b are obtained by determining the line of best ﬁt on the logarithmic scale (i.e., the line appearing in Figure 6). Note that in the absence of 0.5 IS we would have n = p (1 p ) p . x x x One-Stage Algorithm Two-Stage Algorithm 0.98 0.99 One-Factor Model 0.91 p 0.81 p x x 0.98 0.98 Two-Factor Model 0.98 p 0.81 p x x Of particular interest in the rare event context is an estimator ’s relative error, deﬁned as the ratio of its standard error to the true value of the quantity being estimated. For any of the estimators that we b1 consider, the component of relative error that does not depend on sample size is n / p a p . In the x x absence of IS we have b 1 = 0.5, in which case relative error grows rapidly as p ! 0 (i.e., n ! 0 x x but n / p ! ¥ as p ! 0). By contrast, b 1 for any of our IS estimators, in which case there is weak x x x dependence of relative error on p . The minimum sample size required to ensure that an estimator ’s 2(b1) 2 2 2 2 relative error does not exceed the threshold e is v /( p e) a p e . In the absence of IS we x x have b 0.5, in which case the sample size (and therefore computational burden) required to achieve a given degree of accuracy increases rapidly as p ! 0. By contrast, for all of our IS estimators we have b 1, in which case the minimum sample size (and computational burden) is nearly independent of p . Our ultimate goal is to reduce the computational burden associated with estimating p , in situations where p is small. To see how effective the proposed algorithms are in this regard, note that the sample size required to achieve a given degree of accuracy using the proposed algorithm, relative to that required to achieve the same degree of accuracy in the absence of IS, is approximately 2(b1) 2 2 a p e x 2 2b1 = a p , p e 2 2b1 which does not depend on e. Since a < 1 and b > 0.5 (recall Table 1), we have that a p < p . Standard Error (Scaled by Sample Size) Risks 2020, 8, 25 28 of 36 Remark 8. The relative sample size required to achieve a given degree of accuracy using the proposed algorithm, relative to that required in the absence of IS, is not larger than the probability of interest. For example, if the probability of interest is approximately 1%, then the proposed algorithm requires a sample size that is less than 1% of what would be required in the absence of IS (regardless of the desired degree of accuracy). And if the probability of interest is 0.1%, then the proposed algorithm requires a sample size that is less than 0.1% of what would be required in the absence of IS. In other words, the proposed algorithm is extremely effective at reducing the sample size required to achieve a given degree of accuracy. It is also insightful to compare the efﬁciency of the two-stage estimator, relative to the one-stage estimator. In the one-factor case, the minimum sample size required using the two-stage algorithm, relative to that required using the one-stage algorithm, is approximately: 0.02 2 0.66 p e x 0.02 = 0.80 p . 0.04 0.83 p e As p ranges from 1% to 0.01% the estimated relative sample size ranges from 0.73 to 0.67. In the two-factor case, the relative sample size is approximately 0.69, regardless of the value of p . Remark 9. In both the one- and two-factor models, the two-stage algorithm is more efﬁcient than the one-stage algorithm, in the sense that it requires a smaller sample size in order to achieve a given degree of accuracy. Indeed, in cases of practical interest (probabilities in the range of 1% to 0.01%) the minimum sample size required to achieve a given degree of accuracy using the two-stage algorithm is roughly 70% of what would be required using the one-stage algorithm. 7.2. Computational Time Figure 7 illustrates the relationship between sample size (M) and run time (total time required to estimate p using a particular algorithm), for one randomly selected set of parameters. Across both models and algorithms, the relationship is almost perfectly linear. In the absence of IS the intercept is zero (i.e., run time is directly proportional to sample size), whereas the intercepts are non-zero for the IS algorithms. The non-zero intercepts are due to the overhead associated with (i) computing the ﬁrst-stage IS parameters, which accounts for almost all of the difference between the intercepts of the solid (no IS) and dashed (one-stage IS) intercepts, and (ii) computing the second-stage polynomial approximation to q, which accounts for almost all of the difference between the intercepts of the the dashed (one-stage IS) and dash-dot (two-stage IS) lines. It is also worth noting that a given increase in sample size will have a greater impact on the run times for the IS algorithms than it will on the standard algorithm. This is because we only calculate h(X ) for defaulted exposures (evaluating h() i,L is slow because it requires numerical inversion of the beta distribution function), and the default rate is higher under the IS distribution. Across 100 randomly generated parameter sets, portfolio size (N) is most highly correlated with run time and the relationship is roughly linear. Table 2 reports summary statistics on run times, across algorithms and models. Table 2. This table reports summary statistics—in seconds, and across 100 randomly selected parameter sets—for total run time (ﬁrst three columns), time required to estimate the ﬁrst-stage IS parameters (fourth column) and time required to ﬁt the second-stage polynomial approximation to q (ﬁnal column). Average Run Times No IS One-Stage IS Two-Stage IS m , S q IS IS One Factor 7.3 25.6 33.7 1.5 0.8 Two Factor 7.4 39.0 55.5 14.3 8.9 Risks 2020, 8, 25 29 of 36 Sample Size and Run Time (One-Factor Model) Sample Size and Run Time (Two-Factor Model) 30 50 No IS One-Stage IS No IS Two-Stage IS One-Stage IS Two-Stage IS 15 25 0 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Sample Size (M) Sample Size (M) Figure 7. This ﬁgure illustrates the relationship between sample size (M) and run time (total CPU time required to estimate p by a particular algorithm), using a set of parameters randomly selected according to the procedure described in Section 5.3. For each value of M we use a pilot sample that is 10% as large as the sample that is eventually used to estimate p (i.e., we set M = 0.1M). The left panel x p corresponds to the one-factor model and parameter values are (P, r , r , r , r , a, b) = (0.0827, 0.1000, D L I 0.3629,0.0180,1, 0.6676, 0.8751) and N = 2334. The right panel corresponds to the two-factor model and parameter values are (P, r , r , r , r , a, b) = (0.0241, 0.2322, 0.0343, 0.1650, 0.4135, 0.4056, 0.4942) D L I S and N = 3278. 7.3. Overall Performance Recall that the ultimate goal of this paper is to reduce the computational burden associated with estimating p , when p is small. The computational burden associated with a particular algorithm is a x x function of both its statistical accuracy and total run time. We have seen that the proposed algorithms are substantially more accurate, but require considerably more run time. In this section we demonstrate that the beneﬁt of increased accuracy is well worth the cost of additional run time, by considering the amount of time required by a particular algorithm in order to achieve a given degree of accuracy (as measured by relative error). To begin, let t( M) denote the total run time required by a particular algorithm to estimate p using a sample of size M. As illustrated in Figure 7 we have t( M) c + d M for constants c and d that depend on the underlying model parameters (particularly portfolio size, N) as well as the algorithm being used. In Section 7.1 we saw that the minimum sample size required to ensure that the estimator ’s relative error does not exceed the threshold e isL 2(b1) 2 2 M(e) a p e , for constants a and b depending on the underlying model (one- or two-factor) and algorithm being used. Thus, if T(e) denotes the total CPU time required to ensure that the estimator ’s relative error does not exceed e, we have: 2(b1) 2 2 T(e) c + da p e . (48) Table 3 contains sample calculations for several different values of p and e, using the data appearing in the left panel of Figure 7 to estimate c and d and the values of a and b implicitly reported in Table 1. The results reported in the table are representative of those obtained using different parameter sets. It is clear that the proposed algorithms can substantially reduce the computational burden associated with accurate estimation of small probabilities. For instance, if the probability of interest is on the order of 0.1% then either of the proposed algorithms can achieve 5% accuracy within 2–3 s, as compared to 4 min (80 times longer) in the absence of IS. Run Time (Seconds) Run Time (Seconds) Risks 2020, 8, 25 30 of 36 Table 3. This table reports the time (in seconds) required to achieve a given degree of accuracy (computed using Equation (48)) for several values of p and e, for the parameter values corresponding to the left panel of Figure 7. Note that this is for the one-factor model. Values of c and d are obtained from the lines of best fit appearing in the left panel of Figure 7, values of a and b are obtained from Table 1. No IS One-Stage IS (Two-Stage IS) p p x x 1% 0.1% 0.01% 1% 0.1% 0.01% e e 10% 6 60 600 10% 1.2 (2.3) 1.2 (2.3) 1.3 (2.4) 5% 24 240 2400 5% 1.8 (2.8) 1.9 (2.9) 1.9 (2.9) 1% 600 6000 60,000 1% 20.0 (18.8) 21.8 (19.6) 23.8 (20.4) The two-stage estimator is statistically more accurate (Section 7.1) but computationally more expensive (Section 7.2) than the one-stage estimator. It is important to determine whether or not the beneﬁt of increased accuracy outweighs the cost of increased computational time. Table 3 suggests that, in some cases at least, implementing the second stage is indeed worth the effort, in the sense that it can achieve the same degree of accuracy in less time. Figure 8 illustrates the overall efﬁciency of the proposed algorithms, as a function of the desired degree of accuracy. Speciﬁcally, the left panel illustrates the ratio of (i) the total CPU time required to ensure the standard estimator ’s relative error does not exceed a given threshold to (ii) the total time required by the proposed algorithms, for a randomly selected set of parameter values in the one-factor model. The right panel illustrates the same ratio for a randomly selected set of parameters in the two-factor model. Overall Efficiency of Proposed Algorithms (One-Factor Model) Overall Efficiency of Proposed Algorithms (Two-Factor Model) 3500 400 No IS/One-Stage IS No IS/Two-Stage IS No IS/One-Stage IS No IS/Two-Stage IS 0 0 -1 -2 -3 -1 -2 -3 10 10 10 10 10 10 Desired Degree of Accuracy (Relative Error) Desired Degree of Accuracy (Relative Error) Figure 8. This ﬁgure illustrates the overall efﬁciency of the proposed algorithms. Speciﬁcally, the solid [dashed] line in the left panel illustrates the ratio of (i) the total run time (in seconds) required to ensure that the standard estimate’s relative error does not exceed a given threshold to (ii) the run time required by the one-stage [two-stage] algorithm, in the one-factor model. The right panel corresponds to the two-factor model. Parameter values are the same as in Figure 7 and Table 3. In the one-factor model, it would take hundreds of times longer to obtain an estimate of p whose relative error is less than 10%, and thousands of times longer to obtain an estimate whose relative error is less than 1%. The ﬁgure also suggests that, since it requires less run time to obtain very accurate estimates, the two-stage algorithm is preferable to the one-stage algorithm in the one-factor model. In the two-factor model—where estimating IS parameters and ﬁtting the second-stage polynomial approximation to q is more time consuming—the proposed algorithms are hundreds of times more efﬁcient than the standard algorithm. In addition, it appears that the one-stage algorithm is preferable to the two-stage algorithm in this case. Although the numerical values discussed here are speciﬁc to Relative Time Required Relative Time Required Risks 2020, 8, 25 31 of 36 the parameter set used to produce the ﬁgure, they are representative of other parameter sets. In other words, the behaviour illustrated in Figure 8 is representative of the general framework overall. 8. Concluding Remarks This paper developed an importance sampling (IS) algorithm for estimating large deviation probabilities for the loss on a portfolio of loans. In contrast to existing literature, we allowed loss given default to be stochastic and correlate with the default rate. The proposed algorithm proceeded in two stages. In the ﬁrst stage one generates systematic risk factors from an IS distribution that is designed to increase the rate at which adverse macroeconomic scenarios are generated. In the second stage one checks whether or not the simulated macro environment is sufﬁciently adverse—if it is then no further IS is applied and idiosyncratic risk factors are drawn from their actual (conditional) probability distribution, if it is not then one indirectly applies IS to the conditional distribution of the idiosyncratic risk factors. Numerical evidence indicated that the proposed algorithm could be thousands of times more efﬁcient than algorithms that did not employ any variance reduction techniques, across a wide variety of PD-LGD correlation models that are used in practice. Author Contributions: Both authors contributed equally to all parts of this paper. Both authors have read and agreed to the published version of the manuscript. Funding: This research was funded by NSERC Discovery Grant 371512. Acknowledgments: This work was made possible through the generous ﬁnancial support of the NSERC Discovery Grant program. The authors would also like to thank Agassi Iu for invaluable research assistance. Conﬂicts of Interest: The authors declare no conﬂict of interest. Appendix A. Exponential Tilts and Large Deviations Let X , X , . . . , be independent and identically distributed random variable with common density 1 2 f (x), having bounded support [x , x ], and common mean m = E[X ]. For q 2 R we let min max m(q) = E[exp(q X )] and k(q) = log(m(q)) denote the common moment generating function (mgf) 0 0 and cumulant generating function (cgf) of the X , respectively. Note that m = m (0) = k (0). Appendix A.1. Properties of k(q) Elementary properties of cgfs ensure that k () is a strictly increasing function that maps R onto (x , x ). One implication is that, for ﬁxed t 2 (x , x ), the graph of the function q 7! qt k(q) max max min min is \-shaped. The graph also passes through the origin, and its derivative at zero is t m. If this derivative is positive (i.e., if m < t) then the unique maximum is strictly positive and occurs to the right of the origin. If it is negative (i.e., if m > t) then the unique maximum is strictly positive and occurs to the left of the origin. If it is zero (i.e., if m = t) then the unique maximum of zero is attained at the origin. e e For a given t 2 (x , x ), there is a unique value of q for which k (q) = t. We let q = q(t) denote max min e e e this value of q. Note that q(t) is a strictly increasing function of t and that q(m) = 0. Thus q is positive b b e [negative] whenever t > m [t < m]. An important quantity in what follows is q = q(t) := max(0, q(t)), which can be interpreted as the unique value of q for which k (q) = max(m, t). Note that if t m then b b q = 0, and if t > m then q(t) > 0. Appendix A.2. Legendre Transform of k(q) We let q() denote the Legendre transform of k() over [0, ¥). That is, ˆ ˆ q(t) := max(qt k(q)) = qt k(q) , (A1) q0 ˆ ˆ where q = q(t) was deﬁned in the previous section, and is the (uniquely deﬁned) point at which the function q 7! qt k(q) attains its maximum on [0, ¥). Based on the discussion in the preceding Risks 2020, 8, 25 32 of 36 ˆ ˆ paragraph, we see that q(t) = q(t) = 0 whenever m t, whereas both q(t) and q(t) are strictly positive whenever m < t. The derivative of the transform q is demonstrably equal to: 0 0 0 ˆ ˆ ˆ q (t) = q(t) + q (t) [t k (q(t))] . ˆ ˆ Since q = 0 whenever t m and k (q) = t whenever t > m, the second term above vanishes for all t, and we ﬁnd that: q (t) = q(t) . (A2) Appendix A.3. Exponential Tilts For q 2 R we deﬁne: f (x) := exp(q x k(q)) f (x) . (A3) The density f is called an exponential tilt of f . As the value of the tilt parameter q varies, we obtain an exponential family of densities (exponential families have lots of very useful properties, and this is an easy way of constructing them). If q is positive then the right and left tails of f are heavier and thinner, respectively, than those of f . The opposite is true if q is negative. The larger in magnitude is q, the greater the discrepancy between f and f ; indeed the Kullback–Leibler divergence from f to f is q q qm + k(q), which is a strictly convex function of q that attains its minimum value (of zero) at q = 0. It is readily veriﬁed that k (q) = E [X ], where E denotes expectation with respect to f . This q i q q observation, in combination with the developments in Section A.1, implies that it is always possible to ﬁnd a density of the form (A3) whose mean is t, whatever the t 2 (x , x ). Indeed f is precisely min max ˜ such a density. Under mild conditions, f () can be characterised as that density that most resembles f (in the sense of minimum divergence), among all densities whose mean is x (and are absolutely continuous with respect to f ). Recall that q is the unique value of q for which k (q) = max(t, m). We can therefore interpret f as that density that most resembles f , among all densities whose mean is at least t (and that are absolutely continuous with respect to f ). Note in particular hat the mean of f is max(m, t). The numerical value of q can therefore be interpreted as the degree to which we must deform the density f , in order to produce a density whose mean is at least t. If m t then q = 0 and no adjustment is necessary. If m < t then q > 0 and mass must be transferred from the left tail to the right; the larger the discrepancy between m (the mean of f ) and t (the desired mean), the larger is q. Appendix A.4. Behaviour of X , Conditioned on a Large Deviation 1 N Let f (x) denote the conditional density of X , given that X > t, where X = X . t i N N i i=1 We suppress the dependence of f on N for simplicity. Using Bayes’ rule we get P(X > tjX = x) N i f (x) = f (x) , P(X > t) and since the X are independent, we get tx ¯ ¯ P(X > tjX = x) = P(X > t + ) . N N1 N1 Now, using the large deviation approximation P(X t) exp(N q(t)), we get that P(X > tjX = x) i tx exp((N 1)q(t + ) + Nq(t)) . N1 P(X > t) N Risks 2020, 8, 25 33 of 36 Now if N is large then tx tx tx q(t + ) q(t) + q (t) = q(t) + q , N1 N1 N1 where we have used the fact that q (t) = q(t). Putting everything together we arrive at the approximation P(X > tjX = x) N i ˆ ˆ exp(q x k(q)) , P(X > t) which leads to the approximation ˆ ˆ f (x) exp(q x k(q)) f (x) . (A4) We may thus interpret the conditional density f as that density which most resembles the unconditional density f , but whose mean is at least t. Appendix A.5. Approximate Behaviour of (X , X , . . . , X ), Conditioned on a Large Deviation 1 2 N ˆ ˆ Let f (x) = f (x , . . . , x ) denote the conditional density of (X , . . . , X ), given that X > t. Then t t 1 N 1 N N f (x ) i=1 f (x) = , x 2 A , t N,t where p = P(X > t) and A is the set of those points x 2 [x , x ] whose average value t N N,t min max exceeds t. We seek a density h(x), supported on [x , x ], which minimizes the Kullback-Leibler min max divergence (KLD) of h(x) := h(x ) Õ i i=1 ˆ ˆ from f . In other words, we seek an independent sequence Y , Y , . . . , Y (whose density is h) whose t 1 2 N behaviour most resembles (in a certain sense) the behaviour of X , X , . . . , X , conditioned on the 1 2 N large deviation X > t. ˆ ˆ Now let E denote expectation with respect to the density g. Then the divergence of h from f is g t ˆ ˆ E [log( f (X)/h(X))] = E [log ( f (X )/h(X ))] log( p ) ˆ t ˆ t å i i f f t t i=1 = N E [log ( f (X )/h(X ))] log( p ) ˆ 1 1 t = N E [log ( f (X )/h(X ))] log( p ) f 1 1 = N E [log ( f (X )/ f (X ))] + N E [log ( f (X )/h(X ))] log( p ) f 1 t 1 f t 1 1 t t t Now, the middle term in the above display is the KLD of h from f . As such it is non-negative, and is ˆ ˆ equal to zero if and only if h = f . It follows immediately that the divergence of h from f is minimised t t by setting h = f . Appendix B. Important Exponential Families This appendix considers two important special cases—the Gaussian and t families—of the general setting discussed in Section 2.2. Risks 2020, 8, 25 34 of 36 Appendix B.1. Gaussian Suppose ﬁrst that the Z is Gaussian with mean vector m 2 R and positive deﬁnite covariance matrix S . When specifying the IS distribution, one can either (i) shift the mean of Z but leaves its covariance structure unchanged or (ii) shift its mean and adjust its covariance structure. In general the latter approach will lead to a better approximation of the ideal IS density but more volatile IS weights. If we take the former approach (shifting mean, leaving covariance structure unchanged), the implicit family in which we are embedding f is the Gaussian family with arbitrary mean vector m 2 R and ﬁxed covariance matrix S . To this end, let f (z) = f(z; m , S ) denote the Gaussian density 0 0 0 with mean vector m and covariance matrix S and let f (z) = f(z; m, S ). It remains to identify the 0 0 l 0 natural sufﬁcient statistic and write the natural parameter l in terms of the mean vector m. To this end, note that f (z) 1 1 T T 1 T 1 T 1 = exp (m m )S z m S m + m S m . 0 0 f (z) 2 2 The natural sufﬁcient statistic is therefore S(z) = (z , . . . , z ) , the natural parameter is l(m) = S (m m ) . Note that we can write m(l) = m + S l, so that the natural parameter represents a sort of normalized 0 0 deviation from the actual mean m to the IS mean m. Lastly, we see that the cgf of S(Z) is h i 1 1 T 1 T 1 T T T K(l) = m S m m S m = l m + l S l , l 0 0 l 0 0 0 0 2 2 where we have written m instead of m(l) in the above display. Clearly, we have that both K(l) and K(l) are well-deﬁned for all l 2 R . The implication is that if we shift the mean of Z but leave its covariance structure unchanged, the IS weight will have ﬁnite variance regardless of what IS mean we choose. If we take the former approach (shifting mean, adjusting covariance) the implicit family in which we are embedding f is the Gaussian family with arbitrary mean vector m and arbitrary positive deﬁnite covariance matrix S. In this case we have f (z) = f(z; m, S) and the ratio of f (z) to f (z) is l l T 1 T 1 T 1 1 exp((m S m S )z + z (S S )z K(m, S)) , 0 0 0 where h i T 1 T 1 K(m, S) = m S m m S m + log(det(S) log(det(S ) . 0 0 The natural sufﬁcient statistic therefore consists of the d elements of the vector z plus the d elements of the vector zz . The natural parameter l consists of the elements of the vector 1 1 l := l (m, S) = S m S m 1 1 0 plus the elements of the matrix 1 1 l := l (S) = (S S ) . 2 2 Note that since we have assumed S is positive deﬁnite, we are implicitly assuming that the matrix l is such that the determinant of S l is strictly positive. The natural parameter space is therefore 2 0 unrestricted for l , but restricted (to matrices such that the indicated determinant is strictly positive) for l . 2 Risks 2020, 8, 25 35 of 36 The above relations can be inverted to write m and S in terms of l and l , indeed 1 2 1 1 S = S(l ) = (S 2l ) 2 2 and 1 1 1 m = m(l , l ) = (S 2l ) (l + S m ) . 1 2 2 1 0 0 0 The cgf of the natural sufﬁcient statistic is 1 1 1 1 T 1 T 1 K(l) = K(l , l ) = K(m , S ) = m S m m S m + log(det(S )) log(det(S )) 1 2 l ,l l l ,l 0 l 0 1 2 2 l ,l 1 2 0 0 2 1 2 l 2 2 2 2 It is now clear that K(l) is well deﬁned if and only if the determinant of S(l ) is strictly positive, which we have implicitly assumed to be the case since we have insisted S be positive deﬁnite. It is also clear that K(l) is well-deﬁned if and only if the determinants S(l ) is strictly positive, which will occur if and only if the determinant of (2S S ) is strictly positive. Remark A1. Suppose that f and f are Gaussian densities with respective positive deﬁnite covariance matrices S and S. Further suppose that Z f . Then the variance of f (Z)/ f (Z) is ﬁnite if and only if det(2S 0 l l S ) > 0. In the one-dimensional case d = 1 we write Z = Z. The condition in Remark A1 is satisﬁed 2 2 whenever s > s /2. In other words, if the variance of the IS distribution is too small, relative to actual variance of Z, then the IS weight will have inﬁnite variance. Appendix B.2. Chi-Square Family In preparation for the multivariate t family, we ﬁrst consider the chi-square family. Suppose that Z follows a chi-square distribution with n degrees of freedom, and that the goal is to allow Z to have arbitrary degrees of freedom n > 0 under the IS density. In order to identify the natural sufﬁcient statistic S(z) and natural parameter l = l(n), we let f (z) denote the chi-square density with n degrees of freedom and f (z) the chi-square density with n degrees of freedom. Then 0 l f (z) n n n n n n l 0 0 0 = exp log(z) log(2) + log G log G f (z) 2 2 2 2 from which we see that S(z) = log(z) and l = l(n) = (n n )/2. In addition we see that the cgf of S(z) is K(l) = l log(2) + log (G (l + n /2)) log(G(n /2)) . 0 0 In order that K(l) be will deﬁned, we require n > 0, which is obvious. In order that K(l) is well-deﬁned we require l + be positive, which in turn requires n < 2n . In other words, if the IS degrees of freedom are more than twice the actual degrees of freedom, then the IS weight will have inﬁnite variance. Appendix B.3. t Family The t family is not a regular exponential family, so it does not ﬁt directly into the framework discussed in Section 2.2. That being said, a multivariate t vector can be constructed from a Gaussian vector and an independent chi-square variable. Indeed if Z is Gaussian with mean zero and covariance matrix S , and R is chi-square with n degrees of freedom (independent of Z), then 0 0 Z = m + Z , (A5) is multivariate t with n degrees of freedom, mean m and covariance matrix S . 0 0 0 n 2 0 Risks 2020, 8, 25 36 of 36 In the case that Z is multivariate t, then, we can take our systematic risk factors to be the components of (Z, R). In this case the joint density of the systematic risk factors can be embedded into the parametric family T T f (z ˆ , r) := exp(l S(z ˆ) K(l)) exp(h T(r) L(h)) f (z ˆ) g(r) , (A6) l,h where l is and S are the natural parameter and sufﬁcient statistic for the Gaussian family, h and L are those for the chi-square family, and f and g are the Gaussian and chi-square densities. References Bickel, Peter J., and Kjell A. Doksum. 2001. Mathematical Statistics: Basic Ideas and Selected Topics, 2nd ed. Upper Saddle River: Prentice Hall, Volume 1. Chan, Joshua C.C., and Dirk P. Kroese. 2010. Efﬁcient estimation of large portfolio loss probabilities in t-copula models. European Journal of Operational Research 205: 361–67. Chatterjee, Sourav, and Persi Diaconis. 2018. The sample size required in importance sampling. Annals of Applied Probability 28: 1099–135. [CrossRef] de Wit, Tim. 2016. Collateral Damage—Creating a Credit Loss Model Incorporating a Dependency between Defaults and LGDs. Master ’s thesis, University of Twente, Enschede, The Netherlands. Deng, Shaojie, Kay Giesecke, and Tze Leung Lai. 2012. Sequential importance sampling and resampling for dynamic portfolio credit risk. Operations Research 60: 78–91. [CrossRef] Eckert, Johanna, Kevin Jakob, and Matthias Fischer. 2016. A credit portfolio framework under dependent risk parameters PD, LGD and EAD. Journal of Credit Risk 12: 97–119. [CrossRef] Frye, Jon. 2000. Collateral damage. Risk 13: 91-94. Frye, Jon, and Michael Jacobs Jr. 2012. Credit loss and systematic loss given default. Journal of Credit Risk 8: 109–140. [CrossRef] Glasserman, Paul, and Jingyi Li. 2005. Importance sampling for portfolio credit risk. Management Science 51: 1643–56. [CrossRef] Ionides, Edward L. 2008. Truncated importance sampling. Journal of Computational and Graphical Statistics 17: 295–311. [CrossRef] Jeon, Jong-June, Sunggon Kim, and Yonghee Lee. 2017. Portfolio credit risk model with extremal dependence of defaults and random recovery. Journal of Credit Risk 13: 1–31. [CrossRef] Kupiec, Paul H. 2008. A generalized single common factor model of portfolio credit risk. Journal of Derivatives 15: 25–40. [CrossRef] Miu, Peter, and Bogie Ozdemir. 2006. Basel requirements of downturn loss given default: Modeling and estimating probability of default and loss given default correlations. Journal of Credit Risk 2: 43–68. [CrossRef] Pykhtin, Michael. 2003. Unexpected recovery risk. Risk 16: 74–78. Scott, Alexandre, and Adam Metzler. 2015. A general importance sampling algorithm for estimating portfolio loss probabilities in linear factor models. Insurance: Mathematics and Economics 64: 279–93. Sen, Rahul. 2008. A multi-state Vasicek model for correlated default rate and loss severity. Risk 21: 94–100. Witzany, Jirí. ˇ 2011. A Two-Factor Model for PD and LGD Correlation. Working Paper. Available online: http://dx.doi.org/10.2139/ssrn.1476305 (accessed on 9 March 2020). c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Risks Multidisciplinary Digital Publishing Institute http://www.deepdyve.com/lp/multidisciplinary-digital-publishing-institute/importance-sampling-in-the-presence-of-pd-lgd-correlation-ottgXrJKOc

Loading next page...

References (19)

A. Scott, A. Metzler (2015)
A General Importance Sampling Algorithm for Estimating Portfolio Loss Probabilities in Linear Factor Models
ERN: Other Econometrics: Econometric & Statistical Methods - Special Topics (Topic)
Paul Kupiec (2008)
A Generalized Single Common Factor Model of Portfolio Credit Risk
, 15
Joshua Chan, Dirk Kroese (2010)
Efficient estimation of large portfolio loss probabilities in t-copula models
Eur. J. Oper. Res., 205
Shaojie Deng, K. Giesecke, T. Lai (2010)
Sequential Importance Sampling and Resampling for Dynamic Portfolio Credit Risk
OPER: Computational Techniques (Topic)
(2003)
Unexpected recovery risk
(2008)
A multi-state Vasicek model for correlated default rate and loss severity
S. Chatterjee, P. Diaconis (2015)
The sample size required in importance sampling
arXiv: Probability
This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license
Peter Miu, Bogie Ozdemir (2006)
Basel requirements of downturn loss given default: modeling and estimating probability of default and loss given default correlations
Journal of Credit Risk, 2
Johanna Eckert, K. Jakob, M. Fischer (2016)
A Credit Portfolio Framework Under Dependent Risk Parameters: Probability of Default, Loss Given Default and Exposure at Default
Journal of Credit Risk
P. Glasserman, Jingyi Li (2005)
Importance Sampling for Portfolio Credit Risk
Manag. Sci., 51
Johanna Eckert, K. Jakob, M. Fischer (2015)
A Credit Portfolio Framework under Dependent Risk Parameters PD , LGD and EAD
Jon Frye, Michael Jacobs (2012)
Credit loss and systematic loss given default
Journal of Credit Risk, 8
E. Ionides (2008)
Truncated Importance Sampling
Journal of Computational and Graphical Statistics, 17
D. Hunter (2006)
Basel requirements of downturn loss given default: modeling and estimating probability of default and loss given default correlations
R. Hancock (2014)
Collateral damage
Nature Biotechnology, 32
Jong-June Jeon, Sunggon Kim, Yonghee Lee (2017)
Portfolio Credit Risk Model with Extremal Dependence of Defaults and Random Recovery
Journal of Credit Risk
T. Wit (2016)
Collateral damage : creating a credit loss model incorporating a dependency between PD and LGD
J. Witzany (2011)
A Two-Factor Model for PD and LGD Correlation
European Finance

Publisher: Multidisciplinary Digital Publishing Institute
Copyright: © 1996-2020 MDPI (Basel, Switzerland) unless otherwise stated Disclaimer The statements, opinions and data contained in the journals are solely those of the individual authors and contributors and not of the publisher and the editor(s). Terms and Conditions Privacy Policy
ISSN: 2227-9091
DOI: 10.3390/risks8010025
Publisher site: See Article on Publisher Site

Abstract

risks Article Importance Sampling in the Presence of PD-LGD Correlation 1, 2 Adam Metzler * and Alexandre Scott Department of Mathematics, Wilfrid Laurier University, Waterloo, ON N2L 3C5, Canada Department of Applied Mathematics, University of Western Ontario, London, ON N6A 3K7, Canada; alexandre.scott202@gmail.com * Correspondence: ametzler@wlu.ca Received:20 January 2020; Accepted: 5 March 2020; Published: 10 March 2020 Abstract: This paper seeks to identify computationally efﬁcient importance sampling (IS) algorithms for estimating large deviation probabilities for the loss on a portfolio of loans. Related literature typically assumes that realised losses on defaulted loans can be predicted with certainty, i.e., that loss given default (LGD) is non-random. In practice, however, LGD is impossible to predict and tends to be positively correlated with the default rate and the latter phenomenon is typically referred to as PD-LGD correlation (here PD refers to probability of default, which is often used synonymously with default rate). There is a large literature on modelling stochastic LGD and PD-LGD correlation, but there is a dearth of literature on using importance sampling to estimate large deviation probabilities in those models. Numerical evidence indicates that the proposed algorithms are extremely effective at reducing the computational burden associated with obtaining accurate estimates of large deviation probabilities across a wide variety of PD-LGD correlation models that have been proposed in the literature. Keywords: importance sampling; acceptance-rejection sampling; portfolio credit risk; tail probabilities; large deviation probabilities; stochastic recovery; PD-LGD correlation; credit risk; loss probabilities 1. Introduction This paper seeks to identify computationally efﬁcient importance sampling (IS) algorithms for estimating large deviation probabilities for the loss on a portfolio of loans. Related literature assumes that realised losses on defaulted loans can be predicted with certainty, i.e., that loss given default (LGD) is non-random. In practice, however, LGD is impossible to predict and tends to be positively correlated with the default rate and the latter phenomenon is typically referred to as PD-LGD correlation (here PD refers to probability of default, which is often used synonymously with default rate). There is a large literature on modelling stochastic LGD and PD-LGD correlation, but there is a paucity of literature on using importance sampling to estimate large deviation probabilities in those models. This gap in the literature was brought to our attention by a risk management professional at a large Canadian ﬁnancial institution, and ﬁlling that gap is the ultimate goal of this paper. Problem Formulation and Related Literature Consider a portfolio of N exposures of equal size. Let L , L , . . . , L denote the losses on the 1 2 N individual loans, expressed as a percentage of notional value. The percentage loss on the entire portfolio is: L := L . (1) N å i i=1 Risks 2020, 8, 25; doi:10.3390/risks8010025 www.mdpi.com/journal/risks Risks 2020, 8, 25 2 of 36 We are interested in using IS to estimate large deviation probabilities of the form: p := P(L x) , (2) x N where x >> E[L ] = E[L ] is some large, user-deﬁned, threshold. i N In practice the number of exposures is large (e.g., in the thousands) and prudent risk management requires one to assume that the individual losses are correlated. In practice, then, L is the average of a large number of correlated variables. As such, its probability distribution is highly intractable and Monte Carlo is the method of choice for approximating p . As the probability of interest is typically 3 4 small (e.g., on the order of 10 or 10 ), the computational burden required to obtain an accurate estimate of p using Monte Carlo can be prohibitive. For instance if p is on the order of 10 and x x N is on the order of 1000 then, in the absence of any variance reduction techniques, the sample size required to reduce the estimator ’s relative error to 10% is on the order of one hundred thousand. Since each realisation of L requires simulation of one thousand individual losses, a sample size of 100, 000 requires one to generate one hundred million variables. If the desired degree of accuracy is reduced to 1%, the number of variables that must be generated increases to a staggering 10 billion. Importance sampling (IS) is a variance reduction technique that has the potential to signiﬁcantly reduce the computational burden associated with obtaining accurate estimates of large deviation probabilities. In the present context, effective IS algorithms have been identiﬁed for a variety of popular risk management models, but most are limited to the special case that loss given default (LGD) is non-random. The seminal paper in the area is (Glasserman and Li 2005), other papers include (Chan and Kroese 2010) and (Scott and Metzler 2015). It is well documented empirically, however, that portfolio-level LGD is not only stochastic, but positively correlated with the portfolio-level default rate as seen, for instance, in any of the studies listed in (Kupiec 2008) or (Frye and Jacobs 2012). This phenomenon is typically referred to as PD-LGD correlation. (Miu and Ozdemir 2006) show that ignoring PD-LGD correlation when it is in fact present can lead to material underestimates of portfolio risk measures. There is a large literature on modelling PD-LGD correlation (Frye 2000); (Pykhtin 2003); (Miu and Ozdemir 2006); (Kupiec 2008); (Sen 2008); (Witzany 2011); (de Wit 2016); (Eckert et al. 2016); and others listed in (Frye and Jacobs 2012), but there is a much smaller literature on using IS to estimate large deviation probabilities in such models. To the best of our knowledge only (Deng et al. 2012) and (Jeon et al. 2017) have developed algorithms that allow for PD-LGD correlation (the former paper considers a dynamic intensity-based framework, the latter considers a static model with asymmetric and heavy-tailed risk factors). The present paper contributes to this nascent literature by developing algorithms that can be applied in a wide variety of PD-LGD correlation models that have been proposed in the literature, and are popular in practice. The paper is structured as follows. Section 2 outlines important assumptions, notation, and terminology. Section 3 theoretically motivates the proposed algorithm in a general setting, and Section 4 discusses a few practical issues that arise when implementing the algorithm. Section 5 describes a general framework for PD-LGD correlation modelling that includes, as special cases, many of the models that have been developed in the literature and Section 6 describes how to implement the proposed algorithm in this general framework. Numerical results are presented and discussed in Section 7 and demonstrate that the proposed algorithms are extremely effective at reducing the computational burden required to obtain an accurate estimate of p . Relative error is the preferred measure of accuracy for large deviation probabilities. If pb is an estimator of p , its relative x x error is deﬁned as SD( pb )/ p , where SD denotes standard deviation. x x Risks 2020, 8, 25 3 of 36 2. Assumptions, Notation and Terminology We assume that individual losses are of the form L = L(Z, Y ), where L is some deterministic i i function, Z = (Z , . . . , Z ) is a d-dimensional vector of systematic risk factors that affect all exposures, and Y is a vector of idiosyncratic risk factors that only affect exposure i. We assume that Z, Y , Y , . . . i 1 2 are independent, and that the Y are identically distributed. The primary role of the systematic risk factors is to induce correlation among the individual exposures, and it is common to interpret the realised values of the systematic risk factors as determining the overall macroeconomic environment. It is worth noting that the we do not require the components of Z to be independent of one another, etc. for the components of Y . 2.1. Large Portfolios and the Region of Interest In a large portfolio, the inﬂuence of the idiosyncratic risk factors is negligible. Indeed, since individual losses are conditionally independent, given the realised values of the systematic risk factors, we have the almost sure limit: lim L = m(Z) , (3) N!¥ where m(z) := E[L jZ = z] = E[L jZ = z] . (4) i N Since m(Z) L for large N by Equation (3), the random variable m(Z) is often called the large portfolio approximation (LPA) to L . The LPA is often used to formalise the intuitive notion that, in a large portfolio, all risk is systematic (i.e., idiosyncratic is “diversiﬁed away”). We deﬁne the region of interest as the set: fz 2 R : m(z) xg . (5) The region of interest is “responsible” for large deviations in the sense that: lim P(m(Z) xjL x) = 1 (6) N!¥ for most values of x. Together, Equations (3) and (6) suggest that for large portfolios, it is relatively more important to identify an effective IS distribution for the systematic risk factors, as compared to the idiosyncratic risk factors. 2.2. Systematic Risk Factors We assume that Z is continuous and let f (z) denote its joint density. We assume that f is a member of an exponential family (see Bickel and Doksum 2001 for deﬁnitions and important properties) with d p natural sufﬁcient statistic S : R 7! R . Any other member of the family can be put in the form: f (z) := exp(l S(z) K(l)) f (z) , (7) where K() is the cumulant generating function (cgf) of S(Z) and l 2 R is such that K(l) is well-deﬁned. The parameter l is called the natural parameter of the family in Equation (7). Appendix B embeds the Gaussian and multivariate t families into this general framework. In light of the almost sure limit in Equation (3), we have that L converges to m(Z) in distribution, which implies that Equation (6) is valid for all values of x such that P(m(Z) = x) = 0. If m(Z) is a continuous random variable (which it is in most cases of practical interest) then Equation (6) is satisﬁed for every value of x. Risks 2020, 8, 25 4 of 36 We will eventually be using densities of the form in Equation (7) as IS densities for the systematic risk factors. The associated IS weight is: f (Z) = exp(l S(Z) + K(l)) , (8) f (Z) and it will be important to know when the variance of the IS weight is ﬁnite. The following observation is readily veriﬁed. Remark 1. If Z f , then Equation (8) has ﬁnite variance if and only if both K(l) and K(l) are well deﬁned. A standard result in the theory of exponential families is that: rK(l) = E [S(Z)] , (9) where r denotes gradient and E denotes expectation with respect to the density f . l l 2.3. Individual Losses We assume that L takes values in the unit interval. In general L will have a point mass at zero i i (if it did not, the loan would not be prudent) and the conditional distribution of L , given that L > 0, i i is called the (account-level) LGD distribution. We allow the LGD distribution to be arbitrary in the sense that it could be either discrete or continuous, or a mixture of both. This contrasts with the case of non-random LGD, where the LGD distribution is degenerate at a single point. We let ` 2 (0, 1] max denote the supremum of the support of L . Individual losses will therefore never exceed ` but could max take on values arbitrarily close (and possibly equal) to ` . max Remark 2. Despite the fact that L is not a continuous variable, in what follows we will proceed as if it was and make repeated reference to its “density." This is done without loss of generality, and in an interest of simplifying the presentation and discussion. Nothing in the sequel requires L to be a continuous variable, and everything carries over to the case where it is either discrete or continuous, or has both a discrete and continuous component. For z 2 R we let g(`jz) denote the conditional density of L , given that Z = z. We assume that the support of g(jz) is identical to the unconditional support, in particular it does not depend on the value of z. Note that m(z) is the mean of g(jz). In practice (i.e., for all of the PD-LGD correlation models listed in the introduction) g(jz) is not a member of an established parametric family, and direct simulation from g(jz) using a standard technique such as inverse transform or rejection sampling is not straightforward. Simulation from g(jz) is most easily accomplished by simulating the idiosyncratic risk factors, Y , from their density, say h(y), and then setting L = L(z, Y ). In other words, in order to simulate from g(jz) we make use i i of the fact that L = L(z, Y ) is a drawing from g(jz) whenever Y is a drawing from h(). i i i For q 2 R and z 2 R we let: k(q, z) := log(E[exp(q L )jZ = z]) and ¶k k (q, z) := (q, z) . ¶q Then k(, z) is the conditional cgf of L , given that Z = z, and k (, z) is its ﬁrst derivative. In practice, neither k(, z) nor k (, z) is available in closed form. In the examples we consider later in the paper each can be expressed as a one-dimensional integral, but the numerical values of those integrals must Risks 2020, 8, 25 5 of 36 be approximated using quadrature. This contrasts with the case of non-random LGD, where the conditional cgf can be computed in closed form . d 0 For x 2 (0, ` ) and z 2 R we let q(x, z) denote the unique solution to the equation k (q, z) = max ˆ ˆ ˆ max(x, m(z)). We often suppress dependence on x and z, and simply write q instead of q(x, z). That q is well-deﬁned follows immediately from the developments in Appendix A.1. Based on the discussion there we ﬁnd that q is zero whenever z lies in the region of interest, and is strictly positive otherwise. Remark 3. In practice, the value of q cannot be computed in closed form and must be approximated using a numerical root-ﬁnding algorithm. Since each evaluation of the function k (, z) requires quadrature, computing ˆ ˆ q is straightforward but relatively time consuming. This contrasts with the case of non-random LGD, where q can be computed in closed form at essentially no cost. For z 2 R we let q(, z) denote the Legendre transform of k(, z) over [0, ¥). That is, ˆ ˆ q(x, z) := max(q x k(q, z)) = q x k(q, z) . (10) q0 That q is the uniquely deﬁned point at which the function q 7! q x k(q, z) attains its maximum on [0, ¥) follows from the developments in Appendix A.2. Based on the discussion there, we ﬁnd that both q and q are equal to zero whenever z lies in the region of interest, and that both are strictly positive otherwise. 2.4. Conditional Tail Probabilities Given the realised values of the systematic risk factors, individual losses are independent. Large deviations theory can therefore provide useful insights into the large-N behaviour of the tail probability P(L xjZ = z). For instance, Chernoff’s bound yields the estimate: P(L > xjZ = z) exp(Nq(x, z)) , (11) and Cramér ’s (large deviation) theorem yields the limit: log(P(L > xjZ = z)) lim = q(x, z) . (12) N!¥ N Together these results are often used to justify the approximation: P(L > xjZ = z) exp(Nq(x, z)) , (13) which will be used repeatedly throughout the paper. The approximation in Equation (13) is often called the large deviation approximation (LDA) to the tail probability P(L > xjZ = z). Note that since q(x, z) = 0 whenever m(z) x, the LDA suggests that P(L > xjZ = z) 1 whenever z lies in the region of interest. 2.5. Conditional Densities N d Let L = (L , . . . , L ), noting that L takes values in [0, ` ] . For z 2 R and ` = (` , . . . , ` ) 2 max 1 N 1 N [0, ` ] , we let h (z, `) denote the conditional density of (Z, L), given that L > x. Then h is max x N x given by: f (z) g(` jz) i=1 h (z, `) = 1 , (14) f`2 A g N,x 3 (1R)q In the case of non-random LGD we have k(q, z) = log(1 + (e 1) P(L > 0jZ = z)), where R is the known recovery rate on the exposure. Risks 2020, 8, 25 6 of 36 N 1 where A is the set of points ` 2 [0, ` ] for which N ` > x. N,x max i i=1 We let f (z) denote the conditional density of the systematic risk factors, given that L > x, noting that: P(L > xjZ = z) f (z) = f (z) . (15) P(L x) In the examples we consider the mean of f tends to lie inside, but close to the boundary of, the region of interest. And relative to the unconditional density f , the conditional density f tends to be much more concentrated about its mean. Finally, we let g (`jz) denote the conditional density of an individual loss, given that Z = z and L > x, noting that: x` P(L > x + jZ = z) N1 N1 g (`jz) = g(`jz) . (16) P(L > xjZ = z) If the realised value of z lies inside the region of interest, the conditional density g (jz) tends to resemble the unconditional density g(jz). Intuitively, for such values of z the LDA informs that the event fL > xg is very likely, and conditioning on its occurrence is not overly informative. If the realised value of z does not lie in the region of interest then g (jz) tends to resemble the exponentially tilted version of g(jz) whose mean is exactly x. See Appendix A.3 for more details. Neither h , f , nor g are numerically tractable, but as we will soon see they do serve as useful x x x benchmarks against which to compare candidate IS densities. In addition, it is worth noting here that the representations of Equations (15) and (16) lend themselves to numerical approximation via the LDA in Equation (13). 3. Proposed Algorithm In practice, the most common approach to estimating p via Monte Carlo simulation in this framework is summarised in Algorithm 1 below. Algorithm 1 Standard Monte Carlo Algorithm for Estimating p 1: Simulate M i.i.d. copies of the systematic risk factors. Think of these as different economic scenarios and denote the simulated values by z , . . . , z . 1 M 2: For each scenario m: (a) Simulate the idiosyncratic risk factors for each exposure. Denote the simulated values y , . . . , y . 1,m N,m 1 N (b) Set ` = L(z , y ) for each exposures i, and ` = ` . i,m m i,m m i,m i=1 1 M 3: Return pb = 1 . å ¯ M m=1 f` >xg Algorithm 1 consists of two stages. In the ﬁrst stage one simulates the systematic risk factors, and in the second stage one simulates the idiosyncratic risk factors for each exposure. Mathematically, the ﬁrst stage induces independence among the individual exposures, so that the second stage amounts to simulating a large number of i.i.d. variables. Intuitively, it is useful to think of the ﬁrst stage as determining the prevailing macroeconomic environment, which ﬁxes economy-wide quantities such as default and loss-given-default rates. The second stage of the algorithm overlays idiosyncratic noise on top of economy-wide rates, to arrive at the default and loss-given-default rates for a particular portfolio. Relative error is the preferred measure of accuracy for estimators of rare event probabilities. The relative error of the estimator pb in Algorithm 1 is: 1 1 p p , M x Risks 2020, 8, 25 7 of 36 and the sample size required to ensure the relative error does not exceed some predetermined threshold e is: 1 1 p M(e) = . (17) e p The number of variables that must be generated in order to achieve the desired degree of accuracy e is therefore (N + d) M(e), which grows without bound as p ! 0. For instance if p = 10 , x x 3 2 N = 10 , d = 2, and e = 5 10 then the number of variables that must be generated is approximately four hundred million, which is an enormous computational burden for a modest degree of accuracy. In the next section we discuss general principles for selecting an IS algorithm that can reduce the computational burden required to obtain an accurate estimate of p . 3.1. General Principles For practical reasons, we insist that our IS procedure retains conditional independence of individual losses, given the realised value of the systematic risk factors. This is important because it allows us to reduce the problem of simulating a large number of dependent variables to the (much) more computationally efﬁcient problem of simulating a large number of independent variables. In the ﬁrst stage we simulate the systematic risk factors from the IS density f (z). The IS weight I S associated with this ﬁrst stage is therefore: f (z) L (z) := . f (z) I S In the second stage we simulate the individual losses as i.i.d. drawings from the density g (`jz). The I S IS weight associated with this second stage is: g(` jz) L (z, `) = , 2 Õ g (` jz) I S i i=1 and the IS density from which we sample (Z, L) is therefore of the form: h (z, `) = f (z) g (` jz) . (18) I S I S Õ I S i i=1 The so-described algorithm, with as-yet unspeciﬁed IS densities, is summarised in Algorithm 2. Algorithm 2 IS Algorithm for Estimating p 1: Simulate M i.i.d. copies of the systematic risk factors from the density f (z). Think of these as I S different economic scenarios and denote the simulated values by z , . . . , z . 1 M 2: For each scenario m: (a) Independently simulate ` , ` , . . . ,` from the density g (jz ). 2,m N,m m 1,m I S 1 N (b) Set ` = ` . m å i,m N i=1 3: Return pb = L (z ) L (z , ` ) 1 , where ` = (` , . . . ,` ). å ¯ x 1 m 2 m m m 1,m N,m m=1 f` >xg It is important to note that in the second stage, we will not be simulating individual losses directly from the (conditional) IS density g . Rather, we will simulate the idiosyncratic risk factors Y in such I S i a way as to ensure that for a given value of z, the variable L = L(z, Y ) has the desired density g . i i I S Risks 2020, 8, 25 8 of 36 Focusing on the “indirect" IS density of L , as opposed to “direct" IS density of Y , allows us to identify i i a much more effective second stage algorithm . The estimator pb produced by Algorithm 2 is demonstrably unbiased and its variance is: 2 2 2 E [(L(Z, L) 1 p ) ] = p E [(L (Z, L) 1 1) ] , (19) ¯ ¯ I S x I S x fL >xg x fL >xg N N where E denotes expectation under the IS distribution, L(z, `) := L (z) L (z, `) and I S 1 2 L(z, `) L (z, `) := . Note that L is the ratio of (i) the IS density in Equation (18) to (ii) the conditional density in Equation (14). The estimator ’s squared relative error can then be decomposed as: E [(L (Z, L) 1) 1 ] + [1 P (L > x)] , (20) x ¯ I S I S N fL >xg where P denotes probability under the IS distribution. I S Inspecting Equation (20) we see that an effective IS density should (i) assign a high probability to the event of interest and (ii) should resemble the conditional density in Equation (14) as closely as possible, in the sense that the ratio L should deviate as little as possible from unity. Clearly, an estimator that satisﬁes (ii) should also satisfy (i), since h assigns probability one to the event that L > x. The task now is to identify a density of the form in Equation (18) that resembles the ideal density in Equation (14), in some sense. 3.2. Identifying the Ideal IS Densities Our measure of similarity is Kullback–Leibler divergence (KLD), or divergence for short. See Chatterjee and Diaconis (2018) for a general discussion of the merits of minimum divergence as a criteria for identifying effective IS distributions. We begin by writing: h (z, `) f (z) g ˜ (`jz) x x x = , (21) h (z, `) f (z) g ˜ (`jz) I S I S I S where for ﬁxed z, g(` jz) i=1 g ˜ (`jz) = 1 f`2 A g N,x P(L > xjZ = z) is the joint density of N independent variables having marginal density g(jz), conditioned on their average value exceeding the threshold x, and g ˜ (`jz) = g (` jz) . I S Õ I S i i=1 is the joint density of N independent variables having marginal density g (jz). I S Using Equation (21) it is straightforward to decompose the divergence of h from h as: I S x D(h jjh ) = D( f jj f ) + E [ D(g ˜ (jZ)jjg ˜ (jZ))j L > x] , (22) x x x I S I S I S N where D(xjjh) denotes the divergence of the density x from the density h. The ﬁrst term in Equation (22) is the divergence of f from f , and is therefore minimised by setting f = f . In other words, the best I S x I S x In the earliest stages of this project we focused directly on an IS density for Y and had difﬁculties identifying effective candidates. Risks 2020, 8, 25 9 of 36 possible IS density for the systematic risk factors (according to the criteria of minimum divergence) is the conditional density f . The second term in Equation (22) is the average divergence of g ˜ (jz) I S from g ˜ (jz), averaged over all possible realisations of the systematic risk factors and conditioned on portfolio loss exceeding the threshold. Based on the developments in Appendix A.5, for ﬁxed z 2 R the divergence of g ˜ (jz) from g ˜ (jz) is minimised by setting g (jz) = g (jz). The average I S x I S x divergence in Equation (22) is, therefore, also minimised by setting g (jz) = g (jz) for every z 2 R . I S Remark 4. Among all densities of the form in Equation (18), the one that most resembles the ideal density h (in the sense of minimum divergence) is the density: d N h (z, `) := f (z) g (` jz) , z 2 R , ` 2 [0, ` ] . x x x max Õ i i=1 In other words, h is the best possible IS density (among the class Equation (18) and according to the criteria of minimum divergence) from which to simulate (Z, L). It is worth noting that the IS density h “gets marginal behaviour correct”, in the sense that the marginal distribution of the systematic risk factors, as well as the marginal distribution of an individual loss, is the same under h as it is under the ideal density h . The dependence structure of individual x x losses is different under h and h —this is the price that we must pay for insisting on conditional x x independence (i.e., computational efﬁciency). 3.3. Approximating the Ideal IS Densities Simulating directly from h requires an ability to simulate directly from f and g . Unfortunately, x x x neither f nor g is numerically tractable (witness the unknown quantities in Equations (15) and (16)), x x and it does not appear that either is amenable to direct simulation. Our next task is to identify tractable densities that resemble f and g . x x 3.3.1. Systematic Risk Factors As a tractable approximation to f , we suggest using that member of the parametric family in Equation (7) that most resembles f in the sense of minimum divergence. Using Equations (7) and (15) we get that: f (z) log = l S(z) + K(l) + log (P(L > xjZ = z)) log( p ) , f (z) whence the divergence of f from f is: l x ¯ ¯ ¯ D( f jj f ) = l E[S(Z)jL > x] + K(l) + E[log (P(L > xjZ = z))jL > x] log( p ) . (23) x x l N N N As a cgf, K() is strictly convex. As such, Equation (23) attains its unique minimum at that value of l such that: rK(l) = E[S(Z)jL > x] , (24) which, in light of Equation (9), is equivalent to: E [S(Z)] = E[S(Z)jL > x] . (25) Intuitively, we suggest using that value of the IS parameter l for which the mean of S(Z) under the IS density matches the conditional mean of S(Z), given that portfolio losses exceed the threshold. In what follows we let l denote that suggested value of the IS parameter l, i.e., that value of l that solves Equation (24). Risks 2020, 8, 25 10 of 36 Remark 5. The ﬁrst-stage IS weight associated with the so-described density is: ˆ ˆ L (Z) = exp(l S(Z) + K(l )) . (26) 1 x It is entirely possible—and quite common in the examples we consider in this paper—that K(l ) is not well-deﬁned, in which case Equation (26) has inﬁnite variance under f (recall Remark 1). At ﬁrst glance it might seem absurd to consider IS densities whose associated weights have inﬁnite variance, but as we discuss in Section 4.2 it is straightforward to circumvent this issue by trimming large ﬁrst-stage IS weights . It remains to develop a tractable approximation to the right hand side of Equation (24), so that we can approximate the value of l . To this end we write the natural sufﬁcient statistic as S(z) = (S (z), . . . , S (z)) and note that: E[S (Z) 1 ] ¯ E[S (Z) P(L > xjZ)] i fL >xg i N E[S (Z)jL > x] = = . i N ¯ ¯ P(L > x) E[P(L > xjZ)] N N Next, we use the LDA in Equation (13) to get: E[S (Z) exp(Nq(x, Z))] E[S (Z)jL > x] . (27) i N E[exp(Nq(x, Z))] As it only involves the systematic risk factors (and not the large number of idiosyncratic risk factors), the expectation on the right hand side of Equation (27) is amenable to either quadrature or Monte Carlo simulation. 3.3.2. Individual Losses We encourage the reader unfamiliar with exponential tilts to consult Appendix A.3, before reading the remainder of this section. Our approximation to g (`jz) is obtained by using the LDA of Equation (13) to approximate both conditional probabilities appearing in Equation (16) (see Appendix A.4 for details). The resulting approximation is: ˆ ˆ g ˆ (`jz) := exp(q` k(q, z)) g(`jz) , (28) where we recall that q is deﬁned and discussed in Section 2.3. If the realised values of the systematic risk factors obtained in the ﬁrst stage lie in the region of interest then q = 0 and g ˆ is identical to g. Otherwise, q is strictly positive and g ˆ is the exponentially tilted version of g whose mean is x. Intuitively, we can interpret g ˆ as that density that most resembles (in the sense of minimum divergence) g , among all densities whose mean is at least x, and the numerical value of q as the degree to which the density g(jz) must be deformed, in order to produce a density whose mean is at least x. Remark 6. The mean of Equation (28) is max(m(z), x). The implication is that the event of interest is not a rare event under the proposed IS algorithm. Indeed, E [L ] = E [E [L jZ]] = E [E [L jZ]] = E [max(x, m(Z))] x , I S i I S I S i f g ˆ i f ˆ ˆ l l which implies that lim P (L > x) = 1. N!¥ I S N An alternative to trimming is truncation of large weights; see Ionides (2008) for a general and rigorous treatment of truncated IS. Risks 2020, 8, 25 11 of 36 The second-stage IS weight associated with Equation (28) is: ˆ ˆ ˆ ¯ ˆ L (Z, L) = exp(q L + k(q, z)) = exp(N[q L k(q, Z)]) . 2 i N i=1 ¯ ¯ Since the second stage weight depends only on Z and L , we will often write L (Z, L ) instead of N 2 N L (Z, L). In order to assess the stability of the second-stage IS weight, we note that: ˆ ¯ ˆ ˆ ¯ exp(N[q L k(q, Z)]) = exp(q N[L x]) exp(Nq(x, Z)) . N N ¯ ¯ If Z lies in the region of interest then q = q = 0, whence L (Z, L ) = 1 whatever the value of L . 2 N N ˆ ¯ ¯ Otherwise, both q and q are strictly positive, which implies that L (Z, L ) < 1 whenever L > x. The 2 N N net result of this discussion is that: ¯ ¯ L (Z, L ) 1 whenever L > x . (29) 2 N N The implication is that large, unstable, IS weights in the second stage will never be a problem. If the realised value of z does lie in the region of interest then g ˆ and g are identical, and simulation from g is straightforward. Our ﬁnal task is to determine how to sample from Equation (28) in the case where z does not lie in the region of interest. One approach would be to identify a family of densities fh (y) : z 2 R g such that L = L(z, Y ) is a draw from g ˆ (jz) whenever Y is a draw from h (), but z i i x i z this approach appears to be overly complicated. A simpler approach is to sample from Equation (28) using rejection sampling with g as the proposal density. To this end, we note that for ﬁxed z, the ratio ˆ ˆ of g ˆ to g is exp(q` k(q, z)), which is bounded and strictly increasing on [0, ` ]. The best possible x max (i.e., smallest) rejection constant is therefore: ˆ ˆ c ˆ = c ˆ(x, z) := exp(q` k(q, z)) , (30) max and the algorithm for sampling from g ˆ would proceed as follows. First, sample Y from its actual density and set L = L(z, Y ). Then generate a random number U, uniformly distributed on [0, 1] and i i independent of Y . If, g ˆ (L jz) ˆ ˆ U = exp(q(` L )) , max i c ˆ g(L jz) set L = L and proceed to the next exposure. Otherwise return to the ﬁrst step and sample another i i pair (Y , U). 3.4. Summary and Intuition The proposed algorithm is summarised in Algorithm 3 below. The initial step is to approximate the value of the ﬁrst-stage IS parameter, l . In our numerical examples we use a small pilot simulation (10% of the sample size that we eventually use to estimate p ) and the approximation of Equation (27) in order to estimate l . Having computed l , the ﬁrst stage of the algorithm proceeds by simulating independent realisations of the systematic risk factors from the density f , and computing the associated ﬁrst-stage weights of Equation (26). Recall that we can interpret these realisations as corresponding to different economic scenarios. Intuitively, sampling from f instead of f increases the proportion of adverse scenarios that are generated in the ﬁrst stage. In the examples we consider, f concentrates most of its mass near the boundary of the region of interest, and the effect is to concentrate the distribution of m(Z) near x. In the second stage, one ﬁrst checks whether or not the realised values of the systematic risk factors lie inside the region of interest. If they do then the event of interest is no longer rare and there is no need to apply further IS in the second stage. Otherwise, if we “miss” the region of interest in the Risks 2020, 8, 25 12 of 36 ﬁrst stage, we “correct” this mistake by applying an exponential tilt to the conditional distribution of individual losses. Speciﬁcally, we transfer mass from the left tail of g to the right tail, in order to produce a density whose mean is exactly x. Algorithm 3 Proposed IS Algorithm for Estimating p 1: Compute l using a small pilot simulation. 2: Simulate M i.i.d. copies of the systematic risk factors from f (z) and compute the corresponding ﬁrst-stage IS weights. Denote the realised values of the factors by z , . . . , z and the associated IS 1 M weights by L (z ), . . . , L (z ). 1 1 1 M 3: For each scenario m, determine whether or not z lies in the region of interest (i.e., whether or not m(z ) x). If it does lie in the region, proceed as follows: (a) Simulate the idiosyncratic risk factors for each exposure. Denote the simulated values by y , . . . , y . 1,m N,m 1 N ¯ ¯ (b) Set ` = L(z , y ), ` = å ` and L (z , ` ) = 1. m m 2 m m i,m i,m i,m N i=1 Otherwise, proceed as follows: ˆ ˆ ˆ ˆ ˆ ˆ (a) Compute q = q(x, z ), k = k(q, z ) and c ˆ = exp(q` k). For each exposure i: m m max (i) Simulate the exposure’s idiosyncratic risk factor (denote the realised value by y ˆ ) and i,m set ` = L(z , y ). i,m i,m (ii) Simulate a random number drawn uniformly from the unit interval (denote the realised ˆ ˆ ˆ value by u) and determine whether or not u exp(q(` ` )). If it is, set ` = ` max i,m i,m i,m and proceed to the next exposure. Otherwise, return to step (i). 1 N ¯ ¯ ˆ ¯ ˆ (b) Set ` = å ` and L (z , ` ) = exp(N[q` k]) m 2 m m m i,m N i=1 1 M 4: Return pb = L (z ) L (z , ` ) 1 . å ¯ x 1 m 2 m m m=1 f` >xg M m 4. Practical Considerations In this section we discuss some of the practical issues that arise when implementing the proposed methodology. 4.1. One- and Two-Stage Estimators The rejection sampling procedure employed in the second stage of the proposed algorithm involves repeated evaluation of q, which requires a non-trivial amount of computational time time. In addition, rejection sampling in general requires relatively complicated code. As such, it is worth considering a simpler algorithm that only applies importance sampling in the ﬁrst stage, and is therefore easier to implement and faster to run. In what follows we will distinguish between one- and two-stage IS algorithms. A one-stage algorithm only applies IS in the ﬁrst stage and samples (Z, L) from the IS density: h (z, `) := f (z) g(` jz) . (31) 1S ˆ i i=1 The associated IS weight is L (z) and the one-stage algorithm is summarised in Algorithm 4 below. Note the simplicity of Algorithm 4, relative to Algorithm 3. The two-stage algorithm applies IS in both the ﬁrst stage and the second stage, sampling (Z, L) from the IS density: h (z, `) := f (z) g (` jz) . (32) 2S ˆ x Õ i i=1 Risks 2020, 8, 25 13 of 36 The associated IS weight is L (z) L (z, ` ), and the two-stage algorithm was summarised previously 1 2 N in Algorithm 3. Algorithm 4 Proposed One-Stage IS Algorithm for Estimating p 1: Compute l using a small pilot simulation. 2: Simulate M i.i.d. copies of the systematic risk factors from f (z) and compute the corresponding ﬁrst-stage IS weights. Denote the realised values of the factors by z , . . . , z and the associated IS 1 M weights by L (z ), . . . , L (z ). 1 1 1 M 3: For each scenario m: (a) Simulate the idiosyncratic risk factors for each exposure. Denote the simulated values by y , . . . , y . 1,m N,m 1 N (b) Set ` = L(z , y ) and ` = ` . i,m m i,m m i,m i=1 1 M 4: Return pb = L (z ) 1 . x å m ¯ M m=1 f` >xg Although it is simpler to implement and faster to run, the one-stage algorithm is less accurate than the two-stage algorithm. More precisely, the two-stage estimator never has larger variance than the one-stage estimator. To see this, ﬁrst let E denote expectation under the one-stage IS density 1S h (z, `) given in Equation (31). Then the variance of the one-stage estimator is: 1S 2 2 E [(L (Z) 1 ) ] p 1S 1 x fL xg where M denotes sample size. And if we let E denote expectation under the two-stage IS density 2S h (z, `) given in Equation (32) then the variance of the two-stage estimator is: 2S 2 2 E [(L (Z) L (Z, L ) 1 ) ] p 2S 1 2 N fL xg x In order to compare variances it sufﬁces to compare the second moments appearing above under the actual density h(z, `), and we let E denote expectation with respect to this density. To this end we note that: E [(L (Z) 1 ) ] = E[L (Z) 1 ] ¯ ¯ 1S 1 1 fL xg fL xg N N and ¯ ¯ E [(L (Z) L (Z, L ) 1 ) ] = E[L (Z) L (Z, L )] 1 ] 2S 1 2 N ¯ 1 2 N ¯ fL xg fL xg N N In light of Equation (29) we get that: L (Z, L ) 1 1 1 = 1 , (33) 2 N ¯ ¯ ¯ fL >xg fL >xg fL >xg N N N whence ¯ ¯ E [(L (Z) L (Z, L ) 1 ) ] = E[L (Z) L (Z, L ) 1 ] ¯ ¯ 2S 1 2 N 1 2 N fL xg fL xg N N E[L (Z) 1 ] 1 fL xg = E [(L (Z) 1 ) ] . 1S 1 ¯ fL xg The two-stage estimator will therefore never have larger variance than the the one-stage estimator. 4.2. Large First-Stage Weights In the examples that we consider in this paper, the systematic risk factors are Gaussian. When selecting their IS density, one could either (i) shift their means and leave their variances (and correlations) unchanged or (ii) shift their means and adjust their variances (and correlations). In general Risks 2020, 8, 25 14 of 36 the latter approach will lead to a much better approximation to the ideal density f , but could lead to an IS weight that has inﬁnite variance. By contrast, the former approach will always lead to an IS weight with ﬁnite variance, but could lead to a poor approximation of the ideal density. At ﬁrst glance it might seem absurd to consider IS densities whose weights are so unstable as to have inﬁnite variance, but we have found that adjusting the variances of the systematic risk factors can lead to more effective estimators, in terms of both statistical accuracy and run time (see Section 6.1 for more details), provided one stabilises the resulting IS weights in some way. In the remainder of this section we describe a simple stabilisation technique that leads to a computable upper bound on the associated bias (an alternative would be to stabilize unruly IS weights via truncation, as discussed in Ionides (2008)). Returning now to the general case, suppose that the ﬁrst-stage IS parameter, l , is such that the ﬁrst-stage IS weight, L (Z), has inﬁnite variance. We trim large ﬁrst-stage weights by ﬁxing a set A R such that L () is bounded over A, and discarding those simulations for which Z 2 / A. Speciﬁcally, the last line of Algorithm 3 would be altered to return the trimmed estimate: p = L (z ) L (z, ` ) 1 1 , x 1 m 2 m ¯ å fz 2 Ag f` >xg m m=1 etc. for Algorithm 4. The variance of the so-trimmed estimator is necessarily ﬁnite (recall that ¯ ¯ L (z, `) 1 if ` > x), and its bias is: ¯ ¯ E [L (Z) L (Z, L ) 1 1 ] = E[1 1 ] = E[P(L > xjZ) 1 ] , ¯ ¯ 2S 1 2 N N fL >xg fZ2 / Ag fL >xg fZ2 / Ag fZ2 / Ag N N where we have used the tower property (conditioning on Z) to obtain the last equality. Using Chernoff’s bound in Equation (11) we get that: E[P(L > xjZ) 1 ] E[exp(Nq(x, Z)) 1 ] . (34) fZ2 / Ag fZ2 / Ag As it only depends on the small number of systematic risk factors, and not the large number of idiosyncratic risk factors, the right-hand side of Equation (34) is a tractable upper bound on the bias committed by trimming large (ﬁrst-stage) IS weights. This upper bound can be used to assess whether or not the bias associated with a given set A is acceptable. 4.3. Large Rejection Constants The smaller the c ˆ, the more efﬁcient is the rejection sampling algorithm employed in the second stage. Indeed the average number of proposals that must be generated in order to obtain one realisation from g ˆ is 1/c ˆ. In the examples we consider in this paper, c ˆ is (essentially) a decreasing function m(z), such that c ˆ ! 1 as m(z) ! x and c ˆ ! ¥ as m(z) ! 0 (see Figure 1). The second-stage rejection algorithm is therefore quite efﬁcient when m(z) x and quite inefﬁcient when m(z) 0. Now, the IS density for the ﬁrst-stage risk factors is such that the distribution of m(Z) concentrates most of its mass near x (where c ˆ is a reasonable size), but it is still theoretically possible to obtain a realisation of the systematic risk factors for which m(z) is very small and c ˆ is unacceptably large (e.g., 10 ). In such situations the algorithm effectively grinds to a halt, as one endlessly generates proposed losses that have no realistic chance of being accepted. It is extremely unlikely that one does obtain such a scenario under the ﬁrst-stage IS distribution, but it is still important to protect oneself against this unlikely event. To this end we suggest ﬁxing some maximum acceptable rejection constant c , and only max applying the second stage IS to those ﬁrst-stage realizations for which m(z) < x and c ˆ c . In other max words, even if the realised values of the systematic risk factors lie outside the region of interest, we avoid applying the second stage if the associated rejection constant exceeds the predeﬁned threshold. Risks 2020, 8, 25 15 of 36 4.4. Computing q ˆ ˆ Repeated evaluations of q(x,) are necessary when computing l at the outset of the algorithm, as well as during the second stage of the two-stage algorithm. Recall that in order to compute q(x, z) “exactly” one must numerically solve the equation k (q, z) = x, which requires a non-trivial amount of CPU time. As each evaluation of q is relatively costly, repeated evaluation would, in the absence of any further approximation (over and above that inherent in numerical root-ﬁnding), account for the vast majority of the algorithm’s total run time. In order to reduce the amount of time spent evaluating q we ﬁt a low degree polynomial to the function q(x,) that can be evaluated extremely quickly, considerably reducing total run time. Speciﬁcally, suppose that we must compute q(x, z ) for each of n points z , . . . , z (either the sample n 1 n points from the pilot simulation, or the ﬁrst-stage realisations that did not land in the region of interest). We identify a small set C R that contains each of the n points, construct a mesh of m << n points in C, evaluate q exactly at each mesh point, and then ﬁt a ﬁfth degree polynomial to the ¯ ¯ ¯ resulting data. Letting q(x,) denote the resulting polynomial, we then evaluate q(x, z ), . . . , q(x, z ) 1 n ˆ ˆ instead of q(x, z ), . . . , q(x, z ). If m is substantially smaller than n, then the reduction in CPU time 1 n is considerable. 5. PD-LGD Correlation Framework All of the PD-LGD correlation models listed in the introduction are special cases of the following general framework—an observation that, to the best of our knowledge, has not been made in the literature. The systematic risk factors take the form Z = (Z , Z ), where Z and Z are bivariate D L D L normal with standard normal margins and correlation r . Idiosyncratic risk factors take the form Y = S i (Y , Y ), where Y and Y are bivariate normal with standard normal margins and correlation r . i,D i,L i,D i,L I Associated with each exposure is a default driver X and a loss driver X , deﬁned as follows: i,D i,L X = a Z + 1 a Y , (35) D D i,D i,D X = a Z + 1 a Y . (36) i,L L L i,L The factor loadings a and a are constants taking values in the unit interval, and dictate the relative D L importance of systematic risk versus idiosyncratic risk. The correlation between default drivers of 2 2 distinct exposures is r := a and the correlation between loss drivers of distinct exposures is r := a . D L D L The correlation between the default and potential loss drivers of a particular exposure is: q q 2 2 r := a a r + 1 a 1 a r , D L D L S I D L which can be positive or negative (or zero). Note that if r and r have the same sign then, since both S I factor loadings are positive, r inherits this common sign. D L The realised loss on exposure i is L = D L , where: i i i D = 1 1 fX F (P)g i,D is the default indicator associated with exposure i and L = h(X ) i i,L is called the potential loss (our terminology) associated with exposure i. Here P denotes the common default probability of all exposures and h is some function from R to [0, ` ]. It is useful (but not max necessary) to think of potential loss as L = max(0, 1C ), where C is the value of the collateral i i i pledged to exposure i expressed as a fraction of the loan’s notional value. Risks 2020, 8, 25 16 of 36 Models in this framework are characterised by (i) the correlation structure of the risk factors, speciﬁcally restrictions on the values of r and r , and (ii) the marginal distribution of potential loss. I S For instance: Frye (2000) assumes perfect systematic correlation (r = 1) and zero idiosyncratic correlation (r = 0); Pykhtin (2003) assumes perfect systematic correlation (r = 1) but allows for arbitrary idiosyncratic correlation (r unrestricted); Witzany (2011) allows for arbitrary systematic correlation (r unrestricted) but insists on zero idiosyncratic correlation (r = 0); Miu and Ozdemir (2006) allow for arbitrary systematic correlation (r unrestricted) and arbitrary idiosyncratic correlation (r unrestricted). Note that if r = 1 then the systematic risk factor is effectively one-dimensional. Indeed if r = 1 then Z = (Z, Z) from some standard Gaussian variable Z, and if r = 1 then Z = (Z,Z). S S We refer to the case jr j = 1 as the one-factor case, and the case jr j < 1 as the two-factor case. In the S S one-factor case we use Z, and not Z, to denote the systematic risk factor. The ﬁrst two models listed above are one-factor models, the last two are two-factor models. The marginal distribution of potential loss is determined by the speciﬁcation of the function h. For instance: Frye (2000) speciﬁes h(x) = max(0, 1 a(1 + bx)) for constants a 2 R and b > 0. Potential loss takes values in [0, ¥). Its density has a point mass at zero and is proportional to a Gaussian density on (0, ¥). Since L is not constrained to lie in the unit interval, this speciﬁcation violates the assumptions made in Section 2.3; a+bx Pykhtin (2003) speciﬁes h(x) = max(0, 1 e ) for constants a 2 R and b > 0. Potential loss takes values in [0, 1). Its density has a point mass at zero, and is proportional to a shifted lognormal density over (0, 1); Witzany (2011) and Miu and Ozdemir (2006) both specify h(x) = B (F(x)), where a, b > 0 and a,b B denotes the cdf of the beta distribution with parameters a and b. Potential loss takes values in a,b (0, 1). It is a continuous variable and follows a beta distribution. The sign of r and the nature of the function h (increasing or decreasing) will in general D L determine the sign of the relationship between D and L . If r > 0 then the relationship will be i i D L positive [negative] provided h is decreasing [increasing], and vice versa if r < 0. D L 5.1. Computing m(z) 2 T Here vectors z 2 R take the form z = (z , z ) . In order to obtain an expression for m(z) = D L E[L jZ = z], we begin with the observation that: E[L jZ] = E[L D jZ] = E[L E[D jX , Z]jZ] = E[L P(D = 1jX , Z)jZ] . i i i i i i,L i i i,L Thus, m(z) = h(x ) F(d, m(x , z), v) f(x , a z , 1 a ) dx , (37) L L L L L L where 1 a m(x , z) := a z + r (x a z ) L D D I L L L 1 a and 2 2 v = v(x , z) := (1 a )(1 r ) D I Risks 2020, 8, 25 17 of 36 are the conditional mean and variance of X , respectively, given that (X , Z) = (x , z). In general i,D i,L L m(z) must be evaluated using quadrature, and doing so is straightforward . On average (across parameter values and points z 2 R ) a single evaluation of m() requires approximately one millisecond. In the one-factor case with r = 1 [r = 1] the expression for m(z) = E[L jZ = z] is obtained by S S i plugging z = (z, z) [z = (z,z)] into Equation (37). 5.2. Computing k(q, z) and q(x, z) 2 T Here again, vectors z 2 R take the form z = (z , z ) . In order to derive an expression for k(q, z) D L we begin with the observation that: q L qL qL i i i e = 1(D = 0) + e 1(D > 0) = 1 + (e 1) 1(D > 0) , i i i q L and since k(q, z) = log(E[e jZ = z]), we get that: qh(x ) 2 k(q, z) = log 1 + (e 1) F(d, m(x , z), v) f(x , a z , 1 a ) dx , (38) L L L L L where m(x , z) and v are given in the previous section. In the one-factor case with r = 1 [r = 1] L S S the expression for k(q, z) = log(E[exp(q L )jZ = z]) is obtained by plugging z = (z, z) [z = (z,z)] into Equation (38). As with m(z), k(q, z) must in general be evaluated using quadrature, which is straightforward. The time required for a single evaluation of k(q,) is comparable to that required for a single evaluation of m(). In order to compute q we must solve the equation k (q, z) = x with respect to q. Differentiating Equation (38) we get: qh(x ) 2 ¶k(q, z) h(x ) e F(d, m(x , z), v) f(x , a z , 1 a ) dx L L L L L L 0 R L k (q, z) = = , (39) ¶q exp(k(q, z)) which is straightforward to compute using quadrature. A single evaluation of k (q, z) requires approximately twice as much time as a single evaluation of k(q, z). As the root of k (q, z) = x must be evaluated numerically, evaluating q is much more time consuming than evaluating k or k . Across parameter values and points z 2 R , and using q = 0 as an initial guess, the average time required for a single evaluation of q(x,) is slightly less than one tenth of one second. The right panel of Figure 1 illustrates the relationship between expected losses and the rejection ˆ ˆ constant employed in the second stage, c ˆ = exp(q k(q, z)). We see that c ˆ is essentially a decreasing ˆ ˆ function of m(z), such that c ! 1 as m(z) ! x and c ! ¥ as m(z) ! 0. The left panel of Figure 1 illustrates the graph of the LDA approximation P(L > xjZ = z) exp(Nq(x, z)). The approximation is identically equal to one inside the region of interest, and decays to zero very rapidly outside the region. In other words, most of the variability in the function q(x,) occurs along, and just outside, the boundary of the region of interest. All calculations are carried out using Matlab 2018a on a 2015 MacBook Pro with 6.8 GHz Intel Core i7 processor and 16 GB (1600 MHz) of memory. Numerical integration is performed using the built-in integral function. We use the Matlab function fzero for the root-ﬁnding. Risks 2020, 8, 25 18 of 36 LDA Approximation to Conditional Tail Probability Expected Losses and Rejection Constant 0.8 0.6 0.4 0.2 2 -2 -2.5 -3 -2 -3.5 -4 -4 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Figure 1. The left panel of this ﬁgure illustrates the relationship between expected losses m(z) and the second-stage rejection constant c ˆ = c ˆ(x, z), in the two-factor model. The right panel illustrates the graph of the LDA approximation of Equation (13). Parameters (randomly selected using the procedure in Section 5.3) in both panels are (P, r , r , r , r , a, b, N) = (0.0063, 0.3964, 0.2794, 0.3356, 0.7599, D L I S 0.6497, 0.5033, 134) and the threshold is x = 0.1575. Mean losses are E[L ] = 0.0029, and the probability that losses exceed the threshold x is on the order of 50 basis points. Points in the left panel were obtained by generating 1000 realizations of the systematic risk factors from their actual distribution (as opposed to the ﬁrst-stage IS distribution) using the indicated parameter values. 5.3. Exploring the Parameter Space The model contains ﬁve parameters, in addition to any parameters associated with the transformation h. We are ultimately interested in how well the proposed algorithms perform across a wide range of different parameter sets. As such, in our numerical experiments we will randomly select a large number of parameter sets according to the procedure described below, and assess the algorithms’ performance for each parameter set. Generate the default probability P uniformly between 0% and 10%, and generate each of the 2 2 correlations r = a and r = a uniformly between 0% and 50%; D L D L In the one-factor model, generate r uniformly on f1, 1g, i.e., r takes on the value 1 or +1 S S with equal probability. If r = 1 we generate r uniformly between 0% and 100%, and if r = 1 S I S we generate r uniformly between100% and 0%. This allows us to control the sign of r , which I D L we must do in order to ensure a positive relationship between default and potential loss. In the two-factor model we randomly generated r uniformly on [1, 1]. If r is positive, randomly S S generate r uniformly on [0, 1], otherwise randomly generate r uniformly on [1, 0]; I I We choose the transformation h() to ensure that (i) potential loss is beta distributed and (ii) there is a positive relationship between default and loss. The paramters a and b of the beta distribution are generated independently from an exponential distribution with unit mean. If r < 0 we D L 1 1 set h(x) = B (F(x)) and if r > 0 we set h(x) = B (F(x)), where B () is the cumulative D L a,b a,b a,b distribution function for the beta distribution with parameters a and b. Note that under these restrictions, in the one-factor model the expected loss function m(z) is monotone decreasing. In order to ensure that we are considering cases of practical interest, we randomise the portfolio size and loss threshold as follows. Generate the number of exposures randomly between 10 and 5000; 1 q In the one-factor model we generate the threshold x by setting x = m(F (10 )), where q is uniformly distributed on [1, 5]. The LPA suggests that 1 q p = P(L > x) P(m(Z) > x) = P(Z < m (x)) = 10 . x N Risks 2020, 8, 25 19 of 36 This means that log( p ), the order of magnitude of the probability of interested, is approximately uniformly distributed on [5,1]. In the two-factor model we set x = m(z ), where z = q q 1 1 (F (q), r F (q)) and q is uniformly distributed on [5,1]. 6. Implementation In this section we discuss our implementation of the algorithm proposed in Section 3 in the general framework outlined in Section 5. As the general framework encompasses many of the PD-LGD correlations that have been proposed in the literature, this section effectively discusses implementation of the proposed algorithm across a wide variety of models that are used in practice. 6.1. Selecting the IS Density for the Systematic Risk Factors The systematic risk factors here are Gaussian. When constructing their IS density we could either shift their means and leave their variances (and correlations) unchanged, or shift their means and adjust their variances (and correlations). Recall that the ultimate goal is to choose an IS density that closely resembles the ideal density f given in Equation (15). As illustrated in Figure 2, the ideal density f tends to be very tightly concentrated about its mean, and adjusting the variance of the systematic risk factors leads to a much better approximation to the ideal density for “typical values” of the ideal density. The left tail of the ideal density is, however, heavier than the variance-adjusted IS density, an issue that can be resolved by trimming large IS weights. Normal Approximation to Optimal Density Normal Approximation to Optimal Density 1.8 2.5 1.6 1.4 1.2 1.5 0.8 0.6 0.4 0.5 0.2 0 0 -5 -4.5 -4 -3.5 -3 -4 -3.5 -3 -2.5 -2 Figure 2. This ﬁgure illustrates f (in fact, the approximation of Equation (40)) for two randomly generated sets of parameters. Each panel superimposes (i) a normal density with the same mean and variance as f (dashed blue line), and (ii) a normal density with the same mean as f and unit variance x x (dash-dot red line). The mean and variance of f are computed via (computationally inefﬁcient) quadrature. The mean and variance of f are computed using quadrature. Parameters in the right panel are (P, r , r , r , r , a, b, N) = (0.02, 0.33, 0.27, 0.96, 1, 2.47, 4.32, 454), and for the left panel they D L I S are (P, r , r , r , r , a, b, N) = (0.03, 0.13, 0.12, 0.85, 1, 1.81, 1.90, 271). In both cases, the transformation D L I S h is taken to be h(x) = B (F(x)). a,b The downside to adjusting the variance of the systematic risk factors is that it can lead to ﬁrst-stage IS weights with inﬁnite variance, but numerical evidence suggests that this issue can be mitigated by In the one-factor model, a tractable approximation to the ideal density can be obtained by using the LDA of Equation (13) to approximate both probabilities appearing in Equation (15). The result is: exp(Nq(x, z)) f(z) f (z) , (40) exp(Nq(x, w)) f(w) dw and the right-hand side of Equation (40) can be approximated via quadrature. As the integrand involves q, the approximation is computationally very slow. Risks 2020, 8, 25 20 of 36 trimming large weights. Indeed, numerical experiments suggest that adjusting variance and trimming large weights leads to substantially more accurate estimators of p . Intuitively, it is more important for the IS density to mimic the behaviour of the ideal density over its “typical range”, as opposed to faithfully representing its tail behaviour. In addition to improving statistical accuracy, adjusting variance has the added beneﬁt of making the second stage of the algorithm more computationally efﬁcient in terms of run time. Indeed, as discussed in more detail in Section 6.3, adjusting variance tends to increase the proportion of ﬁrst-stage simulations that land in the region of interest (thereby reducing the number of times the rejection sampling algorithm must be employed in the second stage) and reduces the average size of the rejection constants employed in the second stage (thereby making the rejection algorithm more effective whenever it must be employed). 6.2. First Stage In this section we explain how to efﬁciently approximate the parameters of the optimal IS density for the systematic risk factors, in both the one- and two-factor models. We also explain how we trim large IS weights, and demonstrate that the resulting bias is negligible. 6.2.1. Computing Parameters in the Two-Factor Model In the two-factor model the systematic risk factors are bivariate Gaussian with zero mean vector and covariance matrix: " # 1 r S = . r 1 The mean vector and covariance matrix that satisfy the criteria of Equation (25) are: m := E[ZjL > x] (41) I S N and S := E[(Z m )(Z m ) jL > x] , (42) I S I S I S N respectively. In order to approximate the suggested mean vector and covariance matrix we use Equation (27) to get: E[exp(Nq(x, Z)) Z] m (43) I S E[exp(Nq(x, Z))] and E[exp(Nq(x, Z)) (Z m )(Z m ) ] I S I S S . (44) I S E[exp(Nq(x, Z))] The expected values appearing on the right-hand sides of Equations (43) and (44) are both amenable to simulation, and we use a small pilot simulation of size M << M to approximate them. In our numerical examples, the size of the pilot simulation is 10% of the sample size that is eventually used to estimate p . Whether or not we adjust the variance of the systematic risk factor, the standard error of the resulting estimator is of the form n/ M, where n depends on the model parameters and is easily estimated via simulation. Using 100 randomly selected parameter sets from the one-factor model, selected according to the procedure described in Section 5.3, we ﬁnd that for 0.03 the one-stage estimator n /n 1.54 p , where n denotes the value of n assuming we only shift the mean of the MS VA MS systematic risk factor and do not adjust its variance and n denotes the value when we do adjust variance. For probabilities VA in the range of interest, then, adjusting the variance of the systematic risk factor leads to an estimator that is nearly four times as efﬁcient, in the sense that the sample size required to achieve a given degree of accuracy (as measured by standard error) is nearly four times larger if we do not adjust variance. As discussed in Appendix B, the natural sufﬁcient statistic here consists of the components of Z plus the components of T T T ¯ ¯ ZZ . As such, in order to satisfy Equation (27) we must ensure that E [Z] = E[ZjL > x] and E [ZZ ] = E[ZZ jL > x], I S N I S N where E denotes mean under the IS distribution. These conditions are clearly equivalent to Equations (41) and (42).. I S Risks 2020, 8, 25 21 of 36 In order to implement the approximation we must ﬁrst simulate the systematic risk factors and then compute q(x, z) for each sample point z. The most natural way to proceed is to (i) sample the systematic risk factors from their actual distribution (bivariate Gaussian with zero mean vector and covariance matrix S) and (ii) numerically solve the equation k (q, z) = x in order to compute compute q(x, z) for each pilot sample point z that lies outside the region of interest. In our experience this leads to unacceptably inefﬁcient estimators, in terms of both (i) statistical accuracy and (ii) computational time. We deal with each issue in turn. As most of the variation in q(x,) occurs just outside the boundary of the region of interest (recall the right panel of Figure 1), we suggest using an IS distribution for the pilot simulation that is centered on the boundary of the region. Speciﬁcally, we suggest using that point on the boundary at which the density of the systematic risk factors attains its maximum value (i.e., the most likely point on the boundary): T 1 z := arg minfz S z : m(z) = xg . (45) The non-linear minimisation problem appearing above is easily and rapidly solved using standard techniques. We used fmincon function in Matlab. As z lies on the boundary of the region of interest, roughly half the pilot sample will lie outside the region. In Section 5.2 we noted that it takes nearly one tenth of one second to numerically solve the equation k (q, z) = x. As such, if we are to compute q exactly (i.e., by numerically solving the indicated equation) for each sample point that lies outside the region of interest, the total time required (in seconds) to estimate the ﬁrst-stage IS parameters will be at least M /20. In our numerical examples we use a pilot sample size of M = 1000, which means that it would take nearly one full minute to compute the ﬁrst-stage IS parameters. This discussion suggests that reducing the number of times we must numerically solve the equation k (q, z) = x could lead to a dramatic reduction in computational time. We suggest ﬁtting a low degree polynomial to the function q(x,), over a small region in R that contains all of the pilot sample points that lie outside the region of interest. Speciﬁcally, we determine the smallest rectangle that contains all of the pilot sample points, and discretize the rectangle using a mesh of n points, equally spaced in each direction. Next, we identify those mesh points that lie outside the region of interest and compute q(x, z) exactly (i.e., by solving k (q, z) = x numerically) for each such point. Finally, we ﬁt a polynomial to the resulting (z, q(x, z)) pairs and call the resulting function q(x,). Numerical evidence indicates at using a ﬁfth-degree polynomial and a mesh with 15 = 225 points leads to a sufﬁciently accurate approximation to q(x,) over the indicated range (the intersection of (i) the smallest rectangle that contains all sample points and (ii) the complement of ¯ ˆ the region of interest). Note that q could be an extremely inaccurate approximation to q outside this range, but that is not a concern because we will never need to evaluate it there. It remains to compute q(x, z) for each of the pilot points z. For those points z that lie inside the region of interest, we set q(x, z) = 0. For those points that lie inside the region, we set q(x, z) = ¯ ¯ ¯ ¯ ¯ q x k(q, z), where q = q(x, z). Evaluating q(x,) requires essentially no computational time (it is a polynomial), and if the mesh size and degree are chosen appropriately the difference between q ¯ ˆ and q is very small. In total, the suggested procedure reduces the number of evaluations of q from M /2 to n /2, for a percentage reduction of n / M . In our numerical examples we use n = 15 and p g p g M = 1000, which corresponds to a reduction of 75% in computational time. To summarise, we estimate the optimal ﬁrst-stage IS parameters as follows. First, we compute z . Second, we draw a random sample of size M from the Gaussian distribution with mean vector z and p x ¯ ˆ covariance matrix S. Third, we construct q(x,), the polynomial approximation to q(x,), as described Risks 2020, 8, 25 22 of 36 in the previous paragraph. Fourth, for those sample points z that lie outside the region of interest we ¯ ˆ compute q(x, z) using q instead q. The estimates of the optimal ﬁrst-stage IS parameters are then: w(Z ) exp(Nq(x, Z ))Z m m m m=1 m ˆ = I S w(Z ) exp(Nq(x, Z )) m m m=1 and å w(Z ) exp(Nq(x, Z ))(Z m ˆ )(Z m ˆ ) m m m m I S I S m=1 S = , I S w(Z ) exp(Nq(x, Z )) m m m=1 where Z , . . . , Z is the random sample and 1 M f(z; 0, S) w(z) = f(z; z , S) is the IS weight associated with shifting the mean of the systematic risk factors from 0 to z . The upper left panel of Figure 3 illustrates a typical situation where the mean of the IS distribution lies “just inside” the region of interest. Region of Interest 4.5 3.5 2.5 1.5 0.5 -2.95 -2.9 -2.85 -2.8 -2.75 Figure 3. This ﬁgure illustrates the locations of (i) the importance sampling (IS) mean used for the pilot simulation and (ii) the IS mean used for the actual simulation, relative to the region of interest. Parameters (randomly selected using the procedure in Section 5.3) in both panels are (P, r , r , r , r , a, b, N) = (0.0063, 0.3964, 0.2794, -0.3356, -0.7599, 0.6497, 0.5033, 134) and the threshold D L I S is x = 0.1575. Mean losses are E[L ] = 0.0029. 6.2.2. Computing Parameters in the One-Factor Model The procedure described in the previous section specialises in the one-factor case as follows. First, under the parameter restrictions outlined in Section 5.3, the expected loss function m(z) is a strictly decreasing function of z. As such, the region of interest is the semi-inﬁnite interval (¥, z ), where z := m (x), and its boundary is the single point z . In general z must be computed numerically, x x x which is straightforward. Second, we draw a random sample of size M from the Gaussian distribution with mean z and unit variance. Third, the polynomial approximation to q is constructed by evaluating q exactly (i.e., by numerically solving the equation k (q, z) = x) at each of n equally-spaced points z in the interval [z , z ], where z and z are the largest and smallest values obtained in the pilot + + simulation, respectively, and then ﬁtting a polynomial to the resulting (z, q(x, z)) pairs. Fourth, we Risks 2020, 8, 25 23 of 36 evaluate q(x, z) for each pilot sample point z as follows—if z lies inside the region of interest we set q(x, z) = 0, otherwise we compute q(x, z) by replacing the exact value q(x, z) with the approximate ¯ ¯ value q(x, z), where q is the polynomial constructed in the previous step. Note that a single evaluation ¯ ˆ of q requires far less computational time than a single evaluation of q. Finally, the approximations to the ﬁrst-stage IS parameters are: w(Z ) exp(Nq(x, Z ))Z m m m m=1 m ˆ = I S w(Z ) exp(Nq(x, Z )) å m m m=1 and w(Z ) exp(Nq(x, Z ))(Z m ˆ ) m m m I S 2 m=1 s = , I S w(Z ) exp(Nq(x, Z )) å m m m=1 where Z , . . . , Z is the random sample and 1 M f(z; 0, 1) w(z) = . f(z; z , 1) is the IS weight associated with shifting the mean of the systematic risk factor from 0 to z . 6.2.3. Trimming Large Weights In the one-factor model the ﬁrst-stage IS weight will have inﬁnite variance whenever s < 0.5 I S (see Remark A1 in Appendix B). In a sample of 100 parameter sets, randomly selected according to the procedure in Section 5.3, the largest realised value of s was 0.38, and the mean and median were 0.11 I S and 0.09, respectively. It appears, then, that the ﬁrst-stage IS weight in the one-factor model will have inﬁnite variance in all cases of practical interest. We trim large weights as described in Section 4.2, using the set: A = fz 2 R : jz m ˆ j Cs g I S I S for some constant C. In the numerical examples that follow we use C = 4, in which case we expect to trim less than 0.01% of the entire sample. Specialising Equation (34) to the present context, we get that an upper bound on the associated bias is given by: exp(Nq(x, z))f(z) dz , (46) which is straightforward (albeit slow) to compute using quadrature. Figure 4 illustrates the relationship between the probability of interest p and the upper bound of Equation (46) for the 100 randomly generated parameter sets, and clearly demonstrates that the bias associated with our trimming procedure is negligible. For instance, for probabilities on the order of 10 the bias is no larger than 10 , or 1% of the quantity of interest. In the two-factor model the ﬁrst-stage IS weight will have inﬁnite variance whenever det(2S S ) < 0. In a random sample of 100 parameter sets, this condition occurred 96 times. As in the I S one-factor model, then, the ﬁrst-stage IS weight in the two-factor model can be expected to have inﬁnite variance in most cases of practical interest. We trim large weights using the set: n o 2 T 1 2 A = z 2 R : (z m ˆ ) S (z m ˆ )j C I S I S I S for some constant C, and use C = 4 in the numerical examples that follow. Risks 2020, 8, 25 24 of 36 -3 -4 -5 -6 -7 -1 -2 -3 -4 -5 10 10 10 10 10 Figure 4. This ﬁgure illustrates the bias introduced by trimming large weights (vertical axis) as a function of the probability of interest (horizontal axis), for 100 randomly generated parameter sets in the one-factor case. For each set, we compute bias (in fact, an upper bound on the bias) by using quadrature to approximate Equation (46) and estimate the probability of interest using the full two-stage algorithm. 6.3. Second Stage The ﬁrst stage of the algorithm consists of (i) computing the ﬁrst-stage IS parameters, (ii) simulating a random sample of size M from the systematic risk factors’ IS distribution, and (iii) computing the associated IS weights, trimming large weights appropriately. Having completed these tasks, the next step is to simulate individual losses in the second stage. In the remainder of this section we let z = (z , z ) denote a generic realisation of the systematic risk factors obtained in the D L ﬁrst stage. 6.3.1. Approximating q Before generating any individual losses ﬁrst construct the polynomial approximation to q, using the same procedure described in Section 6.2.1. The basic idea is to ﬁt a relatively low degree polynomial to the surface of q(x,), over a small region that contains all of the ﬁrst-stage sample points. The values of z obtained in the pilot sample are invariably different from those obtained in the ﬁrst stage, so it is ¯ ˆ essential that the polynomial is reﬁt to account for this fact. In what follows we use q to approximate q whenever the numerical value of q is required, but since the difference between the two is small we do ˆ ¯ not distinguish between the two (i.e., we write q in this document, but use q in our code). 6.3.2. Sampling Individual Losses In this section we describe how to sample individual losses in the two-factor model. The procedure carries over in an obvious way to the one-factor model, so we do not discuss that case explicitly. If z lies inside the region of interest then the second stage is straightforward. For a given exposure i, we ﬁrst simulate the exposure’s idiosyncratic risk factors Y = (Y , Y ), from the bivariate normal i i,D i,L distribution with standard normal margins and correlation r . Next, we set: q q 2 2 (X , X ) = (a z + 1 a Y , a z + 1 a Y ) . i,D i,L D D i,D L L i,L D L Risks 2020, 8, 25 25 of 36 If X > F (P) then the exposure did not default and we set L = 0 and proceed to the next exposure. i,D i Otherwise the exposure did default, in which case we must compute h(X ), set ` = h(x ) and i,L i i,L then proceed to the next exposure. Note that we only evaluate h for defaulted exposures—this is important since evaluating h requires numerical inversion of the beta cdf, which is relatively slow. Having computed the individual losses associated with each exposure, we then compute the average 1 N ¯ ¯ loss ` = N ` and set L (z, `) = 1. i 2 i=1 ˆ ˆ If z lies outside the region of interest we must compute q, k(q) and c, which we do approximately using the polynomial approximation q. We then sample from g ˆ (jz) as follows. First simulate the idiosyncratic risk factors Y = (Y , Y ) from the bivariate normal distribution with standard normal i i,D i,L margins and correlation r . Also generate a random number U, independent of Y . Then set: I i q q 2 2 (X , X ) = (a z + 1 a Y , a z + 1 a Y ) . i,D i,L D D i,D L L i,L D L ˆ ˆ If the exposure did not default we set L = 0, otherwise we compute h and set L = h(X ). Next we i i i,L check whether or not 1 g ˆ (L jz) x i ˆ ˆ U = exp(q(` L )) (47) max c ˆ g(L jz) ˆ ˆ then accept L as a drawing from g ˆ , that is, set L = L and proceed to exposure i. Otherwise, draw i x i i another random number U and set of idiosyncratic factors. Once we have sampled the individual ¯ ¯ losses associated with each exposure we compute the average loss ` = N ` and set L (z, `) = i 2 i=1 ˆ ¯ ˆ ˆ exp(N[q` k(q, z)]), using the polynomial approximation to estimate the value of q. 6.3.3. Efﬁciency of the Second Stage The frequency with which the rejection sampling algorithm must be applied in the second stage is governed by P (m(Z) > x). The left panel of Figure 5 illustrates the empirical distribution of this I S probability across 100 randomly selected parameter sets. The distribution is concentrated towards small values (the median fraction is 27%) but does have a relatively thick right tail (the mean fraction is 35%). In some cases—particularly when the value of the parameter r is close to zero, in which case individual losses are very nearly independent and systematic risk is largely irrelevant—the vast majority of ﬁrst-stage simulations require further IS in the second stage. The efﬁciency of the rejection sampling algorithm, when it must be applied, is governed by the conditional distribution of c ˆ = c ˆ(x, Z) given that m(Z) < x. For each of the 100 parameter sets we estimate E [c ˆ(x, Z)jm(Z) < x], which determines the average size of the rejection constant for a given I S set of parameters, by computing the associated value of c ˆ for each ﬁrst-stage realisation that lies outside the region of interest and then averaging the resulting values. The right panel of Figure 5 illustrates the results, and we note that the mean and median of the data presented there are 1.09 and 1.17, respectively. The ﬁgure clearly indicates that the rejection sampling algorithm can be expected to be quite efﬁcient, whenever it must be applied. The distributions of P (m(Z) < x) and E [c ˆ(x, Z)jm(Z) < x] across parameters depend heavily I S I S on whether or not we adjust the variance of the systematic risk factors in the ﬁrst stage. When we do not adjust variance, the mean and median of P (m(Z) < x) (across 100 randomly selected parameter I S sets) rise to 49% and 45% (as compared to 35% and 27% when we do adjust variance), and the mean and median of E [c ˆ(x, Z)jm(Z) < x] rise to 18.6 and 1.8, respectively (as compared to 1.17 and 1.09 I S when we do adjust variance). Remark 7. If we do not adjust the variance of the systematic risk factors in the ﬁrst stage, then (i) the rejection sampling algorithm must be applied more frequently and (ii) is less efﬁcient whenever it must be applied. As such, adjusting the variance of the systematic risk factors reduces the total time required to implement the two-stage algorithm. Risks 2020, 8, 25 26 of 36 0.45 0.8 0.4 0.7 0.35 0.6 0.3 0.5 0.25 0.4 0.2 0.3 0.15 0.2 0.1 0.1 0.05 0 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 Figure 5. This ﬁgure illustrates the variation of P (m(Z) < x) (left panel) and E [c ˆ(x, Z)jm(Z) < x] I S I S (right panel) across model parameters. Recall that the former quantity determines the frequency with which the second-stage rejection sampling algorithm must be applied and the latter quantity determines the efﬁciency of the algorithm when it must be applied. For each of 100 parameter sets, randomly selected according to the procedure described in Section 5.3, we compute the ﬁrst-stage IS parameters and then draw 10,000 realisations of the systematic risk factors from the variance adjusted ﬁrst-stage IS density. The intuition behind this fact is as follows. First recall that the mean of the systematic risk factors tends to lie just inside the region of interest (recall Figure 3). In such cases the effect of reducing the variance of the systematic risk factors is to concentrate the distribution of Z just inside the boundary of the region of interest. Not only will this ensure that more ﬁrst-stage realisations lie inside the region of interest (thereby reducing the fraction of points that require further IS in the second stage), it will also ensure that those realisations that lie outside the region (i.e., for which m(z) < x) do not lie “that far ” outside the region (i.e., that m(z) is not “that much less” than x), which in turn ensures that the typical size of c ˆ is relatively close to one (recall the left panel of Figure 1). 7. Performance Evaluation In this section we investigate the proposed algorithms’ performance in terms of statistical accuracy, computational time, and overall. Unless otherwise mentioned, we use a pilot sample size of M = 1000 to estimate the ﬁrst-stage IS parameters and a sample size of M = 10,000 to estimate the probability of interest ( p ). We use the value C = 4 to trim large ﬁrst-stage IS weights, and a value of c = 10 to x max trim large rejection constants. 7.1. Statistical Accuracy The standard error of any estimator that we consider is of the form n / M for some constant n that depends on the algorithm used and the model parameters. For instance, for the one-stage estimator in the two-factor case we have n = SD (L (Z) 1 ), where SD denotes standard x ¯ 1S 1 fL >xg 1S deviation under the one-stage IS density of Equation (31). Note that in the absence of IS we have 0.5 n = p (1 p ) p as p ! 0. x x x x Figure 6 illustrates the relationship between n and p using 100 randomly selected parameters x x sets, for the two-stage algorithm and in the two-factor case. Importantly, we see that (i) n seems to be a function of p (i.e., it only depends on model parameters through p ) and (ii) for small probabilities x x the functional relationship appears to be of the form n = a p for constants a and b. These features are also present in the case of the one-stage estimator, as well as for both estimators in the one-factor model. The numerical values of a and b are easily estimated using the line of best ﬁt (on the logarithmic scale), and the estimated values for both the one- and two-factor cases are summarised in Table 1. Of particular note is the fact that the value of b is extremely close to one in every case. Risks 2020, 8, 25 27 of 36 Statistical Accuracy of Two-Stage Estimator -1 -2 -3 -4 -5 -1 -2 -3 -4 -5 10 10 10 10 10 Probability of Interest Figure 6. This ﬁgure illustrates the relationship between n and p , where n is the standard deviation x x x of L (Z)L (L , Z)1 under the two-stage IS density of Equation (32), in the two-factor case. 1 2 N fL xg The numerical values of p and n are estimated for each of 100 randomly generated parameters sets, x x according to the procedure described in Section 5.3. Table 1. This table reports ﬁtted values of the relationship n a p for each estimator (one- and two-stage) and each model (one- and two-factor). Values of a and b are obtained by determining the line of best ﬁt on the logarithmic scale (i.e., the line appearing in Figure 6). Note that in the absence of 0.5 IS we would have n = p (1 p ) p . x x x One-Stage Algorithm Two-Stage Algorithm 0.98 0.99 One-Factor Model 0.91 p 0.81 p x x 0.98 0.98 Two-Factor Model 0.98 p 0.81 p x x Of particular interest in the rare event context is an estimator ’s relative error, deﬁned as the ratio of its standard error to the true value of the quantity being estimated. For any of the estimators that we b1 consider, the component of relative error that does not depend on sample size is n / p a p . In the x x absence of IS we have b 1 = 0.5, in which case relative error grows rapidly as p ! 0 (i.e., n ! 0 x x but n / p ! ¥ as p ! 0). By contrast, b 1 for any of our IS estimators, in which case there is weak x x x dependence of relative error on p . The minimum sample size required to ensure that an estimator ’s 2(b1) 2 2 2 2 relative error does not exceed the threshold e is v /( p e) a p e . In the absence of IS we x x have b 0.5, in which case the sample size (and therefore computational burden) required to achieve a given degree of accuracy increases rapidly as p ! 0. By contrast, for all of our IS estimators we have b 1, in which case the minimum sample size (and computational burden) is nearly independent of p . Our ultimate goal is to reduce the computational burden associated with estimating p , in situations where p is small. To see how effective the proposed algorithms are in this regard, note that the sample size required to achieve a given degree of accuracy using the proposed algorithm, relative to that required to achieve the same degree of accuracy in the absence of IS, is approximately 2(b1) 2 2 a p e x 2 2b1 = a p , p e 2 2b1 which does not depend on e. Since a < 1 and b > 0.5 (recall Table 1), we have that a p < p . Standard Error (Scaled by Sample Size) Risks 2020, 8, 25 28 of 36 Remark 8. The relative sample size required to achieve a given degree of accuracy using the proposed algorithm, relative to that required in the absence of IS, is not larger than the probability of interest. For example, if the probability of interest is approximately 1%, then the proposed algorithm requires a sample size that is less than 1% of what would be required in the absence of IS (regardless of the desired degree of accuracy). And if the probability of interest is 0.1%, then the proposed algorithm requires a sample size that is less than 0.1% of what would be required in the absence of IS. In other words, the proposed algorithm is extremely effective at reducing the sample size required to achieve a given degree of accuracy. It is also insightful to compare the efﬁciency of the two-stage estimator, relative to the one-stage estimator. In the one-factor case, the minimum sample size required using the two-stage algorithm, relative to that required using the one-stage algorithm, is approximately: 0.02 2 0.66 p e x 0.02 = 0.80 p . 0.04 0.83 p e As p ranges from 1% to 0.01% the estimated relative sample size ranges from 0.73 to 0.67. In the two-factor case, the relative sample size is approximately 0.69, regardless of the value of p . Remark 9. In both the one- and two-factor models, the two-stage algorithm is more efﬁcient than the one-stage algorithm, in the sense that it requires a smaller sample size in order to achieve a given degree of accuracy. Indeed, in cases of practical interest (probabilities in the range of 1% to 0.01%) the minimum sample size required to achieve a given degree of accuracy using the two-stage algorithm is roughly 70% of what would be required using the one-stage algorithm. 7.2. Computational Time Figure 7 illustrates the relationship between sample size (M) and run time (total time required to estimate p using a particular algorithm), for one randomly selected set of parameters. Across both models and algorithms, the relationship is almost perfectly linear. In the absence of IS the intercept is zero (i.e., run time is directly proportional to sample size), whereas the intercepts are non-zero for the IS algorithms. The non-zero intercepts are due to the overhead associated with (i) computing the ﬁrst-stage IS parameters, which accounts for almost all of the difference between the intercepts of the solid (no IS) and dashed (one-stage IS) intercepts, and (ii) computing the second-stage polynomial approximation to q, which accounts for almost all of the difference between the intercepts of the the dashed (one-stage IS) and dash-dot (two-stage IS) lines. It is also worth noting that a given increase in sample size will have a greater impact on the run times for the IS algorithms than it will on the standard algorithm. This is because we only calculate h(X ) for defaulted exposures (evaluating h() i,L is slow because it requires numerical inversion of the beta distribution function), and the default rate is higher under the IS distribution. Across 100 randomly generated parameter sets, portfolio size (N) is most highly correlated with run time and the relationship is roughly linear. Table 2 reports summary statistics on run times, across algorithms and models. Table 2. This table reports summary statistics—in seconds, and across 100 randomly selected parameter sets—for total run time (ﬁrst three columns), time required to estimate the ﬁrst-stage IS parameters (fourth column) and time required to ﬁt the second-stage polynomial approximation to q (ﬁnal column). Average Run Times No IS One-Stage IS Two-Stage IS m , S q IS IS One Factor 7.3 25.6 33.7 1.5 0.8 Two Factor 7.4 39.0 55.5 14.3 8.9 Risks 2020, 8, 25 29 of 36 Sample Size and Run Time (One-Factor Model) Sample Size and Run Time (Two-Factor Model) 30 50 No IS One-Stage IS No IS Two-Stage IS One-Stage IS Two-Stage IS 15 25 0 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Sample Size (M) Sample Size (M) Figure 7. This ﬁgure illustrates the relationship between sample size (M) and run time (total CPU time required to estimate p by a particular algorithm), using a set of parameters randomly selected according to the procedure described in Section 5.3. For each value of M we use a pilot sample that is 10% as large as the sample that is eventually used to estimate p (i.e., we set M = 0.1M). The left panel x p corresponds to the one-factor model and parameter values are (P, r , r , r , r , a, b) = (0.0827, 0.1000, D L I 0.3629,0.0180,1, 0.6676, 0.8751) and N = 2334. The right panel corresponds to the two-factor model and parameter values are (P, r , r , r , r , a, b) = (0.0241, 0.2322, 0.0343, 0.1650, 0.4135, 0.4056, 0.4942) D L I S and N = 3278. 7.3. Overall Performance Recall that the ultimate goal of this paper is to reduce the computational burden associated with estimating p , when p is small. The computational burden associated with a particular algorithm is a x x function of both its statistical accuracy and total run time. We have seen that the proposed algorithms are substantially more accurate, but require considerably more run time. In this section we demonstrate that the beneﬁt of increased accuracy is well worth the cost of additional run time, by considering the amount of time required by a particular algorithm in order to achieve a given degree of accuracy (as measured by relative error). To begin, let t( M) denote the total run time required by a particular algorithm to estimate p using a sample of size M. As illustrated in Figure 7 we have t( M) c + d M for constants c and d that depend on the underlying model parameters (particularly portfolio size, N) as well as the algorithm being used. In Section 7.1 we saw that the minimum sample size required to ensure that the estimator ’s relative error does not exceed the threshold e isL 2(b1) 2 2 M(e) a p e , for constants a and b depending on the underlying model (one- or two-factor) and algorithm being used. Thus, if T(e) denotes the total CPU time required to ensure that the estimator ’s relative error does not exceed e, we have: 2(b1) 2 2 T(e) c + da p e . (48) Table 3 contains sample calculations for several different values of p and e, using the data appearing in the left panel of Figure 7 to estimate c and d and the values of a and b implicitly reported in Table 1. The results reported in the table are representative of those obtained using different parameter sets. It is clear that the proposed algorithms can substantially reduce the computational burden associated with accurate estimation of small probabilities. For instance, if the probability of interest is on the order of 0.1% then either of the proposed algorithms can achieve 5% accuracy within 2–3 s, as compared to 4 min (80 times longer) in the absence of IS. Run Time (Seconds) Run Time (Seconds) Risks 2020, 8, 25 30 of 36 Table 3. This table reports the time (in seconds) required to achieve a given degree of accuracy (computed using Equation (48)) for several values of p and e, for the parameter values corresponding to the left panel of Figure 7. Note that this is for the one-factor model. Values of c and d are obtained from the lines of best fit appearing in the left panel of Figure 7, values of a and b are obtained from Table 1. No IS One-Stage IS (Two-Stage IS) p p x x 1% 0.1% 0.01% 1% 0.1% 0.01% e e 10% 6 60 600 10% 1.2 (2.3) 1.2 (2.3) 1.3 (2.4) 5% 24 240 2400 5% 1.8 (2.8) 1.9 (2.9) 1.9 (2.9) 1% 600 6000 60,000 1% 20.0 (18.8) 21.8 (19.6) 23.8 (20.4) The two-stage estimator is statistically more accurate (Section 7.1) but computationally more expensive (Section 7.2) than the one-stage estimator. It is important to determine whether or not the beneﬁt of increased accuracy outweighs the cost of increased computational time. Table 3 suggests that, in some cases at least, implementing the second stage is indeed worth the effort, in the sense that it can achieve the same degree of accuracy in less time. Figure 8 illustrates the overall efﬁciency of the proposed algorithms, as a function of the desired degree of accuracy. Speciﬁcally, the left panel illustrates the ratio of (i) the total CPU time required to ensure the standard estimator ’s relative error does not exceed a given threshold to (ii) the total time required by the proposed algorithms, for a randomly selected set of parameter values in the one-factor model. The right panel illustrates the same ratio for a randomly selected set of parameters in the two-factor model. Overall Efficiency of Proposed Algorithms (One-Factor Model) Overall Efficiency of Proposed Algorithms (Two-Factor Model) 3500 400 No IS/One-Stage IS No IS/Two-Stage IS No IS/One-Stage IS No IS/Two-Stage IS 0 0 -1 -2 -3 -1 -2 -3 10 10 10 10 10 10 Desired Degree of Accuracy (Relative Error) Desired Degree of Accuracy (Relative Error) Figure 8. This ﬁgure illustrates the overall efﬁciency of the proposed algorithms. Speciﬁcally, the solid [dashed] line in the left panel illustrates the ratio of (i) the total run time (in seconds) required to ensure that the standard estimate’s relative error does not exceed a given threshold to (ii) the run time required by the one-stage [two-stage] algorithm, in the one-factor model. The right panel corresponds to the two-factor model. Parameter values are the same as in Figure 7 and Table 3. In the one-factor model, it would take hundreds of times longer to obtain an estimate of p whose relative error is less than 10%, and thousands of times longer to obtain an estimate whose relative error is less than 1%. The ﬁgure also suggests that, since it requires less run time to obtain very accurate estimates, the two-stage algorithm is preferable to the one-stage algorithm in the one-factor model. In the two-factor model—where estimating IS parameters and ﬁtting the second-stage polynomial approximation to q is more time consuming—the proposed algorithms are hundreds of times more efﬁcient than the standard algorithm. In addition, it appears that the one-stage algorithm is preferable to the two-stage algorithm in this case. Although the numerical values discussed here are speciﬁc to Relative Time Required Relative Time Required Risks 2020, 8, 25 31 of 36 the parameter set used to produce the ﬁgure, they are representative of other parameter sets. In other words, the behaviour illustrated in Figure 8 is representative of the general framework overall. 8. Concluding Remarks This paper developed an importance sampling (IS) algorithm for estimating large deviation probabilities for the loss on a portfolio of loans. In contrast to existing literature, we allowed loss given default to be stochastic and correlate with the default rate. The proposed algorithm proceeded in two stages. In the ﬁrst stage one generates systematic risk factors from an IS distribution that is designed to increase the rate at which adverse macroeconomic scenarios are generated. In the second stage one checks whether or not the simulated macro environment is sufﬁciently adverse—if it is then no further IS is applied and idiosyncratic risk factors are drawn from their actual (conditional) probability distribution, if it is not then one indirectly applies IS to the conditional distribution of the idiosyncratic risk factors. Numerical evidence indicated that the proposed algorithm could be thousands of times more efﬁcient than algorithms that did not employ any variance reduction techniques, across a wide variety of PD-LGD correlation models that are used in practice. Author Contributions: Both authors contributed equally to all parts of this paper. Both authors have read and agreed to the published version of the manuscript. Funding: This research was funded by NSERC Discovery Grant 371512. Acknowledgments: This work was made possible through the generous ﬁnancial support of the NSERC Discovery Grant program. The authors would also like to thank Agassi Iu for invaluable research assistance. Conﬂicts of Interest: The authors declare no conﬂict of interest. Appendix A. Exponential Tilts and Large Deviations Let X , X , . . . , be independent and identically distributed random variable with common density 1 2 f (x), having bounded support [x , x ], and common mean m = E[X ]. For q 2 R we let min max m(q) = E[exp(q X )] and k(q) = log(m(q)) denote the common moment generating function (mgf) 0 0 and cumulant generating function (cgf) of the X , respectively. Note that m = m (0) = k (0). Appendix A.1. Properties of k(q) Elementary properties of cgfs ensure that k () is a strictly increasing function that maps R onto (x , x ). One implication is that, for ﬁxed t 2 (x , x ), the graph of the function q 7! qt k(q) max max min min is \-shaped. The graph also passes through the origin, and its derivative at zero is t m. If this derivative is positive (i.e., if m < t) then the unique maximum is strictly positive and occurs to the right of the origin. If it is negative (i.e., if m > t) then the unique maximum is strictly positive and occurs to the left of the origin. If it is zero (i.e., if m = t) then the unique maximum of zero is attained at the origin. e e For a given t 2 (x , x ), there is a unique value of q for which k (q) = t. We let q = q(t) denote max min e e e this value of q. Note that q(t) is a strictly increasing function of t and that q(m) = 0. Thus q is positive b b e [negative] whenever t > m [t < m]. An important quantity in what follows is q = q(t) := max(0, q(t)), which can be interpreted as the unique value of q for which k (q) = max(m, t). Note that if t m then b b q = 0, and if t > m then q(t) > 0. Appendix A.2. Legendre Transform of k(q) We let q() denote the Legendre transform of k() over [0, ¥). That is, ˆ ˆ q(t) := max(qt k(q)) = qt k(q) , (A1) q0 ˆ ˆ where q = q(t) was deﬁned in the previous section, and is the (uniquely deﬁned) point at which the function q 7! qt k(q) attains its maximum on [0, ¥). Based on the discussion in the preceding Risks 2020, 8, 25 32 of 36 ˆ ˆ paragraph, we see that q(t) = q(t) = 0 whenever m t, whereas both q(t) and q(t) are strictly positive whenever m < t. The derivative of the transform q is demonstrably equal to: 0 0 0 ˆ ˆ ˆ q (t) = q(t) + q (t) [t k (q(t))] . ˆ ˆ Since q = 0 whenever t m and k (q) = t whenever t > m, the second term above vanishes for all t, and we ﬁnd that: q (t) = q(t) . (A2) Appendix A.3. Exponential Tilts For q 2 R we deﬁne: f (x) := exp(q x k(q)) f (x) . (A3) The density f is called an exponential tilt of f . As the value of the tilt parameter q varies, we obtain an exponential family of densities (exponential families have lots of very useful properties, and this is an easy way of constructing them). If q is positive then the right and left tails of f are heavier and thinner, respectively, than those of f . The opposite is true if q is negative. The larger in magnitude is q, the greater the discrepancy between f and f ; indeed the Kullback–Leibler divergence from f to f is q q qm + k(q), which is a strictly convex function of q that attains its minimum value (of zero) at q = 0. It is readily veriﬁed that k (q) = E [X ], where E denotes expectation with respect to f . This q i q q observation, in combination with the developments in Section A.1, implies that it is always possible to ﬁnd a density of the form (A3) whose mean is t, whatever the t 2 (x , x ). Indeed f is precisely min max ˜ such a density. Under mild conditions, f () can be characterised as that density that most resembles f (in the sense of minimum divergence), among all densities whose mean is x (and are absolutely continuous with respect to f ). Recall that q is the unique value of q for which k (q) = max(t, m). We can therefore interpret f as that density that most resembles f , among all densities whose mean is at least t (and that are absolutely continuous with respect to f ). Note in particular hat the mean of f is max(m, t). The numerical value of q can therefore be interpreted as the degree to which we must deform the density f , in order to produce a density whose mean is at least t. If m t then q = 0 and no adjustment is necessary. If m < t then q > 0 and mass must be transferred from the left tail to the right; the larger the discrepancy between m (the mean of f ) and t (the desired mean), the larger is q. Appendix A.4. Behaviour of X , Conditioned on a Large Deviation 1 N Let f (x) denote the conditional density of X , given that X > t, where X = X . t i N N i i=1 We suppress the dependence of f on N for simplicity. Using Bayes’ rule we get P(X > tjX = x) N i f (x) = f (x) , P(X > t) and since the X are independent, we get tx ¯ ¯ P(X > tjX = x) = P(X > t + ) . N N1 N1 Now, using the large deviation approximation P(X t) exp(N q(t)), we get that P(X > tjX = x) i tx exp((N 1)q(t + ) + Nq(t)) . N1 P(X > t) N Risks 2020, 8, 25 33 of 36 Now if N is large then tx tx tx q(t + ) q(t) + q (t) = q(t) + q , N1 N1 N1 where we have used the fact that q (t) = q(t). Putting everything together we arrive at the approximation P(X > tjX = x) N i ˆ ˆ exp(q x k(q)) , P(X > t) which leads to the approximation ˆ ˆ f (x) exp(q x k(q)) f (x) . (A4) We may thus interpret the conditional density f as that density which most resembles the unconditional density f , but whose mean is at least t. Appendix A.5. Approximate Behaviour of (X , X , . . . , X ), Conditioned on a Large Deviation 1 2 N ˆ ˆ Let f (x) = f (x , . . . , x ) denote the conditional density of (X , . . . , X ), given that X > t. Then t t 1 N 1 N N f (x ) i=1 f (x) = , x 2 A , t N,t where p = P(X > t) and A is the set of those points x 2 [x , x ] whose average value t N N,t min max exceeds t. We seek a density h(x), supported on [x , x ], which minimizes the Kullback-Leibler min max divergence (KLD) of h(x) := h(x ) Õ i i=1 ˆ ˆ from f . In other words, we seek an independent sequence Y , Y , . . . , Y (whose density is h) whose t 1 2 N behaviour most resembles (in a certain sense) the behaviour of X , X , . . . , X , conditioned on the 1 2 N large deviation X > t. ˆ ˆ Now let E denote expectation with respect to the density g. Then the divergence of h from f is g t ˆ ˆ E [log( f (X)/h(X))] = E [log ( f (X )/h(X ))] log( p ) ˆ t ˆ t å i i f f t t i=1 = N E [log ( f (X )/h(X ))] log( p ) ˆ 1 1 t = N E [log ( f (X )/h(X ))] log( p ) f 1 1 = N E [log ( f (X )/ f (X ))] + N E [log ( f (X )/h(X ))] log( p ) f 1 t 1 f t 1 1 t t t Now, the middle term in the above display is the KLD of h from f . As such it is non-negative, and is ˆ ˆ equal to zero if and only if h = f . It follows immediately that the divergence of h from f is minimised t t by setting h = f . Appendix B. Important Exponential Families This appendix considers two important special cases—the Gaussian and t families—of the general setting discussed in Section 2.2. Risks 2020, 8, 25 34 of 36 Appendix B.1. Gaussian Suppose ﬁrst that the Z is Gaussian with mean vector m 2 R and positive deﬁnite covariance matrix S . When specifying the IS distribution, one can either (i) shift the mean of Z but leaves its covariance structure unchanged or (ii) shift its mean and adjust its covariance structure. In general the latter approach will lead to a better approximation of the ideal IS density but more volatile IS weights. If we take the former approach (shifting mean, leaving covariance structure unchanged), the implicit family in which we are embedding f is the Gaussian family with arbitrary mean vector m 2 R and ﬁxed covariance matrix S . To this end, let f (z) = f(z; m , S ) denote the Gaussian density 0 0 0 with mean vector m and covariance matrix S and let f (z) = f(z; m, S ). It remains to identify the 0 0 l 0 natural sufﬁcient statistic and write the natural parameter l in terms of the mean vector m. To this end, note that f (z) 1 1 T T 1 T 1 T 1 = exp (m m )S z m S m + m S m . 0 0 f (z) 2 2 The natural sufﬁcient statistic is therefore S(z) = (z , . . . , z ) , the natural parameter is l(m) = S (m m ) . Note that we can write m(l) = m + S l, so that the natural parameter represents a sort of normalized 0 0 deviation from the actual mean m to the IS mean m. Lastly, we see that the cgf of S(Z) is h i 1 1 T 1 T 1 T T T K(l) = m S m m S m = l m + l S l , l 0 0 l 0 0 0 0 2 2 where we have written m instead of m(l) in the above display. Clearly, we have that both K(l) and K(l) are well-deﬁned for all l 2 R . The implication is that if we shift the mean of Z but leave its covariance structure unchanged, the IS weight will have ﬁnite variance regardless of what IS mean we choose. If we take the former approach (shifting mean, adjusting covariance) the implicit family in which we are embedding f is the Gaussian family with arbitrary mean vector m and arbitrary positive deﬁnite covariance matrix S. In this case we have f (z) = f(z; m, S) and the ratio of f (z) to f (z) is l l T 1 T 1 T 1 1 exp((m S m S )z + z (S S )z K(m, S)) , 0 0 0 where h i T 1 T 1 K(m, S) = m S m m S m + log(det(S) log(det(S ) . 0 0 The natural sufﬁcient statistic therefore consists of the d elements of the vector z plus the d elements of the vector zz . The natural parameter l consists of the elements of the vector 1 1 l := l (m, S) = S m S m 1 1 0 plus the elements of the matrix 1 1 l := l (S) = (S S ) . 2 2 Note that since we have assumed S is positive deﬁnite, we are implicitly assuming that the matrix l is such that the determinant of S l is strictly positive. The natural parameter space is therefore 2 0 unrestricted for l , but restricted (to matrices such that the indicated determinant is strictly positive) for l . 2 Risks 2020, 8, 25 35 of 36 The above relations can be inverted to write m and S in terms of l and l , indeed 1 2 1 1 S = S(l ) = (S 2l ) 2 2 and 1 1 1 m = m(l , l ) = (S 2l ) (l + S m ) . 1 2 2 1 0 0 0 The cgf of the natural sufﬁcient statistic is 1 1 1 1 T 1 T 1 K(l) = K(l , l ) = K(m , S ) = m S m m S m + log(det(S )) log(det(S )) 1 2 l ,l l l ,l 0 l 0 1 2 2 l ,l 1 2 0 0 2 1 2 l 2 2 2 2 It is now clear that K(l) is well deﬁned if and only if the determinant of S(l ) is strictly positive, which we have implicitly assumed to be the case since we have insisted S be positive deﬁnite. It is also clear that K(l) is well-deﬁned if and only if the determinants S(l ) is strictly positive, which will occur if and only if the determinant of (2S S ) is strictly positive. Remark A1. Suppose that f and f are Gaussian densities with respective positive deﬁnite covariance matrices S and S. Further suppose that Z f . Then the variance of f (Z)/ f (Z) is ﬁnite if and only if det(2S 0 l l S ) > 0. In the one-dimensional case d = 1 we write Z = Z. The condition in Remark A1 is satisﬁed 2 2 whenever s > s /2. In other words, if the variance of the IS distribution is too small, relative to actual variance of Z, then the IS weight will have inﬁnite variance. Appendix B.2. Chi-Square Family In preparation for the multivariate t family, we ﬁrst consider the chi-square family. Suppose that Z follows a chi-square distribution with n degrees of freedom, and that the goal is to allow Z to have arbitrary degrees of freedom n > 0 under the IS density. In order to identify the natural sufﬁcient statistic S(z) and natural parameter l = l(n), we let f (z) denote the chi-square density with n degrees of freedom and f (z) the chi-square density with n degrees of freedom. Then 0 l f (z) n n n n n n l 0 0 0 = exp log(z) log(2) + log G log G f (z) 2 2 2 2 from which we see that S(z) = log(z) and l = l(n) = (n n )/2. In addition we see that the cgf of S(z) is K(l) = l log(2) + log (G (l + n /2)) log(G(n /2)) . 0 0 In order that K(l) be will deﬁned, we require n > 0, which is obvious. In order that K(l) is well-deﬁned we require l + be positive, which in turn requires n < 2n . In other words, if the IS degrees of freedom are more than twice the actual degrees of freedom, then the IS weight will have inﬁnite variance. Appendix B.3. t Family The t family is not a regular exponential family, so it does not ﬁt directly into the framework discussed in Section 2.2. That being said, a multivariate t vector can be constructed from a Gaussian vector and an independent chi-square variable. Indeed if Z is Gaussian with mean zero and covariance matrix S , and R is chi-square with n degrees of freedom (independent of Z), then 0 0 Z = m + Z , (A5) is multivariate t with n degrees of freedom, mean m and covariance matrix S . 0 0 0 n 2 0 Risks 2020, 8, 25 36 of 36 In the case that Z is multivariate t, then, we can take our systematic risk factors to be the components of (Z, R). In this case the joint density of the systematic risk factors can be embedded into the parametric family T T f (z ˆ , r) := exp(l S(z ˆ) K(l)) exp(h T(r) L(h)) f (z ˆ) g(r) , (A6) l,h where l is and S are the natural parameter and sufﬁcient statistic for the Gaussian family, h and L are those for the chi-square family, and f and g are the Gaussian and chi-square densities. References Bickel, Peter J., and Kjell A. Doksum. 2001. Mathematical Statistics: Basic Ideas and Selected Topics, 2nd ed. Upper Saddle River: Prentice Hall, Volume 1. Chan, Joshua C.C., and Dirk P. Kroese. 2010. Efﬁcient estimation of large portfolio loss probabilities in t-copula models. European Journal of Operational Research 205: 361–67. Chatterjee, Sourav, and Persi Diaconis. 2018. The sample size required in importance sampling. Annals of Applied Probability 28: 1099–135. [CrossRef] de Wit, Tim. 2016. Collateral Damage—Creating a Credit Loss Model Incorporating a Dependency between Defaults and LGDs. Master ’s thesis, University of Twente, Enschede, The Netherlands. Deng, Shaojie, Kay Giesecke, and Tze Leung Lai. 2012. Sequential importance sampling and resampling for dynamic portfolio credit risk. Operations Research 60: 78–91. [CrossRef] Eckert, Johanna, Kevin Jakob, and Matthias Fischer. 2016. A credit portfolio framework under dependent risk parameters PD, LGD and EAD. Journal of Credit Risk 12: 97–119. [CrossRef] Frye, Jon. 2000. Collateral damage. Risk 13: 91-94. Frye, Jon, and Michael Jacobs Jr. 2012. Credit loss and systematic loss given default. Journal of Credit Risk 8: 109–140. [CrossRef] Glasserman, Paul, and Jingyi Li. 2005. Importance sampling for portfolio credit risk. Management Science 51: 1643–56. [CrossRef] Ionides, Edward L. 2008. Truncated importance sampling. Journal of Computational and Graphical Statistics 17: 295–311. [CrossRef] Jeon, Jong-June, Sunggon Kim, and Yonghee Lee. 2017. Portfolio credit risk model with extremal dependence of defaults and random recovery. Journal of Credit Risk 13: 1–31. [CrossRef] Kupiec, Paul H. 2008. A generalized single common factor model of portfolio credit risk. Journal of Derivatives 15: 25–40. [CrossRef] Miu, Peter, and Bogie Ozdemir. 2006. Basel requirements of downturn loss given default: Modeling and estimating probability of default and loss given default correlations. Journal of Credit Risk 2: 43–68. [CrossRef] Pykhtin, Michael. 2003. Unexpected recovery risk. Risk 16: 74–78. Scott, Alexandre, and Adam Metzler. 2015. A general importance sampling algorithm for estimating portfolio loss probabilities in linear factor models. Insurance: Mathematics and Economics 64: 279–93. Sen, Rahul. 2008. A multi-state Vasicek model for correlated default rate and loss severity. Risk 21: 94–100. Witzany, Jirí. ˇ 2011. A Two-Factor Model for PD and LGD Correlation. Working Paper. Available online: http://dx.doi.org/10.2139/ssrn.1476305 (accessed on 9 March 2020). c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Journal

Risks – Multidisciplinary Digital Publishing Institute

Published: Mar 10, 2020

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Importance Sampling in the Presence of PD-LGD Correlation

Importance Sampling in the Presence of PD-LGD Correlation

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Importance Sampling in the Presence of PD-LGD Correlation

Importance Sampling in the Presence of PD-LGD Correlation

References (19)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies