Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Importance Sampling in the Presence of PD-LGD Correlation

Importance Sampling in the Presence of PD-LGD Correlation risks Article Importance Sampling in the Presence of PD-LGD Correlation 1, 2 Adam Metzler * and Alexandre Scott Department of Mathematics, Wilfrid Laurier University, Waterloo, ON N2L 3C5, Canada Department of Applied Mathematics, University of Western Ontario, London, ON N6A 3K7, Canada; alexandre.scott202@gmail.com * Correspondence: ametzler@wlu.ca Received:20 January 2020; Accepted: 5 March 2020; Published: 10 March 2020 Abstract: This paper seeks to identify computationally efficient importance sampling (IS) algorithms for estimating large deviation probabilities for the loss on a portfolio of loans. Related literature typically assumes that realised losses on defaulted loans can be predicted with certainty, i.e., that loss given default (LGD) is non-random. In practice, however, LGD is impossible to predict and tends to be positively correlated with the default rate and the latter phenomenon is typically referred to as PD-LGD correlation (here PD refers to probability of default, which is often used synonymously with default rate). There is a large literature on modelling stochastic LGD and PD-LGD correlation, but there is a dearth of literature on using importance sampling to estimate large deviation probabilities in those models. Numerical evidence indicates that the proposed algorithms are extremely effective at reducing the computational burden associated with obtaining accurate estimates of large deviation probabilities across a wide variety of PD-LGD correlation models that have been proposed in the literature. Keywords: importance sampling; acceptance-rejection sampling; portfolio credit risk; tail probabilities; large deviation probabilities; stochastic recovery; PD-LGD correlation; credit risk; loss probabilities 1. Introduction This paper seeks to identify computationally efficient importance sampling (IS) algorithms for estimating large deviation probabilities for the loss on a portfolio of loans. Related literature assumes that realised losses on defaulted loans can be predicted with certainty, i.e., that loss given default (LGD) is non-random. In practice, however, LGD is impossible to predict and tends to be positively correlated with the default rate and the latter phenomenon is typically referred to as PD-LGD correlation (here PD refers to probability of default, which is often used synonymously with default rate). There is a large literature on modelling stochastic LGD and PD-LGD correlation, but there is a paucity of literature on using importance sampling to estimate large deviation probabilities in those models. This gap in the literature was brought to our attention by a risk management professional at a large Canadian financial institution, and filling that gap is the ultimate goal of this paper. Problem Formulation and Related Literature Consider a portfolio of N exposures of equal size. Let L , L , . . . , L denote the losses on the 1 2 N individual loans, expressed as a percentage of notional value. The percentage loss on the entire portfolio is: L := L . (1) N å i i=1 Risks 2020, 8, 25; doi:10.3390/risks8010025 www.mdpi.com/journal/risks Risks 2020, 8, 25 2 of 36 We are interested in using IS to estimate large deviation probabilities of the form: p := P(L  x) , (2) x N where x >> E[L ] = E[L ] is some large, user-defined, threshold. i N In practice the number of exposures is large (e.g., in the thousands) and prudent risk management requires one to assume that the individual losses are correlated. In practice, then, L is the average of a large number of correlated variables. As such, its probability distribution is highly intractable and Monte Carlo is the method of choice for approximating p . As the probability of interest is typically 3 4 small (e.g., on the order of 10 or 10 ), the computational burden required to obtain an accurate estimate of p using Monte Carlo can be prohibitive. For instance if p is on the order of 10 and x x N is on the order of 1000 then, in the absence of any variance reduction techniques, the sample size required to reduce the estimator ’s relative error to 10% is on the order of one hundred thousand. Since each realisation of L requires simulation of one thousand individual losses, a sample size of 100, 000 requires one to generate one hundred million variables. If the desired degree of accuracy is reduced to 1%, the number of variables that must be generated increases to a staggering 10 billion. Importance sampling (IS) is a variance reduction technique that has the potential to significantly reduce the computational burden associated with obtaining accurate estimates of large deviation probabilities. In the present context, effective IS algorithms have been identified for a variety of popular risk management models, but most are limited to the special case that loss given default (LGD) is non-random. The seminal paper in the area is (Glasserman and Li 2005), other papers include (Chan and Kroese 2010) and (Scott and Metzler 2015). It is well documented empirically, however, that portfolio-level LGD is not only stochastic, but positively correlated with the portfolio-level default rate as seen, for instance, in any of the studies listed in (Kupiec 2008) or (Frye and Jacobs 2012). This phenomenon is typically referred to as PD-LGD correlation. (Miu and Ozdemir 2006) show that ignoring PD-LGD correlation when it is in fact present can lead to material underestimates of portfolio risk measures. There is a large literature on modelling PD-LGD correlation (Frye 2000); (Pykhtin 2003); (Miu and Ozdemir 2006); (Kupiec 2008); (Sen 2008); (Witzany 2011); (de Wit 2016); (Eckert et al. 2016); and others listed in (Frye and Jacobs 2012), but there is a much smaller literature on using IS to estimate large deviation probabilities in such models. To the best of our knowledge only (Deng et al. 2012) and (Jeon et al. 2017) have developed algorithms that allow for PD-LGD correlation (the former paper considers a dynamic intensity-based framework, the latter considers a static model with asymmetric and heavy-tailed risk factors). The present paper contributes to this nascent literature by developing algorithms that can be applied in a wide variety of PD-LGD correlation models that have been proposed in the literature, and are popular in practice. The paper is structured as follows. Section 2 outlines important assumptions, notation, and terminology. Section 3 theoretically motivates the proposed algorithm in a general setting, and Section 4 discusses a few practical issues that arise when implementing the algorithm. Section 5 describes a general framework for PD-LGD correlation modelling that includes, as special cases, many of the models that have been developed in the literature and Section 6 describes how to implement the proposed algorithm in this general framework. Numerical results are presented and discussed in Section 7 and demonstrate that the proposed algorithms are extremely effective at reducing the computational burden required to obtain an accurate estimate of p . Relative error is the preferred measure of accuracy for large deviation probabilities. If pb is an estimator of p , its relative x x error is defined as SD( pb )/ p , where SD denotes standard deviation. x x Risks 2020, 8, 25 3 of 36 2. Assumptions, Notation and Terminology We assume that individual losses are of the form L = L(Z, Y ), where L is some deterministic i i function, Z = (Z , . . . , Z ) is a d-dimensional vector of systematic risk factors that affect all exposures, and Y is a vector of idiosyncratic risk factors that only affect exposure i. We assume that Z, Y , Y , . . . i 1 2 are independent, and that the Y are identically distributed. The primary role of the systematic risk factors is to induce correlation among the individual exposures, and it is common to interpret the realised values of the systematic risk factors as determining the overall macroeconomic environment. It is worth noting that the we do not require the components of Z to be independent of one another, etc. for the components of Y . 2.1. Large Portfolios and the Region of Interest In a large portfolio, the influence of the idiosyncratic risk factors is negligible. Indeed, since individual losses are conditionally independent, given the realised values of the systematic risk factors, we have the almost sure limit: lim L = m(Z) , (3) N!¥ where m(z) := E[L jZ = z] = E[L jZ = z] . (4) i N Since m(Z)  L for large N by Equation (3), the random variable m(Z) is often called the large portfolio approximation (LPA) to L . The LPA is often used to formalise the intuitive notion that, in a large portfolio, all risk is systematic (i.e., idiosyncratic is “diversified away”). We define the region of interest as the set: fz 2 R : m(z)  xg . (5) The region of interest is “responsible” for large deviations in the sense that: lim P(m(Z)  xjL  x) = 1 (6) N!¥ for most values of x. Together, Equations (3) and (6) suggest that for large portfolios, it is relatively more important to identify an effective IS distribution for the systematic risk factors, as compared to the idiosyncratic risk factors. 2.2. Systematic Risk Factors We assume that Z is continuous and let f (z) denote its joint density. We assume that f is a member of an exponential family (see Bickel and Doksum 2001 for definitions and important properties) with d p natural sufficient statistic S : R 7! R . Any other member of the family can be put in the form: f (z) := exp(l S(z) K(l)) f (z) , (7) where K() is the cumulant generating function (cgf) of S(Z) and l 2 R is such that K(l) is well-defined. The parameter l is called the natural parameter of the family in Equation (7). Appendix B embeds the Gaussian and multivariate t families into this general framework. In light of the almost sure limit in Equation (3), we have that L converges to m(Z) in distribution, which implies that Equation (6) is valid for all values of x such that P(m(Z) = x) = 0. If m(Z) is a continuous random variable (which it is in most cases of practical interest) then Equation (6) is satisfied for every value of x. Risks 2020, 8, 25 4 of 36 We will eventually be using densities of the form in Equation (7) as IS densities for the systematic risk factors. The associated IS weight is: f (Z) = exp(l S(Z) + K(l)) , (8) f (Z) and it will be important to know when the variance of the IS weight is finite. The following observation is readily verified. Remark 1. If Z  f , then Equation (8) has finite variance if and only if both K(l) and K(l) are well defined. A standard result in the theory of exponential families is that: rK(l) = E [S(Z)] , (9) where r denotes gradient and E denotes expectation with respect to the density f . l l 2.3. Individual Losses We assume that L takes values in the unit interval. In general L will have a point mass at zero i i (if it did not, the loan would not be prudent) and the conditional distribution of L , given that L > 0, i i is called the (account-level) LGD distribution. We allow the LGD distribution to be arbitrary in the sense that it could be either discrete or continuous, or a mixture of both. This contrasts with the case of non-random LGD, where the LGD distribution is degenerate at a single point. We let ` 2 (0, 1] max denote the supremum of the support of L . Individual losses will therefore never exceed ` but could max take on values arbitrarily close (and possibly equal) to ` . max Remark 2. Despite the fact that L is not a continuous variable, in what follows we will proceed as if it was and make repeated reference to its “density." This is done without loss of generality, and in an interest of simplifying the presentation and discussion. Nothing in the sequel requires L to be a continuous variable, and everything carries over to the case where it is either discrete or continuous, or has both a discrete and continuous component. For z 2 R we let g(`jz) denote the conditional density of L , given that Z = z. We assume that the support of g(jz) is identical to the unconditional support, in particular it does not depend on the value of z. Note that m(z) is the mean of g(jz). In practice (i.e., for all of the PD-LGD correlation models listed in the introduction) g(jz) is not a member of an established parametric family, and direct simulation from g(jz) using a standard technique such as inverse transform or rejection sampling is not straightforward. Simulation from g(jz) is most easily accomplished by simulating the idiosyncratic risk factors, Y , from their density, say h(y), and then setting L = L(z, Y ). In other words, in order to simulate from g(jz) we make use i i of the fact that L = L(z, Y ) is a drawing from g(jz) whenever Y is a drawing from h(). i i i For q 2 R and z 2 R we let: k(q, z) := log(E[exp(q L )jZ = z]) and ¶k k (q, z) := (q, z) . ¶q Then k(, z) is the conditional cgf of L , given that Z = z, and k (, z) is its first derivative. In practice, neither k(, z) nor k (, z) is available in closed form. In the examples we consider later in the paper each can be expressed as a one-dimensional integral, but the numerical values of those integrals must Risks 2020, 8, 25 5 of 36 be approximated using quadrature. This contrasts with the case of non-random LGD, where the conditional cgf can be computed in closed form . d 0 For x 2 (0, ` ) and z 2 R we let q(x, z) denote the unique solution to the equation k (q, z) = max ˆ ˆ ˆ max(x, m(z)). We often suppress dependence on x and z, and simply write q instead of q(x, z). That q is well-defined follows immediately from the developments in Appendix A.1. Based on the discussion there we find that q is zero whenever z lies in the region of interest, and is strictly positive otherwise. Remark 3. In practice, the value of q cannot be computed in closed form and must be approximated using a numerical root-finding algorithm. Since each evaluation of the function k (, z) requires quadrature, computing ˆ ˆ q is straightforward but relatively time consuming. This contrasts with the case of non-random LGD, where q can be computed in closed form at essentially no cost. For z 2 R we let q(, z) denote the Legendre transform of k(, z) over [0, ¥). That is, ˆ ˆ q(x, z) := max(q x k(q, z)) = q x k(q, z) . (10) q0 That q is the uniquely defined point at which the function q 7! q x k(q, z) attains its maximum on [0, ¥) follows from the developments in Appendix A.2. Based on the discussion there, we find that both q and q are equal to zero whenever z lies in the region of interest, and that both are strictly positive otherwise. 2.4. Conditional Tail Probabilities Given the realised values of the systematic risk factors, individual losses are independent. Large deviations theory can therefore provide useful insights into the large-N behaviour of the tail probability P(L  xjZ = z). For instance, Chernoff’s bound yields the estimate: P(L > xjZ = z)  exp(Nq(x, z)) , (11) and Cramér ’s (large deviation) theorem yields the limit: log(P(L > xjZ = z)) lim = q(x, z) . (12) N!¥ N Together these results are often used to justify the approximation: P(L > xjZ = z)  exp(Nq(x, z)) , (13) which will be used repeatedly throughout the paper. The approximation in Equation (13) is often called the large deviation approximation (LDA) to the tail probability P(L > xjZ = z). Note that since q(x, z) = 0 whenever m(z)  x, the LDA suggests that P(L > xjZ = z)  1 whenever z lies in the region of interest. 2.5. Conditional Densities N d Let L = (L , . . . , L ), noting that L takes values in [0, ` ] . For z 2 R and ` = (` , . . . , ` ) 2 max 1 N 1 N [0, ` ] , we let h (z, `) denote the conditional density of (Z, L), given that L > x. Then h is max x N x given by: f (z) g(` jz) i=1 h (z, `) =  1 , (14) f`2 A g N,x 3 (1R)q In the case of non-random LGD we have k(q, z) = log(1 + (e 1) P(L > 0jZ = z)), where R is the known recovery rate on the exposure. Risks 2020, 8, 25 6 of 36 N 1 where A is the set of points ` 2 [0, ` ] for which N ` > x. N,x max i i=1 We let f (z) denote the conditional density of the systematic risk factors, given that L > x, noting that: P(L > xjZ = z) f (z) =  f (z) . (15) P(L  x) In the examples we consider the mean of f tends to lie inside, but close to the boundary of, the region of interest. And relative to the unconditional density f , the conditional density f tends to be much more concentrated about its mean. Finally, we let g (`jz) denote the conditional density of an individual loss, given that Z = z and L > x, noting that: x` P(L > x + jZ = z) N1 N1 g (`jz) =  g(`jz) . (16) P(L > xjZ = z) If the realised value of z lies inside the region of interest, the conditional density g (jz) tends to resemble the unconditional density g(jz). Intuitively, for such values of z the LDA informs that the event fL > xg is very likely, and conditioning on its occurrence is not overly informative. If the realised value of z does not lie in the region of interest then g (jz) tends to resemble the exponentially tilted version of g(jz) whose mean is exactly x. See Appendix A.3 for more details. Neither h , f , nor g are numerically tractable, but as we will soon see they do serve as useful x x x benchmarks against which to compare candidate IS densities. In addition, it is worth noting here that the representations of Equations (15) and (16) lend themselves to numerical approximation via the LDA in Equation (13). 3. Proposed Algorithm In practice, the most common approach to estimating p via Monte Carlo simulation in this framework is summarised in Algorithm 1 below. Algorithm 1 Standard Monte Carlo Algorithm for Estimating p 1: Simulate M i.i.d. copies of the systematic risk factors. Think of these as different economic scenarios and denote the simulated values by z , . . . , z . 1 M 2: For each scenario m: (a) Simulate the idiosyncratic risk factors for each exposure. Denote the simulated values y , . . . , y . 1,m N,m 1 N (b) Set ` = L(z , y ) for each exposures i, and ` = ` . i,m m i,m m i,m i=1 1 M 3: Return pb = 1 . å ¯ M m=1 f` >xg Algorithm 1 consists of two stages. In the first stage one simulates the systematic risk factors, and in the second stage one simulates the idiosyncratic risk factors for each exposure. Mathematically, the first stage induces independence among the individual exposures, so that the second stage amounts to simulating a large number of i.i.d. variables. Intuitively, it is useful to think of the first stage as determining the prevailing macroeconomic environment, which fixes economy-wide quantities such as default and loss-given-default rates. The second stage of the algorithm overlays idiosyncratic noise on top of economy-wide rates, to arrive at the default and loss-given-default rates for a particular portfolio. Relative error is the preferred measure of accuracy for estimators of rare event probabilities. The relative error of the estimator pb in Algorithm 1 is: 1 1 p p , M x Risks 2020, 8, 25 7 of 36 and the sample size required to ensure the relative error does not exceed some predetermined threshold e is: 1 1 p M(e) = . (17) e p The number of variables that must be generated in order to achieve the desired degree of accuracy e is therefore (N + d) M(e), which grows without bound as p ! 0. For instance if p = 10 , x x 3 2 N = 10 , d = 2, and e = 5 10 then the number of variables that must be generated is approximately four hundred million, which is an enormous computational burden for a modest degree of accuracy. In the next section we discuss general principles for selecting an IS algorithm that can reduce the computational burden required to obtain an accurate estimate of p . 3.1. General Principles For practical reasons, we insist that our IS procedure retains conditional independence of individual losses, given the realised value of the systematic risk factors. This is important because it allows us to reduce the problem of simulating a large number of dependent variables to the (much) more computationally efficient problem of simulating a large number of independent variables. In the first stage we simulate the systematic risk factors from the IS density f (z). The IS weight I S associated with this first stage is therefore: f (z) L (z) := . f (z) I S In the second stage we simulate the individual losses as i.i.d. drawings from the density g (`jz). The I S IS weight associated with this second stage is: g(` jz) L (z, `) = , 2 Õ g (` jz) I S i i=1 and the IS density from which we sample (Z, L) is therefore of the form: h (z, `) = f (z) g (` jz) . (18) I S I S Õ I S i i=1 The so-described algorithm, with as-yet unspecified IS densities, is summarised in Algorithm 2. Algorithm 2 IS Algorithm for Estimating p 1: Simulate M i.i.d. copies of the systematic risk factors from the density f (z). Think of these as I S different economic scenarios and denote the simulated values by z , . . . , z . 1 M 2: For each scenario m: (a) Independently simulate ` , ` , . . . ,` from the density g (jz ). 2,m N,m m 1,m I S 1 N (b) Set ` = ` . m å i,m N i=1 3: Return pb = L (z ) L (z , ` ) 1 , where ` = (` , . . . ,` ). å ¯ x 1 m 2 m m m 1,m N,m m=1 f` >xg It is important to note that in the second stage, we will not be simulating individual losses directly from the (conditional) IS density g . Rather, we will simulate the idiosyncratic risk factors Y in such I S i a way as to ensure that for a given value of z, the variable L = L(z, Y ) has the desired density g . i i I S Risks 2020, 8, 25 8 of 36 Focusing on the “indirect" IS density of L , as opposed to “direct" IS density of Y , allows us to identify i i a much more effective second stage algorithm . The estimator pb produced by Algorithm 2 is demonstrably unbiased and its variance is: 2 2 2 E [(L(Z, L) 1 p ) ] = p  E [(L (Z, L) 1 1) ] , (19) ¯ ¯ I S x I S x fL >xg x fL >xg N N where E denotes expectation under the IS distribution, L(z, `) := L (z) L (z, `) and I S 1 2 L(z, `) L (z, `) := . Note that L is the ratio of (i) the IS density in Equation (18) to (ii) the conditional density in Equation (14). The estimator ’s squared relative error can then be decomposed as: E [(L (Z, L) 1)  1 ] + [1 P (L > x)] , (20) x ¯ I S I S N fL >xg where P denotes probability under the IS distribution. I S Inspecting Equation (20) we see that an effective IS density should (i) assign a high probability to the event of interest and (ii) should resemble the conditional density in Equation (14) as closely as possible, in the sense that the ratio L should deviate as little as possible from unity. Clearly, an estimator that satisfies (ii) should also satisfy (i), since h assigns probability one to the event that L > x. The task now is to identify a density of the form in Equation (18) that resembles the ideal density in Equation (14), in some sense. 3.2. Identifying the Ideal IS Densities Our measure of similarity is Kullback–Leibler divergence (KLD), or divergence for short. See Chatterjee and Diaconis (2018) for a general discussion of the merits of minimum divergence as a criteria for identifying effective IS distributions. We begin by writing: h (z, `) f (z) g ˜ (`jz) x x x =  , (21) h (z, `) f (z) g ˜ (`jz) I S I S I S where for fixed z, g(` jz) i=1 g ˜ (`jz) =  1 f`2 A g N,x P(L > xjZ = z) is the joint density of N independent variables having marginal density g(jz), conditioned on their average value exceeding the threshold x, and g ˜ (`jz) = g (` jz) . I S Õ I S i i=1 is the joint density of N independent variables having marginal density g (jz). I S Using Equation (21) it is straightforward to decompose the divergence of h from h as: I S x D(h jjh ) = D( f jj f ) + E [ D(g ˜ (jZ)jjg ˜ (jZ))j L > x] , (22) x x x I S I S I S N where D(xjjh) denotes the divergence of the density x from the density h. The first term in Equation (22) is the divergence of f from f , and is therefore minimised by setting f = f . In other words, the best I S x I S x In the earliest stages of this project we focused directly on an IS density for Y and had difficulties identifying effective candidates. Risks 2020, 8, 25 9 of 36 possible IS density for the systematic risk factors (according to the criteria of minimum divergence) is the conditional density f . The second term in Equation (22) is the average divergence of g ˜ (jz) I S from g ˜ (jz), averaged over all possible realisations of the systematic risk factors and conditioned on portfolio loss exceeding the threshold. Based on the developments in Appendix A.5, for fixed z 2 R the divergence of g ˜ (jz) from g ˜ (jz) is minimised by setting g (jz) = g (jz). The average I S x I S x divergence in Equation (22) is, therefore, also minimised by setting g (jz) = g (jz) for every z 2 R . I S Remark 4. Among all densities of the form in Equation (18), the one that most resembles the ideal density h (in the sense of minimum divergence) is the density: d N h (z, `) := f (z) g (` jz) , z 2 R , ` 2 [0, ` ] . x x x max Õ i i=1 In other words, h is the best possible IS density (among the class Equation (18) and according to the criteria of minimum divergence) from which to simulate (Z, L). It is worth noting that the IS density h “gets marginal behaviour correct”, in the sense that the marginal distribution of the systematic risk factors, as well as the marginal distribution of an individual loss, is the same under h as it is under the ideal density h . The dependence structure of individual x x losses is different under h and h —this is the price that we must pay for insisting on conditional x x independence (i.e., computational efficiency). 3.3. Approximating the Ideal IS Densities Simulating directly from h requires an ability to simulate directly from f and g . Unfortunately, x x x neither f nor g is numerically tractable (witness the unknown quantities in Equations (15) and (16)), x x and it does not appear that either is amenable to direct simulation. Our next task is to identify tractable densities that resemble f and g . x x 3.3.1. Systematic Risk Factors As a tractable approximation to f , we suggest using that member of the parametric family in Equation (7) that most resembles f in the sense of minimum divergence. Using Equations (7) and (15) we get that: f (z) log = l S(z) + K(l) + log (P(L > xjZ = z)) log( p ) , f (z) whence the divergence of f from f is: l x ¯ ¯ ¯ D( f jj f ) = l E[S(Z)jL > x] + K(l) + E[log (P(L > xjZ = z))jL > x] log( p ) . (23) x x l N N N As a cgf, K() is strictly convex. As such, Equation (23) attains its unique minimum at that value of l such that: rK(l) = E[S(Z)jL > x] , (24) which, in light of Equation (9), is equivalent to: E [S(Z)] = E[S(Z)jL > x] . (25) Intuitively, we suggest using that value of the IS parameter l for which the mean of S(Z) under the IS density matches the conditional mean of S(Z), given that portfolio losses exceed the threshold. In what follows we let l denote that suggested value of the IS parameter l, i.e., that value of l that solves Equation (24). Risks 2020, 8, 25 10 of 36 Remark 5. The first-stage IS weight associated with the so-described density is: ˆ ˆ L (Z) = exp(l S(Z) + K(l )) . (26) 1 x It is entirely possible—and quite common in the examples we consider in this paper—that K(l ) is not well-defined, in which case Equation (26) has infinite variance under f (recall Remark 1). At first glance it might seem absurd to consider IS densities whose associated weights have infinite variance, but as we discuss in Section 4.2 it is straightforward to circumvent this issue by trimming large first-stage IS weights . It remains to develop a tractable approximation to the right hand side of Equation (24), so that we can approximate the value of l . To this end we write the natural sufficient statistic as S(z) = (S (z), . . . , S (z)) and note that: E[S (Z) 1 ] ¯ E[S (Z) P(L > xjZ)] i fL >xg i N E[S (Z)jL > x] = = . i N ¯ ¯ P(L > x) E[P(L > xjZ)] N N Next, we use the LDA in Equation (13) to get: E[S (Z) exp(Nq(x, Z))] E[S (Z)jL > x]  . (27) i N E[exp(Nq(x, Z))] As it only involves the systematic risk factors (and not the large number of idiosyncratic risk factors), the expectation on the right hand side of Equation (27) is amenable to either quadrature or Monte Carlo simulation. 3.3.2. Individual Losses We encourage the reader unfamiliar with exponential tilts to consult Appendix A.3, before reading the remainder of this section. Our approximation to g (`jz) is obtained by using the LDA of Equation (13) to approximate both conditional probabilities appearing in Equation (16) (see Appendix A.4 for details). The resulting approximation is: ˆ ˆ g ˆ (`jz) := exp(q` k(q, z)) g(`jz) , (28) where we recall that q is defined and discussed in Section 2.3. If the realised values of the systematic risk factors obtained in the first stage lie in the region of interest then q = 0 and g ˆ is identical to g. Otherwise, q is strictly positive and g ˆ is the exponentially tilted version of g whose mean is x. Intuitively, we can interpret g ˆ as that density that most resembles (in the sense of minimum divergence) g , among all densities whose mean is at least x, and the numerical value of q as the degree to which the density g(jz) must be deformed, in order to produce a density whose mean is at least x. Remark 6. The mean of Equation (28) is max(m(z), x). The implication is that the event of interest is not a rare event under the proposed IS algorithm. Indeed, E [L ] = E [E [L jZ]] = E [E [L jZ]] = E [max(x, m(Z))]  x , I S i I S I S i f g ˆ i f ˆ ˆ l l which implies that lim P (L > x) = 1. N!¥ I S N An alternative to trimming is truncation of large weights; see Ionides (2008) for a general and rigorous treatment of truncated IS. Risks 2020, 8, 25 11 of 36 The second-stage IS weight associated with Equation (28) is: ˆ ˆ ˆ ¯ ˆ L (Z, L) = exp(q L + k(q, z)) = exp(N[q L k(q, Z)]) . 2 i N i=1 ¯ ¯ Since the second stage weight depends only on Z and L , we will often write L (Z, L ) instead of N 2 N L (Z, L). In order to assess the stability of the second-stage IS weight, we note that: ˆ ¯ ˆ ˆ ¯ exp(N[q L k(q, Z)]) = exp(q N[L x]) exp(Nq(x, Z)) . N N ¯ ¯ If Z lies in the region of interest then q = q = 0, whence L (Z, L ) = 1 whatever the value of L . 2 N N ˆ ¯ ¯ Otherwise, both q and q are strictly positive, which implies that L (Z, L ) < 1 whenever L > x. The 2 N N net result of this discussion is that: ¯ ¯ L (Z, L )  1 whenever L > x . (29) 2 N N The implication is that large, unstable, IS weights in the second stage will never be a problem. If the realised value of z does lie in the region of interest then g ˆ and g are identical, and simulation from g is straightforward. Our final task is to determine how to sample from Equation (28) in the case where z does not lie in the region of interest. One approach would be to identify a family of densities fh (y) : z 2 R g such that L = L(z, Y ) is a draw from g ˆ (jz) whenever Y is a draw from h (), but z i i x i z this approach appears to be overly complicated. A simpler approach is to sample from Equation (28) using rejection sampling with g as the proposal density. To this end, we note that for fixed z, the ratio ˆ ˆ of g ˆ to g is exp(q` k(q, z)), which is bounded and strictly increasing on [0, ` ]. The best possible x max (i.e., smallest) rejection constant is therefore: ˆ ˆ c ˆ = c ˆ(x, z) := exp(q` k(q, z)) , (30) max and the algorithm for sampling from g ˆ would proceed as follows. First, sample Y from its actual density and set L = L(z, Y ). Then generate a random number U, uniformly distributed on [0, 1] and i i independent of Y . If, g ˆ (L jz) ˆ ˆ U  = exp(q(` L )) , max i c ˆ g(L jz) set L = L and proceed to the next exposure. Otherwise return to the first step and sample another i i pair (Y , U). 3.4. Summary and Intuition The proposed algorithm is summarised in Algorithm 3 below. The initial step is to approximate the value of the first-stage IS parameter, l . In our numerical examples we use a small pilot simulation (10% of the sample size that we eventually use to estimate p ) and the approximation of Equation (27) in order to estimate l . Having computed l , the first stage of the algorithm proceeds by simulating independent realisations of the systematic risk factors from the density f , and computing the associated first-stage weights of Equation (26). Recall that we can interpret these realisations as corresponding to different economic scenarios. Intuitively, sampling from f instead of f increases the proportion of adverse scenarios that are generated in the first stage. In the examples we consider, f concentrates most of its mass near the boundary of the region of interest, and the effect is to concentrate the distribution of m(Z) near x. In the second stage, one first checks whether or not the realised values of the systematic risk factors lie inside the region of interest. If they do then the event of interest is no longer rare and there is no need to apply further IS in the second stage. Otherwise, if we “miss” the region of interest in the Risks 2020, 8, 25 12 of 36 first stage, we “correct” this mistake by applying an exponential tilt to the conditional distribution of individual losses. Specifically, we transfer mass from the left tail of g to the right tail, in order to produce a density whose mean is exactly x. Algorithm 3 Proposed IS Algorithm for Estimating p 1: Compute l using a small pilot simulation. 2: Simulate M i.i.d. copies of the systematic risk factors from f (z) and compute the corresponding first-stage IS weights. Denote the realised values of the factors by z , . . . , z and the associated IS 1 M weights by L (z ), . . . , L (z ). 1 1 1 M 3: For each scenario m, determine whether or not z lies in the region of interest (i.e., whether or not m(z )  x). If it does lie in the region, proceed as follows: (a) Simulate the idiosyncratic risk factors for each exposure. Denote the simulated values by y , . . . , y . 1,m N,m 1 N ¯ ¯ (b) Set ` = L(z , y ), ` = å ` and L (z , ` ) = 1. m m 2 m m i,m i,m i,m N i=1 Otherwise, proceed as follows: ˆ ˆ ˆ ˆ ˆ ˆ (a) Compute q = q(x, z ), k = k(q, z ) and c ˆ = exp(q` k). For each exposure i: m m max (i) Simulate the exposure’s idiosyncratic risk factor (denote the realised value by y ˆ ) and i,m set ` = L(z , y ). i,m i,m (ii) Simulate a random number drawn uniformly from the unit interval (denote the realised ˆ ˆ ˆ value by u) and determine whether or not u  exp(q(` ` )). If it is, set ` = ` max i,m i,m i,m and proceed to the next exposure. Otherwise, return to step (i). 1 N ¯ ¯ ˆ ¯ ˆ (b) Set ` = å ` and L (z , ` ) = exp(N[q` k]) m 2 m m m i,m N i=1 1 M 4: Return pb = L (z ) L (z , ` ) 1 . å ¯ x 1 m 2 m m m=1 f` >xg M m 4. Practical Considerations In this section we discuss some of the practical issues that arise when implementing the proposed methodology. 4.1. One- and Two-Stage Estimators The rejection sampling procedure employed in the second stage of the proposed algorithm involves repeated evaluation of q, which requires a non-trivial amount of computational time time. In addition, rejection sampling in general requires relatively complicated code. As such, it is worth considering a simpler algorithm that only applies importance sampling in the first stage, and is therefore easier to implement and faster to run. In what follows we will distinguish between one- and two-stage IS algorithms. A one-stage algorithm only applies IS in the first stage and samples (Z, L) from the IS density: h (z, `) := f (z) g(` jz) . (31) 1S ˆ i i=1 The associated IS weight is L (z) and the one-stage algorithm is summarised in Algorithm 4 below. Note the simplicity of Algorithm 4, relative to Algorithm 3. The two-stage algorithm applies IS in both the first stage and the second stage, sampling (Z, L) from the IS density: h (z, `) := f (z) g (` jz) . (32) 2S ˆ x Õ i i=1 Risks 2020, 8, 25 13 of 36 The associated IS weight is L (z) L (z, ` ), and the two-stage algorithm was summarised previously 1 2 N in Algorithm 3. Algorithm 4 Proposed One-Stage IS Algorithm for Estimating p 1: Compute l using a small pilot simulation. 2: Simulate M i.i.d. copies of the systematic risk factors from f (z) and compute the corresponding first-stage IS weights. Denote the realised values of the factors by z , . . . , z and the associated IS 1 M weights by L (z ), . . . , L (z ). 1 1 1 M 3: For each scenario m: (a) Simulate the idiosyncratic risk factors for each exposure. Denote the simulated values by y , . . . , y . 1,m N,m 1 N (b) Set ` = L(z , y ) and ` = ` . i,m m i,m m i,m i=1 1 M 4: Return pb = L (z ) 1 . x å m ¯ M m=1 f` >xg Although it is simpler to implement and faster to run, the one-stage algorithm is less accurate than the two-stage algorithm. More precisely, the two-stage estimator never has larger variance than the one-stage estimator. To see this, first let E denote expectation under the one-stage IS density 1S h (z, `) given in Equation (31). Then the variance of the one-stage estimator is: 1S 2 2 E [(L (Z) 1 ) ] p 1S 1 x fL xg where M denotes sample size. And if we let E denote expectation under the two-stage IS density 2S h (z, `) given in Equation (32) then the variance of the two-stage estimator is: 2S 2 2 E [(L (Z) L (Z, L ) 1 ) ] p 2S 1 2 N fL xg x In order to compare variances it suffices to compare the second moments appearing above under the actual density h(z, `), and we let E denote expectation with respect to this density. To this end we note that: E [(L (Z) 1 ) ] = E[L (Z) 1 ] ¯ ¯ 1S 1 1 fL xg fL xg N N and ¯ ¯ E [(L (Z) L (Z, L ) 1 ) ] = E[L (Z) L (Z, L )] 1 ] 2S 1 2 N ¯ 1 2 N ¯ fL xg fL xg N N In light of Equation (29) we get that: L (Z, L ) 1  1 1 = 1 , (33) 2 N ¯ ¯ ¯ fL >xg fL >xg fL >xg N N N whence ¯ ¯ E [(L (Z) L (Z, L ) 1 ) ] = E[L (Z) L (Z, L ) 1 ] ¯ ¯ 2S 1 2 N 1 2 N fL xg fL xg N N E[L (Z) 1 ] 1 fL xg = E [(L (Z) 1 ) ] . 1S 1 ¯ fL xg The two-stage estimator will therefore never have larger variance than the the one-stage estimator. 4.2. Large First-Stage Weights In the examples that we consider in this paper, the systematic risk factors are Gaussian. When selecting their IS density, one could either (i) shift their means and leave their variances (and correlations) unchanged or (ii) shift their means and adjust their variances (and correlations). In general Risks 2020, 8, 25 14 of 36 the latter approach will lead to a much better approximation to the ideal density f , but could lead to an IS weight that has infinite variance. By contrast, the former approach will always lead to an IS weight with finite variance, but could lead to a poor approximation of the ideal density. At first glance it might seem absurd to consider IS densities whose weights are so unstable as to have infinite variance, but we have found that adjusting the variances of the systematic risk factors can lead to more effective estimators, in terms of both statistical accuracy and run time (see Section 6.1 for more details), provided one stabilises the resulting IS weights in some way. In the remainder of this section we describe a simple stabilisation technique that leads to a computable upper bound on the associated bias (an alternative would be to stabilize unruly IS weights via truncation, as discussed in Ionides (2008)). Returning now to the general case, suppose that the first-stage IS parameter, l , is such that the first-stage IS weight, L (Z), has infinite variance. We trim large first-stage weights by fixing a set A  R such that L () is bounded over A, and discarding those simulations for which Z 2 / A. Specifically, the last line of Algorithm 3 would be altered to return the trimmed estimate: p = L (z ) L (z, ` ) 1  1 , x 1 m 2 m ¯ å fz 2 Ag f` >xg m m=1 etc. for Algorithm 4. The variance of the so-trimmed estimator is necessarily finite (recall that ¯ ¯ L (z, `)  1 if ` > x), and its bias is: ¯ ¯ E [L (Z) L (Z, L ) 1  1 ] = E[1  1 ] = E[P(L > xjZ) 1 ] , ¯ ¯ 2S 1 2 N N fL >xg fZ2 / Ag fL >xg fZ2 / Ag fZ2 / Ag N N where we have used the tower property (conditioning on Z) to obtain the last equality. Using Chernoff’s bound in Equation (11) we get that: E[P(L > xjZ) 1 ]  E[exp(Nq(x, Z)) 1 ] . (34) fZ2 / Ag fZ2 / Ag As it only depends on the small number of systematic risk factors, and not the large number of idiosyncratic risk factors, the right-hand side of Equation (34) is a tractable upper bound on the bias committed by trimming large (first-stage) IS weights. This upper bound can be used to assess whether or not the bias associated with a given set A is acceptable. 4.3. Large Rejection Constants The smaller the c ˆ, the more efficient is the rejection sampling algorithm employed in the second stage. Indeed the average number of proposals that must be generated in order to obtain one realisation from g ˆ is 1/c ˆ. In the examples we consider in this paper, c ˆ is (essentially) a decreasing function m(z), such that c ˆ ! 1 as m(z) ! x and c ˆ ! ¥ as m(z) ! 0 (see Figure 1). The second-stage rejection algorithm is therefore quite efficient when m(z)  x and quite inefficient when m(z)  0. Now, the IS density for the first-stage risk factors is such that the distribution of m(Z) concentrates most of its mass near x (where c ˆ is a reasonable size), but it is still theoretically possible to obtain a realisation of the systematic risk factors for which m(z) is very small and c ˆ is unacceptably large (e.g., 10 ). In such situations the algorithm effectively grinds to a halt, as one endlessly generates proposed losses that have no realistic chance of being accepted. It is extremely unlikely that one does obtain such a scenario under the first-stage IS distribution, but it is still important to protect oneself against this unlikely event. To this end we suggest fixing some maximum acceptable rejection constant c , and only max applying the second stage IS to those first-stage realizations for which m(z) < x and c ˆ  c . In other max words, even if the realised values of the systematic risk factors lie outside the region of interest, we avoid applying the second stage if the associated rejection constant exceeds the predefined threshold. Risks 2020, 8, 25 15 of 36 4.4. Computing q ˆ ˆ Repeated evaluations of q(x,) are necessary when computing l at the outset of the algorithm, as well as during the second stage of the two-stage algorithm. Recall that in order to compute q(x, z) “exactly” one must numerically solve the equation k (q, z) = x, which requires a non-trivial amount of CPU time. As each evaluation of q is relatively costly, repeated evaluation would, in the absence of any further approximation (over and above that inherent in numerical root-finding), account for the vast majority of the algorithm’s total run time. In order to reduce the amount of time spent evaluating q we fit a low degree polynomial to the function q(x,) that can be evaluated extremely quickly, considerably reducing total run time. Specifically, suppose that we must compute q(x, z ) for each of n points z , . . . , z (either the sample n 1 n points from the pilot simulation, or the first-stage realisations that did not land in the region of interest). We identify a small set C  R that contains each of the n points, construct a mesh of m << n points in C, evaluate q exactly at each mesh point, and then fit a fifth degree polynomial to the ¯ ¯ ¯ resulting data. Letting q(x,) denote the resulting polynomial, we then evaluate q(x, z ), . . . , q(x, z ) 1 n ˆ ˆ instead of q(x, z ), . . . , q(x, z ). If m is substantially smaller than n, then the reduction in CPU time 1 n is considerable. 5. PD-LGD Correlation Framework All of the PD-LGD correlation models listed in the introduction are special cases of the following general framework—an observation that, to the best of our knowledge, has not been made in the literature. The systematic risk factors take the form Z = (Z , Z ), where Z and Z are bivariate D L D L normal with standard normal margins and correlation r . Idiosyncratic risk factors take the form Y = S i (Y , Y ), where Y and Y are bivariate normal with standard normal margins and correlation r . i,D i,L i,D i,L I Associated with each exposure is a default driver X and a loss driver X , defined as follows: i,D i,L X = a Z + 1 a Y , (35) D D i,D i,D X = a Z + 1 a Y . (36) i,L L L i,L The factor loadings a and a are constants taking values in the unit interval, and dictate the relative D L importance of systematic risk versus idiosyncratic risk. The correlation between default drivers of 2 2 distinct exposures is r := a and the correlation between loss drivers of distinct exposures is r := a . D L D L The correlation between the default and potential loss drivers of a particular exposure is: q q 2 2 r := a a r + 1 a 1 a r , D L D L S I D L which can be positive or negative (or zero). Note that if r and r have the same sign then, since both S I factor loadings are positive, r inherits this common sign. D L The realised loss on exposure i is L = D L , where: i i i D = 1 1 fX F (P)g i,D is the default indicator associated with exposure i and L = h(X ) i i,L is called the potential loss (our terminology) associated with exposure i. Here P denotes the common default probability of all exposures and h is some function from R to [0, ` ]. It is useful (but not max necessary) to think of potential loss as L = max(0, 1C ), where C is the value of the collateral i i i pledged to exposure i expressed as a fraction of the loan’s notional value. Risks 2020, 8, 25 16 of 36 Models in this framework are characterised by (i) the correlation structure of the risk factors, specifically restrictions on the values of r and r , and (ii) the marginal distribution of potential loss. I S For instance: Frye (2000) assumes perfect systematic correlation (r = 1) and zero idiosyncratic correlation (r = 0); Pykhtin (2003) assumes perfect systematic correlation (r = 1) but allows for arbitrary idiosyncratic correlation (r unrestricted); Witzany (2011) allows for arbitrary systematic correlation (r unrestricted) but insists on zero idiosyncratic correlation (r = 0); Miu and Ozdemir (2006) allow for arbitrary systematic correlation (r unrestricted) and arbitrary idiosyncratic correlation (r unrestricted). Note that if r = 1 then the systematic risk factor is effectively one-dimensional. Indeed if r = 1 then Z = (Z, Z) from some standard Gaussian variable Z, and if r = 1 then Z = (Z,Z). S S We refer to the case jr j = 1 as the one-factor case, and the case jr j < 1 as the two-factor case. In the S S one-factor case we use Z, and not Z, to denote the systematic risk factor. The first two models listed above are one-factor models, the last two are two-factor models. The marginal distribution of potential loss is determined by the specification of the function h. For instance: Frye (2000) specifies h(x) = max(0, 1 a(1 + bx)) for constants a 2 R and b > 0. Potential loss takes values in [0, ¥). Its density has a point mass at zero and is proportional to a Gaussian density on (0, ¥). Since L is not constrained to lie in the unit interval, this specification violates the assumptions made in Section 2.3; a+bx Pykhtin (2003) specifies h(x) = max(0, 1 e ) for constants a 2 R and b > 0. Potential loss takes values in [0, 1). Its density has a point mass at zero, and is proportional to a shifted lognormal density over (0, 1); Witzany (2011) and Miu and Ozdemir (2006) both specify h(x) = B (F(x)), where a, b > 0 and a,b B denotes the cdf of the beta distribution with parameters a and b. Potential loss takes values in a,b (0, 1). It is a continuous variable and follows a beta distribution. The sign of r and the nature of the function h (increasing or decreasing) will in general D L determine the sign of the relationship between D and L . If r > 0 then the relationship will be i i D L positive [negative] provided h is decreasing [increasing], and vice versa if r < 0. D L 5.1. Computing m(z) 2 T Here vectors z 2 R take the form z = (z , z ) . In order to obtain an expression for m(z) = D L E[L jZ = z], we begin with the observation that: E[L jZ] = E[L D jZ] = E[L E[D jX , Z]jZ] = E[L P(D = 1jX , Z)jZ] . i i i i i i,L i i i,L Thus, m(z) = h(x ) F(d, m(x , z), v) f(x , a z , 1 a ) dx , (37) L L L L L L where 1 a m(x , z) := a z + r   (x a z ) L D D I L L L 1 a and 2 2 v = v(x , z) := (1 a )(1 r ) D I Risks 2020, 8, 25 17 of 36 are the conditional mean and variance of X , respectively, given that (X , Z) = (x , z). In general i,D i,L L m(z) must be evaluated using quadrature, and doing so is straightforward . On average (across parameter values and points z 2 R ) a single evaluation of m() requires approximately one millisecond. In the one-factor case with r = 1 [r = 1] the expression for m(z) = E[L jZ = z] is obtained by S S i plugging z = (z, z) [z = (z,z)] into Equation (37). 5.2. Computing k(q, z) and q(x, z) 2 T Here again, vectors z 2 R take the form z = (z , z ) . In order to derive an expression for k(q, z) D L we begin with the observation that: q L qL qL i i i e = 1(D = 0) + e 1(D > 0) = 1 + (e 1) 1(D > 0) , i i i q L and since k(q, z) = log(E[e jZ = z]), we get that: qh(x ) 2 k(q, z) = log 1 + (e 1) F(d, m(x , z), v) f(x , a z , 1 a ) dx , (38) L L L L L where m(x , z) and v are given in the previous section. In the one-factor case with r = 1 [r = 1] L S S the expression for k(q, z) = log(E[exp(q L )jZ = z]) is obtained by plugging z = (z, z) [z = (z,z)] into Equation (38). As with m(z), k(q, z) must in general be evaluated using quadrature, which is straightforward. The time required for a single evaluation of k(q,) is comparable to that required for a single evaluation of m(). In order to compute q we must solve the equation k (q, z) = x with respect to q. Differentiating Equation (38) we get: qh(x ) 2 ¶k(q, z) h(x ) e  F(d, m(x , z), v) f(x , a z , 1 a ) dx L L L L L L 0 R L k (q, z) = = , (39) ¶q exp(k(q, z)) which is straightforward to compute using quadrature. A single evaluation of k (q, z) requires approximately twice as much time as a single evaluation of k(q, z). As the root of k (q, z) = x must be evaluated numerically, evaluating q is much more time consuming than evaluating k or k . Across parameter values and points z 2 R , and using q = 0 as an initial guess, the average time required for a single evaluation of q(x,) is slightly less than one tenth of one second. The right panel of Figure 1 illustrates the relationship between expected losses and the rejection ˆ ˆ constant employed in the second stage, c ˆ = exp(q k(q, z)). We see that c ˆ is essentially a decreasing ˆ ˆ function of m(z), such that c ! 1 as m(z) ! x and c ! ¥ as m(z) ! 0. The left panel of Figure 1 illustrates the graph of the LDA approximation P(L > xjZ = z)  exp(Nq(x, z)). The approximation is identically equal to one inside the region of interest, and decays to zero very rapidly outside the region. In other words, most of the variability in the function q(x,) occurs along, and just outside, the boundary of the region of interest. All calculations are carried out using Matlab 2018a on a 2015 MacBook Pro with 6.8 GHz Intel Core i7 processor and 16 GB (1600 MHz) of memory. Numerical integration is performed using the built-in integral function. We use the Matlab function fzero for the root-finding. Risks 2020, 8, 25 18 of 36 LDA Approximation to Conditional Tail Probability Expected Losses and Rejection Constant 0.8 0.6 0.4 0.2 2 -2 -2.5 -3 -2 -3.5 -4 -4 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Figure 1. The left panel of this figure illustrates the relationship between expected losses m(z) and the second-stage rejection constant c ˆ = c ˆ(x, z), in the two-factor model. The right panel illustrates the graph of the LDA approximation of Equation (13). Parameters (randomly selected using the procedure in Section 5.3) in both panels are (P, r , r , r , r , a, b, N) = (0.0063, 0.3964, 0.2794, 0.3356, 0.7599, D L I S 0.6497, 0.5033, 134) and the threshold is x = 0.1575. Mean losses are E[L ] = 0.0029, and the probability that losses exceed the threshold x is on the order of 50 basis points. Points in the left panel were obtained by generating 1000 realizations of the systematic risk factors from their actual distribution (as opposed to the first-stage IS distribution) using the indicated parameter values. 5.3. Exploring the Parameter Space The model contains five parameters, in addition to any parameters associated with the transformation h. We are ultimately interested in how well the proposed algorithms perform across a wide range of different parameter sets. As such, in our numerical experiments we will randomly select a large number of parameter sets according to the procedure described below, and assess the algorithms’ performance for each parameter set. Generate the default probability P uniformly between 0% and 10%, and generate each of the 2 2 correlations r = a and r = a uniformly between 0% and 50%; D L D L In the one-factor model, generate r uniformly on f1, 1g, i.e., r takes on the value 1 or +1 S S with equal probability. If r = 1 we generate r uniformly between 0% and 100%, and if r = 1 S I S we generate r uniformly between100% and 0%. This allows us to control the sign of r , which I D L we must do in order to ensure a positive relationship between default and potential loss. In the two-factor model we randomly generated r uniformly on [1, 1]. If r is positive, randomly S S generate r uniformly on [0, 1], otherwise randomly generate r uniformly on [1, 0]; I I We choose the transformation h() to ensure that (i) potential loss is beta distributed and (ii) there is a positive relationship between default and loss. The paramters a and b of the beta distribution are generated independently from an exponential distribution with unit mean. If r < 0 we D L 1 1 set h(x) = B (F(x)) and if r > 0 we set h(x) = B (F(x)), where B () is the cumulative D L a,b a,b a,b distribution function for the beta distribution with parameters a and b. Note that under these restrictions, in the one-factor model the expected loss function m(z) is monotone decreasing. In order to ensure that we are considering cases of practical interest, we randomise the portfolio size and loss threshold as follows. Generate the number of exposures randomly between 10 and 5000; 1 q In the one-factor model we generate the threshold x by setting x = m(F (10 )), where q is uniformly distributed on [1, 5]. The LPA suggests that 1 q p = P(L > x)  P(m(Z) > x) = P(Z < m (x)) = 10 . x N Risks 2020, 8, 25 19 of 36 This means that log( p ), the order of magnitude of the probability of interested, is approximately uniformly distributed on [5,1]. In the two-factor model we set x = m(z ), where z = q q 1 1 (F (q), r F (q)) and q is uniformly distributed on [5,1]. 6. Implementation In this section we discuss our implementation of the algorithm proposed in Section 3 in the general framework outlined in Section 5. As the general framework encompasses many of the PD-LGD correlations that have been proposed in the literature, this section effectively discusses implementation of the proposed algorithm across a wide variety of models that are used in practice. 6.1. Selecting the IS Density for the Systematic Risk Factors The systematic risk factors here are Gaussian. When constructing their IS density we could either shift their means and leave their variances (and correlations) unchanged, or shift their means and adjust their variances (and correlations). Recall that the ultimate goal is to choose an IS density that closely resembles the ideal density f given in Equation (15). As illustrated in Figure 2, the ideal density f tends to be very tightly concentrated about its mean, and adjusting the variance of the systematic risk factors leads to a much better approximation to the ideal density for “typical values” of the ideal density. The left tail of the ideal density is, however, heavier than the variance-adjusted IS density, an issue that can be resolved by trimming large IS weights. Normal Approximation to Optimal Density Normal Approximation to Optimal Density 1.8 2.5 1.6 1.4 1.2 1.5 0.8 0.6 0.4 0.5 0.2 0 0 -5 -4.5 -4 -3.5 -3 -4 -3.5 -3 -2.5 -2 Figure 2. This figure illustrates f (in fact, the approximation of Equation (40)) for two randomly generated sets of parameters. Each panel superimposes (i) a normal density with the same mean and variance as f (dashed blue line), and (ii) a normal density with the same mean as f and unit variance x x (dash-dot red line). The mean and variance of f are computed via (computationally inefficient) quadrature. The mean and variance of f are computed using quadrature. Parameters in the right panel are (P, r , r , r , r , a, b, N) = (0.02, 0.33, 0.27, 0.96, 1, 2.47, 4.32, 454), and for the left panel they D L I S are (P, r , r , r , r , a, b, N) = (0.03, 0.13, 0.12, 0.85, 1, 1.81, 1.90, 271). In both cases, the transformation D L I S h is taken to be h(x) = B (F(x)). a,b The downside to adjusting the variance of the systematic risk factors is that it can lead to first-stage IS weights with infinite variance, but numerical evidence suggests that this issue can be mitigated by In the one-factor model, a tractable approximation to the ideal density can be obtained by using the LDA of Equation (13) to approximate both probabilities appearing in Equation (15). The result is: exp(Nq(x, z)) f(z) f (z)  , (40) exp(Nq(x, w)) f(w) dw and the right-hand side of Equation (40) can be approximated via quadrature. As the integrand involves q, the approximation is computationally very slow. Risks 2020, 8, 25 20 of 36 trimming large weights. Indeed, numerical experiments suggest that adjusting variance and trimming large weights leads to substantially more accurate estimators of p . Intuitively, it is more important for the IS density to mimic the behaviour of the ideal density over its “typical range”, as opposed to faithfully representing its tail behaviour. In addition to improving statistical accuracy, adjusting variance has the added benefit of making the second stage of the algorithm more computationally efficient in terms of run time. Indeed, as discussed in more detail in Section 6.3, adjusting variance tends to increase the proportion of first-stage simulations that land in the region of interest (thereby reducing the number of times the rejection sampling algorithm must be employed in the second stage) and reduces the average size of the rejection constants employed in the second stage (thereby making the rejection algorithm more effective whenever it must be employed). 6.2. First Stage In this section we explain how to efficiently approximate the parameters of the optimal IS density for the systematic risk factors, in both the one- and two-factor models. We also explain how we trim large IS weights, and demonstrate that the resulting bias is negligible. 6.2.1. Computing Parameters in the Two-Factor Model In the two-factor model the systematic risk factors are bivariate Gaussian with zero mean vector and covariance matrix: " # 1 r S = . r 1 The mean vector and covariance matrix that satisfy the criteria of Equation (25) are: m := E[ZjL > x] (41) I S N and S := E[(Z m )(Z m ) jL > x] , (42) I S I S I S N respectively. In order to approximate the suggested mean vector and covariance matrix we use Equation (27) to get: E[exp(Nq(x, Z)) Z] m  (43) I S E[exp(Nq(x, Z))] and E[exp(Nq(x, Z)) (Z m )(Z m ) ] I S I S S  . (44) I S E[exp(Nq(x, Z))] The expected values appearing on the right-hand sides of Equations (43) and (44) are both amenable to simulation, and we use a small pilot simulation of size M << M to approximate them. In our numerical examples, the size of the pilot simulation is 10% of the sample size that is eventually used to estimate p . Whether or not we adjust the variance of the systematic risk factor, the standard error of the resulting estimator is of the form n/ M, where n depends on the model parameters and is easily estimated via simulation. Using 100 randomly selected parameter sets from the one-factor model, selected according to the procedure described in Section 5.3, we find that for 0.03 the one-stage estimator n /n  1.54 p , where n denotes the value of n assuming we only shift the mean of the MS VA MS systematic risk factor and do not adjust its variance and n denotes the value when we do adjust variance. For probabilities VA in the range of interest, then, adjusting the variance of the systematic risk factor leads to an estimator that is nearly four times as efficient, in the sense that the sample size required to achieve a given degree of accuracy (as measured by standard error) is nearly four times larger if we do not adjust variance. As discussed in Appendix B, the natural sufficient statistic here consists of the components of Z plus the components of T T T ¯ ¯ ZZ . As such, in order to satisfy Equation (27) we must ensure that E [Z] = E[ZjL > x] and E [ZZ ] = E[ZZ jL > x], I S N I S N where E denotes mean under the IS distribution. These conditions are clearly equivalent to Equations (41) and (42).. I S Risks 2020, 8, 25 21 of 36 In order to implement the approximation we must first simulate the systematic risk factors and then compute q(x, z) for each sample point z. The most natural way to proceed is to (i) sample the systematic risk factors from their actual distribution (bivariate Gaussian with zero mean vector and covariance matrix S) and (ii) numerically solve the equation k (q, z) = x in order to compute compute q(x, z) for each pilot sample point z that lies outside the region of interest. In our experience this leads to unacceptably inefficient estimators, in terms of both (i) statistical accuracy and (ii) computational time. We deal with each issue in turn. As most of the variation in q(x,) occurs just outside the boundary of the region of interest (recall the right panel of Figure 1), we suggest using an IS distribution for the pilot simulation that is centered on the boundary of the region. Specifically, we suggest using that point on the boundary at which the density of the systematic risk factors attains its maximum value (i.e., the most likely point on the boundary): T 1 z := arg minfz S z : m(z) = xg . (45) The non-linear minimisation problem appearing above is easily and rapidly solved using standard techniques. We used fmincon function in Matlab. As z lies on the boundary of the region of interest, roughly half the pilot sample will lie outside the region. In Section 5.2 we noted that it takes nearly one tenth of one second to numerically solve the equation k (q, z) = x. As such, if we are to compute q exactly (i.e., by numerically solving the indicated equation) for each sample point that lies outside the region of interest, the total time required (in seconds) to estimate the first-stage IS parameters will be at least M /20. In our numerical examples we use a pilot sample size of M = 1000, which means that it would take nearly one full minute to compute the first-stage IS parameters. This discussion suggests that reducing the number of times we must numerically solve the equation k (q, z) = x could lead to a dramatic reduction in computational time. We suggest fitting a low degree polynomial to the function q(x,), over a small region in R that contains all of the pilot sample points that lie outside the region of interest. Specifically, we determine the smallest rectangle that contains all of the pilot sample points, and discretize the rectangle using a mesh of n points, equally spaced in each direction. Next, we identify those mesh points that lie outside the region of interest and compute q(x, z) exactly (i.e., by solving k (q, z) = x numerically) for each such point. Finally, we fit a polynomial to the resulting (z, q(x, z)) pairs and call the resulting function q(x,). Numerical evidence indicates at using a fifth-degree polynomial and a mesh with 15 = 225 points leads to a sufficiently accurate approximation to q(x,) over the indicated range (the intersection of (i) the smallest rectangle that contains all sample points and (ii) the complement of ¯ ˆ the region of interest). Note that q could be an extremely inaccurate approximation to q outside this range, but that is not a concern because we will never need to evaluate it there. It remains to compute q(x, z) for each of the pilot points z. For those points z that lie inside the region of interest, we set q(x, z) = 0. For those points that lie inside the region, we set q(x, z) = ¯ ¯ ¯ ¯ ¯ q x k(q, z), where q = q(x, z). Evaluating q(x,) requires essentially no computational time (it is a polynomial), and if the mesh size and degree are chosen appropriately the difference between q ¯ ˆ and q is very small. In total, the suggested procedure reduces the number of evaluations of q from M /2 to n /2, for a percentage reduction of n / M . In our numerical examples we use n = 15 and p g p g M = 1000, which corresponds to a reduction of 75% in computational time. To summarise, we estimate the optimal first-stage IS parameters as follows. First, we compute z . Second, we draw a random sample of size M from the Gaussian distribution with mean vector z and p x ¯ ˆ covariance matrix S. Third, we construct q(x,), the polynomial approximation to q(x,), as described Risks 2020, 8, 25 22 of 36 in the previous paragraph. Fourth, for those sample points z that lie outside the region of interest we ¯ ˆ compute q(x, z) using q instead q. The estimates of the optimal first-stage IS parameters are then: w(Z ) exp(Nq(x, Z ))Z m m m m=1 m ˆ = I S w(Z ) exp(Nq(x, Z )) m m m=1 and å w(Z ) exp(Nq(x, Z ))(Z m ˆ )(Z m ˆ ) m m m m I S I S m=1 S = , I S w(Z ) exp(Nq(x, Z )) m m m=1 where Z , . . . , Z is the random sample and 1 M f(z; 0, S) w(z) = f(z; z , S) is the IS weight associated with shifting the mean of the systematic risk factors from 0 to z . The upper left panel of Figure 3 illustrates a typical situation where the mean of the IS distribution lies “just inside” the region of interest. Region of Interest 4.5 3.5 2.5 1.5 0.5 -2.95 -2.9 -2.85 -2.8 -2.75 Figure 3. This figure illustrates the locations of (i) the importance sampling (IS) mean used for the pilot simulation and (ii) the IS mean used for the actual simulation, relative to the region of interest. Parameters (randomly selected using the procedure in Section 5.3) in both panels are (P, r , r , r , r , a, b, N) = (0.0063, 0.3964, 0.2794, -0.3356, -0.7599, 0.6497, 0.5033, 134) and the threshold D L I S is x = 0.1575. Mean losses are E[L ] = 0.0029. 6.2.2. Computing Parameters in the One-Factor Model The procedure described in the previous section specialises in the one-factor case as follows. First, under the parameter restrictions outlined in Section 5.3, the expected loss function m(z) is a strictly decreasing function of z. As such, the region of interest is the semi-infinite interval (¥, z ), where z := m (x), and its boundary is the single point z . In general z must be computed numerically, x x x which is straightforward. Second, we draw a random sample of size M from the Gaussian distribution with mean z and unit variance. Third, the polynomial approximation to q is constructed by evaluating q exactly (i.e., by numerically solving the equation k (q, z) = x) at each of n equally-spaced points z in the interval [z , z ], where z and z are the largest and smallest values obtained in the pilot + + simulation, respectively, and then fitting a polynomial to the resulting (z, q(x, z)) pairs. Fourth, we Risks 2020, 8, 25 23 of 36 evaluate q(x, z) for each pilot sample point z as follows—if z lies inside the region of interest we set q(x, z) = 0, otherwise we compute q(x, z) by replacing the exact value q(x, z) with the approximate ¯ ¯ value q(x, z), where q is the polynomial constructed in the previous step. Note that a single evaluation ¯ ˆ of q requires far less computational time than a single evaluation of q. Finally, the approximations to the first-stage IS parameters are: w(Z ) exp(Nq(x, Z ))Z m m m m=1 m ˆ = I S w(Z ) exp(Nq(x, Z )) å m m m=1 and w(Z ) exp(Nq(x, Z ))(Z m ˆ ) m m m I S 2 m=1 s = , I S w(Z ) exp(Nq(x, Z )) å m m m=1 where Z , . . . , Z is the random sample and 1 M f(z; 0, 1) w(z) = . f(z; z , 1) is the IS weight associated with shifting the mean of the systematic risk factor from 0 to z . 6.2.3. Trimming Large Weights In the one-factor model the first-stage IS weight will have infinite variance whenever s < 0.5 I S (see Remark A1 in Appendix B). In a sample of 100 parameter sets, randomly selected according to the procedure in Section 5.3, the largest realised value of s was 0.38, and the mean and median were 0.11 I S and 0.09, respectively. It appears, then, that the first-stage IS weight in the one-factor model will have infinite variance in all cases of practical interest. We trim large weights as described in Section 4.2, using the set: A = fz 2 R : jz m ˆ j  Cs g I S I S for some constant C. In the numerical examples that follow we use C = 4, in which case we expect to trim less than 0.01% of the entire sample. Specialising Equation (34) to the present context, we get that an upper bound on the associated bias is given by: exp(Nq(x, z))f(z) dz , (46) which is straightforward (albeit slow) to compute using quadrature. Figure 4 illustrates the relationship between the probability of interest p and the upper bound of Equation (46) for the 100 randomly generated parameter sets, and clearly demonstrates that the bias associated with our trimming procedure is negligible. For instance, for probabilities on the order of 10 the bias is no larger than 10 , or 1% of the quantity of interest. In the two-factor model the first-stage IS weight will have infinite variance whenever det(2S S ) < 0. In a random sample of 100 parameter sets, this condition occurred 96 times. As in the I S one-factor model, then, the first-stage IS weight in the two-factor model can be expected to have infinite variance in most cases of practical interest. We trim large weights using the set: n o 2 T 1 2 A = z 2 R : (z m ˆ ) S (z m ˆ )j  C I S I S I S for some constant C, and use C = 4 in the numerical examples that follow. Risks 2020, 8, 25 24 of 36 -3 -4 -5 -6 -7 -1 -2 -3 -4 -5 10 10 10 10 10 Figure 4. This figure illustrates the bias introduced by trimming large weights (vertical axis) as a function of the probability of interest (horizontal axis), for 100 randomly generated parameter sets in the one-factor case. For each set, we compute bias (in fact, an upper bound on the bias) by using quadrature to approximate Equation (46) and estimate the probability of interest using the full two-stage algorithm. 6.3. Second Stage The first stage of the algorithm consists of (i) computing the first-stage IS parameters, (ii) simulating a random sample of size M from the systematic risk factors’ IS distribution, and (iii) computing the associated IS weights, trimming large weights appropriately. Having completed these tasks, the next step is to simulate individual losses in the second stage. In the remainder of this section we let z = (z , z ) denote a generic realisation of the systematic risk factors obtained in the D L first stage. 6.3.1. Approximating q Before generating any individual losses first construct the polynomial approximation to q, using the same procedure described in Section 6.2.1. The basic idea is to fit a relatively low degree polynomial to the surface of q(x,), over a small region that contains all of the first-stage sample points. The values of z obtained in the pilot sample are invariably different from those obtained in the first stage, so it is ¯ ˆ essential that the polynomial is refit to account for this fact. In what follows we use q to approximate q whenever the numerical value of q is required, but since the difference between the two is small we do ˆ ¯ not distinguish between the two (i.e., we write q in this document, but use q in our code). 6.3.2. Sampling Individual Losses In this section we describe how to sample individual losses in the two-factor model. The procedure carries over in an obvious way to the one-factor model, so we do not discuss that case explicitly. If z lies inside the region of interest then the second stage is straightforward. For a given exposure i, we first simulate the exposure’s idiosyncratic risk factors Y = (Y , Y ), from the bivariate normal i i,D i,L distribution with standard normal margins and correlation r . Next, we set: q q 2 2 (X , X ) = (a z + 1 a Y , a z + 1 a Y ) . i,D i,L D D i,D L L i,L D L Risks 2020, 8, 25 25 of 36 If X > F (P) then the exposure did not default and we set L = 0 and proceed to the next exposure. i,D i Otherwise the exposure did default, in which case we must compute h(X ), set ` = h(x ) and i,L i i,L then proceed to the next exposure. Note that we only evaluate h for defaulted exposures—this is important since evaluating h requires numerical inversion of the beta cdf, which is relatively slow. Having computed the individual losses associated with each exposure, we then compute the average 1 N ¯ ¯ loss ` = N ` and set L (z, `) = 1. i 2 i=1 ˆ ˆ If z lies outside the region of interest we must compute q, k(q) and c, which we do approximately using the polynomial approximation q. We then sample from g ˆ (jz) as follows. First simulate the idiosyncratic risk factors Y = (Y , Y ) from the bivariate normal distribution with standard normal i i,D i,L margins and correlation r . Also generate a random number U, independent of Y . Then set: I i q q 2 2 (X , X ) = (a z + 1 a Y , a z + 1 a Y ) . i,D i,L D D i,D L L i,L D L ˆ ˆ If the exposure did not default we set L = 0, otherwise we compute h and set L = h(X ). Next we i i i,L check whether or not 1 g ˆ (L jz) x i ˆ ˆ U   = exp(q(` L )) (47) max c ˆ g(L jz) ˆ ˆ then accept L as a drawing from g ˆ , that is, set L = L and proceed to exposure i. Otherwise, draw i x i i another random number U and set of idiosyncratic factors. Once we have sampled the individual ¯ ¯ losses associated with each exposure we compute the average loss ` = N ` and set L (z, `) = i 2 i=1 ˆ ¯ ˆ ˆ exp(N[q` k(q, z)]), using the polynomial approximation to estimate the value of q. 6.3.3. Efficiency of the Second Stage The frequency with which the rejection sampling algorithm must be applied in the second stage is governed by P (m(Z) > x). The left panel of Figure 5 illustrates the empirical distribution of this I S probability across 100 randomly selected parameter sets. The distribution is concentrated towards small values (the median fraction is 27%) but does have a relatively thick right tail (the mean fraction is 35%). In some cases—particularly when the value of the parameter r is close to zero, in which case individual losses are very nearly independent and systematic risk is largely irrelevant—the vast majority of first-stage simulations require further IS in the second stage. The efficiency of the rejection sampling algorithm, when it must be applied, is governed by the conditional distribution of c ˆ = c ˆ(x, Z) given that m(Z) < x. For each of the 100 parameter sets we estimate E [c ˆ(x, Z)jm(Z) < x], which determines the average size of the rejection constant for a given I S set of parameters, by computing the associated value of c ˆ for each first-stage realisation that lies outside the region of interest and then averaging the resulting values. The right panel of Figure 5 illustrates the results, and we note that the mean and median of the data presented there are 1.09 and 1.17, respectively. The figure clearly indicates that the rejection sampling algorithm can be expected to be quite efficient, whenever it must be applied. The distributions of P (m(Z) < x) and E [c ˆ(x, Z)jm(Z) < x] across parameters depend heavily I S I S on whether or not we adjust the variance of the systematic risk factors in the first stage. When we do not adjust variance, the mean and median of P (m(Z) < x) (across 100 randomly selected parameter I S sets) rise to 49% and 45% (as compared to 35% and 27% when we do adjust variance), and the mean and median of E [c ˆ(x, Z)jm(Z) < x] rise to 18.6 and 1.8, respectively (as compared to 1.17 and 1.09 I S when we do adjust variance). Remark 7. If we do not adjust the variance of the systematic risk factors in the first stage, then (i) the rejection sampling algorithm must be applied more frequently and (ii) is less efficient whenever it must be applied. As such, adjusting the variance of the systematic risk factors reduces the total time required to implement the two-stage algorithm. Risks 2020, 8, 25 26 of 36 0.45 0.8 0.4 0.7 0.35 0.6 0.3 0.5 0.25 0.4 0.2 0.3 0.15 0.2 0.1 0.1 0.05 0 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 Figure 5. This figure illustrates the variation of P (m(Z) < x) (left panel) and E [c ˆ(x, Z)jm(Z) < x] I S I S (right panel) across model parameters. Recall that the former quantity determines the frequency with which the second-stage rejection sampling algorithm must be applied and the latter quantity determines the efficiency of the algorithm when it must be applied. For each of 100 parameter sets, randomly selected according to the procedure described in Section 5.3, we compute the first-stage IS parameters and then draw 10,000 realisations of the systematic risk factors from the variance adjusted first-stage IS density. The intuition behind this fact is as follows. First recall that the mean of the systematic risk factors tends to lie just inside the region of interest (recall Figure 3). In such cases the effect of reducing the variance of the systematic risk factors is to concentrate the distribution of Z just inside the boundary of the region of interest. Not only will this ensure that more first-stage realisations lie inside the region of interest (thereby reducing the fraction of points that require further IS in the second stage), it will also ensure that those realisations that lie outside the region (i.e., for which m(z) < x) do not lie “that far ” outside the region (i.e., that m(z) is not “that much less” than x), which in turn ensures that the typical size of c ˆ is relatively close to one (recall the left panel of Figure 1). 7. Performance Evaluation In this section we investigate the proposed algorithms’ performance in terms of statistical accuracy, computational time, and overall. Unless otherwise mentioned, we use a pilot sample size of M = 1000 to estimate the first-stage IS parameters and a sample size of M = 10,000 to estimate the probability of interest ( p ). We use the value C = 4 to trim large first-stage IS weights, and a value of c = 10 to x max trim large rejection constants. 7.1. Statistical Accuracy The standard error of any estimator that we consider is of the form n / M for some constant n that depends on the algorithm used and the model parameters. For instance, for the one-stage estimator in the two-factor case we have n = SD (L (Z) 1 ), where SD denotes standard x ¯ 1S 1 fL >xg 1S deviation under the one-stage IS density of Equation (31). Note that in the absence of IS we have 0.5 n = p (1 p )  p as p ! 0. x x x x Figure 6 illustrates the relationship between n and p using 100 randomly selected parameters x x sets, for the two-stage algorithm and in the two-factor case. Importantly, we see that (i) n seems to be a function of p (i.e., it only depends on model parameters through p ) and (ii) for small probabilities x x the functional relationship appears to be of the form n = a p for constants a and b. These features are also present in the case of the one-stage estimator, as well as for both estimators in the one-factor model. The numerical values of a and b are easily estimated using the line of best fit (on the logarithmic scale), and the estimated values for both the one- and two-factor cases are summarised in Table 1. Of particular note is the fact that the value of b is extremely close to one in every case. Risks 2020, 8, 25 27 of 36 Statistical Accuracy of Two-Stage Estimator -1 -2 -3 -4 -5 -1 -2 -3 -4 -5 10 10 10 10 10 Probability of Interest Figure 6. This figure illustrates the relationship between n and p , where n is the standard deviation x x x of L (Z)L (L , Z)1 under the two-stage IS density of Equation (32), in the two-factor case. 1 2 N fL xg The numerical values of p and n are estimated for each of 100 randomly generated parameters sets, x x according to the procedure described in Section 5.3. Table 1. This table reports fitted values of the relationship n  a p for each estimator (one- and two-stage) and each model (one- and two-factor). Values of a and b are obtained by determining the line of best fit on the logarithmic scale (i.e., the line appearing in Figure 6). Note that in the absence of 0.5 IS we would have n = p (1 p )  p . x x x One-Stage Algorithm Two-Stage Algorithm 0.98 0.99 One-Factor Model 0.91 p 0.81 p x x 0.98 0.98 Two-Factor Model 0.98 p 0.81 p x x Of particular interest in the rare event context is an estimator ’s relative error, defined as the ratio of its standard error to the true value of the quantity being estimated. For any of the estimators that we b1 consider, the component of relative error that does not depend on sample size is n / p  a p . In the x x absence of IS we have b 1 = 0.5, in which case relative error grows rapidly as p ! 0 (i.e., n ! 0 x x but n / p ! ¥ as p ! 0). By contrast, b  1 for any of our IS estimators, in which case there is weak x x x dependence of relative error on p . The minimum sample size required to ensure that an estimator ’s 2(b1) 2 2 2 2 relative error does not exceed the threshold e is v /( p e)  a p e . In the absence of IS we x x have b  0.5, in which case the sample size (and therefore computational burden) required to achieve a given degree of accuracy increases rapidly as p ! 0. By contrast, for all of our IS estimators we have b  1, in which case the minimum sample size (and computational burden) is nearly independent of p . Our ultimate goal is to reduce the computational burden associated with estimating p , in situations where p is small. To see how effective the proposed algorithms are in this regard, note that the sample size required to achieve a given degree of accuracy using the proposed algorithm, relative to that required to achieve the same degree of accuracy in the absence of IS, is approximately 2(b1) 2 2 a p e x 2 2b1 = a p , p e 2 2b1 which does not depend on e. Since a < 1 and b > 0.5 (recall Table 1), we have that a p < p . Standard Error (Scaled by Sample Size) Risks 2020, 8, 25 28 of 36 Remark 8. The relative sample size required to achieve a given degree of accuracy using the proposed algorithm, relative to that required in the absence of IS, is not larger than the probability of interest. For example, if the probability of interest is approximately 1%, then the proposed algorithm requires a sample size that is less than 1% of what would be required in the absence of IS (regardless of the desired degree of accuracy). And if the probability of interest is 0.1%, then the proposed algorithm requires a sample size that is less than 0.1% of what would be required in the absence of IS. In other words, the proposed algorithm is extremely effective at reducing the sample size required to achieve a given degree of accuracy. It is also insightful to compare the efficiency of the two-stage estimator, relative to the one-stage estimator. In the one-factor case, the minimum sample size required using the two-stage algorithm, relative to that required using the one-stage algorithm, is approximately: 0.02 2 0.66 p e x 0.02 = 0.80 p . 0.04 0.83 p e As p ranges from 1% to 0.01% the estimated relative sample size ranges from 0.73 to 0.67. In the two-factor case, the relative sample size is approximately 0.69, regardless of the value of p . Remark 9. In both the one- and two-factor models, the two-stage algorithm is more efficient than the one-stage algorithm, in the sense that it requires a smaller sample size in order to achieve a given degree of accuracy. Indeed, in cases of practical interest (probabilities in the range of 1% to 0.01%) the minimum sample size required to achieve a given degree of accuracy using the two-stage algorithm is roughly 70% of what would be required using the one-stage algorithm. 7.2. Computational Time Figure 7 illustrates the relationship between sample size (M) and run time (total time required to estimate p using a particular algorithm), for one randomly selected set of parameters. Across both models and algorithms, the relationship is almost perfectly linear. In the absence of IS the intercept is zero (i.e., run time is directly proportional to sample size), whereas the intercepts are non-zero for the IS algorithms. The non-zero intercepts are due to the overhead associated with (i) computing the first-stage IS parameters, which accounts for almost all of the difference between the intercepts of the solid (no IS) and dashed (one-stage IS) intercepts, and (ii) computing the second-stage polynomial approximation to q, which accounts for almost all of the difference between the intercepts of the the dashed (one-stage IS) and dash-dot (two-stage IS) lines. It is also worth noting that a given increase in sample size will have a greater impact on the run times for the IS algorithms than it will on the standard algorithm. This is because we only calculate h(X ) for defaulted exposures (evaluating h() i,L is slow because it requires numerical inversion of the beta distribution function), and the default rate is higher under the IS distribution. Across 100 randomly generated parameter sets, portfolio size (N) is most highly correlated with run time and the relationship is roughly linear. Table 2 reports summary statistics on run times, across algorithms and models. Table 2. This table reports summary statistics—in seconds, and across 100 randomly selected parameter sets—for total run time (first three columns), time required to estimate the first-stage IS parameters (fourth column) and time required to fit the second-stage polynomial approximation to q (final column). Average Run Times No IS One-Stage IS Two-Stage IS m , S q IS IS One Factor 7.3 25.6 33.7 1.5 0.8 Two Factor 7.4 39.0 55.5 14.3 8.9 Risks 2020, 8, 25 29 of 36 Sample Size and Run Time (One-Factor Model) Sample Size and Run Time (Two-Factor Model) 30 50 No IS One-Stage IS No IS Two-Stage IS One-Stage IS Two-Stage IS 15 25 0 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Sample Size (M) Sample Size (M) Figure 7. This figure illustrates the relationship between sample size (M) and run time (total CPU time required to estimate p by a particular algorithm), using a set of parameters randomly selected according to the procedure described in Section 5.3. For each value of M we use a pilot sample that is 10% as large as the sample that is eventually used to estimate p (i.e., we set M = 0.1M). The left panel x p corresponds to the one-factor model and parameter values are (P, r , r , r , r , a, b) = (0.0827, 0.1000, D L I 0.3629,0.0180,1, 0.6676, 0.8751) and N = 2334. The right panel corresponds to the two-factor model and parameter values are (P, r , r , r , r , a, b) = (0.0241, 0.2322, 0.0343, 0.1650, 0.4135, 0.4056, 0.4942) D L I S and N = 3278. 7.3. Overall Performance Recall that the ultimate goal of this paper is to reduce the computational burden associated with estimating p , when p is small. The computational burden associated with a particular algorithm is a x x function of both its statistical accuracy and total run time. We have seen that the proposed algorithms are substantially more accurate, but require considerably more run time. In this section we demonstrate that the benefit of increased accuracy is well worth the cost of additional run time, by considering the amount of time required by a particular algorithm in order to achieve a given degree of accuracy (as measured by relative error). To begin, let t( M) denote the total run time required by a particular algorithm to estimate p using a sample of size M. As illustrated in Figure 7 we have t( M)  c + d M for constants c and d that depend on the underlying model parameters (particularly portfolio size, N) as well as the algorithm being used. In Section 7.1 we saw that the minimum sample size required to ensure that the estimator ’s relative error does not exceed the threshold e isL 2(b1) 2 2 M(e)  a p e , for constants a and b depending on the underlying model (one- or two-factor) and algorithm being used. Thus, if T(e) denotes the total CPU time required to ensure that the estimator ’s relative error does not exceed e, we have: 2(b1) 2 2 T(e)  c + da p e . (48) Table 3 contains sample calculations for several different values of p and e, using the data appearing in the left panel of Figure 7 to estimate c and d and the values of a and b implicitly reported in Table 1. The results reported in the table are representative of those obtained using different parameter sets. It is clear that the proposed algorithms can substantially reduce the computational burden associated with accurate estimation of small probabilities. For instance, if the probability of interest is on the order of 0.1% then either of the proposed algorithms can achieve 5% accuracy within 2–3 s, as compared to 4 min (80 times longer) in the absence of IS. Run Time (Seconds) Run Time (Seconds) Risks 2020, 8, 25 30 of 36 Table 3. This table reports the time (in seconds) required to achieve a given degree of accuracy (computed using Equation (48)) for several values of p and e, for the parameter values corresponding to the left panel of Figure 7. Note that this is for the one-factor model. Values of c and d are obtained from the lines of best fit appearing in the left panel of Figure 7, values of a and b are obtained from Table 1. No IS One-Stage IS (Two-Stage IS) p p x x 1% 0.1% 0.01% 1% 0.1% 0.01% e e 10% 6 60 600 10% 1.2 (2.3) 1.2 (2.3) 1.3 (2.4) 5% 24 240 2400 5% 1.8 (2.8) 1.9 (2.9) 1.9 (2.9) 1% 600 6000 60,000 1% 20.0 (18.8) 21.8 (19.6) 23.8 (20.4) The two-stage estimator is statistically more accurate (Section 7.1) but computationally more expensive (Section 7.2) than the one-stage estimator. It is important to determine whether or not the benefit of increased accuracy outweighs the cost of increased computational time. Table 3 suggests that, in some cases at least, implementing the second stage is indeed worth the effort, in the sense that it can achieve the same degree of accuracy in less time. Figure 8 illustrates the overall efficiency of the proposed algorithms, as a function of the desired degree of accuracy. Specifically, the left panel illustrates the ratio of (i) the total CPU time required to ensure the standard estimator ’s relative error does not exceed a given threshold to (ii) the total time required by the proposed algorithms, for a randomly selected set of parameter values in the one-factor model. The right panel illustrates the same ratio for a randomly selected set of parameters in the two-factor model. Overall Efficiency of Proposed Algorithms (One-Factor Model) Overall Efficiency of Proposed Algorithms (Two-Factor Model) 3500 400 No IS/One-Stage IS No IS/Two-Stage IS No IS/One-Stage IS No IS/Two-Stage IS 0 0 -1 -2 -3 -1 -2 -3 10 10 10 10 10 10 Desired Degree of Accuracy (Relative Error) Desired Degree of Accuracy (Relative Error) Figure 8. This figure illustrates the overall efficiency of the proposed algorithms. Specifically, the solid [dashed] line in the left panel illustrates the ratio of (i) the total run time (in seconds) required to ensure that the standard estimate’s relative error does not exceed a given threshold to (ii) the run time required by the one-stage [two-stage] algorithm, in the one-factor model. The right panel corresponds to the two-factor model. Parameter values are the same as in Figure 7 and Table 3. In the one-factor model, it would take hundreds of times longer to obtain an estimate of p whose relative error is less than 10%, and thousands of times longer to obtain an estimate whose relative error is less than 1%. The figure also suggests that, since it requires less run time to obtain very accurate estimates, the two-stage algorithm is preferable to the one-stage algorithm in the one-factor model. In the two-factor model—where estimating IS parameters and fitting the second-stage polynomial approximation to q is more time consuming—the proposed algorithms are hundreds of times more efficient than the standard algorithm. In addition, it appears that the one-stage algorithm is preferable to the two-stage algorithm in this case. Although the numerical values discussed here are specific to Relative Time Required Relative Time Required Risks 2020, 8, 25 31 of 36 the parameter set used to produce the figure, they are representative of other parameter sets. In other words, the behaviour illustrated in Figure 8 is representative of the general framework overall. 8. Concluding Remarks This paper developed an importance sampling (IS) algorithm for estimating large deviation probabilities for the loss on a portfolio of loans. In contrast to existing literature, we allowed loss given default to be stochastic and correlate with the default rate. The proposed algorithm proceeded in two stages. In the first stage one generates systematic risk factors from an IS distribution that is designed to increase the rate at which adverse macroeconomic scenarios are generated. In the second stage one checks whether or not the simulated macro environment is sufficiently adverse—if it is then no further IS is applied and idiosyncratic risk factors are drawn from their actual (conditional) probability distribution, if it is not then one indirectly applies IS to the conditional distribution of the idiosyncratic risk factors. Numerical evidence indicated that the proposed algorithm could be thousands of times more efficient than algorithms that did not employ any variance reduction techniques, across a wide variety of PD-LGD correlation models that are used in practice. Author Contributions: Both authors contributed equally to all parts of this paper. Both authors have read and agreed to the published version of the manuscript. Funding: This research was funded by NSERC Discovery Grant 371512. Acknowledgments: This work was made possible through the generous financial support of the NSERC Discovery Grant program. The authors would also like to thank Agassi Iu for invaluable research assistance. Conflicts of Interest: The authors declare no conflict of interest. Appendix A. Exponential Tilts and Large Deviations Let X , X , . . . , be independent and identically distributed random variable with common density 1 2 f (x), having bounded support [x , x ], and common mean m = E[X ]. For q 2 R we let min max m(q) = E[exp(q X )] and k(q) = log(m(q)) denote the common moment generating function (mgf) 0 0 and cumulant generating function (cgf) of the X , respectively. Note that m = m (0) = k (0). Appendix A.1. Properties of k(q) Elementary properties of cgfs ensure that k () is a strictly increasing function that maps R onto (x , x ). One implication is that, for fixed t 2 (x , x ), the graph of the function q 7! qt k(q) max max min min is \-shaped. The graph also passes through the origin, and its derivative at zero is t m. If this derivative is positive (i.e., if m < t) then the unique maximum is strictly positive and occurs to the right of the origin. If it is negative (i.e., if m > t) then the unique maximum is strictly positive and occurs to the left of the origin. If it is zero (i.e., if m = t) then the unique maximum of zero is attained at the origin. e e For a given t 2 (x , x ), there is a unique value of q for which k (q) = t. We let q = q(t) denote max min e e e this value of q. Note that q(t) is a strictly increasing function of t and that q(m) = 0. Thus q is positive b b e [negative] whenever t > m [t < m]. An important quantity in what follows is q = q(t) := max(0, q(t)), which can be interpreted as the unique value of q for which k (q) = max(m, t). Note that if t  m then b b q = 0, and if t > m then q(t) > 0. Appendix A.2. Legendre Transform of k(q) We let q() denote the Legendre transform of k() over [0, ¥). That is, ˆ ˆ q(t) := max(qt k(q)) = qt k(q) , (A1) q0 ˆ ˆ where q = q(t) was defined in the previous section, and is the (uniquely defined) point at which the function q 7! qt k(q) attains its maximum on [0, ¥). Based on the discussion in the preceding Risks 2020, 8, 25 32 of 36 ˆ ˆ paragraph, we see that q(t) = q(t) = 0 whenever m  t, whereas both q(t) and q(t) are strictly positive whenever m < t. The derivative of the transform q is demonstrably equal to: 0 0 0 ˆ ˆ ˆ q (t) = q(t) + q (t) [t k (q(t))] . ˆ ˆ Since q = 0 whenever t  m and k (q) = t whenever t > m, the second term above vanishes for all t, and we find that: q (t) = q(t) . (A2) Appendix A.3. Exponential Tilts For q 2 R we define: f (x) := exp(q x k(q)) f (x) . (A3) The density f is called an exponential tilt of f . As the value of the tilt parameter q varies, we obtain an exponential family of densities (exponential families have lots of very useful properties, and this is an easy way of constructing them). If q is positive then the right and left tails of f are heavier and thinner, respectively, than those of f . The opposite is true if q is negative. The larger in magnitude is q, the greater the discrepancy between f and f ; indeed the Kullback–Leibler divergence from f to f is q q qm + k(q), which is a strictly convex function of q that attains its minimum value (of zero) at q = 0. It is readily verified that k (q) = E [X ], where E denotes expectation with respect to f . This q i q q observation, in combination with the developments in Section A.1, implies that it is always possible to find a density of the form (A3) whose mean is t, whatever the t 2 (x , x ). Indeed f is precisely min max ˜ such a density. Under mild conditions, f () can be characterised as that density that most resembles f (in the sense of minimum divergence), among all densities whose mean is x (and are absolutely continuous with respect to f ). Recall that q is the unique value of q for which k (q) = max(t, m). We can therefore interpret f as that density that most resembles f , among all densities whose mean is at least t (and that are absolutely continuous with respect to f ). Note in particular hat the mean of f is max(m, t). The numerical value of q can therefore be interpreted as the degree to which we must deform the density f , in order to produce a density whose mean is at least t. If m  t then q = 0 and no adjustment is necessary. If m < t then q > 0 and mass must be transferred from the left tail to the right; the larger the discrepancy between m (the mean of f ) and t (the desired mean), the larger is q. Appendix A.4. Behaviour of X , Conditioned on a Large Deviation 1 N Let f (x) denote the conditional density of X , given that X > t, where X = X . t i N N i i=1 We suppress the dependence of f on N for simplicity. Using Bayes’ rule we get P(X > tjX = x) N i f (x) =  f (x) , P(X > t) and since the X are independent, we get tx ¯ ¯ P(X > tjX = x) = P(X > t + ) . N N1 N1 Now, using the large deviation approximation P(X  t)  exp(N q(t)), we get that P(X > tjX = x) i tx exp((N 1)q(t + ) + Nq(t)) . N1 P(X > t) N Risks 2020, 8, 25 33 of 36 Now if N is large then tx tx tx q(t + )  q(t) + q (t) = q(t) + q , N1 N1 N1 where we have used the fact that q (t) = q(t). Putting everything together we arrive at the approximation P(X > tjX = x) N i ˆ ˆ exp(q x k(q)) , P(X > t) which leads to the approximation ˆ ˆ f (x)  exp(q x k(q)) f (x) . (A4) We may thus interpret the conditional density f as that density which most resembles the unconditional density f , but whose mean is at least t. Appendix A.5. Approximate Behaviour of (X , X , . . . , X ), Conditioned on a Large Deviation 1 2 N ˆ ˆ Let f (x) = f (x , . . . , x ) denote the conditional density of (X , . . . , X ), given that X > t. Then t t 1 N 1 N N f (x ) i=1 f (x) = , x 2 A , t N,t where p = P(X > t) and A is the set of those points x 2 [x , x ] whose average value t N N,t min max exceeds t. We seek a density h(x), supported on [x , x ], which minimizes the Kullback-Leibler min max divergence (KLD) of h(x) := h(x ) Õ i i=1 ˆ ˆ from f . In other words, we seek an independent sequence Y , Y , . . . , Y (whose density is h) whose t 1 2 N behaviour most resembles (in a certain sense) the behaviour of X , X , . . . , X , conditioned on the 1 2 N large deviation X > t. ˆ ˆ Now let E denote expectation with respect to the density g. Then the divergence of h from f is g t ˆ ˆ E [log( f (X)/h(X))] = E [log ( f (X )/h(X ))] log( p ) ˆ t ˆ t å i i f f t t i=1 = N E [log ( f (X )/h(X ))] log( p ) ˆ 1 1 t = N E [log ( f (X )/h(X ))] log( p ) f 1 1 = N E [log ( f (X )/ f (X ))] + N E [log ( f (X )/h(X ))] log( p ) f 1 t 1 f t 1 1 t t t Now, the middle term in the above display is the KLD of h from f . As such it is non-negative, and is ˆ ˆ equal to zero if and only if h = f . It follows immediately that the divergence of h from f is minimised t t by setting h = f . Appendix B. Important Exponential Families This appendix considers two important special cases—the Gaussian and t families—of the general setting discussed in Section 2.2. Risks 2020, 8, 25 34 of 36 Appendix B.1. Gaussian Suppose first that the Z is Gaussian with mean vector m 2 R and positive definite covariance matrix S . When specifying the IS distribution, one can either (i) shift the mean of Z but leaves its covariance structure unchanged or (ii) shift its mean and adjust its covariance structure. In general the latter approach will lead to a better approximation of the ideal IS density but more volatile IS weights. If we take the former approach (shifting mean, leaving covariance structure unchanged), the implicit family in which we are embedding f is the Gaussian family with arbitrary mean vector m 2 R and fixed covariance matrix S . To this end, let f (z) = f(z; m , S ) denote the Gaussian density 0 0 0 with mean vector m and covariance matrix S and let f (z) = f(z; m, S ). It remains to identify the 0 0 l 0 natural sufficient statistic and write the natural parameter l in terms of the mean vector m. To this end, note that f (z) 1 1 T T 1 T 1 T 1 = exp (m m )S z m S m + m S m . 0 0 f (z) 2 2 The natural sufficient statistic is therefore S(z) = (z , . . . , z ) , the natural parameter is l(m) = S (m m ) . Note that we can write m(l) = m + S l, so that the natural parameter represents a sort of normalized 0 0 deviation from the actual mean m to the IS mean m. Lastly, we see that the cgf of S(Z) is h i 1 1 T 1 T 1 T T T K(l) = m S m m S m = l m + l S l , l 0 0 l 0 0 0 0 2 2 where we have written m instead of m(l) in the above display. Clearly, we have that both K(l) and K(l) are well-defined for all l 2 R . The implication is that if we shift the mean of Z but leave its covariance structure unchanged, the IS weight will have finite variance regardless of what IS mean we choose. If we take the former approach (shifting mean, adjusting covariance) the implicit family in which we are embedding f is the Gaussian family with arbitrary mean vector m and arbitrary positive definite covariance matrix S. In this case we have f (z) = f(z; m, S) and the ratio of f (z) to f (z) is l l T 1 T 1 T 1 1 exp((m S m S )z + z (S S )z K(m, S)) , 0 0 0 where h i T 1 T 1 K(m, S) = m S m m S m + log(det(S) log(det(S ) . 0 0 The natural sufficient statistic therefore consists of the d elements of the vector z plus the d elements of the vector zz . The natural parameter l consists of the elements of the vector 1 1 l := l (m, S) = S m S m 1 1 0 plus the elements of the matrix 1 1 l := l (S) = (S S ) . 2 2 Note that since we have assumed S is positive definite, we are implicitly assuming that the matrix l is such that the determinant of S l is strictly positive. The natural parameter space is therefore 2 0 unrestricted for l , but restricted (to matrices such that the indicated determinant is strictly positive) for l . 2 Risks 2020, 8, 25 35 of 36 The above relations can be inverted to write m and S in terms of l and l , indeed 1 2 1 1 S = S(l ) = (S 2l ) 2 2 and 1 1 1 m = m(l , l ) = (S 2l ) (l + S m ) . 1 2 2 1 0 0 0 The cgf of the natural sufficient statistic is 1 1 1 1 T 1 T 1 K(l) = K(l , l ) = K(m , S ) = m S m m S m + log(det(S )) log(det(S )) 1 2 l ,l l l ,l 0 l 0 1 2 2 l ,l 1 2 0 0 2 1 2 l 2 2 2 2 It is now clear that K(l) is well defined if and only if the determinant of S(l ) is strictly positive, which we have implicitly assumed to be the case since we have insisted S be positive definite. It is also clear that K(l) is well-defined if and only if the determinants S(l ) is strictly positive, which will occur if and only if the determinant of (2S S ) is strictly positive. Remark A1. Suppose that f and f are Gaussian densities with respective positive definite covariance matrices S and S. Further suppose that Z  f . Then the variance of f (Z)/ f (Z) is finite if and only if det(2S 0 l l S ) > 0. In the one-dimensional case d = 1 we write Z = Z. The condition in Remark A1 is satisfied 2 2 whenever s > s /2. In other words, if the variance of the IS distribution is too small, relative to actual variance of Z, then the IS weight will have infinite variance. Appendix B.2. Chi-Square Family In preparation for the multivariate t family, we first consider the chi-square family. Suppose that Z follows a chi-square distribution with n degrees of freedom, and that the goal is to allow Z to have arbitrary degrees of freedom n > 0 under the IS density. In order to identify the natural sufficient statistic S(z) and natural parameter l = l(n), we let f (z) denote the chi-square density with n degrees of freedom and f (z) the chi-square density with n degrees of freedom. Then 0 l f (z) n n n n n n l 0 0 0 = exp log(z) log(2) + log G log G f (z) 2 2 2 2 from which we see that S(z) = log(z) and l = l(n) = (n n )/2. In addition we see that the cgf of S(z) is K(l) = l log(2) + log (G (l + n /2)) log(G(n /2)) . 0 0 In order that K(l) be will defined, we require n > 0, which is obvious. In order that K(l) is well-defined we require l + be positive, which in turn requires n < 2n . In other words, if the IS degrees of freedom are more than twice the actual degrees of freedom, then the IS weight will have infinite variance. Appendix B.3. t Family The t family is not a regular exponential family, so it does not fit directly into the framework discussed in Section 2.2. That being said, a multivariate t vector can be constructed from a Gaussian vector and an independent chi-square variable. Indeed if Z is Gaussian with mean zero and covariance matrix S , and R is chi-square with n degrees of freedom (independent of Z), then 0 0 Z = m +  Z , (A5) is multivariate t with n degrees of freedom, mean m and covariance matrix S . 0 0 0 n 2 0 Risks 2020, 8, 25 36 of 36 In the case that Z is multivariate t, then, we can take our systematic risk factors to be the components of (Z, R). In this case the joint density of the systematic risk factors can be embedded into the parametric family T T f (z ˆ , r) := exp(l S(z ˆ) K(l)) exp(h T(r) L(h)) f (z ˆ) g(r) , (A6) l,h where l is and S are the natural parameter and sufficient statistic for the Gaussian family, h and L are those for the chi-square family, and f and g are the Gaussian and chi-square densities. References Bickel, Peter J., and Kjell A. Doksum. 2001. Mathematical Statistics: Basic Ideas and Selected Topics, 2nd ed. Upper Saddle River: Prentice Hall, Volume 1. Chan, Joshua C.C., and Dirk P. Kroese. 2010. Efficient estimation of large portfolio loss probabilities in t-copula models. European Journal of Operational Research 205: 361–67. Chatterjee, Sourav, and Persi Diaconis. 2018. The sample size required in importance sampling. Annals of Applied Probability 28: 1099–135. [CrossRef] de Wit, Tim. 2016. Collateral Damage—Creating a Credit Loss Model Incorporating a Dependency between Defaults and LGDs. Master ’s thesis, University of Twente, Enschede, The Netherlands. Deng, Shaojie, Kay Giesecke, and Tze Leung Lai. 2012. Sequential importance sampling and resampling for dynamic portfolio credit risk. Operations Research 60: 78–91. [CrossRef] Eckert, Johanna, Kevin Jakob, and Matthias Fischer. 2016. A credit portfolio framework under dependent risk parameters PD, LGD and EAD. Journal of Credit Risk 12: 97–119. [CrossRef] Frye, Jon. 2000. Collateral damage. Risk 13: 91-94. Frye, Jon, and Michael Jacobs Jr. 2012. Credit loss and systematic loss given default. Journal of Credit Risk 8: 109–140. [CrossRef] Glasserman, Paul, and Jingyi Li. 2005. Importance sampling for portfolio credit risk. Management Science 51: 1643–56. [CrossRef] Ionides, Edward L. 2008. Truncated importance sampling. Journal of Computational and Graphical Statistics 17: 295–311. [CrossRef] Jeon, Jong-June, Sunggon Kim, and Yonghee Lee. 2017. Portfolio credit risk model with extremal dependence of defaults and random recovery. Journal of Credit Risk 13: 1–31. [CrossRef] Kupiec, Paul H. 2008. A generalized single common factor model of portfolio credit risk. Journal of Derivatives 15: 25–40. [CrossRef] Miu, Peter, and Bogie Ozdemir. 2006. Basel requirements of downturn loss given default: Modeling and estimating probability of default and loss given default correlations. Journal of Credit Risk 2: 43–68. [CrossRef] Pykhtin, Michael. 2003. Unexpected recovery risk. Risk 16: 74–78. Scott, Alexandre, and Adam Metzler. 2015. A general importance sampling algorithm for estimating portfolio loss probabilities in linear factor models. Insurance: Mathematics and Economics 64: 279–93. Sen, Rahul. 2008. A multi-state Vasicek model for correlated default rate and loss severity. Risk 21: 94–100. Witzany, Jirí. ˇ 2011. A Two-Factor Model for PD and LGD Correlation. Working Paper. Available online: http://dx.doi.org/10.2139/ssrn.1476305 (accessed on 9 March 2020). c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Risks Multidisciplinary Digital Publishing Institute

Importance Sampling in the Presence of PD-LGD Correlation

Risks , Volume 8 (1) – Mar 10, 2020

Loading next page...
 
/lp/multidisciplinary-digital-publishing-institute/importance-sampling-in-the-presence-of-pd-lgd-correlation-ottgXrJKOc

References (19)

Publisher
Multidisciplinary Digital Publishing Institute
Copyright
© 1996-2020 MDPI (Basel, Switzerland) unless otherwise stated Disclaimer The statements, opinions and data contained in the journals are solely those of the individual authors and contributors and not of the publisher and the editor(s). Terms and Conditions Privacy Policy
ISSN
2227-9091
DOI
10.3390/risks8010025
Publisher site
See Article on Publisher Site

Abstract

risks Article Importance Sampling in the Presence of PD-LGD Correlation 1, 2 Adam Metzler * and Alexandre Scott Department of Mathematics, Wilfrid Laurier University, Waterloo, ON N2L 3C5, Canada Department of Applied Mathematics, University of Western Ontario, London, ON N6A 3K7, Canada; alexandre.scott202@gmail.com * Correspondence: ametzler@wlu.ca Received:20 January 2020; Accepted: 5 March 2020; Published: 10 March 2020 Abstract: This paper seeks to identify computationally efficient importance sampling (IS) algorithms for estimating large deviation probabilities for the loss on a portfolio of loans. Related literature typically assumes that realised losses on defaulted loans can be predicted with certainty, i.e., that loss given default (LGD) is non-random. In practice, however, LGD is impossible to predict and tends to be positively correlated with the default rate and the latter phenomenon is typically referred to as PD-LGD correlation (here PD refers to probability of default, which is often used synonymously with default rate). There is a large literature on modelling stochastic LGD and PD-LGD correlation, but there is a dearth of literature on using importance sampling to estimate large deviation probabilities in those models. Numerical evidence indicates that the proposed algorithms are extremely effective at reducing the computational burden associated with obtaining accurate estimates of large deviation probabilities across a wide variety of PD-LGD correlation models that have been proposed in the literature. Keywords: importance sampling; acceptance-rejection sampling; portfolio credit risk; tail probabilities; large deviation probabilities; stochastic recovery; PD-LGD correlation; credit risk; loss probabilities 1. Introduction This paper seeks to identify computationally efficient importance sampling (IS) algorithms for estimating large deviation probabilities for the loss on a portfolio of loans. Related literature assumes that realised losses on defaulted loans can be predicted with certainty, i.e., that loss given default (LGD) is non-random. In practice, however, LGD is impossible to predict and tends to be positively correlated with the default rate and the latter phenomenon is typically referred to as PD-LGD correlation (here PD refers to probability of default, which is often used synonymously with default rate). There is a large literature on modelling stochastic LGD and PD-LGD correlation, but there is a paucity of literature on using importance sampling to estimate large deviation probabilities in those models. This gap in the literature was brought to our attention by a risk management professional at a large Canadian financial institution, and filling that gap is the ultimate goal of this paper. Problem Formulation and Related Literature Consider a portfolio of N exposures of equal size. Let L , L , . . . , L denote the losses on the 1 2 N individual loans, expressed as a percentage of notional value. The percentage loss on the entire portfolio is: L := L . (1) N å i i=1 Risks 2020, 8, 25; doi:10.3390/risks8010025 www.mdpi.com/journal/risks Risks 2020, 8, 25 2 of 36 We are interested in using IS to estimate large deviation probabilities of the form: p := P(L  x) , (2) x N where x >> E[L ] = E[L ] is some large, user-defined, threshold. i N In practice the number of exposures is large (e.g., in the thousands) and prudent risk management requires one to assume that the individual losses are correlated. In practice, then, L is the average of a large number of correlated variables. As such, its probability distribution is highly intractable and Monte Carlo is the method of choice for approximating p . As the probability of interest is typically 3 4 small (e.g., on the order of 10 or 10 ), the computational burden required to obtain an accurate estimate of p using Monte Carlo can be prohibitive. For instance if p is on the order of 10 and x x N is on the order of 1000 then, in the absence of any variance reduction techniques, the sample size required to reduce the estimator ’s relative error to 10% is on the order of one hundred thousand. Since each realisation of L requires simulation of one thousand individual losses, a sample size of 100, 000 requires one to generate one hundred million variables. If the desired degree of accuracy is reduced to 1%, the number of variables that must be generated increases to a staggering 10 billion. Importance sampling (IS) is a variance reduction technique that has the potential to significantly reduce the computational burden associated with obtaining accurate estimates of large deviation probabilities. In the present context, effective IS algorithms have been identified for a variety of popular risk management models, but most are limited to the special case that loss given default (LGD) is non-random. The seminal paper in the area is (Glasserman and Li 2005), other papers include (Chan and Kroese 2010) and (Scott and Metzler 2015). It is well documented empirically, however, that portfolio-level LGD is not only stochastic, but positively correlated with the portfolio-level default rate as seen, for instance, in any of the studies listed in (Kupiec 2008) or (Frye and Jacobs 2012). This phenomenon is typically referred to as PD-LGD correlation. (Miu and Ozdemir 2006) show that ignoring PD-LGD correlation when it is in fact present can lead to material underestimates of portfolio risk measures. There is a large literature on modelling PD-LGD correlation (Frye 2000); (Pykhtin 2003); (Miu and Ozdemir 2006); (Kupiec 2008); (Sen 2008); (Witzany 2011); (de Wit 2016); (Eckert et al. 2016); and others listed in (Frye and Jacobs 2012), but there is a much smaller literature on using IS to estimate large deviation probabilities in such models. To the best of our knowledge only (Deng et al. 2012) and (Jeon et al. 2017) have developed algorithms that allow for PD-LGD correlation (the former paper considers a dynamic intensity-based framework, the latter considers a static model with asymmetric and heavy-tailed risk factors). The present paper contributes to this nascent literature by developing algorithms that can be applied in a wide variety of PD-LGD correlation models that have been proposed in the literature, and are popular in practice. The paper is structured as follows. Section 2 outlines important assumptions, notation, and terminology. Section 3 theoretically motivates the proposed algorithm in a general setting, and Section 4 discusses a few practical issues that arise when implementing the algorithm. Section 5 describes a general framework for PD-LGD correlation modelling that includes, as special cases, many of the models that have been developed in the literature and Section 6 describes how to implement the proposed algorithm in this general framework. Numerical results are presented and discussed in Section 7 and demonstrate that the proposed algorithms are extremely effective at reducing the computational burden required to obtain an accurate estimate of p . Relative error is the preferred measure of accuracy for large deviation probabilities. If pb is an estimator of p , its relative x x error is defined as SD( pb )/ p , where SD denotes standard deviation. x x Risks 2020, 8, 25 3 of 36 2. Assumptions, Notation and Terminology We assume that individual losses are of the form L = L(Z, Y ), where L is some deterministic i i function, Z = (Z , . . . , Z ) is a d-dimensional vector of systematic risk factors that affect all exposures, and Y is a vector of idiosyncratic risk factors that only affect exposure i. We assume that Z, Y , Y , . . . i 1 2 are independent, and that the Y are identically distributed. The primary role of the systematic risk factors is to induce correlation among the individual exposures, and it is common to interpret the realised values of the systematic risk factors as determining the overall macroeconomic environment. It is worth noting that the we do not require the components of Z to be independent of one another, etc. for the components of Y . 2.1. Large Portfolios and the Region of Interest In a large portfolio, the influence of the idiosyncratic risk factors is negligible. Indeed, since individual losses are conditionally independent, given the realised values of the systematic risk factors, we have the almost sure limit: lim L = m(Z) , (3) N!¥ where m(z) := E[L jZ = z] = E[L jZ = z] . (4) i N Since m(Z)  L for large N by Equation (3), the random variable m(Z) is often called the large portfolio approximation (LPA) to L . The LPA is often used to formalise the intuitive notion that, in a large portfolio, all risk is systematic (i.e., idiosyncratic is “diversified away”). We define the region of interest as the set: fz 2 R : m(z)  xg . (5) The region of interest is “responsible” for large deviations in the sense that: lim P(m(Z)  xjL  x) = 1 (6) N!¥ for most values of x. Together, Equations (3) and (6) suggest that for large portfolios, it is relatively more important to identify an effective IS distribution for the systematic risk factors, as compared to the idiosyncratic risk factors. 2.2. Systematic Risk Factors We assume that Z is continuous and let f (z) denote its joint density. We assume that f is a member of an exponential family (see Bickel and Doksum 2001 for definitions and important properties) with d p natural sufficient statistic S : R 7! R . Any other member of the family can be put in the form: f (z) := exp(l S(z) K(l)) f (z) , (7) where K() is the cumulant generating function (cgf) of S(Z) and l 2 R is such that K(l) is well-defined. The parameter l is called the natural parameter of the family in Equation (7). Appendix B embeds the Gaussian and multivariate t families into this general framework. In light of the almost sure limit in Equation (3), we have that L converges to m(Z) in distribution, which implies that Equation (6) is valid for all values of x such that P(m(Z) = x) = 0. If m(Z) is a continuous random variable (which it is in most cases of practical interest) then Equation (6) is satisfied for every value of x. Risks 2020, 8, 25 4 of 36 We will eventually be using densities of the form in Equation (7) as IS densities for the systematic risk factors. The associated IS weight is: f (Z) = exp(l S(Z) + K(l)) , (8) f (Z) and it will be important to know when the variance of the IS weight is finite. The following observation is readily verified. Remark 1. If Z  f , then Equation (8) has finite variance if and only if both K(l) and K(l) are well defined. A standard result in the theory of exponential families is that: rK(l) = E [S(Z)] , (9) where r denotes gradient and E denotes expectation with respect to the density f . l l 2.3. Individual Losses We assume that L takes values in the unit interval. In general L will have a point mass at zero i i (if it did not, the loan would not be prudent) and the conditional distribution of L , given that L > 0, i i is called the (account-level) LGD distribution. We allow the LGD distribution to be arbitrary in the sense that it could be either discrete or continuous, or a mixture of both. This contrasts with the case of non-random LGD, where the LGD distribution is degenerate at a single point. We let ` 2 (0, 1] max denote the supremum of the support of L . Individual losses will therefore never exceed ` but could max take on values arbitrarily close (and possibly equal) to ` . max Remark 2. Despite the fact that L is not a continuous variable, in what follows we will proceed as if it was and make repeated reference to its “density." This is done without loss of generality, and in an interest of simplifying the presentation and discussion. Nothing in the sequel requires L to be a continuous variable, and everything carries over to the case where it is either discrete or continuous, or has both a discrete and continuous component. For z 2 R we let g(`jz) denote the conditional density of L , given that Z = z. We assume that the support of g(jz) is identical to the unconditional support, in particular it does not depend on the value of z. Note that m(z) is the mean of g(jz). In practice (i.e., for all of the PD-LGD correlation models listed in the introduction) g(jz) is not a member of an established parametric family, and direct simulation from g(jz) using a standard technique such as inverse transform or rejection sampling is not straightforward. Simulation from g(jz) is most easily accomplished by simulating the idiosyncratic risk factors, Y , from their density, say h(y), and then setting L = L(z, Y ). In other words, in order to simulate from g(jz) we make use i i of the fact that L = L(z, Y ) is a drawing from g(jz) whenever Y is a drawing from h(). i i i For q 2 R and z 2 R we let: k(q, z) := log(E[exp(q L )jZ = z]) and ¶k k (q, z) := (q, z) . ¶q Then k(, z) is the conditional cgf of L , given that Z = z, and k (, z) is its first derivative. In practice, neither k(, z) nor k (, z) is available in closed form. In the examples we consider later in the paper each can be expressed as a one-dimensional integral, but the numerical values of those integrals must Risks 2020, 8, 25 5 of 36 be approximated using quadrature. This contrasts with the case of non-random LGD, where the conditional cgf can be computed in closed form . d 0 For x 2 (0, ` ) and z 2 R we let q(x, z) denote the unique solution to the equation k (q, z) = max ˆ ˆ ˆ max(x, m(z)). We often suppress dependence on x and z, and simply write q instead of q(x, z). That q is well-defined follows immediately from the developments in Appendix A.1. Based on the discussion there we find that q is zero whenever z lies in the region of interest, and is strictly positive otherwise. Remark 3. In practice, the value of q cannot be computed in closed form and must be approximated using a numerical root-finding algorithm. Since each evaluation of the function k (, z) requires quadrature, computing ˆ ˆ q is straightforward but relatively time consuming. This contrasts with the case of non-random LGD, where q can be computed in closed form at essentially no cost. For z 2 R we let q(, z) denote the Legendre transform of k(, z) over [0, ¥). That is, ˆ ˆ q(x, z) := max(q x k(q, z)) = q x k(q, z) . (10) q0 That q is the uniquely defined point at which the function q 7! q x k(q, z) attains its maximum on [0, ¥) follows from the developments in Appendix A.2. Based on the discussion there, we find that both q and q are equal to zero whenever z lies in the region of interest, and that both are strictly positive otherwise. 2.4. Conditional Tail Probabilities Given the realised values of the systematic risk factors, individual losses are independent. Large deviations theory can therefore provide useful insights into the large-N behaviour of the tail probability P(L  xjZ = z). For instance, Chernoff’s bound yields the estimate: P(L > xjZ = z)  exp(Nq(x, z)) , (11) and Cramér ’s (large deviation) theorem yields the limit: log(P(L > xjZ = z)) lim = q(x, z) . (12) N!¥ N Together these results are often used to justify the approximation: P(L > xjZ = z)  exp(Nq(x, z)) , (13) which will be used repeatedly throughout the paper. The approximation in Equation (13) is often called the large deviation approximation (LDA) to the tail probability P(L > xjZ = z). Note that since q(x, z) = 0 whenever m(z)  x, the LDA suggests that P(L > xjZ = z)  1 whenever z lies in the region of interest. 2.5. Conditional Densities N d Let L = (L , . . . , L ), noting that L takes values in [0, ` ] . For z 2 R and ` = (` , . . . , ` ) 2 max 1 N 1 N [0, ` ] , we let h (z, `) denote the conditional density of (Z, L), given that L > x. Then h is max x N x given by: f (z) g(` jz) i=1 h (z, `) =  1 , (14) f`2 A g N,x 3 (1R)q In the case of non-random LGD we have k(q, z) = log(1 + (e 1) P(L > 0jZ = z)), where R is the known recovery rate on the exposure. Risks 2020, 8, 25 6 of 36 N 1 where A is the set of points ` 2 [0, ` ] for which N ` > x. N,x max i i=1 We let f (z) denote the conditional density of the systematic risk factors, given that L > x, noting that: P(L > xjZ = z) f (z) =  f (z) . (15) P(L  x) In the examples we consider the mean of f tends to lie inside, but close to the boundary of, the region of interest. And relative to the unconditional density f , the conditional density f tends to be much more concentrated about its mean. Finally, we let g (`jz) denote the conditional density of an individual loss, given that Z = z and L > x, noting that: x` P(L > x + jZ = z) N1 N1 g (`jz) =  g(`jz) . (16) P(L > xjZ = z) If the realised value of z lies inside the region of interest, the conditional density g (jz) tends to resemble the unconditional density g(jz). Intuitively, for such values of z the LDA informs that the event fL > xg is very likely, and conditioning on its occurrence is not overly informative. If the realised value of z does not lie in the region of interest then g (jz) tends to resemble the exponentially tilted version of g(jz) whose mean is exactly x. See Appendix A.3 for more details. Neither h , f , nor g are numerically tractable, but as we will soon see they do serve as useful x x x benchmarks against which to compare candidate IS densities. In addition, it is worth noting here that the representations of Equations (15) and (16) lend themselves to numerical approximation via the LDA in Equation (13). 3. Proposed Algorithm In practice, the most common approach to estimating p via Monte Carlo simulation in this framework is summarised in Algorithm 1 below. Algorithm 1 Standard Monte Carlo Algorithm for Estimating p 1: Simulate M i.i.d. copies of the systematic risk factors. Think of these as different economic scenarios and denote the simulated values by z , . . . , z . 1 M 2: For each scenario m: (a) Simulate the idiosyncratic risk factors for each exposure. Denote the simulated values y , . . . , y . 1,m N,m 1 N (b) Set ` = L(z , y ) for each exposures i, and ` = ` . i,m m i,m m i,m i=1 1 M 3: Return pb = 1 . å ¯ M m=1 f` >xg Algorithm 1 consists of two stages. In the first stage one simulates the systematic risk factors, and in the second stage one simulates the idiosyncratic risk factors for each exposure. Mathematically, the first stage induces independence among the individual exposures, so that the second stage amounts to simulating a large number of i.i.d. variables. Intuitively, it is useful to think of the first stage as determining the prevailing macroeconomic environment, which fixes economy-wide quantities such as default and loss-given-default rates. The second stage of the algorithm overlays idiosyncratic noise on top of economy-wide rates, to arrive at the default and loss-given-default rates for a particular portfolio. Relative error is the preferred measure of accuracy for estimators of rare event probabilities. The relative error of the estimator pb in Algorithm 1 is: 1 1 p p , M x Risks 2020, 8, 25 7 of 36 and the sample size required to ensure the relative error does not exceed some predetermined threshold e is: 1 1 p M(e) = . (17) e p The number of variables that must be generated in order to achieve the desired degree of accuracy e is therefore (N + d) M(e), which grows without bound as p ! 0. For instance if p = 10 , x x 3 2 N = 10 , d = 2, and e = 5 10 then the number of variables that must be generated is approximately four hundred million, which is an enormous computational burden for a modest degree of accuracy. In the next section we discuss general principles for selecting an IS algorithm that can reduce the computational burden required to obtain an accurate estimate of p . 3.1. General Principles For practical reasons, we insist that our IS procedure retains conditional independence of individual losses, given the realised value of the systematic risk factors. This is important because it allows us to reduce the problem of simulating a large number of dependent variables to the (much) more computationally efficient problem of simulating a large number of independent variables. In the first stage we simulate the systematic risk factors from the IS density f (z). The IS weight I S associated with this first stage is therefore: f (z) L (z) := . f (z) I S In the second stage we simulate the individual losses as i.i.d. drawings from the density g (`jz). The I S IS weight associated with this second stage is: g(` jz) L (z, `) = , 2 Õ g (` jz) I S i i=1 and the IS density from which we sample (Z, L) is therefore of the form: h (z, `) = f (z) g (` jz) . (18) I S I S Õ I S i i=1 The so-described algorithm, with as-yet unspecified IS densities, is summarised in Algorithm 2. Algorithm 2 IS Algorithm for Estimating p 1: Simulate M i.i.d. copies of the systematic risk factors from the density f (z). Think of these as I S different economic scenarios and denote the simulated values by z , . . . , z . 1 M 2: For each scenario m: (a) Independently simulate ` , ` , . . . ,` from the density g (jz ). 2,m N,m m 1,m I S 1 N (b) Set ` = ` . m å i,m N i=1 3: Return pb = L (z ) L (z , ` ) 1 , where ` = (` , . . . ,` ). å ¯ x 1 m 2 m m m 1,m N,m m=1 f` >xg It is important to note that in the second stage, we will not be simulating individual losses directly from the (conditional) IS density g . Rather, we will simulate the idiosyncratic risk factors Y in such I S i a way as to ensure that for a given value of z, the variable L = L(z, Y ) has the desired density g . i i I S Risks 2020, 8, 25 8 of 36 Focusing on the “indirect" IS density of L , as opposed to “direct" IS density of Y , allows us to identify i i a much more effective second stage algorithm . The estimator pb produced by Algorithm 2 is demonstrably unbiased and its variance is: 2 2 2 E [(L(Z, L) 1 p ) ] = p  E [(L (Z, L) 1 1) ] , (19) ¯ ¯ I S x I S x fL >xg x fL >xg N N where E denotes expectation under the IS distribution, L(z, `) := L (z) L (z, `) and I S 1 2 L(z, `) L (z, `) := . Note that L is the ratio of (i) the IS density in Equation (18) to (ii) the conditional density in Equation (14). The estimator ’s squared relative error can then be decomposed as: E [(L (Z, L) 1)  1 ] + [1 P (L > x)] , (20) x ¯ I S I S N fL >xg where P denotes probability under the IS distribution. I S Inspecting Equation (20) we see that an effective IS density should (i) assign a high probability to the event of interest and (ii) should resemble the conditional density in Equation (14) as closely as possible, in the sense that the ratio L should deviate as little as possible from unity. Clearly, an estimator that satisfies (ii) should also satisfy (i), since h assigns probability one to the event that L > x. The task now is to identify a density of the form in Equation (18) that resembles the ideal density in Equation (14), in some sense. 3.2. Identifying the Ideal IS Densities Our measure of similarity is Kullback–Leibler divergence (KLD), or divergence for short. See Chatterjee and Diaconis (2018) for a general discussion of the merits of minimum divergence as a criteria for identifying effective IS distributions. We begin by writing: h (z, `) f (z) g ˜ (`jz) x x x =  , (21) h (z, `) f (z) g ˜ (`jz) I S I S I S where for fixed z, g(` jz) i=1 g ˜ (`jz) =  1 f`2 A g N,x P(L > xjZ = z) is the joint density of N independent variables having marginal density g(jz), conditioned on their average value exceeding the threshold x, and g ˜ (`jz) = g (` jz) . I S Õ I S i i=1 is the joint density of N independent variables having marginal density g (jz). I S Using Equation (21) it is straightforward to decompose the divergence of h from h as: I S x D(h jjh ) = D( f jj f ) + E [ D(g ˜ (jZ)jjg ˜ (jZ))j L > x] , (22) x x x I S I S I S N where D(xjjh) denotes the divergence of the density x from the density h. The first term in Equation (22) is the divergence of f from f , and is therefore minimised by setting f = f . In other words, the best I S x I S x In the earliest stages of this project we focused directly on an IS density for Y and had difficulties identifying effective candidates. Risks 2020, 8, 25 9 of 36 possible IS density for the systematic risk factors (according to the criteria of minimum divergence) is the conditional density f . The second term in Equation (22) is the average divergence of g ˜ (jz) I S from g ˜ (jz), averaged over all possible realisations of the systematic risk factors and conditioned on portfolio loss exceeding the threshold. Based on the developments in Appendix A.5, for fixed z 2 R the divergence of g ˜ (jz) from g ˜ (jz) is minimised by setting g (jz) = g (jz). The average I S x I S x divergence in Equation (22) is, therefore, also minimised by setting g (jz) = g (jz) for every z 2 R . I S Remark 4. Among all densities of the form in Equation (18), the one that most resembles the ideal density h (in the sense of minimum divergence) is the density: d N h (z, `) := f (z) g (` jz) , z 2 R , ` 2 [0, ` ] . x x x max Õ i i=1 In other words, h is the best possible IS density (among the class Equation (18) and according to the criteria of minimum divergence) from which to simulate (Z, L). It is worth noting that the IS density h “gets marginal behaviour correct”, in the sense that the marginal distribution of the systematic risk factors, as well as the marginal distribution of an individual loss, is the same under h as it is under the ideal density h . The dependence structure of individual x x losses is different under h and h —this is the price that we must pay for insisting on conditional x x independence (i.e., computational efficiency). 3.3. Approximating the Ideal IS Densities Simulating directly from h requires an ability to simulate directly from f and g . Unfortunately, x x x neither f nor g is numerically tractable (witness the unknown quantities in Equations (15) and (16)), x x and it does not appear that either is amenable to direct simulation. Our next task is to identify tractable densities that resemble f and g . x x 3.3.1. Systematic Risk Factors As a tractable approximation to f , we suggest using that member of the parametric family in Equation (7) that most resembles f in the sense of minimum divergence. Using Equations (7) and (15) we get that: f (z) log = l S(z) + K(l) + log (P(L > xjZ = z)) log( p ) , f (z) whence the divergence of f from f is: l x ¯ ¯ ¯ D( f jj f ) = l E[S(Z)jL > x] + K(l) + E[log (P(L > xjZ = z))jL > x] log( p ) . (23) x x l N N N As a cgf, K() is strictly convex. As such, Equation (23) attains its unique minimum at that value of l such that: rK(l) = E[S(Z)jL > x] , (24) which, in light of Equation (9), is equivalent to: E [S(Z)] = E[S(Z)jL > x] . (25) Intuitively, we suggest using that value of the IS parameter l for which the mean of S(Z) under the IS density matches the conditional mean of S(Z), given that portfolio losses exceed the threshold. In what follows we let l denote that suggested value of the IS parameter l, i.e., that value of l that solves Equation (24). Risks 2020, 8, 25 10 of 36 Remark 5. The first-stage IS weight associated with the so-described density is: ˆ ˆ L (Z) = exp(l S(Z) + K(l )) . (26) 1 x It is entirely possible—and quite common in the examples we consider in this paper—that K(l ) is not well-defined, in which case Equation (26) has infinite variance under f (recall Remark 1). At first glance it might seem absurd to consider IS densities whose associated weights have infinite variance, but as we discuss in Section 4.2 it is straightforward to circumvent this issue by trimming large first-stage IS weights . It remains to develop a tractable approximation to the right hand side of Equation (24), so that we can approximate the value of l . To this end we write the natural sufficient statistic as S(z) = (S (z), . . . , S (z)) and note that: E[S (Z) 1 ] ¯ E[S (Z) P(L > xjZ)] i fL >xg i N E[S (Z)jL > x] = = . i N ¯ ¯ P(L > x) E[P(L > xjZ)] N N Next, we use the LDA in Equation (13) to get: E[S (Z) exp(Nq(x, Z))] E[S (Z)jL > x]  . (27) i N E[exp(Nq(x, Z))] As it only involves the systematic risk factors (and not the large number of idiosyncratic risk factors), the expectation on the right hand side of Equation (27) is amenable to either quadrature or Monte Carlo simulation. 3.3.2. Individual Losses We encourage the reader unfamiliar with exponential tilts to consult Appendix A.3, before reading the remainder of this section. Our approximation to g (`jz) is obtained by using the LDA of Equation (13) to approximate both conditional probabilities appearing in Equation (16) (see Appendix A.4 for details). The resulting approximation is: ˆ ˆ g ˆ (`jz) := exp(q` k(q, z)) g(`jz) , (28) where we recall that q is defined and discussed in Section 2.3. If the realised values of the systematic risk factors obtained in the first stage lie in the region of interest then q = 0 and g ˆ is identical to g. Otherwise, q is strictly positive and g ˆ is the exponentially tilted version of g whose mean is x. Intuitively, we can interpret g ˆ as that density that most resembles (in the sense of minimum divergence) g , among all densities whose mean is at least x, and the numerical value of q as the degree to which the density g(jz) must be deformed, in order to produce a density whose mean is at least x. Remark 6. The mean of Equation (28) is max(m(z), x). The implication is that the event of interest is not a rare event under the proposed IS algorithm. Indeed, E [L ] = E [E [L jZ]] = E [E [L jZ]] = E [max(x, m(Z))]  x , I S i I S I S i f g ˆ i f ˆ ˆ l l which implies that lim P (L > x) = 1. N!¥ I S N An alternative to trimming is truncation of large weights; see Ionides (2008) for a general and rigorous treatment of truncated IS. Risks 2020, 8, 25 11 of 36 The second-stage IS weight associated with Equation (28) is: ˆ ˆ ˆ ¯ ˆ L (Z, L) = exp(q L + k(q, z)) = exp(N[q L k(q, Z)]) . 2 i N i=1 ¯ ¯ Since the second stage weight depends only on Z and L , we will often write L (Z, L ) instead of N 2 N L (Z, L). In order to assess the stability of the second-stage IS weight, we note that: ˆ ¯ ˆ ˆ ¯ exp(N[q L k(q, Z)]) = exp(q N[L x]) exp(Nq(x, Z)) . N N ¯ ¯ If Z lies in the region of interest then q = q = 0, whence L (Z, L ) = 1 whatever the value of L . 2 N N ˆ ¯ ¯ Otherwise, both q and q are strictly positive, which implies that L (Z, L ) < 1 whenever L > x. The 2 N N net result of this discussion is that: ¯ ¯ L (Z, L )  1 whenever L > x . (29) 2 N N The implication is that large, unstable, IS weights in the second stage will never be a problem. If the realised value of z does lie in the region of interest then g ˆ and g are identical, and simulation from g is straightforward. Our final task is to determine how to sample from Equation (28) in the case where z does not lie in the region of interest. One approach would be to identify a family of densities fh (y) : z 2 R g such that L = L(z, Y ) is a draw from g ˆ (jz) whenever Y is a draw from h (), but z i i x i z this approach appears to be overly complicated. A simpler approach is to sample from Equation (28) using rejection sampling with g as the proposal density. To this end, we note that for fixed z, the ratio ˆ ˆ of g ˆ to g is exp(q` k(q, z)), which is bounded and strictly increasing on [0, ` ]. The best possible x max (i.e., smallest) rejection constant is therefore: ˆ ˆ c ˆ = c ˆ(x, z) := exp(q` k(q, z)) , (30) max and the algorithm for sampling from g ˆ would proceed as follows. First, sample Y from its actual density and set L = L(z, Y ). Then generate a random number U, uniformly distributed on [0, 1] and i i independent of Y . If, g ˆ (L jz) ˆ ˆ U  = exp(q(` L )) , max i c ˆ g(L jz) set L = L and proceed to the next exposure. Otherwise return to the first step and sample another i i pair (Y , U). 3.4. Summary and Intuition The proposed algorithm is summarised in Algorithm 3 below. The initial step is to approximate the value of the first-stage IS parameter, l . In our numerical examples we use a small pilot simulation (10% of the sample size that we eventually use to estimate p ) and the approximation of Equation (27) in order to estimate l . Having computed l , the first stage of the algorithm proceeds by simulating independent realisations of the systematic risk factors from the density f , and computing the associated first-stage weights of Equation (26). Recall that we can interpret these realisations as corresponding to different economic scenarios. Intuitively, sampling from f instead of f increases the proportion of adverse scenarios that are generated in the first stage. In the examples we consider, f concentrates most of its mass near the boundary of the region of interest, and the effect is to concentrate the distribution of m(Z) near x. In the second stage, one first checks whether or not the realised values of the systematic risk factors lie inside the region of interest. If they do then the event of interest is no longer rare and there is no need to apply further IS in the second stage. Otherwise, if we “miss” the region of interest in the Risks 2020, 8, 25 12 of 36 first stage, we “correct” this mistake by applying an exponential tilt to the conditional distribution of individual losses. Specifically, we transfer mass from the left tail of g to the right tail, in order to produce a density whose mean is exactly x. Algorithm 3 Proposed IS Algorithm for Estimating p 1: Compute l using a small pilot simulation. 2: Simulate M i.i.d. copies of the systematic risk factors from f (z) and compute the corresponding first-stage IS weights. Denote the realised values of the factors by z , . . . , z and the associated IS 1 M weights by L (z ), . . . , L (z ). 1 1 1 M 3: For each scenario m, determine whether or not z lies in the region of interest (i.e., whether or not m(z )  x). If it does lie in the region, proceed as follows: (a) Simulate the idiosyncratic risk factors for each exposure. Denote the simulated values by y , . . . , y . 1,m N,m 1 N ¯ ¯ (b) Set ` = L(z , y ), ` = å ` and L (z , ` ) = 1. m m 2 m m i,m i,m i,m N i=1 Otherwise, proceed as follows: ˆ ˆ ˆ ˆ ˆ ˆ (a) Compute q = q(x, z ), k = k(q, z ) and c ˆ = exp(q` k). For each exposure i: m m max (i) Simulate the exposure’s idiosyncratic risk factor (denote the realised value by y ˆ ) and i,m set ` = L(z , y ). i,m i,m (ii) Simulate a random number drawn uniformly from the unit interval (denote the realised ˆ ˆ ˆ value by u) and determine whether or not u  exp(q(` ` )). If it is, set ` = ` max i,m i,m i,m and proceed to the next exposure. Otherwise, return to step (i). 1 N ¯ ¯ ˆ ¯ ˆ (b) Set ` = å ` and L (z , ` ) = exp(N[q` k]) m 2 m m m i,m N i=1 1 M 4: Return pb = L (z ) L (z , ` ) 1 . å ¯ x 1 m 2 m m m=1 f` >xg M m 4. Practical Considerations In this section we discuss some of the practical issues that arise when implementing the proposed methodology. 4.1. One- and Two-Stage Estimators The rejection sampling procedure employed in the second stage of the proposed algorithm involves repeated evaluation of q, which requires a non-trivial amount of computational time time. In addition, rejection sampling in general requires relatively complicated code. As such, it is worth considering a simpler algorithm that only applies importance sampling in the first stage, and is therefore easier to implement and faster to run. In what follows we will distinguish between one- and two-stage IS algorithms. A one-stage algorithm only applies IS in the first stage and samples (Z, L) from the IS density: h (z, `) := f (z) g(` jz) . (31) 1S ˆ i i=1 The associated IS weight is L (z) and the one-stage algorithm is summarised in Algorithm 4 below. Note the simplicity of Algorithm 4, relative to Algorithm 3. The two-stage algorithm applies IS in both the first stage and the second stage, sampling (Z, L) from the IS density: h (z, `) := f (z) g (` jz) . (32) 2S ˆ x Õ i i=1 Risks 2020, 8, 25 13 of 36 The associated IS weight is L (z) L (z, ` ), and the two-stage algorithm was summarised previously 1 2 N in Algorithm 3. Algorithm 4 Proposed One-Stage IS Algorithm for Estimating p 1: Compute l using a small pilot simulation. 2: Simulate M i.i.d. copies of the systematic risk factors from f (z) and compute the corresponding first-stage IS weights. Denote the realised values of the factors by z , . . . , z and the associated IS 1 M weights by L (z ), . . . , L (z ). 1 1 1 M 3: For each scenario m: (a) Simulate the idiosyncratic risk factors for each exposure. Denote the simulated values by y , . . . , y . 1,m N,m 1 N (b) Set ` = L(z , y ) and ` = ` . i,m m i,m m i,m i=1 1 M 4: Return pb = L (z ) 1 . x å m ¯ M m=1 f` >xg Although it is simpler to implement and faster to run, the one-stage algorithm is less accurate than the two-stage algorithm. More precisely, the two-stage estimator never has larger variance than the one-stage estimator. To see this, first let E denote expectation under the one-stage IS density 1S h (z, `) given in Equation (31). Then the variance of the one-stage estimator is: 1S 2 2 E [(L (Z) 1 ) ] p 1S 1 x fL xg where M denotes sample size. And if we let E denote expectation under the two-stage IS density 2S h (z, `) given in Equation (32) then the variance of the two-stage estimator is: 2S 2 2 E [(L (Z) L (Z, L ) 1 ) ] p 2S 1 2 N fL xg x In order to compare variances it suffices to compare the second moments appearing above under the actual density h(z, `), and we let E denote expectation with respect to this density. To this end we note that: E [(L (Z) 1 ) ] = E[L (Z) 1 ] ¯ ¯ 1S 1 1 fL xg fL xg N N and ¯ ¯ E [(L (Z) L (Z, L ) 1 ) ] = E[L (Z) L (Z, L )] 1 ] 2S 1 2 N ¯ 1 2 N ¯ fL xg fL xg N N In light of Equation (29) we get that: L (Z, L ) 1  1 1 = 1 , (33) 2 N ¯ ¯ ¯ fL >xg fL >xg fL >xg N N N whence ¯ ¯ E [(L (Z) L (Z, L ) 1 ) ] = E[L (Z) L (Z, L ) 1 ] ¯ ¯ 2S 1 2 N 1 2 N fL xg fL xg N N E[L (Z) 1 ] 1 fL xg = E [(L (Z) 1 ) ] . 1S 1 ¯ fL xg The two-stage estimator will therefore never have larger variance than the the one-stage estimator. 4.2. Large First-Stage Weights In the examples that we consider in this paper, the systematic risk factors are Gaussian. When selecting their IS density, one could either (i) shift their means and leave their variances (and correlations) unchanged or (ii) shift their means and adjust their variances (and correlations). In general Risks 2020, 8, 25 14 of 36 the latter approach will lead to a much better approximation to the ideal density f , but could lead to an IS weight that has infinite variance. By contrast, the former approach will always lead to an IS weight with finite variance, but could lead to a poor approximation of the ideal density. At first glance it might seem absurd to consider IS densities whose weights are so unstable as to have infinite variance, but we have found that adjusting the variances of the systematic risk factors can lead to more effective estimators, in terms of both statistical accuracy and run time (see Section 6.1 for more details), provided one stabilises the resulting IS weights in some way. In the remainder of this section we describe a simple stabilisation technique that leads to a computable upper bound on the associated bias (an alternative would be to stabilize unruly IS weights via truncation, as discussed in Ionides (2008)). Returning now to the general case, suppose that the first-stage IS parameter, l , is such that the first-stage IS weight, L (Z), has infinite variance. We trim large first-stage weights by fixing a set A  R such that L () is bounded over A, and discarding those simulations for which Z 2 / A. Specifically, the last line of Algorithm 3 would be altered to return the trimmed estimate: p = L (z ) L (z, ` ) 1  1 , x 1 m 2 m ¯ å fz 2 Ag f` >xg m m=1 etc. for Algorithm 4. The variance of the so-trimmed estimator is necessarily finite (recall that ¯ ¯ L (z, `)  1 if ` > x), and its bias is: ¯ ¯ E [L (Z) L (Z, L ) 1  1 ] = E[1  1 ] = E[P(L > xjZ) 1 ] , ¯ ¯ 2S 1 2 N N fL >xg fZ2 / Ag fL >xg fZ2 / Ag fZ2 / Ag N N where we have used the tower property (conditioning on Z) to obtain the last equality. Using Chernoff’s bound in Equation (11) we get that: E[P(L > xjZ) 1 ]  E[exp(Nq(x, Z)) 1 ] . (34) fZ2 / Ag fZ2 / Ag As it only depends on the small number of systematic risk factors, and not the large number of idiosyncratic risk factors, the right-hand side of Equation (34) is a tractable upper bound on the bias committed by trimming large (first-stage) IS weights. This upper bound can be used to assess whether or not the bias associated with a given set A is acceptable. 4.3. Large Rejection Constants The smaller the c ˆ, the more efficient is the rejection sampling algorithm employed in the second stage. Indeed the average number of proposals that must be generated in order to obtain one realisation from g ˆ is 1/c ˆ. In the examples we consider in this paper, c ˆ is (essentially) a decreasing function m(z), such that c ˆ ! 1 as m(z) ! x and c ˆ ! ¥ as m(z) ! 0 (see Figure 1). The second-stage rejection algorithm is therefore quite efficient when m(z)  x and quite inefficient when m(z)  0. Now, the IS density for the first-stage risk factors is such that the distribution of m(Z) concentrates most of its mass near x (where c ˆ is a reasonable size), but it is still theoretically possible to obtain a realisation of the systematic risk factors for which m(z) is very small and c ˆ is unacceptably large (e.g., 10 ). In such situations the algorithm effectively grinds to a halt, as one endlessly generates proposed losses that have no realistic chance of being accepted. It is extremely unlikely that one does obtain such a scenario under the first-stage IS distribution, but it is still important to protect oneself against this unlikely event. To this end we suggest fixing some maximum acceptable rejection constant c , and only max applying the second stage IS to those first-stage realizations for which m(z) < x and c ˆ  c . In other max words, even if the realised values of the systematic risk factors lie outside the region of interest, we avoid applying the second stage if the associated rejection constant exceeds the predefined threshold. Risks 2020, 8, 25 15 of 36 4.4. Computing q ˆ ˆ Repeated evaluations of q(x,) are necessary when computing l at the outset of the algorithm, as well as during the second stage of the two-stage algorithm. Recall that in order to compute q(x, z) “exactly” one must numerically solve the equation k (q, z) = x, which requires a non-trivial amount of CPU time. As each evaluation of q is relatively costly, repeated evaluation would, in the absence of any further approximation (over and above that inherent in numerical root-finding), account for the vast majority of the algorithm’s total run time. In order to reduce the amount of time spent evaluating q we fit a low degree polynomial to the function q(x,) that can be evaluated extremely quickly, considerably reducing total run time. Specifically, suppose that we must compute q(x, z ) for each of n points z , . . . , z (either the sample n 1 n points from the pilot simulation, or the first-stage realisations that did not land in the region of interest). We identify a small set C  R that contains each of the n points, construct a mesh of m << n points in C, evaluate q exactly at each mesh point, and then fit a fifth degree polynomial to the ¯ ¯ ¯ resulting data. Letting q(x,) denote the resulting polynomial, we then evaluate q(x, z ), . . . , q(x, z ) 1 n ˆ ˆ instead of q(x, z ), . . . , q(x, z ). If m is substantially smaller than n, then the reduction in CPU time 1 n is considerable. 5. PD-LGD Correlation Framework All of the PD-LGD correlation models listed in the introduction are special cases of the following general framework—an observation that, to the best of our knowledge, has not been made in the literature. The systematic risk factors take the form Z = (Z , Z ), where Z and Z are bivariate D L D L normal with standard normal margins and correlation r . Idiosyncratic risk factors take the form Y = S i (Y , Y ), where Y and Y are bivariate normal with standard normal margins and correlation r . i,D i,L i,D i,L I Associated with each exposure is a default driver X and a loss driver X , defined as follows: i,D i,L X = a Z + 1 a Y , (35) D D i,D i,D X = a Z + 1 a Y . (36) i,L L L i,L The factor loadings a and a are constants taking values in the unit interval, and dictate the relative D L importance of systematic risk versus idiosyncratic risk. The correlation between default drivers of 2 2 distinct exposures is r := a and the correlation between loss drivers of distinct exposures is r := a . D L D L The correlation between the default and potential loss drivers of a particular exposure is: q q 2 2 r := a a r + 1 a 1 a r , D L D L S I D L which can be positive or negative (or zero). Note that if r and r have the same sign then, since both S I factor loadings are positive, r inherits this common sign. D L The realised loss on exposure i is L = D L , where: i i i D = 1 1 fX F (P)g i,D is the default indicator associated with exposure i and L = h(X ) i i,L is called the potential loss (our terminology) associated with exposure i. Here P denotes the common default probability of all exposures and h is some function from R to [0, ` ]. It is useful (but not max necessary) to think of potential loss as L = max(0, 1C ), where C is the value of the collateral i i i pledged to exposure i expressed as a fraction of the loan’s notional value. Risks 2020, 8, 25 16 of 36 Models in this framework are characterised by (i) the correlation structure of the risk factors, specifically restrictions on the values of r and r , and (ii) the marginal distribution of potential loss. I S For instance: Frye (2000) assumes perfect systematic correlation (r = 1) and zero idiosyncratic correlation (r = 0); Pykhtin (2003) assumes perfect systematic correlation (r = 1) but allows for arbitrary idiosyncratic correlation (r unrestricted); Witzany (2011) allows for arbitrary systematic correlation (r unrestricted) but insists on zero idiosyncratic correlation (r = 0); Miu and Ozdemir (2006) allow for arbitrary systematic correlation (r unrestricted) and arbitrary idiosyncratic correlation (r unrestricted). Note that if r = 1 then the systematic risk factor is effectively one-dimensional. Indeed if r = 1 then Z = (Z, Z) from some standard Gaussian variable Z, and if r = 1 then Z = (Z,Z). S S We refer to the case jr j = 1 as the one-factor case, and the case jr j < 1 as the two-factor case. In the S S one-factor case we use Z, and not Z, to denote the systematic risk factor. The first two models listed above are one-factor models, the last two are two-factor models. The marginal distribution of potential loss is determined by the specification of the function h. For instance: Frye (2000) specifies h(x) = max(0, 1 a(1 + bx)) for constants a 2 R and b > 0. Potential loss takes values in [0, ¥). Its density has a point mass at zero and is proportional to a Gaussian density on (0, ¥). Since L is not constrained to lie in the unit interval, this specification violates the assumptions made in Section 2.3; a+bx Pykhtin (2003) specifies h(x) = max(0, 1 e ) for constants a 2 R and b > 0. Potential loss takes values in [0, 1). Its density has a point mass at zero, and is proportional to a shifted lognormal density over (0, 1); Witzany (2011) and Miu and Ozdemir (2006) both specify h(x) = B (F(x)), where a, b > 0 and a,b B denotes the cdf of the beta distribution with parameters a and b. Potential loss takes values in a,b (0, 1). It is a continuous variable and follows a beta distribution. The sign of r and the nature of the function h (increasing or decreasing) will in general D L determine the sign of the relationship between D and L . If r > 0 then the relationship will be i i D L positive [negative] provided h is decreasing [increasing], and vice versa if r < 0. D L 5.1. Computing m(z) 2 T Here vectors z 2 R take the form z = (z , z ) . In order to obtain an expression for m(z) = D L E[L jZ = z], we begin with the observation that: E[L jZ] = E[L D jZ] = E[L E[D jX , Z]jZ] = E[L P(D = 1jX , Z)jZ] . i i i i i i,L i i i,L Thus, m(z) = h(x ) F(d, m(x , z), v) f(x , a z , 1 a ) dx , (37) L L L L L L where 1 a m(x , z) := a z + r   (x a z ) L D D I L L L 1 a and 2 2 v = v(x , z) := (1 a )(1 r ) D I Risks 2020, 8, 25 17 of 36 are the conditional mean and variance of X , respectively, given that (X , Z) = (x , z). In general i,D i,L L m(z) must be evaluated using quadrature, and doing so is straightforward . On average (across parameter values and points z 2 R ) a single evaluation of m() requires approximately one millisecond. In the one-factor case with r = 1 [r = 1] the expression for m(z) = E[L jZ = z] is obtained by S S i plugging z = (z, z) [z = (z,z)] into Equation (37). 5.2. Computing k(q, z) and q(x, z) 2 T Here again, vectors z 2 R take the form z = (z , z ) . In order to derive an expression for k(q, z) D L we begin with the observation that: q L qL qL i i i e = 1(D = 0) + e 1(D > 0) = 1 + (e 1) 1(D > 0) , i i i q L and since k(q, z) = log(E[e jZ = z]), we get that: qh(x ) 2 k(q, z) = log 1 + (e 1) F(d, m(x , z), v) f(x , a z , 1 a ) dx , (38) L L L L L where m(x , z) and v are given in the previous section. In the one-factor case with r = 1 [r = 1] L S S the expression for k(q, z) = log(E[exp(q L )jZ = z]) is obtained by plugging z = (z, z) [z = (z,z)] into Equation (38). As with m(z), k(q, z) must in general be evaluated using quadrature, which is straightforward. The time required for a single evaluation of k(q,) is comparable to that required for a single evaluation of m(). In order to compute q we must solve the equation k (q, z) = x with respect to q. Differentiating Equation (38) we get: qh(x ) 2 ¶k(q, z) h(x ) e  F(d, m(x , z), v) f(x , a z , 1 a ) dx L L L L L L 0 R L k (q, z) = = , (39) ¶q exp(k(q, z)) which is straightforward to compute using quadrature. A single evaluation of k (q, z) requires approximately twice as much time as a single evaluation of k(q, z). As the root of k (q, z) = x must be evaluated numerically, evaluating q is much more time consuming than evaluating k or k . Across parameter values and points z 2 R , and using q = 0 as an initial guess, the average time required for a single evaluation of q(x,) is slightly less than one tenth of one second. The right panel of Figure 1 illustrates the relationship between expected losses and the rejection ˆ ˆ constant employed in the second stage, c ˆ = exp(q k(q, z)). We see that c ˆ is essentially a decreasing ˆ ˆ function of m(z), such that c ! 1 as m(z) ! x and c ! ¥ as m(z) ! 0. The left panel of Figure 1 illustrates the graph of the LDA approximation P(L > xjZ = z)  exp(Nq(x, z)). The approximation is identically equal to one inside the region of interest, and decays to zero very rapidly outside the region. In other words, most of the variability in the function q(x,) occurs along, and just outside, the boundary of the region of interest. All calculations are carried out using Matlab 2018a on a 2015 MacBook Pro with 6.8 GHz Intel Core i7 processor and 16 GB (1600 MHz) of memory. Numerical integration is performed using the built-in integral function. We use the Matlab function fzero for the root-finding. Risks 2020, 8, 25 18 of 36 LDA Approximation to Conditional Tail Probability Expected Losses and Rejection Constant 0.8 0.6 0.4 0.2 2 -2 -2.5 -3 -2 -3.5 -4 -4 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Figure 1. The left panel of this figure illustrates the relationship between expected losses m(z) and the second-stage rejection constant c ˆ = c ˆ(x, z), in the two-factor model. The right panel illustrates the graph of the LDA approximation of Equation (13). Parameters (randomly selected using the procedure in Section 5.3) in both panels are (P, r , r , r , r , a, b, N) = (0.0063, 0.3964, 0.2794, 0.3356, 0.7599, D L I S 0.6497, 0.5033, 134) and the threshold is x = 0.1575. Mean losses are E[L ] = 0.0029, and the probability that losses exceed the threshold x is on the order of 50 basis points. Points in the left panel were obtained by generating 1000 realizations of the systematic risk factors from their actual distribution (as opposed to the first-stage IS distribution) using the indicated parameter values. 5.3. Exploring the Parameter Space The model contains five parameters, in addition to any parameters associated with the transformation h. We are ultimately interested in how well the proposed algorithms perform across a wide range of different parameter sets. As such, in our numerical experiments we will randomly select a large number of parameter sets according to the procedure described below, and assess the algorithms’ performance for each parameter set. Generate the default probability P uniformly between 0% and 10%, and generate each of the 2 2 correlations r = a and r = a uniformly between 0% and 50%; D L D L In the one-factor model, generate r uniformly on f1, 1g, i.e., r takes on the value 1 or +1 S S with equal probability. If r = 1 we generate r uniformly between 0% and 100%, and if r = 1 S I S we generate r uniformly between100% and 0%. This allows us to control the sign of r , which I D L we must do in order to ensure a positive relationship between default and potential loss. In the two-factor model we randomly generated r uniformly on [1, 1]. If r is positive, randomly S S generate r uniformly on [0, 1], otherwise randomly generate r uniformly on [1, 0]; I I We choose the transformation h() to ensure that (i) potential loss is beta distributed and (ii) there is a positive relationship between default and loss. The paramters a and b of the beta distribution are generated independently from an exponential distribution with unit mean. If r < 0 we D L 1 1 set h(x) = B (F(x)) and if r > 0 we set h(x) = B (F(x)), where B () is the cumulative D L a,b a,b a,b distribution function for the beta distribution with parameters a and b. Note that under these restrictions, in the one-factor model the expected loss function m(z) is monotone decreasing. In order to ensure that we are considering cases of practical interest, we randomise the portfolio size and loss threshold as follows. Generate the number of exposures randomly between 10 and 5000; 1 q In the one-factor model we generate the threshold x by setting x = m(F (10 )), where q is uniformly distributed on [1, 5]. The LPA suggests that 1 q p = P(L > x)  P(m(Z) > x) = P(Z < m (x)) = 10 . x N Risks 2020, 8, 25 19 of 36 This means that log( p ), the order of magnitude of the probability of interested, is approximately uniformly distributed on [5,1]. In the two-factor model we set x = m(z ), where z = q q 1 1 (F (q), r F (q)) and q is uniformly distributed on [5,1]. 6. Implementation In this section we discuss our implementation of the algorithm proposed in Section 3 in the general framework outlined in Section 5. As the general framework encompasses many of the PD-LGD correlations that have been proposed in the literature, this section effectively discusses implementation of the proposed algorithm across a wide variety of models that are used in practice. 6.1. Selecting the IS Density for the Systematic Risk Factors The systematic risk factors here are Gaussian. When constructing their IS density we could either shift their means and leave their variances (and correlations) unchanged, or shift their means and adjust their variances (and correlations). Recall that the ultimate goal is to choose an IS density that closely resembles the ideal density f given in Equation (15). As illustrated in Figure 2, the ideal density f tends to be very tightly concentrated about its mean, and adjusting the variance of the systematic risk factors leads to a much better approximation to the ideal density for “typical values” of the ideal density. The left tail of the ideal density is, however, heavier than the variance-adjusted IS density, an issue that can be resolved by trimming large IS weights. Normal Approximation to Optimal Density Normal Approximation to Optimal Density 1.8 2.5 1.6 1.4 1.2 1.5 0.8 0.6 0.4 0.5 0.2 0 0 -5 -4.5 -4 -3.5 -3 -4 -3.5 -3 -2.5 -2 Figure 2. This figure illustrates f (in fact, the approximation of Equation (40)) for two randomly generated sets of parameters. Each panel superimposes (i) a normal density with the same mean and variance as f (dashed blue line), and (ii) a normal density with the same mean as f and unit variance x x (dash-dot red line). The mean and variance of f are computed via (computationally inefficient) quadrature. The mean and variance of f are computed using quadrature. Parameters in the right panel are (P, r , r , r , r , a, b, N) = (0.02, 0.33, 0.27, 0.96, 1, 2.47, 4.32, 454), and for the left panel they D L I S are (P, r , r , r , r , a, b, N) = (0.03, 0.13, 0.12, 0.85, 1, 1.81, 1.90, 271). In both cases, the transformation D L I S h is taken to be h(x) = B (F(x)). a,b The downside to adjusting the variance of the systematic risk factors is that it can lead to first-stage IS weights with infinite variance, but numerical evidence suggests that this issue can be mitigated by In the one-factor model, a tractable approximation to the ideal density can be obtained by using the LDA of Equation (13) to approximate both probabilities appearing in Equation (15). The result is: exp(Nq(x, z)) f(z) f (z)  , (40) exp(Nq(x, w)) f(w) dw and the right-hand side of Equation (40) can be approximated via quadrature. As the integrand involves q, the approximation is computationally very slow. Risks 2020, 8, 25 20 of 36 trimming large weights. Indeed, numerical experiments suggest that adjusting variance and trimming large weights leads to substantially more accurate estimators of p . Intuitively, it is more important for the IS density to mimic the behaviour of the ideal density over its “typical range”, as opposed to faithfully representing its tail behaviour. In addition to improving statistical accuracy, adjusting variance has the added benefit of making the second stage of the algorithm more computationally efficient in terms of run time. Indeed, as discussed in more detail in Section 6.3, adjusting variance tends to increase the proportion of first-stage simulations that land in the region of interest (thereby reducing the number of times the rejection sampling algorithm must be employed in the second stage) and reduces the average size of the rejection constants employed in the second stage (thereby making the rejection algorithm more effective whenever it must be employed). 6.2. First Stage In this section we explain how to efficiently approximate the parameters of the optimal IS density for the systematic risk factors, in both the one- and two-factor models. We also explain how we trim large IS weights, and demonstrate that the resulting bias is negligible. 6.2.1. Computing Parameters in the Two-Factor Model In the two-factor model the systematic risk factors are bivariate Gaussian with zero mean vector and covariance matrix: " # 1 r S = . r 1 The mean vector and covariance matrix that satisfy the criteria of Equation (25) are: m := E[ZjL > x] (41) I S N and S := E[(Z m )(Z m ) jL > x] , (42) I S I S I S N respectively. In order to approximate the suggested mean vector and covariance matrix we use Equation (27) to get: E[exp(Nq(x, Z)) Z] m  (43) I S E[exp(Nq(x, Z))] and E[exp(Nq(x, Z)) (Z m )(Z m ) ] I S I S S  . (44) I S E[exp(Nq(x, Z))] The expected values appearing on the right-hand sides of Equations (43) and (44) are both amenable to simulation, and we use a small pilot simulation of size M << M to approximate them. In our numerical examples, the size of the pilot simulation is 10% of the sample size that is eventually used to estimate p . Whether or not we adjust the variance of the systematic risk factor, the standard error of the resulting estimator is of the form n/ M, where n depends on the model parameters and is easily estimated via simulation. Using 100 randomly selected parameter sets from the one-factor model, selected according to the procedure described in Section 5.3, we find that for 0.03 the one-stage estimator n /n  1.54 p , where n denotes the value of n assuming we only shift the mean of the MS VA MS systematic risk factor and do not adjust its variance and n denotes the value when we do adjust variance. For probabilities VA in the range of interest, then, adjusting the variance of the systematic risk factor leads to an estimator that is nearly four times as efficient, in the sense that the sample size required to achieve a given degree of accuracy (as measured by standard error) is nearly four times larger if we do not adjust variance. As discussed in Appendix B, the natural sufficient statistic here consists of the components of Z plus the components of T T T ¯ ¯ ZZ . As such, in order to satisfy Equation (27) we must ensure that E [Z] = E[ZjL > x] and E [ZZ ] = E[ZZ jL > x], I S N I S N where E denotes mean under the IS distribution. These conditions are clearly equivalent to Equations (41) and (42).. I S Risks 2020, 8, 25 21 of 36 In order to implement the approximation we must first simulate the systematic risk factors and then compute q(x, z) for each sample point z. The most natural way to proceed is to (i) sample the systematic risk factors from their actual distribution (bivariate Gaussian with zero mean vector and covariance matrix S) and (ii) numerically solve the equation k (q, z) = x in order to compute compute q(x, z) for each pilot sample point z that lies outside the region of interest. In our experience this leads to unacceptably inefficient estimators, in terms of both (i) statistical accuracy and (ii) computational time. We deal with each issue in turn. As most of the variation in q(x,) occurs just outside the boundary of the region of interest (recall the right panel of Figure 1), we suggest using an IS distribution for the pilot simulation that is centered on the boundary of the region. Specifically, we suggest using that point on the boundary at which the density of the systematic risk factors attains its maximum value (i.e., the most likely point on the boundary): T 1 z := arg minfz S z : m(z) = xg . (45) The non-linear minimisation problem appearing above is easily and rapidly solved using standard techniques. We used fmincon function in Matlab. As z lies on the boundary of the region of interest, roughly half the pilot sample will lie outside the region. In Section 5.2 we noted that it takes nearly one tenth of one second to numerically solve the equation k (q, z) = x. As such, if we are to compute q exactly (i.e., by numerically solving the indicated equation) for each sample point that lies outside the region of interest, the total time required (in seconds) to estimate the first-stage IS parameters will be at least M /20. In our numerical examples we use a pilot sample size of M = 1000, which means that it would take nearly one full minute to compute the first-stage IS parameters. This discussion suggests that reducing the number of times we must numerically solve the equation k (q, z) = x could lead to a dramatic reduction in computational time. We suggest fitting a low degree polynomial to the function q(x,), over a small region in R that contains all of the pilot sample points that lie outside the region of interest. Specifically, we determine the smallest rectangle that contains all of the pilot sample points, and discretize the rectangle using a mesh of n points, equally spaced in each direction. Next, we identify those mesh points that lie outside the region of interest and compute q(x, z) exactly (i.e., by solving k (q, z) = x numerically) for each such point. Finally, we fit a polynomial to the resulting (z, q(x, z)) pairs and call the resulting function q(x,). Numerical evidence indicates at using a fifth-degree polynomial and a mesh with 15 = 225 points leads to a sufficiently accurate approximation to q(x,) over the indicated range (the intersection of (i) the smallest rectangle that contains all sample points and (ii) the complement of ¯ ˆ the region of interest). Note that q could be an extremely inaccurate approximation to q outside this range, but that is not a concern because we will never need to evaluate it there. It remains to compute q(x, z) for each of the pilot points z. For those points z that lie inside the region of interest, we set q(x, z) = 0. For those points that lie inside the region, we set q(x, z) = ¯ ¯ ¯ ¯ ¯ q x k(q, z), where q = q(x, z). Evaluating q(x,) requires essentially no computational time (it is a polynomial), and if the mesh size and degree are chosen appropriately the difference between q ¯ ˆ and q is very small. In total, the suggested procedure reduces the number of evaluations of q from M /2 to n /2, for a percentage reduction of n / M . In our numerical examples we use n = 15 and p g p g M = 1000, which corresponds to a reduction of 75% in computational time. To summarise, we estimate the optimal first-stage IS parameters as follows. First, we compute z . Second, we draw a random sample of size M from the Gaussian distribution with mean vector z and p x ¯ ˆ covariance matrix S. Third, we construct q(x,), the polynomial approximation to q(x,), as described Risks 2020, 8, 25 22 of 36 in the previous paragraph. Fourth, for those sample points z that lie outside the region of interest we ¯ ˆ compute q(x, z) using q instead q. The estimates of the optimal first-stage IS parameters are then: w(Z ) exp(Nq(x, Z ))Z m m m m=1 m ˆ = I S w(Z ) exp(Nq(x, Z )) m m m=1 and å w(Z ) exp(Nq(x, Z ))(Z m ˆ )(Z m ˆ ) m m m m I S I S m=1 S = , I S w(Z ) exp(Nq(x, Z )) m m m=1 where Z , . . . , Z is the random sample and 1 M f(z; 0, S) w(z) = f(z; z , S) is the IS weight associated with shifting the mean of the systematic risk factors from 0 to z . The upper left panel of Figure 3 illustrates a typical situation where the mean of the IS distribution lies “just inside” the region of interest. Region of Interest 4.5 3.5 2.5 1.5 0.5 -2.95 -2.9 -2.85 -2.8 -2.75 Figure 3. This figure illustrates the locations of (i) the importance sampling (IS) mean used for the pilot simulation and (ii) the IS mean used for the actual simulation, relative to the region of interest. Parameters (randomly selected using the procedure in Section 5.3) in both panels are (P, r , r , r , r , a, b, N) = (0.0063, 0.3964, 0.2794, -0.3356, -0.7599, 0.6497, 0.5033, 134) and the threshold D L I S is x = 0.1575. Mean losses are E[L ] = 0.0029. 6.2.2. Computing Parameters in the One-Factor Model The procedure described in the previous section specialises in the one-factor case as follows. First, under the parameter restrictions outlined in Section 5.3, the expected loss function m(z) is a strictly decreasing function of z. As such, the region of interest is the semi-infinite interval (¥, z ), where z := m (x), and its boundary is the single point z . In general z must be computed numerically, x x x which is straightforward. Second, we draw a random sample of size M from the Gaussian distribution with mean z and unit variance. Third, the polynomial approximation to q is constructed by evaluating q exactly (i.e., by numerically solving the equation k (q, z) = x) at each of n equally-spaced points z in the interval [z , z ], where z and z are the largest and smallest values obtained in the pilot + + simulation, respectively, and then fitting a polynomial to the resulting (z, q(x, z)) pairs. Fourth, we Risks 2020, 8, 25 23 of 36 evaluate q(x, z) for each pilot sample point z as follows—if z lies inside the region of interest we set q(x, z) = 0, otherwise we compute q(x, z) by replacing the exact value q(x, z) with the approximate ¯ ¯ value q(x, z), where q is the polynomial constructed in the previous step. Note that a single evaluation ¯ ˆ of q requires far less computational time than a single evaluation of q. Finally, the approximations to the first-stage IS parameters are: w(Z ) exp(Nq(x, Z ))Z m m m m=1 m ˆ = I S w(Z ) exp(Nq(x, Z )) å m m m=1 and w(Z ) exp(Nq(x, Z ))(Z m ˆ ) m m m I S 2 m=1 s = , I S w(Z ) exp(Nq(x, Z )) å m m m=1 where Z , . . . , Z is the random sample and 1 M f(z; 0, 1) w(z) = . f(z; z , 1) is the IS weight associated with shifting the mean of the systematic risk factor from 0 to z . 6.2.3. Trimming Large Weights In the one-factor model the first-stage IS weight will have infinite variance whenever s < 0.5 I S (see Remark A1 in Appendix B). In a sample of 100 parameter sets, randomly selected according to the procedure in Section 5.3, the largest realised value of s was 0.38, and the mean and median were 0.11 I S and 0.09, respectively. It appears, then, that the first-stage IS weight in the one-factor model will have infinite variance in all cases of practical interest. We trim large weights as described in Section 4.2, using the set: A = fz 2 R : jz m ˆ j  Cs g I S I S for some constant C. In the numerical examples that follow we use C = 4, in which case we expect to trim less than 0.01% of the entire sample. Specialising Equation (34) to the present context, we get that an upper bound on the associated bias is given by: exp(Nq(x, z))f(z) dz , (46) which is straightforward (albeit slow) to compute using quadrature. Figure 4 illustrates the relationship between the probability of interest p and the upper bound of Equation (46) for the 100 randomly generated parameter sets, and clearly demonstrates that the bias associated with our trimming procedure is negligible. For instance, for probabilities on the order of 10 the bias is no larger than 10 , or 1% of the quantity of interest. In the two-factor model the first-stage IS weight will have infinite variance whenever det(2S S ) < 0. In a random sample of 100 parameter sets, this condition occurred 96 times. As in the I S one-factor model, then, the first-stage IS weight in the two-factor model can be expected to have infinite variance in most cases of practical interest. We trim large weights using the set: n o 2 T 1 2 A = z 2 R : (z m ˆ ) S (z m ˆ )j  C I S I S I S for some constant C, and use C = 4 in the numerical examples that follow. Risks 2020, 8, 25 24 of 36 -3 -4 -5 -6 -7 -1 -2 -3 -4 -5 10 10 10 10 10 Figure 4. This figure illustrates the bias introduced by trimming large weights (vertical axis) as a function of the probability of interest (horizontal axis), for 100 randomly generated parameter sets in the one-factor case. For each set, we compute bias (in fact, an upper bound on the bias) by using quadrature to approximate Equation (46) and estimate the probability of interest using the full two-stage algorithm. 6.3. Second Stage The first stage of the algorithm consists of (i) computing the first-stage IS parameters, (ii) simulating a random sample of size M from the systematic risk factors’ IS distribution, and (iii) computing the associated IS weights, trimming large weights appropriately. Having completed these tasks, the next step is to simulate individual losses in the second stage. In the remainder of this section we let z = (z , z ) denote a generic realisation of the systematic risk factors obtained in the D L first stage. 6.3.1. Approximating q Before generating any individual losses first construct the polynomial approximation to q, using the same procedure described in Section 6.2.1. The basic idea is to fit a relatively low degree polynomial to the surface of q(x,), over a small region that contains all of the first-stage sample points. The values of z obtained in the pilot sample are invariably different from those obtained in the first stage, so it is ¯ ˆ essential that the polynomial is refit to account for this fact. In what follows we use q to approximate q whenever the numerical value of q is required, but since the difference between the two is small we do ˆ ¯ not distinguish between the two (i.e., we write q in this document, but use q in our code). 6.3.2. Sampling Individual Losses In this section we describe how to sample individual losses in the two-factor model. The procedure carries over in an obvious way to the one-factor model, so we do not discuss that case explicitly. If z lies inside the region of interest then the second stage is straightforward. For a given exposure i, we first simulate the exposure’s idiosyncratic risk factors Y = (Y , Y ), from the bivariate normal i i,D i,L distribution with standard normal margins and correlation r . Next, we set: q q 2 2 (X , X ) = (a z + 1 a Y , a z + 1 a Y ) . i,D i,L D D i,D L L i,L D L Risks 2020, 8, 25 25 of 36 If X > F (P) then the exposure did not default and we set L = 0 and proceed to the next exposure. i,D i Otherwise the exposure did default, in which case we must compute h(X ), set ` = h(x ) and i,L i i,L then proceed to the next exposure. Note that we only evaluate h for defaulted exposures—this is important since evaluating h requires numerical inversion of the beta cdf, which is relatively slow. Having computed the individual losses associated with each exposure, we then compute the average 1 N ¯ ¯ loss ` = N ` and set L (z, `) = 1. i 2 i=1 ˆ ˆ If z lies outside the region of interest we must compute q, k(q) and c, which we do approximately using the polynomial approximation q. We then sample from g ˆ (jz) as follows. First simulate the idiosyncratic risk factors Y = (Y , Y ) from the bivariate normal distribution with standard normal i i,D i,L margins and correlation r . Also generate a random number U, independent of Y . Then set: I i q q 2 2 (X , X ) = (a z + 1 a Y , a z + 1 a Y ) . i,D i,L D D i,D L L i,L D L ˆ ˆ If the exposure did not default we set L = 0, otherwise we compute h and set L = h(X ). Next we i i i,L check whether or not 1 g ˆ (L jz) x i ˆ ˆ U   = exp(q(` L )) (47) max c ˆ g(L jz) ˆ ˆ then accept L as a drawing from g ˆ , that is, set L = L and proceed to exposure i. Otherwise, draw i x i i another random number U and set of idiosyncratic factors. Once we have sampled the individual ¯ ¯ losses associated with each exposure we compute the average loss ` = N ` and set L (z, `) = i 2 i=1 ˆ ¯ ˆ ˆ exp(N[q` k(q, z)]), using the polynomial approximation to estimate the value of q. 6.3.3. Efficiency of the Second Stage The frequency with which the rejection sampling algorithm must be applied in the second stage is governed by P (m(Z) > x). The left panel of Figure 5 illustrates the empirical distribution of this I S probability across 100 randomly selected parameter sets. The distribution is concentrated towards small values (the median fraction is 27%) but does have a relatively thick right tail (the mean fraction is 35%). In some cases—particularly when the value of the parameter r is close to zero, in which case individual losses are very nearly independent and systematic risk is largely irrelevant—the vast majority of first-stage simulations require further IS in the second stage. The efficiency of the rejection sampling algorithm, when it must be applied, is governed by the conditional distribution of c ˆ = c ˆ(x, Z) given that m(Z) < x. For each of the 100 parameter sets we estimate E [c ˆ(x, Z)jm(Z) < x], which determines the average size of the rejection constant for a given I S set of parameters, by computing the associated value of c ˆ for each first-stage realisation that lies outside the region of interest and then averaging the resulting values. The right panel of Figure 5 illustrates the results, and we note that the mean and median of the data presented there are 1.09 and 1.17, respectively. The figure clearly indicates that the rejection sampling algorithm can be expected to be quite efficient, whenever it must be applied. The distributions of P (m(Z) < x) and E [c ˆ(x, Z)jm(Z) < x] across parameters depend heavily I S I S on whether or not we adjust the variance of the systematic risk factors in the first stage. When we do not adjust variance, the mean and median of P (m(Z) < x) (across 100 randomly selected parameter I S sets) rise to 49% and 45% (as compared to 35% and 27% when we do adjust variance), and the mean and median of E [c ˆ(x, Z)jm(Z) < x] rise to 18.6 and 1.8, respectively (as compared to 1.17 and 1.09 I S when we do adjust variance). Remark 7. If we do not adjust the variance of the systematic risk factors in the first stage, then (i) the rejection sampling algorithm must be applied more frequently and (ii) is less efficient whenever it must be applied. As such, adjusting the variance of the systematic risk factors reduces the total time required to implement the two-stage algorithm. Risks 2020, 8, 25 26 of 36 0.45 0.8 0.4 0.7 0.35 0.6 0.3 0.5 0.25 0.4 0.2 0.3 0.15 0.2 0.1 0.1 0.05 0 0 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 Figure 5. This figure illustrates the variation of P (m(Z) < x) (left panel) and E [c ˆ(x, Z)jm(Z) < x] I S I S (right panel) across model parameters. Recall that the former quantity determines the frequency with which the second-stage rejection sampling algorithm must be applied and the latter quantity determines the efficiency of the algorithm when it must be applied. For each of 100 parameter sets, randomly selected according to the procedure described in Section 5.3, we compute the first-stage IS parameters and then draw 10,000 realisations of the systematic risk factors from the variance adjusted first-stage IS density. The intuition behind this fact is as follows. First recall that the mean of the systematic risk factors tends to lie just inside the region of interest (recall Figure 3). In such cases the effect of reducing the variance of the systematic risk factors is to concentrate the distribution of Z just inside the boundary of the region of interest. Not only will this ensure that more first-stage realisations lie inside the region of interest (thereby reducing the fraction of points that require further IS in the second stage), it will also ensure that those realisations that lie outside the region (i.e., for which m(z) < x) do not lie “that far ” outside the region (i.e., that m(z) is not “that much less” than x), which in turn ensures that the typical size of c ˆ is relatively close to one (recall the left panel of Figure 1). 7. Performance Evaluation In this section we investigate the proposed algorithms’ performance in terms of statistical accuracy, computational time, and overall. Unless otherwise mentioned, we use a pilot sample size of M = 1000 to estimate the first-stage IS parameters and a sample size of M = 10,000 to estimate the probability of interest ( p ). We use the value C = 4 to trim large first-stage IS weights, and a value of c = 10 to x max trim large rejection constants. 7.1. Statistical Accuracy The standard error of any estimator that we consider is of the form n / M for some constant n that depends on the algorithm used and the model parameters. For instance, for the one-stage estimator in the two-factor case we have n = SD (L (Z) 1 ), where SD denotes standard x ¯ 1S 1 fL >xg 1S deviation under the one-stage IS density of Equation (31). Note that in the absence of IS we have 0.5 n = p (1 p )  p as p ! 0. x x x x Figure 6 illustrates the relationship between n and p using 100 randomly selected parameters x x sets, for the two-stage algorithm and in the two-factor case. Importantly, we see that (i) n seems to be a function of p (i.e., it only depends on model parameters through p ) and (ii) for small probabilities x x the functional relationship appears to be of the form n = a p for constants a and b. These features are also present in the case of the one-stage estimator, as well as for both estimators in the one-factor model. The numerical values of a and b are easily estimated using the line of best fit (on the logarithmic scale), and the estimated values for both the one- and two-factor cases are summarised in Table 1. Of particular note is the fact that the value of b is extremely close to one in every case. Risks 2020, 8, 25 27 of 36 Statistical Accuracy of Two-Stage Estimator -1 -2 -3 -4 -5 -1 -2 -3 -4 -5 10 10 10 10 10 Probability of Interest Figure 6. This figure illustrates the relationship between n and p , where n is the standard deviation x x x of L (Z)L (L , Z)1 under the two-stage IS density of Equation (32), in the two-factor case. 1 2 N fL xg The numerical values of p and n are estimated for each of 100 randomly generated parameters sets, x x according to the procedure described in Section 5.3. Table 1. This table reports fitted values of the relationship n  a p for each estimator (one- and two-stage) and each model (one- and two-factor). Values of a and b are obtained by determining the line of best fit on the logarithmic scale (i.e., the line appearing in Figure 6). Note that in the absence of 0.5 IS we would have n = p (1 p )  p . x x x One-Stage Algorithm Two-Stage Algorithm 0.98 0.99 One-Factor Model 0.91 p 0.81 p x x 0.98 0.98 Two-Factor Model 0.98 p 0.81 p x x Of particular interest in the rare event context is an estimator ’s relative error, defined as the ratio of its standard error to the true value of the quantity being estimated. For any of the estimators that we b1 consider, the component of relative error that does not depend on sample size is n / p  a p . In the x x absence of IS we have b 1 = 0.5, in which case relative error grows rapidly as p ! 0 (i.e., n ! 0 x x but n / p ! ¥ as p ! 0). By contrast, b  1 for any of our IS estimators, in which case there is weak x x x dependence of relative error on p . The minimum sample size required to ensure that an estimator ’s 2(b1) 2 2 2 2 relative error does not exceed the threshold e is v /( p e)  a p e . In the absence of IS we x x have b  0.5, in which case the sample size (and therefore computational burden) required to achieve a given degree of accuracy increases rapidly as p ! 0. By contrast, for all of our IS estimators we have b  1, in which case the minimum sample size (and computational burden) is nearly independent of p . Our ultimate goal is to reduce the computational burden associated with estimating p , in situations where p is small. To see how effective the proposed algorithms are in this regard, note that the sample size required to achieve a given degree of accuracy using the proposed algorithm, relative to that required to achieve the same degree of accuracy in the absence of IS, is approximately 2(b1) 2 2 a p e x 2 2b1 = a p , p e 2 2b1 which does not depend on e. Since a < 1 and b > 0.5 (recall Table 1), we have that a p < p . Standard Error (Scaled by Sample Size) Risks 2020, 8, 25 28 of 36 Remark 8. The relative sample size required to achieve a given degree of accuracy using the proposed algorithm, relative to that required in the absence of IS, is not larger than the probability of interest. For example, if the probability of interest is approximately 1%, then the proposed algorithm requires a sample size that is less than 1% of what would be required in the absence of IS (regardless of the desired degree of accuracy). And if the probability of interest is 0.1%, then the proposed algorithm requires a sample size that is less than 0.1% of what would be required in the absence of IS. In other words, the proposed algorithm is extremely effective at reducing the sample size required to achieve a given degree of accuracy. It is also insightful to compare the efficiency of the two-stage estimator, relative to the one-stage estimator. In the one-factor case, the minimum sample size required using the two-stage algorithm, relative to that required using the one-stage algorithm, is approximately: 0.02 2 0.66 p e x 0.02 = 0.80 p . 0.04 0.83 p e As p ranges from 1% to 0.01% the estimated relative sample size ranges from 0.73 to 0.67. In the two-factor case, the relative sample size is approximately 0.69, regardless of the value of p . Remark 9. In both the one- and two-factor models, the two-stage algorithm is more efficient than the one-stage algorithm, in the sense that it requires a smaller sample size in order to achieve a given degree of accuracy. Indeed, in cases of practical interest (probabilities in the range of 1% to 0.01%) the minimum sample size required to achieve a given degree of accuracy using the two-stage algorithm is roughly 70% of what would be required using the one-stage algorithm. 7.2. Computational Time Figure 7 illustrates the relationship between sample size (M) and run time (total time required to estimate p using a particular algorithm), for one randomly selected set of parameters. Across both models and algorithms, the relationship is almost perfectly linear. In the absence of IS the intercept is zero (i.e., run time is directly proportional to sample size), whereas the intercepts are non-zero for the IS algorithms. The non-zero intercepts are due to the overhead associated with (i) computing the first-stage IS parameters, which accounts for almost all of the difference between the intercepts of the solid (no IS) and dashed (one-stage IS) intercepts, and (ii) computing the second-stage polynomial approximation to q, which accounts for almost all of the difference between the intercepts of the the dashed (one-stage IS) and dash-dot (two-stage IS) lines. It is also worth noting that a given increase in sample size will have a greater impact on the run times for the IS algorithms than it will on the standard algorithm. This is because we only calculate h(X ) for defaulted exposures (evaluating h() i,L is slow because it requires numerical inversion of the beta distribution function), and the default rate is higher under the IS distribution. Across 100 randomly generated parameter sets, portfolio size (N) is most highly correlated with run time and the relationship is roughly linear. Table 2 reports summary statistics on run times, across algorithms and models. Table 2. This table reports summary statistics—in seconds, and across 100 randomly selected parameter sets—for total run time (first three columns), time required to estimate the first-stage IS parameters (fourth column) and time required to fit the second-stage polynomial approximation to q (final column). Average Run Times No IS One-Stage IS Two-Stage IS m , S q IS IS One Factor 7.3 25.6 33.7 1.5 0.8 Two Factor 7.4 39.0 55.5 14.3 8.9 Risks 2020, 8, 25 29 of 36 Sample Size and Run Time (One-Factor Model) Sample Size and Run Time (Two-Factor Model) 30 50 No IS One-Stage IS No IS Two-Stage IS One-Stage IS Two-Stage IS 15 25 0 0 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Sample Size (M) Sample Size (M) Figure 7. This figure illustrates the relationship between sample size (M) and run time (total CPU time required to estimate p by a particular algorithm), using a set of parameters randomly selected according to the procedure described in Section 5.3. For each value of M we use a pilot sample that is 10% as large as the sample that is eventually used to estimate p (i.e., we set M = 0.1M). The left panel x p corresponds to the one-factor model and parameter values are (P, r , r , r , r , a, b) = (0.0827, 0.1000, D L I 0.3629,0.0180,1, 0.6676, 0.8751) and N = 2334. The right panel corresponds to the two-factor model and parameter values are (P, r , r , r , r , a, b) = (0.0241, 0.2322, 0.0343, 0.1650, 0.4135, 0.4056, 0.4942) D L I S and N = 3278. 7.3. Overall Performance Recall that the ultimate goal of this paper is to reduce the computational burden associated with estimating p , when p is small. The computational burden associated with a particular algorithm is a x x function of both its statistical accuracy and total run time. We have seen that the proposed algorithms are substantially more accurate, but require considerably more run time. In this section we demonstrate that the benefit of increased accuracy is well worth the cost of additional run time, by considering the amount of time required by a particular algorithm in order to achieve a given degree of accuracy (as measured by relative error). To begin, let t( M) denote the total run time required by a particular algorithm to estimate p using a sample of size M. As illustrated in Figure 7 we have t( M)  c + d M for constants c and d that depend on the underlying model parameters (particularly portfolio size, N) as well as the algorithm being used. In Section 7.1 we saw that the minimum sample size required to ensure that the estimator ’s relative error does not exceed the threshold e isL 2(b1) 2 2 M(e)  a p e , for constants a and b depending on the underlying model (one- or two-factor) and algorithm being used. Thus, if T(e) denotes the total CPU time required to ensure that the estimator ’s relative error does not exceed e, we have: 2(b1) 2 2 T(e)  c + da p e . (48) Table 3 contains sample calculations for several different values of p and e, using the data appearing in the left panel of Figure 7 to estimate c and d and the values of a and b implicitly reported in Table 1. The results reported in the table are representative of those obtained using different parameter sets. It is clear that the proposed algorithms can substantially reduce the computational burden associated with accurate estimation of small probabilities. For instance, if the probability of interest is on the order of 0.1% then either of the proposed algorithms can achieve 5% accuracy within 2–3 s, as compared to 4 min (80 times longer) in the absence of IS. Run Time (Seconds) Run Time (Seconds) Risks 2020, 8, 25 30 of 36 Table 3. This table reports the time (in seconds) required to achieve a given degree of accuracy (computed using Equation (48)) for several values of p and e, for the parameter values corresponding to the left panel of Figure 7. Note that this is for the one-factor model. Values of c and d are obtained from the lines of best fit appearing in the left panel of Figure 7, values of a and b are obtained from Table 1. No IS One-Stage IS (Two-Stage IS) p p x x 1% 0.1% 0.01% 1% 0.1% 0.01% e e 10% 6 60 600 10% 1.2 (2.3) 1.2 (2.3) 1.3 (2.4) 5% 24 240 2400 5% 1.8 (2.8) 1.9 (2.9) 1.9 (2.9) 1% 600 6000 60,000 1% 20.0 (18.8) 21.8 (19.6) 23.8 (20.4) The two-stage estimator is statistically more accurate (Section 7.1) but computationally more expensive (Section 7.2) than the one-stage estimator. It is important to determine whether or not the benefit of increased accuracy outweighs the cost of increased computational time. Table 3 suggests that, in some cases at least, implementing the second stage is indeed worth the effort, in the sense that it can achieve the same degree of accuracy in less time. Figure 8 illustrates the overall efficiency of the proposed algorithms, as a function of the desired degree of accuracy. Specifically, the left panel illustrates the ratio of (i) the total CPU time required to ensure the standard estimator ’s relative error does not exceed a given threshold to (ii) the total time required by the proposed algorithms, for a randomly selected set of parameter values in the one-factor model. The right panel illustrates the same ratio for a randomly selected set of parameters in the two-factor model. Overall Efficiency of Proposed Algorithms (One-Factor Model) Overall Efficiency of Proposed Algorithms (Two-Factor Model) 3500 400 No IS/One-Stage IS No IS/Two-Stage IS No IS/One-Stage IS No IS/Two-Stage IS 0 0 -1 -2 -3 -1 -2 -3 10 10 10 10 10 10 Desired Degree of Accuracy (Relative Error) Desired Degree of Accuracy (Relative Error) Figure 8. This figure illustrates the overall efficiency of the proposed algorithms. Specifically, the solid [dashed] line in the left panel illustrates the ratio of (i) the total run time (in seconds) required to ensure that the standard estimate’s relative error does not exceed a given threshold to (ii) the run time required by the one-stage [two-stage] algorithm, in the one-factor model. The right panel corresponds to the two-factor model. Parameter values are the same as in Figure 7 and Table 3. In the one-factor model, it would take hundreds of times longer to obtain an estimate of p whose relative error is less than 10%, and thousands of times longer to obtain an estimate whose relative error is less than 1%. The figure also suggests that, since it requires less run time to obtain very accurate estimates, the two-stage algorithm is preferable to the one-stage algorithm in the one-factor model. In the two-factor model—where estimating IS parameters and fitting the second-stage polynomial approximation to q is more time consuming—the proposed algorithms are hundreds of times more efficient than the standard algorithm. In addition, it appears that the one-stage algorithm is preferable to the two-stage algorithm in this case. Although the numerical values discussed here are specific to Relative Time Required Relative Time Required Risks 2020, 8, 25 31 of 36 the parameter set used to produce the figure, they are representative of other parameter sets. In other words, the behaviour illustrated in Figure 8 is representative of the general framework overall. 8. Concluding Remarks This paper developed an importance sampling (IS) algorithm for estimating large deviation probabilities for the loss on a portfolio of loans. In contrast to existing literature, we allowed loss given default to be stochastic and correlate with the default rate. The proposed algorithm proceeded in two stages. In the first stage one generates systematic risk factors from an IS distribution that is designed to increase the rate at which adverse macroeconomic scenarios are generated. In the second stage one checks whether or not the simulated macro environment is sufficiently adverse—if it is then no further IS is applied and idiosyncratic risk factors are drawn from their actual (conditional) probability distribution, if it is not then one indirectly applies IS to the conditional distribution of the idiosyncratic risk factors. Numerical evidence indicated that the proposed algorithm could be thousands of times more efficient than algorithms that did not employ any variance reduction techniques, across a wide variety of PD-LGD correlation models that are used in practice. Author Contributions: Both authors contributed equally to all parts of this paper. Both authors have read and agreed to the published version of the manuscript. Funding: This research was funded by NSERC Discovery Grant 371512. Acknowledgments: This work was made possible through the generous financial support of the NSERC Discovery Grant program. The authors would also like to thank Agassi Iu for invaluable research assistance. Conflicts of Interest: The authors declare no conflict of interest. Appendix A. Exponential Tilts and Large Deviations Let X , X , . . . , be independent and identically distributed random variable with common density 1 2 f (x), having bounded support [x , x ], and common mean m = E[X ]. For q 2 R we let min max m(q) = E[exp(q X )] and k(q) = log(m(q)) denote the common moment generating function (mgf) 0 0 and cumulant generating function (cgf) of the X , respectively. Note that m = m (0) = k (0). Appendix A.1. Properties of k(q) Elementary properties of cgfs ensure that k () is a strictly increasing function that maps R onto (x , x ). One implication is that, for fixed t 2 (x , x ), the graph of the function q 7! qt k(q) max max min min is \-shaped. The graph also passes through the origin, and its derivative at zero is t m. If this derivative is positive (i.e., if m < t) then the unique maximum is strictly positive and occurs to the right of the origin. If it is negative (i.e., if m > t) then the unique maximum is strictly positive and occurs to the left of the origin. If it is zero (i.e., if m = t) then the unique maximum of zero is attained at the origin. e e For a given t 2 (x , x ), there is a unique value of q for which k (q) = t. We let q = q(t) denote max min e e e this value of q. Note that q(t) is a strictly increasing function of t and that q(m) = 0. Thus q is positive b b e [negative] whenever t > m [t < m]. An important quantity in what follows is q = q(t) := max(0, q(t)), which can be interpreted as the unique value of q for which k (q) = max(m, t). Note that if t  m then b b q = 0, and if t > m then q(t) > 0. Appendix A.2. Legendre Transform of k(q) We let q() denote the Legendre transform of k() over [0, ¥). That is, ˆ ˆ q(t) := max(qt k(q)) = qt k(q) , (A1) q0 ˆ ˆ where q = q(t) was defined in the previous section, and is the (uniquely defined) point at which the function q 7! qt k(q) attains its maximum on [0, ¥). Based on the discussion in the preceding Risks 2020, 8, 25 32 of 36 ˆ ˆ paragraph, we see that q(t) = q(t) = 0 whenever m  t, whereas both q(t) and q(t) are strictly positive whenever m < t. The derivative of the transform q is demonstrably equal to: 0 0 0 ˆ ˆ ˆ q (t) = q(t) + q (t) [t k (q(t))] . ˆ ˆ Since q = 0 whenever t  m and k (q) = t whenever t > m, the second term above vanishes for all t, and we find that: q (t) = q(t) . (A2) Appendix A.3. Exponential Tilts For q 2 R we define: f (x) := exp(q x k(q)) f (x) . (A3) The density f is called an exponential tilt of f . As the value of the tilt parameter q varies, we obtain an exponential family of densities (exponential families have lots of very useful properties, and this is an easy way of constructing them). If q is positive then the right and left tails of f are heavier and thinner, respectively, than those of f . The opposite is true if q is negative. The larger in magnitude is q, the greater the discrepancy between f and f ; indeed the Kullback–Leibler divergence from f to f is q q qm + k(q), which is a strictly convex function of q that attains its minimum value (of zero) at q = 0. It is readily verified that k (q) = E [X ], where E denotes expectation with respect to f . This q i q q observation, in combination with the developments in Section A.1, implies that it is always possible to find a density of the form (A3) whose mean is t, whatever the t 2 (x , x ). Indeed f is precisely min max ˜ such a density. Under mild conditions, f () can be characterised as that density that most resembles f (in the sense of minimum divergence), among all densities whose mean is x (and are absolutely continuous with respect to f ). Recall that q is the unique value of q for which k (q) = max(t, m). We can therefore interpret f as that density that most resembles f , among all densities whose mean is at least t (and that are absolutely continuous with respect to f ). Note in particular hat the mean of f is max(m, t). The numerical value of q can therefore be interpreted as the degree to which we must deform the density f , in order to produce a density whose mean is at least t. If m  t then q = 0 and no adjustment is necessary. If m < t then q > 0 and mass must be transferred from the left tail to the right; the larger the discrepancy between m (the mean of f ) and t (the desired mean), the larger is q. Appendix A.4. Behaviour of X , Conditioned on a Large Deviation 1 N Let f (x) denote the conditional density of X , given that X > t, where X = X . t i N N i i=1 We suppress the dependence of f on N for simplicity. Using Bayes’ rule we get P(X > tjX = x) N i f (x) =  f (x) , P(X > t) and since the X are independent, we get tx ¯ ¯ P(X > tjX = x) = P(X > t + ) . N N1 N1 Now, using the large deviation approximation P(X  t)  exp(N q(t)), we get that P(X > tjX = x) i tx exp((N 1)q(t + ) + Nq(t)) . N1 P(X > t) N Risks 2020, 8, 25 33 of 36 Now if N is large then tx tx tx q(t + )  q(t) + q (t) = q(t) + q , N1 N1 N1 where we have used the fact that q (t) = q(t). Putting everything together we arrive at the approximation P(X > tjX = x) N i ˆ ˆ exp(q x k(q)) , P(X > t) which leads to the approximation ˆ ˆ f (x)  exp(q x k(q)) f (x) . (A4) We may thus interpret the conditional density f as that density which most resembles the unconditional density f , but whose mean is at least t. Appendix A.5. Approximate Behaviour of (X , X , . . . , X ), Conditioned on a Large Deviation 1 2 N ˆ ˆ Let f (x) = f (x , . . . , x ) denote the conditional density of (X , . . . , X ), given that X > t. Then t t 1 N 1 N N f (x ) i=1 f (x) = , x 2 A , t N,t where p = P(X > t) and A is the set of those points x 2 [x , x ] whose average value t N N,t min max exceeds t. We seek a density h(x), supported on [x , x ], which minimizes the Kullback-Leibler min max divergence (KLD) of h(x) := h(x ) Õ i i=1 ˆ ˆ from f . In other words, we seek an independent sequence Y , Y , . . . , Y (whose density is h) whose t 1 2 N behaviour most resembles (in a certain sense) the behaviour of X , X , . . . , X , conditioned on the 1 2 N large deviation X > t. ˆ ˆ Now let E denote expectation with respect to the density g. Then the divergence of h from f is g t ˆ ˆ E [log( f (X)/h(X))] = E [log ( f (X )/h(X ))] log( p ) ˆ t ˆ t å i i f f t t i=1 = N E [log ( f (X )/h(X ))] log( p ) ˆ 1 1 t = N E [log ( f (X )/h(X ))] log( p ) f 1 1 = N E [log ( f (X )/ f (X ))] + N E [log ( f (X )/h(X ))] log( p ) f 1 t 1 f t 1 1 t t t Now, the middle term in the above display is the KLD of h from f . As such it is non-negative, and is ˆ ˆ equal to zero if and only if h = f . It follows immediately that the divergence of h from f is minimised t t by setting h = f . Appendix B. Important Exponential Families This appendix considers two important special cases—the Gaussian and t families—of the general setting discussed in Section 2.2. Risks 2020, 8, 25 34 of 36 Appendix B.1. Gaussian Suppose first that the Z is Gaussian with mean vector m 2 R and positive definite covariance matrix S . When specifying the IS distribution, one can either (i) shift the mean of Z but leaves its covariance structure unchanged or (ii) shift its mean and adjust its covariance structure. In general the latter approach will lead to a better approximation of the ideal IS density but more volatile IS weights. If we take the former approach (shifting mean, leaving covariance structure unchanged), the implicit family in which we are embedding f is the Gaussian family with arbitrary mean vector m 2 R and fixed covariance matrix S . To this end, let f (z) = f(z; m , S ) denote the Gaussian density 0 0 0 with mean vector m and covariance matrix S and let f (z) = f(z; m, S ). It remains to identify the 0 0 l 0 natural sufficient statistic and write the natural parameter l in terms of the mean vector m. To this end, note that f (z) 1 1 T T 1 T 1 T 1 = exp (m m )S z m S m + m S m . 0 0 f (z) 2 2 The natural sufficient statistic is therefore S(z) = (z , . . . , z ) , the natural parameter is l(m) = S (m m ) . Note that we can write m(l) = m + S l, so that the natural parameter represents a sort of normalized 0 0 deviation from the actual mean m to the IS mean m. Lastly, we see that the cgf of S(Z) is h i 1 1 T 1 T 1 T T T K(l) = m S m m S m = l m + l S l , l 0 0 l 0 0 0 0 2 2 where we have written m instead of m(l) in the above display. Clearly, we have that both K(l) and K(l) are well-defined for all l 2 R . The implication is that if we shift the mean of Z but leave its covariance structure unchanged, the IS weight will have finite variance regardless of what IS mean we choose. If we take the former approach (shifting mean, adjusting covariance) the implicit family in which we are embedding f is the Gaussian family with arbitrary mean vector m and arbitrary positive definite covariance matrix S. In this case we have f (z) = f(z; m, S) and the ratio of f (z) to f (z) is l l T 1 T 1 T 1 1 exp((m S m S )z + z (S S )z K(m, S)) , 0 0 0 where h i T 1 T 1 K(m, S) = m S m m S m + log(det(S) log(det(S ) . 0 0 The natural sufficient statistic therefore consists of the d elements of the vector z plus the d elements of the vector zz . The natural parameter l consists of the elements of the vector 1 1 l := l (m, S) = S m S m 1 1 0 plus the elements of the matrix 1 1 l := l (S) = (S S ) . 2 2 Note that since we have assumed S is positive definite, we are implicitly assuming that the matrix l is such that the determinant of S l is strictly positive. The natural parameter space is therefore 2 0 unrestricted for l , but restricted (to matrices such that the indicated determinant is strictly positive) for l . 2 Risks 2020, 8, 25 35 of 36 The above relations can be inverted to write m and S in terms of l and l , indeed 1 2 1 1 S = S(l ) = (S 2l ) 2 2 and 1 1 1 m = m(l , l ) = (S 2l ) (l + S m ) . 1 2 2 1 0 0 0 The cgf of the natural sufficient statistic is 1 1 1 1 T 1 T 1 K(l) = K(l , l ) = K(m , S ) = m S m m S m + log(det(S )) log(det(S )) 1 2 l ,l l l ,l 0 l 0 1 2 2 l ,l 1 2 0 0 2 1 2 l 2 2 2 2 It is now clear that K(l) is well defined if and only if the determinant of S(l ) is strictly positive, which we have implicitly assumed to be the case since we have insisted S be positive definite. It is also clear that K(l) is well-defined if and only if the determinants S(l ) is strictly positive, which will occur if and only if the determinant of (2S S ) is strictly positive. Remark A1. Suppose that f and f are Gaussian densities with respective positive definite covariance matrices S and S. Further suppose that Z  f . Then the variance of f (Z)/ f (Z) is finite if and only if det(2S 0 l l S ) > 0. In the one-dimensional case d = 1 we write Z = Z. The condition in Remark A1 is satisfied 2 2 whenever s > s /2. In other words, if the variance of the IS distribution is too small, relative to actual variance of Z, then the IS weight will have infinite variance. Appendix B.2. Chi-Square Family In preparation for the multivariate t family, we first consider the chi-square family. Suppose that Z follows a chi-square distribution with n degrees of freedom, and that the goal is to allow Z to have arbitrary degrees of freedom n > 0 under the IS density. In order to identify the natural sufficient statistic S(z) and natural parameter l = l(n), we let f (z) denote the chi-square density with n degrees of freedom and f (z) the chi-square density with n degrees of freedom. Then 0 l f (z) n n n n n n l 0 0 0 = exp log(z) log(2) + log G log G f (z) 2 2 2 2 from which we see that S(z) = log(z) and l = l(n) = (n n )/2. In addition we see that the cgf of S(z) is K(l) = l log(2) + log (G (l + n /2)) log(G(n /2)) . 0 0 In order that K(l) be will defined, we require n > 0, which is obvious. In order that K(l) is well-defined we require l + be positive, which in turn requires n < 2n . In other words, if the IS degrees of freedom are more than twice the actual degrees of freedom, then the IS weight will have infinite variance. Appendix B.3. t Family The t family is not a regular exponential family, so it does not fit directly into the framework discussed in Section 2.2. That being said, a multivariate t vector can be constructed from a Gaussian vector and an independent chi-square variable. Indeed if Z is Gaussian with mean zero and covariance matrix S , and R is chi-square with n degrees of freedom (independent of Z), then 0 0 Z = m +  Z , (A5) is multivariate t with n degrees of freedom, mean m and covariance matrix S . 0 0 0 n 2 0 Risks 2020, 8, 25 36 of 36 In the case that Z is multivariate t, then, we can take our systematic risk factors to be the components of (Z, R). In this case the joint density of the systematic risk factors can be embedded into the parametric family T T f (z ˆ , r) := exp(l S(z ˆ) K(l)) exp(h T(r) L(h)) f (z ˆ) g(r) , (A6) l,h where l is and S are the natural parameter and sufficient statistic for the Gaussian family, h and L are those for the chi-square family, and f and g are the Gaussian and chi-square densities. References Bickel, Peter J., and Kjell A. Doksum. 2001. Mathematical Statistics: Basic Ideas and Selected Topics, 2nd ed. Upper Saddle River: Prentice Hall, Volume 1. Chan, Joshua C.C., and Dirk P. Kroese. 2010. Efficient estimation of large portfolio loss probabilities in t-copula models. European Journal of Operational Research 205: 361–67. Chatterjee, Sourav, and Persi Diaconis. 2018. The sample size required in importance sampling. Annals of Applied Probability 28: 1099–135. [CrossRef] de Wit, Tim. 2016. Collateral Damage—Creating a Credit Loss Model Incorporating a Dependency between Defaults and LGDs. Master ’s thesis, University of Twente, Enschede, The Netherlands. Deng, Shaojie, Kay Giesecke, and Tze Leung Lai. 2012. Sequential importance sampling and resampling for dynamic portfolio credit risk. Operations Research 60: 78–91. [CrossRef] Eckert, Johanna, Kevin Jakob, and Matthias Fischer. 2016. A credit portfolio framework under dependent risk parameters PD, LGD and EAD. Journal of Credit Risk 12: 97–119. [CrossRef] Frye, Jon. 2000. Collateral damage. Risk 13: 91-94. Frye, Jon, and Michael Jacobs Jr. 2012. Credit loss and systematic loss given default. Journal of Credit Risk 8: 109–140. [CrossRef] Glasserman, Paul, and Jingyi Li. 2005. Importance sampling for portfolio credit risk. Management Science 51: 1643–56. [CrossRef] Ionides, Edward L. 2008. Truncated importance sampling. Journal of Computational and Graphical Statistics 17: 295–311. [CrossRef] Jeon, Jong-June, Sunggon Kim, and Yonghee Lee. 2017. Portfolio credit risk model with extremal dependence of defaults and random recovery. Journal of Credit Risk 13: 1–31. [CrossRef] Kupiec, Paul H. 2008. A generalized single common factor model of portfolio credit risk. Journal of Derivatives 15: 25–40. [CrossRef] Miu, Peter, and Bogie Ozdemir. 2006. Basel requirements of downturn loss given default: Modeling and estimating probability of default and loss given default correlations. Journal of Credit Risk 2: 43–68. [CrossRef] Pykhtin, Michael. 2003. Unexpected recovery risk. Risk 16: 74–78. Scott, Alexandre, and Adam Metzler. 2015. A general importance sampling algorithm for estimating portfolio loss probabilities in linear factor models. Insurance: Mathematics and Economics 64: 279–93. Sen, Rahul. 2008. A multi-state Vasicek model for correlated default rate and loss severity. Risk 21: 94–100. Witzany, Jirí. ˇ 2011. A Two-Factor Model for PD and LGD Correlation. Working Paper. Available online: http://dx.doi.org/10.2139/ssrn.1476305 (accessed on 9 March 2020). c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Journal

RisksMultidisciplinary Digital Publishing Institute

Published: Mar 10, 2020

There are no references for this article.