Access the full text.

Sign up today, get DeepDyve free for 14 days.

Dependence Modeling
, Volume 11 (1): 1 – Jan 1, 2023

/lp/de-gruyter/joint-lifetime-modeling-with-matrix-distributions-vJlEZur1zM

- Publisher
- de Gruyter
- Copyright
- © 2023 the author(s), published by De Gruyter
- ISSN
- 2300-2298
- eISSN
- 2300-2298
- DOI
- 10.1515/demo-2022-0153
- Publisher site
- See Article on Publisher Site

1IntroductionWhen studying insurance products on multiple lives, it is natural to assume that individuals who are exposed to very similar life conditions may have somewhat correlated lifetimes. This is especially true for married couples, since, once married, the spouses typically share to a large extent a similar lifestyle. Indeed, the simplistic assumption of independence of lifetimes of partners has been shown to be inappropriate in various papers. For example, Frees et al. [15] used a bivariate Frank copula model to assess the effect of dependency between husband and wife on insurance annuities, illustrating their approach on a by now classical data set of a large insurer. The same data were used in Carriere [10], where multiple bivariate copula models were studied. The Linear-Mixing Frailty copula was found to be the best-suited model to describe the data. Shemyakin and Youn [22] introduced a general conditional Bayesian copula for the joint last survivor analysis. They allow entry ages of spouses to have a selection effect on his/her mortality, as well as on the other spouse’s mortality. For that same data set, Luciano et al. [19] captured the dependence between survival times of spouses by an Archimedean copula, whose marginals were estimated according to a stochastic intensity approach. In a similar spirit, Dufresne et al. [13] allow for the Archimedean copula parameter to depend on the age difference of partners at issue of the policy, to describe the dependence of the remaining lifetime of a couple. In the the study by Gobbi et al. [16], extended Marshall-Olkin models were employed for that same data set, where the continuous copula approach is extended by allowing for fatal events that affect both marginal lives.In this article, we propose an alternative to copula-based methods for the modeling of joint remaining lifetimes in a couple based on multivariate phase-type distributions. Phase-type (PH) distributions are interesting candidates since they broaden favorable properties of exponential random variables to scenarios where the latter alone would not be appropriate. In particular, the denseness of PH distributions among all distributions on the positive half-line in the sense of weak convergence, which extends to the multivariate setup, is a major advantage when one wants to approximate a distribution. For more details on PH distributions, we refer readers to Bladt and Nielsen [8]. As opposed to copula-based methods, a PH distribution can give rise to a natural interpretation when used to approximate lifetime distributions. One can view the path of a Markov jump process as the life of an individual, which goes through several different states (for instance, biological markers) before reaching the inevitable absorption (death) state. Along this interpretation, acyclic PH distributions have been the first choice for modeling the aging process of a human life, since they only allow forward transitions or direct exits to the absorption state. This characteristic makes them an appropriate tool for describing lifetimes ended by natural aging or accidents. In the study by Lin and Liu [18], a PH distribution with the Coxian structure was used to explain the physical aging process of marginal lifetimes. This approach was extended in the study by Asmussen et al. [5] to generalized Coxian distributions for the purpose of pricing equity-linked products. The first contribution to lift the PH approach to bivariate lifetime models was Ji et al. [17], where a Markovian multistate model and a semi-Markov model are used to describe the dependence between the lifetimes of husbands and wives. Spreeuw and Owadally [23] also use a Markovian multistate model, with more attention given on how to tie the bereavement effect to forces of mortality. Moutanabbir and Abdelrahman [20] then used a bivariate Sarmanov distribution with PH marginals to model joint lifetimes. Both articles focused on the pricing of multiple-life insurance contracts.Recently, Albrecher et al. [2] introduced time-inhomogeneous PH (IPH) distributions for the purpose of lifetime modeling, which leads to a considerable reduction of necessary phases for a satisfactory fit of given data, since the introduced inhomogeneity can more efficiently accommodate nonexponential shapes than an augmentation of the phase dimension. In particular, [2] applied regression on the intensity functions of the IPH distributions to associate lifetimes of different cohorts and populations.In this article, we propose a different route for using available information in the data set, namely, to incorporate multinomial logistic regressions in the estimation procedure of multivariate PH distributions. In particular, the regression is applied to the initial distribution vectors of each IPH component, which adapts an approach presented in Bladt and Yslas [9] to the multivariate case. The resulting dependence structure allows for explicit formulas alongside an intuitive “aging” interpretation and, beyond the theoretical contribution, for a satisfactory fit to the bivariate spouses’ lifetime data.The remainder of this article is structured as follows. Section 2 introduces the class of multivariate PH distributions that we will use to describe joint lifetimes of couples; we also provide some additional properties. In Section 3, an estimation method for this multivariate PH distribution is introduced, which allows for right censoring and covariate information. Section 4 then applies and illustrates the procedure on the classical spouses’ lifetime data set from [15] and interprets the results. Section 5 concludes this article.2Multivariate phase-type distributionsWe first recall the mPH class, which was introduced in the study by Bladt [7].2.1mPH distributionsLet {Jt(i)}t≥0{\left\{{J}_{t}^{\left(i)}\right\}}_{t\ge 0}, i=1,…,di=1,\ldots ,d, denote separate homogeneous Markov pure-jump processes on the common state space E={1,…,p,p+1}E=\left\{1,\ldots ,p,p+1\right\}, with states 1,…,p1,\ldots ,pbeing transient and p+1p+1absorbing. Defining transition probabilities as follows: pjl(i)(s,t)=P(Jt(i)=l∣Js(i)=j),0≤j,l≤p+1,0<i≤d,{p}_{jl}^{\left(i)}\left(s,t)={\mathbb{P}}\left({J}_{t}^{\left(i)}=l| {J}_{s}^{\left(i)}=j),\hspace{1.0em}0\le j,l\le p+1,\hspace{0.33em}0\lt i\le d,we may write Pi(s,t)=exp(Λi(t−s))=exp(Ti(t−s))e−exp(Ti(t−s))e01∈R(p+1)×(p+1),{{\boldsymbol{P}}}_{i}\left(s,t)=\exp \left({{\boldsymbol{\Lambda }}}_{i}\left(t-s))=\left(\begin{array}{cc}\exp \left({{\boldsymbol{T}}}_{i}\left(t-s))& {\boldsymbol{e}}-\exp \left({{\boldsymbol{T}}}_{i}\left(t-s)){\boldsymbol{e}}\\ {\boldsymbol{0}}& 1\end{array}\right)\in {{\mathbb{R}}}^{\left(p+1)\times \left(p+1)},for s<ts\lt t, 0≤i≤d0\le i\le d, where Λi(t){{\boldsymbol{\Lambda }}}_{i}\left(t)are intensity matrices. In the following, we write ek{{\boldsymbol{e}}}_{k}for the kk-th canonical basis vector in Rp{{\mathbb{R}}}^{p}, e=∑j=1pej{\boldsymbol{e}}={\sum }_{j=1}^{p}{{\boldsymbol{e}}}_{j}, and Ti={tks(i)}k,s=1,…,p,ti=−Tie=(t1(i),…,tp(i))T,k=1,…,p.{{\boldsymbol{T}}}_{i}={\left\{{t}_{ks}^{\left(i)}\right\}}_{k,s=1,\ldots ,p},\hspace{1.0em}{{\boldsymbol{t}}}_{i}=-{{\boldsymbol{T}}}_{i}{\boldsymbol{e}}={\left({t}_{1}^{\left(i)},\ldots ,{t}_{p}^{\left(i)})}^{{\mathsf{T}}},\hspace{0.33em}k=1,\ldots ,p.The crucial property of this class of PH distributions is now its dependence structure. Concretely, the assumption is that all jump processes start in the same state at time t=0t=0, but proceed independently thereafter until absorption. That is, dependence is introduced solely through the shared initial state, which leads to a particularly tractable yet flexible model class. More formally, (2.1)J0(i)=J0(l),{Jt(i)}t≥0⊥⊥J0(1){Jt(l)}t≥0,l≠i,∀i,l∈{1,…,d}.{J}_{0}^{\left(i)}={J}_{0}^{\left(l)},\hspace{1.0em}{\left\{{J}_{t}^{\left(i)}\right\}}_{t\ge 0}{\perp \perp }_{{J}_{0}^{\left(1)}}{\left\{{J}_{t}^{\left(l)}\right\}}_{t\ge 0,l\ne i},\hspace{1.0em}\forall i,l\in \left\{1,\ldots ,d\right\}.We will use J0≔J0(i){J}_{0}:= {J}_{0}^{\left(i)}to simplify notation. Let P(J0=j)=πj{\mathbb{P}}\left({J}_{0}=j)={\pi }_{j}, j=1,…,pj=1,\ldots ,pand π=(π1,…,πp){\boldsymbol{\pi }}=\left({\pi }_{1},\ldots ,{\pi }_{p})denote the distribution vector of the shared initial state. The random variables (2.2)Xi=inf{t>0:Jt(i)=p+1},i=1,…,d,{X}_{i}=\inf \left\{t\gt 0:{J}_{t}^{\left(i)}=p+1\right\},\hspace{1.0em}i=1,\ldots ,d,are then all univariate PH distributed. We say that the random vector X=X1,…,Xd∈R+dX=\left(\begin{array}{c}{X}_{1},\ldots ,{X}_{d}\end{array}\right)\in {{\mathbb{R}}}_{+}^{d}has a multivariate phase-type distribution (mPH) if each marginal variable Xi{X}_{i}, i=1,2,…,di=1,2,\ldots ,dis given by (2.2) and pairwise dependence is defined by (2.1). We use the notation X∼mPH(π,T),withT={T1,…,Td}.X\hspace{0.33em} \sim \hspace{0.33em}\hspace{0.1em}\text{mPH}\hspace{0.1em}\left({\boldsymbol{\pi }},{\mathcal{T}}),\hspace{1.0em}\hspace{0.1em}\text{with}\hspace{0.1em}\hspace{1.0em}{\mathcal{T}}=\left\{{{\boldsymbol{T}}}_{1},\ldots ,{{\boldsymbol{T}}}_{d}\right\}.The joint cumulative distribution function of XXis given by FX(x)=P(X1≤x1,X2≤x2,…,Xd≤xd)=∑j=1pP(X1≤x1,X2≤x2,…,Xd≤xd∣J0=j)P(J0=j)=∑j=1pπj∏i=1d(1−ejTexp(Tixi)e),x∈R+d.\begin{array}{rcl}{F}_{X}\left(x)& =& {\mathbb{P}}\left({X}_{1}\le {x}_{1},{X}_{2}\le {x}_{2},\ldots ,{X}_{d}\le {x}_{d})\\ & =& \mathop{\displaystyle \sum }\limits_{j=1}^{p}{\mathbb{P}}\left({X}_{1}\le {x}_{1},{X}_{2}\le {x}_{2},\ldots ,{X}_{d}\le {x}_{d}\hspace{0.33em}| \hspace{0.33em}{J}_{0}=j){\mathbb{P}}\left({J}_{0}=j)\\ & =& \mathop{\displaystyle \sum }\limits_{j=1}^{p}{\pi }_{j}\mathop{\displaystyle \prod }\limits_{i=1}^{d}(1-{{\boldsymbol{e}}}_{j}^{{\mathsf{T}}}\exp \left({{\boldsymbol{T}}}_{i}{x}_{i}){\boldsymbol{e}}),\hspace{1.0em}x\in {{\mathbb{R}}}_{+}^{d}.\end{array}Furthermore, the survival function is SX(x)=P(X1>x1,X2>x2,…,Xd>xd)=∑j=1pπj∏i=1dejTexp(Tixi)e,{S}_{X}\left(x)={\mathbb{P}}\left({X}_{1}\gt {x}_{1},{X}_{2}\gt {x}_{2},\ldots ,{X}_{d}\gt {x}_{d})=\mathop{\sum }\limits_{j=1}^{p}{\pi }_{j}\mathop{\prod }\limits_{i=1}^{d}{{\boldsymbol{e}}}_{j}^{{\mathsf{T}}}\exp \left({{\boldsymbol{T}}}_{i}{x}_{i}){\boldsymbol{e}},and the probability density function is given by fX(x)=∑j=1pπj∏i=1dejTexp(Tixi)ti.{f}_{X}\left(x)=\mathop{\sum }\limits_{j=1}^{p}{\pi }_{j}\mathop{\prod }\limits_{i=1}^{d}{{\boldsymbol{e}}}_{j}^{{\mathsf{T}}}\exp \left({{\boldsymbol{T}}}_{i}{x}_{i}){{\boldsymbol{t}}}_{i}.For more details, compare with [7].2.2mIPH distributionsThe particular focus in this article will now be on an inhomogeneous extension of the mPH distribution (briefly mentioned in [7, Section 6.1]). When considering time-inhomogeneous Markov pure jump processes on the common state-space EE, it follows from Albrecher and Bladt [1] that the transition matrices are modified to P(s,t)=∏st(I+Λ(u)du)≔I+∑k=1∞∫st∫suk⋯∫su2Λ(u1)⋯Λ(uk)du1⋯duk,{\boldsymbol{P}}\left(s,t)=\mathop{\prod }\limits_{s}^{t}\left({\boldsymbol{I}}+{\boldsymbol{\Lambda }}\left(u){\rm{d}}u):= {\boldsymbol{I}}+\mathop{\sum }\limits_{k=1}^{\infty }\underset{s}{\overset{t}{\int }}\underset{s}{\overset{{u}_{k}}{\int }}\cdots \underset{s}{\overset{{u}_{2}}{\int }}{\boldsymbol{\Lambda }}({u}_{1})\hspace{0.25em}\cdots {\boldsymbol{\Lambda }}({u}_{k}){\rm{d}}{u}_{1}\cdots {\rm{d}}{u}_{k},with sub-intensity matrix Λ(t)=T(t)t(t)00∈R(p+1)×(p+1),t≥0.{\boldsymbol{\Lambda }}\left(t)=\left(\begin{array}{cc}{\boldsymbol{T}}\left(t)& {\boldsymbol{t}}\left(t)\\ {\boldsymbol{0}}& 0\end{array}\right)\in {{\mathbb{R}}}^{\left(p+1)\times \left(p+1)},\hspace{1.0em}t\ge 0.The random variables Yi=inf{t>0:Jt(i)=p+1},i=1,…,d,{Y}_{i}=\inf \left\{t\gt 0:{J}_{t}^{\left(i)}=p+1\right\},\hspace{0.33em}i=1,\ldots ,d,then follow univariate inhomogeneous phase-type (IPH) distributions, and readers can compare with [1] for more details.Here, we focus on the particularly tractable case Ti(t)=λi(t)T{{\boldsymbol{T}}}_{i}\left(t)={\lambda }_{i}\left(t){\boldsymbol{T}}. A random vector Y=Y1,…,YdY=\left(\begin{array}{c}{Y}_{1},\ldots ,{Y}_{d}\end{array}\right)is said to have an inhomogeneous multivariate PH (mIPH) distribution if all marginals follow IPH distributions, and the dependence structure is defined by (2.1). We write Y∼mIPH(π,T,ℒ),whereT={T1,…,Td},ℒ={λ1,…,λd}.Y\hspace{0.33em} \sim \hspace{0.33em}\hspace{0.1em}\text{mIPH}\hspace{0.1em}\left({\boldsymbol{\pi }},{\mathcal{T}},{\mathcal{ {\mathcal L} }}),\hspace{1.0em}\hspace{0.1em}\text{where}\hspace{0.1em}\hspace{0.33em}{\mathcal{T}}=\left\{{{\boldsymbol{T}}}_{1},\ldots ,{{\boldsymbol{T}}}_{d}\right\},\hspace{1.0em}{\mathcal{ {\mathcal L} }}=\left\{{\lambda }_{1},\ldots ,{\lambda }_{d}\right\}.With gi−1(y)≔∫0yλi(u)du,i=1,…,d,{g}_{i}^{-1}(y):= \underset{0}{\overset{y}{\int }}{\lambda }_{i}\left(u){\rm{d}}u,\hspace{1.0em}i=1,\ldots ,d,the cumulative distribution function, survival function, and density of YYare given by FY(y)=∑j=1pπj∏i=1d(1−ejTexp(Tigi−1(yi))e),y∈R+d,SY(y)=∑j=1pπj∏i=1dejTexp(Tigi−1(yi))e,y∈R+d,\begin{array}{rcl}{F}_{Y}(y)& =& \mathop{\displaystyle \sum }\limits_{j=1}^{p}{\pi }_{j}\mathop{\displaystyle \prod }\limits_{i=1}^{d}\left(1-{{\boldsymbol{e}}}_{j}^{{\mathsf{T}}}\exp \left({{\boldsymbol{T}}}_{i}{g}_{i}^{-1}({y}_{i})){\boldsymbol{e}}),\hspace{1.0em}y\in {{\mathbb{R}}}_{+}^{d},\\ {S}_{Y}(y)& =& \mathop{\displaystyle \sum }\limits_{j=1}^{p}{\pi }_{j}\mathop{\displaystyle \prod }\limits_{i=1}^{d}{{\boldsymbol{e}}}_{j}^{{\mathsf{T}}}\exp \left({{\boldsymbol{T}}}_{i}{g}_{i}^{-1}({y}_{i})){\boldsymbol{e}},\hspace{1.0em}y\in {{\mathbb{R}}}_{+}^{d},\end{array}and fY(y)=∑j=1pπj∏i=1dejTexp(Tigi−1(yi))tiλi(yi),y∈R+d,{f}_{Y}(y)=\mathop{\sum }\limits_{j=1}^{p}{\pi }_{j}\mathop{\prod }\limits_{i=1}^{d}{{\boldsymbol{e}}}_{j}^{{\mathsf{T}}}\exp \left({{\boldsymbol{T}}}_{i}{g}_{i}^{-1}({y}_{i})){{\boldsymbol{t}}}_{i}{\lambda }_{i}({y}_{i}),\hspace{1.0em}y\in {{\mathbb{R}}}_{+}^{d},respectively. Note that one can view each IPH random variable as a transformation of a PH random variable (and correspondingly the absorption time of a time-transformed formerly time-homogeneous Markov jump process), with X∼PH(π,T)X\hspace{0.33em} \sim \hspace{0.33em}\hspace{0.1em}\text{PH}\hspace{0.1em}\left({\boldsymbol{\pi }},{\boldsymbol{T}})and g(X)∼IPH(π,T,λ)g\left(X)\hspace{0.33em} \sim \hspace{0.33em}\hspace{0.1em}\text{IPH}\hspace{0.1em}\left({\boldsymbol{\pi }},{\boldsymbol{T}},\lambda ).The construction of mIPH(π,T,ℒ)\hspace{0.1em}\text{mIPH}\hspace{0.1em}\left({\boldsymbol{\pi }},{\mathcal{T}},{\mathcal{ {\mathcal L} }})allows different sub-intensity matrices and inhomogeneity functions for each marginal, as long as they share the same state-space. This leads to a considerable model flexibility. In particular, when compared to the homogeneous case, time-inhomogeneity allows for substantially smaller state-spaces for appropriate fits of data with potentially nonexponential tails (compare with [1]), and the mIPH class inherits this feature.When we condition a mIPH distribution on one or more marginals, we obtain another mIPH distribution with a new initial distribution vector and smaller dimension: Let Y∼mIPH(π,T,ℒ)Y\hspace{0.33em} \sim \hspace{0.33em}\hspace{0.1em}\text{mIPH}\hspace{0.1em}\left({\boldsymbol{\pi }},{\mathcal{T}},{\mathcal{ {\mathcal L} }})and condition on the value of Yl{Y}_{l}, l≤dl\le d. The conditional density is fY∣Yl(y∣yl)=∑j=1pπjejTexp(Tlgl−1(yl))tlλl(yl)πexp(Tlgl−1(yl))tlλl(yl)∏i≠lejTexp(Tigl−1(yi))tiλi(yi).{f}_{Y| {Y}_{l}}(y\hspace{0.33em}| \hspace{0.33em}{y}_{l})=\mathop{\sum }\limits_{j=1}^{p}\frac{{\pi }_{j}{{\boldsymbol{e}}}_{j}^{{\mathsf{T}}}\exp \left({{\boldsymbol{T}}}_{l}{g}_{l}^{-1}({y}_{l})){{\boldsymbol{t}}}_{l}{\lambda }_{l}({y}_{l})}{{\boldsymbol{\pi }}\exp \left({{\boldsymbol{T}}}_{l}{g}_{l}^{-1}({y}_{l})){{\boldsymbol{t}}}_{l}{\lambda }_{l}({y}_{l})}\prod _{i\ne l}{{\boldsymbol{e}}}_{j}^{{\mathsf{T}}}\exp \left({{\boldsymbol{T}}}_{i}{g}_{l}^{-1}({y}_{i})){{\boldsymbol{t}}}_{i}{\lambda }_{i}({y}_{i}).That is, (2.3)Y∣Yl=yl∼mIPH(α,T⧹Tl,ℒ⧹λl),Y\hspace{0.33em}| \hspace{0.33em}{Y}_{l}={y}_{l}\hspace{0.33em} \sim \hspace{0.33em}\hspace{0.1em}\text{mIPH}\hspace{0.1em}\left({\boldsymbol{\alpha }},{\mathcal{T}}\setminus {{\boldsymbol{T}}}_{l},{\mathcal{ {\mathcal L} }}\setminus {\lambda }_{l}),with initial distribution vector α=πj×ejTexp(Tlgl−1(yl))tlλl(yl)πexp(Tlgl−1(yl))tlλl(yl)j=1,…,p.{\boldsymbol{\alpha }}={\left\{{\pi }_{j}\times \frac{{{\boldsymbol{e}}}_{j}^{{\mathsf{T}}}\exp \left({{\boldsymbol{T}}}_{l}{g}_{l}^{-1}({y}_{l})){{\boldsymbol{t}}}_{l}{\lambda }_{l}({y}_{l})}{{\boldsymbol{\pi }}\exp \left({{\boldsymbol{T}}}_{l}{g}_{l}^{-1}({y}_{l})){{\boldsymbol{t}}}_{l}{\lambda }_{l}({y}_{l})}\right\}}_{j=1,\ldots ,p}.The same reasoning can be applied to obtain (2.4)Y∣Yl≥yl∼mIPH(ν,T⧹Tl,ℒ⧹λl),Y\hspace{0.33em}| \hspace{0.33em}{Y}_{l}\ge {y}_{l}\hspace{0.33em} \sim \hspace{0.33em}\hspace{0.1em}\text{mIPH}\hspace{0.1em}\left({\boldsymbol{\nu }},{\mathcal{T}}\setminus {{\boldsymbol{T}}}_{l},{\mathcal{ {\mathcal L} }}\setminus {\lambda }_{l}),with ν=πj×ejTexp(Tlgl−1(yl))eπexp(Tlgl−1(yl))ej=1,…,p.{\boldsymbol{\nu }}={\left\{{\pi }_{j}\times \frac{{{\boldsymbol{e}}}_{j}^{{\mathsf{T}}}\exp \left({{\boldsymbol{T}}}_{l}{g}_{l}^{-1}({y}_{l})){\boldsymbol{e}}}{{\boldsymbol{\pi }}\exp \left({{\boldsymbol{T}}}_{l}{g}_{l}^{-1}({y}_{l})){\boldsymbol{e}}}\right\}}_{j=1,\ldots ,p}.Remark 2.1One might be tempted to argue that Assumption (2.1) necessarily leads to positive dependence of the resulting random variables (in our case lifetimes). However, sharing the initial state is not sufficient to obtain positive dependence, as the different intensity matrices may introduce counter-effects. For instance, after starting in the same state, we could have a very small expected holding time and direct absorption for one marginal, while the second has to pass through the entire state space before absorption happens, leading to a very large survival time. This behavior could be very well reversed when starting in another (but common) state. Consequently, certain combinations of individual intensity matrices may give rise to negative dependence as well.3Parameter estimation for right-censored data and covariate informationIn the following, we first introduce the components needed to estimate the parameters of mIPH distributions, when right-censored data is present. Second, we present how to estimate initial distribution vectors considering covariate information. Finally, we propose an adapted expectation maximization (EM) algorithm, which we name ERMI algorithm.3.1EM algorithm for right-censored dataTaking inspiration from Asmussen et al. [6] and Olsson [21], we now derive conditional expectations needed in the EM algorithm for mPH distributions, where absorption times are allowed to be right censored. Since the eventually targeted mIPH distributions are transformed mPH distributions, after transformation of the data, the E-Step and M-Step of the algorithm are the same as for the time-homogeneous case.Let X=(X1,…,Xd){\boldsymbol{X}}=\left({X}_{1},\ldots ,{X}_{d})be the collection of random variables we are interested in. Let x=x1(m),…,xd(m){\boldsymbol{x}}=\left(\begin{array}{c}{x}_{1}^{\left(m)},\ldots ,{x}_{d}^{\left(m)}\end{array}\right)be the observations of absorption times assumed to be generated from X∼mPH(π,T){\boldsymbol{X}}\hspace{0.33em} \sim \hspace{0.33em}\hspace{0.1em}\text{mPH}\hspace{0.1em}\left({\boldsymbol{\pi }},{\mathcal{T}}), where xi(m)∈R+n{x}_{i}^{\left(m)}\in {{\mathbb{R}}}_{+}^{n}for i=1,…,di=1,\ldots ,d. We assume that the censoring mechanism is independent of the size of the random variables. The marginals Xi(m)=min(xi(m),Ri(m)){X}_{i}^{\left(m)}=\min \left({x}_{i}^{\left(m)},{R}_{i}^{\left(m)})follow PH(π,Ti)\left({\boldsymbol{\pi }},{{\boldsymbol{T}}}_{i})distributions, where Ri(m){R}_{i}^{\left(m)}is a random censoring point for the mm-th observation. The realization of random right-censoring indicators can be found in Δ=δ1(m),…,δd(m){\boldsymbol{\Delta }}=\left(\begin{array}{c}{\delta }_{1}^{\left(m)},\ldots ,{\delta }_{d}^{\left(m)}\end{array}\right), where elements δi(m)∈R+n{\delta }_{i}^{\left(m)}\in {{\mathbb{R}}}_{+}^{n}, i=1,…,di=1,\ldots ,d, are equal to 1 if the absorption time xi(m){x}_{i}^{\left(m)}is fully observed and 0 if xi(m)≥Ri(m){x}_{i}^{\left(m)}\ge {R}_{i}^{\left(m)}is right-censored.The sample X{\boldsymbol{X}}is associated with the latent sample paths {Jt(i,m)}t≥0{\left\{{J}_{t}^{\left(i,m)}\right\}}_{t\ge 0}, i=1,…,di=1,\ldots ,d, m=1,…,nm=1,\ldots ,n, which are not observed. To face this issue, we make the following definitions. Let Bk=∑i=1d∑m=1n1{J0(i,m)=k},k=1,…,p,Nks(i)=∑m=1n∑t≥01{Jt−(i,m)=k,Jt(i,m)=s},k,s=1,…,p,i=1,…,d,Nk(i)=∑m=1n∑t≥01{Jt−(i,m)=k,Jt(i,m)=p+1},k=1,…,p,i=1,…,d,Zk(i)=∑m=1n∫0∞1{Jt(i,m)=k}dt,k=1,…,p,i=1,…,d.\begin{array}{rcl}{B}_{k}& =& \mathop{\displaystyle \sum }\limits_{i=1}^{d}\mathop{\displaystyle \sum }\limits_{m=1}^{n}1\left\{{J}_{0}^{\left(i,m)}=k\right\},\hspace{1.0em}k=1,\ldots ,p,\\ {N}_{ks}^{\left(i)}& =& \mathop{\displaystyle \sum }\limits_{m=1}^{n}\displaystyle \sum _{t\ge 0}1\left\{{J}_{t-}^{\left(i,m)}=k,{J}_{t}^{\left(i,m)}=s\right\},\hspace{1.0em}k,s=1,\ldots ,p,\hspace{0.33em}i=1,\ldots ,d,\\ {N}_{k}^{\left(i)}& =& \mathop{\displaystyle \sum }\limits_{m=1}^{n}\displaystyle \sum _{t\ge 0}1\left\{{J}_{t-}^{\left(i,m)}=k,{J}_{t}^{\left(i,m)}=p+1\right\},\hspace{1.0em}k=1,\ldots ,p,\hspace{0.33em}i=1,\ldots ,d,\\ {Z}_{k}^{\left(i)}& =& \mathop{\displaystyle \sum }\limits_{m=1}^{n}\underset{0}{\overset{\infty }{\displaystyle \int }}1\left\{{J}_{t}^{\left(i,m)}=k\right\}{\rm{d}}t,\hspace{1.0em}k=1,\ldots ,p,\hspace{0.33em}i=1,\ldots ,d.\end{array}Bk{B}_{k}is the number of times marginal jump processes start in State kk, Nks(i){N}_{ks}^{\left(i)}is the number of transitions from State kkto ssfor jump process ii, and Nk(i){N}_{k}^{\left(i)}is the number of absorptions from State kkfor jump process ii. Finally, Zk(i){Z}_{k}^{\left(i)}is the time spent in state kkbefore absorption of jump process ii. These statistics are not observable, but are sufficient to describe the dynamics of the underlying Markov process. Moreover, they are essential to construct an effective EM-like algorithm. Then, the completely observed likelihood can be expressed using the sufficient statistics defined earlier, as (3.1)ℒc(π,T;x)=∏k=1pπkBk∏i=1d∏k=1p∏s≠ktks(i)Nks(i)e−tks(i)Zk(i)∏k=1ptk(i)Nk(i)e−tk(i)Zk(i),{{\mathcal{ {\mathcal L} }}}_{c}\left({\boldsymbol{\pi }},{\mathcal{T}};{\boldsymbol{x}})=\left(\mathop{\prod }\limits_{k=1}^{p}{{\pi }_{k}}^{{B}_{k}}\right)\left(\mathop{\prod }\limits_{i=1}^{d}\mathop{\prod }\limits_{k=1}^{p}\prod _{s\ne k}{{t}_{ks}^{\left(i)}}^{{N}_{ks}^{\left(i)}}{e}^{-{t}_{ks}^{\left(i)}{Z}_{k}^{\left(i)}}\right)\left(\mathop{\prod }\limits_{k=1}^{p}{{t}_{k}^{\left(i)}}^{{N}_{k}^{\left(i)}}{e}^{-{t}_{k}^{\left(i)}{Z}_{k}^{\left(i)}}\right),which is seen to conveniently fall into the exponential family of distributions and thus has explicit maximum likelihood estimators. The multiplications on the right-hand side of (3.1) can be interpreted as follows: The first product gives the probability of starting in each transient state. The second consists of the likelihood of transitions between transient states, associated with respective sojourn times, such that the absorbing state is not considered in the multiplication. Finally, the last product gives the likelihood of transitions from transient states to the absorbing state, along with time spent in respective states before absorption.With these assumptions, the derivation of Bk{B}_{k}, Zk(i){Z}_{k}^{\left(i)}, Nks(i){N}_{ks}^{\left(i)}, and Nk(i){N}_{k}^{\left(i)}, for k,s=1,…,pk,s=1,\ldots ,pand i=1,…,di=1,\ldots ,d, is analogous to the fully uncensored case (see [7]). The only difference from the fully observed case is that marginals may have right-censored absorption times. To see how this affects the expectation step of the EM algorithm, we give a detailed derivation of E(Bk∣X=x){\mathbb{E}}\left({B}_{k}\hspace{0.33em}| \hspace{0.33em}{\boldsymbol{X}}={\boldsymbol{x}}).For the mm-th row of Y{\boldsymbol{Y}}, let iun(m){i}_{un}^{\left(m)}denote the collection of indices of marginals that are uncensored and similarly irc(m){i}_{rc}^{\left(m)}for right-censored marginals. Naturally iun(m)+irc(m)=d{i}_{un}^{\left(m)}+{i}_{rc}^{\left(m)}=d. Then, the conditional expectation of Bk{B}_{k}under right-censoring is E(Bk∣X=x)=∑i=1d∑m=1nE(1{J0(i,m)=k}∣X=x)=d×∑m=1nP(J0(m)=k∣X=x)=d×∑m=1nP(J0(m)=k)P(Xj∈dxj(m),Xl≥xl(m);j∈iun(m),l∈irc(m)∣J0(m)=k)P(Xj∈dxj(m),Xl≥xl(m);j∈iun(m),l∈irc(m))=d×∑m=1nπk(m)∏j∈iun(m)ekTexp(Tjxj(m))tj∏l∈irc(m)ekTexp(Tlxl(m))e∑s=1pπs(m)∏j∈iun(m)esTexp(Tjxj(m))tj∏l∈irc(m)esTexp(Tlxl(m))e,\begin{array}{rcl}{\mathbb{E}}\left({B}_{k}\hspace{0.33em}| \hspace{0.33em}{\boldsymbol{X}}={\boldsymbol{x}})& =& \mathop{\displaystyle \sum }\limits_{i=1}^{d}\mathop{\displaystyle \sum }\limits_{m=1}^{n}{\mathbb{E}}\left(1\left\{{J}_{0}^{\left(i,m)}=k\right\}\hspace{0.33em}| \hspace{0.33em}{\boldsymbol{X}}={\boldsymbol{x}})\\ & =& d\times \mathop{\displaystyle \sum }\limits_{m=1}^{n}{\mathbb{P}}\left({J}_{0}^{\left(m)}=k\hspace{0.33em}| \hspace{0.33em}{\boldsymbol{X}}={\boldsymbol{x}})\\ & =& d\times \mathop{\displaystyle \sum }\limits_{m=1}^{n}\frac{{\mathbb{P}}\left({J}_{0}^{\left(m)}=k){\mathbb{P}}\left({X}_{j}\in d{x}_{j}^{\left(m)},{X}_{l}\ge {x}_{l}^{\left(m)};j\in {i}_{un}^{\left(m)},l\in {i}_{rc}^{\left(m)}\hspace{0.33em}| \hspace{0.33em}{J}_{0}^{\left(m)}=k)}{{\mathbb{P}}\left({X}_{j}\in d{x}_{j}^{\left(m)},{X}_{l}\ge {x}_{l}^{\left(m)};j\in {i}_{un}^{\left(m)},l\in {i}_{rc}^{\left(m)})}\\ & =& d\times \mathop{\displaystyle \sum }\limits_{m=1}^{n}\frac{{\pi }_{k}^{\left(m)}{\displaystyle \prod }_{j\in {i}_{un}^{\left(m)}}{{{\boldsymbol{e}}}_{k}}^{{\mathsf{T}}}\exp \left({{\boldsymbol{T}}}_{j}{x}_{j}^{\left(m)}){{\boldsymbol{t}}}_{j}{\displaystyle \prod }_{l\in {i}_{rc}^{\left(m)}}{{{\boldsymbol{e}}}_{k}}^{{\mathsf{T}}}\exp \left({{\boldsymbol{T}}}_{l}{x}_{l}^{\left(m)}){\boldsymbol{e}}}{{\displaystyle \sum }_{s=1}^{p}{\pi }_{s}^{\left(m)}{\displaystyle \prod }_{j\in {i}_{un}^{\left(m)}}{{{\boldsymbol{e}}}_{s}}^{{\mathsf{T}}}\exp \left({{\boldsymbol{T}}}_{j}{x}_{j}^{\left(m)}){{\boldsymbol{t}}}_{j}{\displaystyle \prod }_{l\in {i}_{rc}^{\left(m)}}{{{\boldsymbol{e}}}_{s}}^{{\mathsf{T}}}\exp \left({{\boldsymbol{T}}}_{l}{x}_{l}^{\left(m)}){\boldsymbol{e}}},\end{array}where we see a mix of marginal densities and survival functions appearing in both the numerator and denominator.Note that we introduce the notation J0(i,m){J}_{0}^{\left(i,m)}since it is possible to observe mmdifferent mIPH distributions which only differ in terms of initial distribution vectors, as explained in Section 3.2. With this in mind, J0(m){J}_{0}^{\left(m)}indicates the starting state of the Markov jump process of the mm-th observation, which is coupled with initial distribution vector π(m){{\boldsymbol{\pi }}}^{\left(m)}.This expectation can also be expressed, using the Δ{\boldsymbol{\Delta }}notation, as follows: E(Bk∣X=x)=d×∑m=1nπk(m)∏i=1d(ek⊤exp(Tixi(m))ti)δi(m)(ek⊤exp(Tixi(m))e)1−δi(m)∑j=1pπj(m)∏i=1d(ej⊤exp(Tixi(m))ti)δi(m)(ej⊤exp(Tixi(m))e)1−δi(m),{\mathbb{E}}\left({B}_{k}\hspace{0.33em}| \hspace{0.33em}{\boldsymbol{X}}={\boldsymbol{x}})=d\times \mathop{\sum }\limits_{m=1}^{n}\frac{{\pi }_{k}^{\left(m)}{\prod }_{i=1}^{d}{({{\boldsymbol{e}}}_{k}^{\top }\exp ({{\boldsymbol{T}}}_{i}{x}_{i}^{\left(m)}){{\boldsymbol{t}}}_{i})}^{{\delta }_{i}^{\left(m)}}{({{\boldsymbol{e}}}_{k}^{\top }\exp ({{\boldsymbol{T}}}_{i}{x}_{i}^{\left(m)}){\boldsymbol{e}})}^{1-{\delta }_{i}^{\left(m)}}}{{\sum }_{j=1}^{p}{\pi }_{j}^{\left(m)}{\prod }_{i=1}^{d}{({{\boldsymbol{e}}}_{j}^{\top }\exp ({{\boldsymbol{T}}}_{i}{x}_{i}^{\left(m)}){{\boldsymbol{t}}}_{i})}^{{\delta }_{i}^{\left(m)}}{({{\boldsymbol{e}}}_{j}^{\top }\exp ({{\boldsymbol{T}}}_{i}{x}_{i}^{\left(m)}){\boldsymbol{e}})}^{1-{\delta }_{i}^{\left(m)}}},and we shall use this style in the following. The other needed conditional expectations are obtained in a similar way, reading E(Zk(i)∣X=x)=∑m=1n∑j=1pπj(m)∏l≠i(ej⊤exp(Tlxl(m))tl)δl(m)(ej⊤exp(Tlxl(m))e)1−δl(m)∑j=1pπj(m)∏i=1d(ej⊤exp(Tixi(m))ti)δi(m)(ej⊤exp(Tixi(m))e)1−δi(m)×ek⊤∫0xi(m)exp(Ti(xi(m)−t))tiej⊤exp(Tit)dtekδi(m)×ek⊤∫0xi(m)exp(Ti(xi(m)−t))eej⊤exp(Tit)dtek1−δi(m),\begin{array}{rcl}{\mathbb{E}}\left({Z}_{k}^{\left(i)}\hspace{0.33em}| \hspace{0.33em}{\boldsymbol{X}}={\boldsymbol{x}})& =& \mathop{\displaystyle \sum }\limits_{m=1}^{n}\frac{{\displaystyle \sum }_{j=1}^{p}{\pi }_{j}^{\left(m)}{\displaystyle \prod }_{l\ne i}{({{\boldsymbol{e}}}_{j}^{\top }\exp ({{\boldsymbol{T}}}_{l}{x}_{l}^{\left(m)}){{\boldsymbol{t}}}_{l})}^{{\delta }_{l}^{\left(m)}}{({{\boldsymbol{e}}}_{j}^{\top }\exp ({{\boldsymbol{T}}}_{l}{x}_{l}^{\left(m)}){\boldsymbol{e}})}^{1-{\delta }_{l}^{\left(m)}}}{{\displaystyle \sum }_{j=1}^{p}{\pi }_{j}^{\left(m)}{\displaystyle \prod }_{i=1}^{d}{({{\boldsymbol{e}}}_{j}^{\top }\exp ({{\boldsymbol{T}}}_{i}{x}_{i}^{\left(m)}){{\boldsymbol{t}}}_{i})}^{{\delta }_{i}^{\left(m)}}{({{\boldsymbol{e}}}_{j}^{\top }\exp ({{\boldsymbol{T}}}_{i}{x}_{i}^{\left(m)}){\boldsymbol{e}})}^{1-{\delta }_{i}^{\left(m)}}}\\ & & \times {\left[{{\boldsymbol{e}}}_{k}^{\top }\underset{0}{\overset{{x}_{i}^{\left(m)}}{\displaystyle \int }}\exp ({{\boldsymbol{T}}}_{i}\left({x}_{i}^{\left(m)}-t)){{\boldsymbol{t}}}_{i}{{\boldsymbol{e}}}_{j}^{\top }\exp \left({{\boldsymbol{T}}}_{i}t){\rm{d}}t{{\boldsymbol{e}}}_{k}\right]}^{{\delta }_{i}^{\left(m)}}\\ & & \times {\left[{{\boldsymbol{e}}}_{k}^{\top }\underset{0}{\overset{{x}_{i}^{\left(m)}}{\displaystyle \int }}\exp ({{\boldsymbol{T}}}_{i}\left({x}_{i}^{\left(m)}-t)){\boldsymbol{e}}{{\boldsymbol{e}}}_{j}^{\top }\exp \left({{\boldsymbol{T}}}_{i}t){\rm{d}}t{{\boldsymbol{e}}}_{k}\right]}^{1-{\delta }_{i}^{\left(m)}},\end{array}E(Nks(i)∣X=x)=tks(i)×∑m=1n∑j=1pπj(m)∏l≠i(ej⊤exp(Tlxl(m))tl)δl(m)(ej⊤exp(Tlxl(m))e)1−δl(m)∑j=1pπj(m)∏i=1d(ej⊤exp(Tixi(m))ti)δi(m)(ej⊤exp(Tixi(m))e)1−δi(m)×es⊤∫0xi(m)exp(Ti(xi(m)−t))tiej⊤exp(Tit)dtekδi(m)×es⊤∫0xi(m)exp(Ti(xi(m)−t))eej⊤exp(Tit)dtek1−δi(m),\begin{array}{rcl}{\mathbb{E}}\left({N}_{ks}^{\left(i)}\hspace{0.33em}| \hspace{0.33em}{\boldsymbol{X}}={\boldsymbol{x}})& =& {t}_{ks}^{\left(i)}\times \mathop{\displaystyle \sum }\limits_{m=1}^{n}\frac{{\displaystyle \sum }_{j=1}^{p}{\pi }_{j}^{\left(m)}{\displaystyle \prod }_{l\ne i}{({{\boldsymbol{e}}}_{j}^{\top }\exp ({{\boldsymbol{T}}}_{l}{x}_{l}^{\left(m)}){{\boldsymbol{t}}}_{l})}^{{\delta }_{l}^{\left(m)}}{({{\boldsymbol{e}}}_{j}^{\top }\exp ({{\boldsymbol{T}}}_{l}{x}_{l}^{\left(m)}){\boldsymbol{e}})}^{1-{\delta }_{l}^{\left(m)}}}{{\displaystyle \sum }_{j=1}^{p}{\pi }_{j}^{\left(m)}{\displaystyle \prod }_{i=1}^{d}{({{\boldsymbol{e}}}_{j}^{\top }\exp ({{\boldsymbol{T}}}_{i}{x}_{i}^{\left(m)}){{\boldsymbol{t}}}_{i})}^{{\delta }_{i}^{\left(m)}}{({{\boldsymbol{e}}}_{j}^{\top }\exp ({{\boldsymbol{T}}}_{i}{x}_{i}^{\left(m)}){\boldsymbol{e}})}^{1-{\delta }_{i}^{\left(m)}}}\\ & & \times {\left[{{\boldsymbol{e}}}_{s}^{\top }\underset{0}{\overset{{x}_{i}^{\left(m)}}{\displaystyle \int }}\exp ({{\boldsymbol{T}}}_{i}\left({x}_{i}^{\left(m)}-t)){{\boldsymbol{t}}}_{i}{{\boldsymbol{e}}}_{j}^{\top }\exp \left({{\boldsymbol{T}}}_{i}t){\rm{d}}t{{\boldsymbol{e}}}_{k}\right]}^{{\delta }_{i}^{\left(m)}}\\ & & \times {\left[{{\boldsymbol{e}}}_{s}^{\top }\underset{0}{\overset{{x}_{i}^{\left(m)}}{\displaystyle \int }}\exp ({{\boldsymbol{T}}}_{i}\left({x}_{i}^{\left(m)}-t)){\boldsymbol{e}}{{\boldsymbol{e}}}_{j}^{\top }\exp \left({{\boldsymbol{T}}}_{i}t){\rm{d}}t{{\boldsymbol{e}}}_{k}\right]}^{1-{\delta }_{i}^{\left(m)}},\end{array}and finally, E(Nk(i)∣X=x)=tk(i)×∑m=1n∑j=1pπj(m)ej⊤exp(Tixi(m))ekδi(m)×∏l≠i(ej⊤exp(Tlxl(m))tl)δl(m)(ej⊤exp(Tlxl(m))e)1−δl(m)∑j=1pπj(m)∏i=1d(ej⊤exp(Tixi(m))ti)δi(m)(ej⊤exp(Tixi(m))e)1−δi(m).\begin{array}{rcl}{\mathbb{E}}\left({N}_{k}^{\left(i)}\hspace{0.33em}| \hspace{0.33em}{\boldsymbol{X}}={\boldsymbol{x}})& =& {t}_{k}^{\left(i)}\times \mathop{\displaystyle \sum }\limits_{m=1}^{n}\mathop{\displaystyle \sum }\limits_{j=1}^{p}{\pi }_{j}^{\left(m)}{{\boldsymbol{e}}}_{j}^{\top }\exp \left({{\boldsymbol{T}}}_{i}{x}_{i}^{\left(m)}){{\boldsymbol{e}}}_{k}{\delta }_{i}^{\left(m)}\\ & & \times \frac{{\displaystyle \prod }_{l\ne i}{({{\boldsymbol{e}}}_{j}^{\top }\exp ({{\boldsymbol{T}}}_{l}{x}_{l}^{\left(m)}){{\boldsymbol{t}}}_{l})}^{{\delta }_{l}^{\left(m)}}{({{\boldsymbol{e}}}_{j}^{\top }\exp ({{\boldsymbol{T}}}_{l}{x}_{l}^{\left(m)}){\boldsymbol{e}})}^{1-{\delta }_{l}^{\left(m)}}}{{\displaystyle \sum }_{j=1}^{p}{\pi }_{j}^{\left(m)}{\displaystyle \prod }_{i=1}^{d}{({{\boldsymbol{e}}}_{j}^{\top }\exp ({{\boldsymbol{T}}}_{i}{x}_{i}^{\left(m)}){{\boldsymbol{t}}}_{i})}^{{\delta }_{i}^{\left(m)}}{({{\boldsymbol{e}}}_{j}^{\top }\exp ({{\boldsymbol{T}}}_{i}{x}_{i}^{\left(m)}){\boldsymbol{e}})}^{1-{\delta }_{i}^{\left(m)}}}.\end{array}3.2Initial distribution vectorsAdapting an idea developed in [9], we apply the regression component to the initial distribution, i.e., we estimate a “personalized” initial distribution vector as a function of covariates, which increases the flexibility of the model. To that end, we use multinomial logistic regressions, where we consider the initial probabilities as response variables that depend on covariate information found in A(m)T∈Rg{{{\boldsymbol{A}}}^{\left(m)}}^{{\mathsf{T}}}\in {{\mathbb{R}}}^{g}, with ggbeing the number of explanatory variables, m=1,…,nm=1,\ldots ,n, and regression coefficients in γ∈Rp×g{\boldsymbol{\gamma }}\in {{\mathbb{R}}}^{p\times g}. Concretely, the initial distribution probabilities are then given as follows: πk(m)=exp(A(m)γk)∑j=1pexp(A(m)γj),{\pi }_{k}^{\left(m)}=\frac{\exp ({{\boldsymbol{A}}}^{\left(m)}{\gamma }_{k})}{{\sum }_{j=1}^{p}\exp ({{\boldsymbol{A}}}^{\left(m)}{\gamma }_{j})},with γk∈Rg{\gamma }_{k}\in {{\mathbb{R}}}^{g}for k=1,…,pk=1,\ldots ,p.In every iteration of the expectation-maximization (EM) algorithm to be described later, we use the conditional expectation of the number of times that the underlying process starts in a specific state as weights for the regression coefficients in γ{\boldsymbol{\gamma }}. Let us consider the information carried by E(Bk∣X=x){\mathbb{E}}\left({B}_{k}\hspace{0.33em}| \hspace{0.33em}{\boldsymbol{X}}={\boldsymbol{x}})separately. For a row of observations mm, let Bk(m){B}_{k}^{\left(m)}be the number of times that the marginal jump process {Jt(m)}t≥0{\left\{{J}_{t}^{\left(m)}\right\}}_{t\ge 0}starts in state kk. The conditional expectation is given by E(Bk(m)∣X=x)=d×πk(m)∏i=1d(ek⊤exp(Tixi(m))ti)δi(m)(ek⊤exp(Tixi(m))e)1−δi(m)∑j=1pπj(m)∏i=1d(ej⊤exp(Tixi(m))ti)δi(m)(ej⊤exp(Tixi(m))e)1−δi(m),{\mathbb{E}}\left({B}_{k}^{\left(m)}\hspace{0.33em}| \hspace{0.33em}{\boldsymbol{X}}={\boldsymbol{x}})=d\times \frac{{\pi }_{k}^{\left(m)}{\prod }_{i=1}^{d}{({{\boldsymbol{e}}}_{k}^{\top }\exp ({{\boldsymbol{T}}}_{i}{x}_{i}^{\left(m)}){{\boldsymbol{t}}}_{i})}^{{\delta }_{i}^{\left(m)}}{({{\boldsymbol{e}}}_{k}^{\top }\exp ({{\boldsymbol{T}}}_{i}{x}_{i}^{\left(m)}){\boldsymbol{e}})}^{1-{\delta }_{i}^{\left(m)}}}{{\sum }_{j=1}^{p}{\pi }_{j}^{\left(m)}{\prod }_{i=1}^{d}{({{\boldsymbol{e}}}_{j}^{\top }\exp ({{\boldsymbol{T}}}_{i}{x}_{i}^{\left(m)}){{\boldsymbol{t}}}_{i})}^{{\delta }_{i}^{\left(m)}}{({{\boldsymbol{e}}}_{j}^{\top }\exp ({{\boldsymbol{T}}}_{i}{x}_{i}^{\left(m)}){\boldsymbol{e}})}^{1-{\delta }_{i}^{\left(m)}}},for k=1,…,pk=1,\ldots ,p. We then solve the optimization problem γˆ=argmaxγ∑m=1n∑k=1pE(Bk(m)∣X=x)log(πk(m)(A(m);γ))\hat{{\boldsymbol{\gamma }}}=\mathop{{\rm{argmax}}}\limits_{{\boldsymbol{\gamma }}}\mathop{\sum }\limits_{m=1}^{n}\mathop{\sum }\limits_{k=1}^{p}{\mathbb{E}}\left({B}_{k}^{\left(m)}\hspace{0.33em}| \hspace{0.33em}{\boldsymbol{X}}={\boldsymbol{x}})\log \left({\pi }_{k}^{\left(m)}\left({{\boldsymbol{A}}}^{\left(m)};\hspace{0.33em}{\boldsymbol{\gamma }}))and set (3.2)πˆk(m)=πk(m)(A(m);γˆ)=exp(A(m)γˆk)∑j=1pexp(A(m)γˆj){\hat{\pi }}_{k}^{\left(m)}={\pi }_{k}^{\left(m)}\left({{\boldsymbol{A}}}^{\left(m)};\hspace{0.33em}\hat{{\boldsymbol{\gamma }}})=\frac{\exp ({{\boldsymbol{A}}}^{\left(m)}{\hat{\gamma }}_{k})}{\mathop{\sum }\limits_{j=1}^{p}\exp ({{\boldsymbol{A}}}^{\left(m)}{\hat{\gamma }}_{j})}in every iteration. The initial distribution hence depends on covariate information. Recall that all marginal processes {Jt(i)}t≥0{\left\{{J}_{t}^{\left(i)}\right\}}_{t\ge 0}are assumed to start in the same state (drawn from the initial distribution with probabilities (3.2)), but afterward, transit independently to other states according to their specific sub-intensity matrices (and the latter do not depend on covariate information).3.3ERMI algorithmConsider now a multivariate sample of right-censored absorption times y=y1(m),…,yd(m){\boldsymbol{y}}=\left(\begin{array}{c}{y}_{1}^{\left(m)},\ldots ,{y}_{d}^{\left(m)}\end{array}\right), which we assume to originate from Y(m)∼mIPH(π(m),T,ℒ){Y}^{\left(m)}\hspace{0.33em} \sim \hspace{0.33em}\hspace{0.1em}\text{mIPH}\hspace{0.1em}\left({{\boldsymbol{\pi }}}^{\left(m)},{\mathcal{T}},{\mathcal{ {\mathcal L} }}). The associated inhomogeneity functions depend on parameters βi{\beta }_{i}, i=1,…,di=1,\ldots ,d, and the right-censoring indicators are collected in Δ{\boldsymbol{\Delta }}. The resulting EM algorithm with covariate information is depicted in Algorithm 1. As in the study by Albrecher et al. [4], we first take care of time-inhomogeneity. By using the relation gi−1(yi(m))=∫0yi(m)λi(u;βi)du,i=1,…,d,{g}_{i}^{-1}({y}_{i}^{\left(m)})={\int }_{0}^{{y}_{i}^{\left(m)}}{\lambda }_{i}\left(u;\hspace{0.33em}{\beta }_{i}){\rm{d}}u,\hspace{1.0em}i=1,\ldots ,d,we obtain a time-homogeneous random sample (x1(m),…,xd(m))({x}_{1}^{\left(m)},\ldots ,{x}_{d}^{\left(m)}), for which we know how to evaluate conditional expectations of sufficient statistics (E step). By using these expectations (as given above), we estimate the marginal sub-intensity matrices (M step), while the initial distribution vectors are predicted by multinomial logistic regressions (R step). Once we have estimated both, we need to find optimal inhomogeneity parameters βi{\beta }_{i}, i=1,…,di=1,\ldots ,d, that maximize the joint likelihood of the time-inhomogeneous sample (I step). Concretely, we solve βˆ=argmaxβ∑m=1nlog∑j=1pπˆj(m)∏i=1d(ejTexp(Tˆixi(m))tˆiλi(yi(m),βi))δi(m)(ejTexp(Tˆixi(m))e)1−δi(m),\hat{{\boldsymbol{\beta }}}=\mathop{{\rm{argmax}}}\limits_{{\boldsymbol{\beta }}}\mathop{\sum }\limits_{m=1}^{n}\log \left(\mathop{\sum }\limits_{j=1}^{p}{\hat{\pi }}_{j}^{\left(m)}\mathop{\prod }\limits_{i=1}^{d}{({{\boldsymbol{e}}}_{j}^{{\mathsf{T}}}\exp \left({\hat{{\boldsymbol{T}}}}_{i}{x}_{i}^{\left(m)}){\hat{{\boldsymbol{t}}}}_{i}{\lambda }_{i}({y}_{i}^{\left(m)},{\beta }_{i}))}^{{\delta }_{i}^{\left(m)}}{({{\boldsymbol{e}}}_{j}^{{\mathsf{T}}}\exp \left({\hat{{\boldsymbol{T}}}}_{i}{x}_{i}^{\left(m)}){\boldsymbol{e}})}^{1-{\delta }_{i}^{\left(m)}}\right),where xi(m)=gi−1(yi(m)){x}_{i}^{\left(m)}={g}_{i}^{-1}({y}_{i}^{\left(m)}). We repeat the procedure until a stopping rule is satisfied, and finally obtain the estimated distribution mIPH(πˆ,Tˆ,ℒˆ)\left(\hat{{\boldsymbol{\pi }}},\hat{{\mathcal{T}}},\hat{{\mathcal{ {\mathcal L} }}}). Here, πˆ\hat{{\boldsymbol{\pi }}}is a matrix, where each row is a distribution vector πˆ(m){\hat{{\boldsymbol{\pi }}}}^{\left(m)}, which is shared by marginals with the same covariates.In contrast to copula-based methods, this approach does not separate the estimation of marginals and multivariate parameters. This may be considered preferable as the implied multivariate distribution has a natural and causal interpretation and is intimately connected to the marginal behavior of the risks, whereas choosing a concrete copula family on given (and possibly already fitted) marginal risks is often a somewhat more arbitrary choice for the modeling of multivariate phenomena.Algorithm 1. Adapted expectation maximization (ERMI) algorithm for mIPH distributionsInput: Observed absorption times y∈R+n×d{\boldsymbol{y}}\in {{\mathbb{R}}}_{+}^{n\times d}, right-censoring indicators Δ∈Rn×d{\boldsymbol{\Delta }}\in {{\mathbb{R}}}^{n\times d}and arbitrary initial parameters for (π,T,ℒ)\left({\boldsymbol{\pi }},{\mathcal{T}},{\mathcal{ {\mathcal L} }}).\hspace{1.0em}(1) For each marginal, transform the data in xi(m)=gi−1(yi(m);βi){x}_{i}^{\left(m)}={g}_{i}^{-1}({y}_{i}^{\left(m)};\hspace{0.33em}{\beta }_{i}), i=1,2,…,di=1,2,\ldots ,dand m=1,2,…,nm=1,2,\ldots ,n\hspace{1.0em}(2) E-step: Calculate\hspace{2.0em} E(Bk(m)∣Y=y)k=1,…,p,m=1,…,n{\mathbb{E}}\left({B}_{k}^{\left(m)}\hspace{0.33em}| \hspace{0.33em}{\boldsymbol{Y}}={\boldsymbol{y}})\hspace{1.0em}k=1,\ldots ,p,\hspace{0.33em}m=1,\ldots ,n\hspace{2.0em} E(Zk(i)∣Y=y)k=1,…,p,i=1,…,d{\mathbb{E}}\left({Z}_{k}^{\left(i)}\hspace{0.33em}| \hspace{0.33em}{\boldsymbol{Y}}={\boldsymbol{y}})\hspace{1.0em}k=1,\ldots ,p,\hspace{0.33em}i=1,\ldots ,d\hspace{2.0em} E(Nks(i)∣Y=y)k,s=1,…,p,i=1,…,d{\mathbb{E}}\left({N}_{ks}^{\left(i)}\hspace{0.33em}| \hspace{0.33em}{\boldsymbol{Y}}={\boldsymbol{y}})\hspace{1.0em}k,s=1,\ldots ,p,\hspace{0.33em}i=1,\ldots ,d\hspace{2.0em} E(Nk(i)∣Y=y)k=1,…,p,i=1,…,d{\mathbb{E}}\left({N}_{k}^{\left(i)}\hspace{0.33em}| \hspace{0.33em}{\boldsymbol{Y}}={\boldsymbol{y}})\hspace{1.0em}k=1,\ldots ,p,\hspace{0.33em}i=1,\ldots ,d\hspace{1.0em}(3) R-step: Perform a multinomial regression with weights given by E(Bk(m)∣Y=y){\mathbb{E}}\left({B}_{k}^{\left(m)}\hspace{0.33em}| \hspace{0.33em}{\boldsymbol{Y}}={\boldsymbol{y}})and predict πˆk(m){\hat{\pi }}_{k}^{\left(m)}for k=1,…,pk=1,\ldots ,pand m=1,…,nm=1,\ldots ,n.\hspace{1.0em}(4) M-step: Let\hspace{2.0em} tˆks(i)=E(Nks(i)∣Y=y)E(Zk(i)∣Y=y),k,s=1,…,p,i=1,…,d{\hat{t}}_{ks}^{\left(i)}=\frac{{\mathbb{E}}\left({N}_{ks}^{\left(i)}\hspace{0.33em}| \hspace{0.33em}{\boldsymbol{Y}}={\boldsymbol{y}})}{{\mathbb{E}}\left({Z}_{k}^{\left(i)}\hspace{0.33em}| \hspace{0.33em}{\boldsymbol{Y}}={\boldsymbol{y}})},\hspace{1.0em}k,s=1,\ldots ,p,\hspace{0.33em}i=1,\ldots ,d\hspace{2.0em} tˆk(i)=E(Nk(i)∣Y=y)E(Zk(i)∣Y=y)k=1,…,p,i=1,…,d{\hat{t}}_{k}^{\left(i)}=\frac{{\mathbb{E}}\left({N}_{k}^{\left(i)}\hspace{0.33em}| \hspace{0.33em}{\boldsymbol{Y}}={\boldsymbol{y}})}{{\mathbb{E}}\left({Z}_{k}^{\left(i)}\hspace{0.33em}| \hspace{0.33em}{\boldsymbol{Y}}={\boldsymbol{y}})}\hspace{1.0em}k=1,\ldots ,p,\hspace{0.33em}i=1,\ldots ,d\hspace{2.0em} tˆkk(i)=−∑s≠ktˆks(i)−tˆk(i)k,s=1,…,p,i=1,…,d{\hat{t}}_{kk}^{\left(i)}=-\sum _{s\ne k}{\hat{t}}_{ks}^{\left(i)}-{\hat{t}}_{k}^{\left(i)}\hspace{1.0em}k,s=1,\ldots ,p,\hspace{0.33em}i=1,\ldots ,d\hspace{2.0em}Let πˆ=(πˆ1(m),…,πˆp(m)),Tiˆ={tˆks(i)}k,s=1,2,…,p,andtiˆ=tˆ1(i)⋮tˆp(i)\hat{{\boldsymbol{\pi }}}=\left({\hat{\pi }}_{1}^{\left(m)},\ldots ,{\hat{\pi }}_{p}^{\left(m)}),\hspace{1.0em}\hat{{{\boldsymbol{T}}}_{i}}={\left\{{\hat{t}}_{ks}^{\left(i)}\right\}}_{k,s=1,2,\ldots ,p},\hspace{1.0em}\hspace{0.1em}\text{and}\hspace{0.1em}\hspace{1.0em}\hat{{{\boldsymbol{t}}}_{i}}=\left(\begin{array}{c}{\hat{t}}_{1}^{\left(i)}\\ \vdots \\ {\hat{t}}_{p}^{\left(i)}\end{array}\right).\hspace{1.0em}(5) I-step: Computeβˆ=argmaxβ∑m=1nlog(fY(y(m);πˆ,Tˆ,β,Δ))\hat{{\boldsymbol{\beta }}}=\mathop{{\rm{argmax}}}\limits_{{\boldsymbol{\beta }}}\mathop{\sum }\limits_{m=1}^{n}\log ({f}_{Y}({{\boldsymbol{y}}}^{\left(m)};\hat{{\boldsymbol{\pi }}},\hat{{\mathcal{T}}},{\boldsymbol{\beta }},{\boldsymbol{\Delta }}))=argmaxβ∑m=1nlog∑j=1pπˆj(m)∏i=1d(ejTexp(Tˆixi(m))tˆiλi(yi(m),βi))δi(m)(ejTexp(Tˆixi(m))e)1−δi(m)\hspace{1.0em}=\mathop{{\rm{argmax}}}\limits_{{\boldsymbol{\beta }}}\mathop{\sum }\limits_{m=1}^{n}\log \left(\mathop{\sum }\limits_{j=1}^{p}{\hat{\pi }}_{j}^{\left(m)}\mathop{\prod }\limits_{i=1}^{d}{({{\boldsymbol{e}}}_{j}^{{\mathsf{T}}}\exp \left({\hat{{\boldsymbol{T}}}}_{i}{x}_{i}^{\left(m)}){\hat{{\boldsymbol{t}}}}_{i}{\lambda }_{i}({y}_{i}^{\left(m)},{\beta }_{i}))}^{{\delta }_{i}^{\left(m)}}{({{\boldsymbol{e}}}_{j}^{{\mathsf{T}}}\exp \left({\hat{{\boldsymbol{T}}}}_{i}{x}_{i}^{\left(m)}){\boldsymbol{e}})}^{1-{\delta }_{i}^{\left(m)}}\right)\hspace{1.0em}(6) Assign π=πˆ{\boldsymbol{\pi }}=\hat{{\boldsymbol{\pi }}}, Ti=Tiˆ{{\boldsymbol{T}}}_{i}=\hat{{{\boldsymbol{T}}}_{i}}and βi=βˆi{\beta }_{i}={\hat{\beta }}_{i}then repeat from Step 1 until a stopping rule is satisfied.Output: Fitted representations (πˆ,Tˆ,ℒˆ)\left(\hat{{\boldsymbol{\pi }}},\hat{{\mathcal{T}}},\hat{{\mathcal{ {\mathcal L} }}}), for m=1,…,nm=1,\ldots ,n.4Modeling joint excess lifetimes of couplesIn this section, we present an application of Algorithm 1 to the well-known data set of joint lives used in [15]. In addition to survival times, this data set provides information on individuals’ ages at issue of an insurance policy. This leads to left-truncated data in this particular case, and one might indeed consider the different entry ages with a left-truncated likelihood in the estimation process, which is, however, quite inefficient. Instead, we propose here to use entry age as a covariate information and use multinomial logistic regression to deal with the different ages at issue. Entry ages were also considered as relevant factors for the dependence modeling in [22]. Initial distribution vectors obtained via regression will then incorporate the fact that an old couple is expected to survive less long than a young one and that the bereavement effect may be different for different age dynamics in a couple. In PH terms, when using an acyclical distribution, the older the couple, the larger the starting probabilities should be for states closer to the absorbing state, so that fewer states will be visited until absorption. Passing through fewer states translates into less time spent in the state-space before exiting, which results in smaller remaining lifetimes.4.1Description of the data setThe data set at hand provides information about 14947 insurance products on joint lives, which were observed from December 29, 1988, until December 31, 1993. We consider January 1, 1994, as the right censoring limit. For the purpose of this article, we only consider birthdays, sexes (and potential death dates) of policyholders, given that we are only interested in mortality, i.e., we do not make use of the monetary details of each contract.After removing same-sex couples and multiple entries (due to several contracts of the same couple), we compute the remaining lifetime that any person lived from the start of the observation period until the right-censoring date 01.01.1994, given that they are at least 40 years old at the issue of the policy. Note that the terms “remaining” and “excess” are used interchangeably in the following. Doing so leads to 8,834 different joint excess survival times, with 155 cases where both individuals died, 1,057 where only one individual died, and 7,622 where neither died. Consequently, less than 2% of joint remaining survival times are fully observed. Concerning the 1057 cases where only one person in the couple died, in 820 of them this was the man, and in the complementary 237 cases the woman died. Hereafter, we refer to the start of the observation period as the “issue of the policy,” although the actual issue date may be later than 29.12.1988.To prepare the data for the multinomial regression, we construct the covariate matrix A=(1agey1agey2{\boldsymbol{A}}\left=(\begin{array}{ccc}{\bf{1}}& {{\rm{age}}}_{{y}_{1}}& {{\rm{age}}}_{{y}_{2}}\end{array}interact)\begin{array}{c}{\rm{interact}}\end{array}), A∈Rn×4{\boldsymbol{A}}\in {{\mathbb{R}}}^{n\times 4}. The column vector agey1{{\rm{age}}}_{{y}_{1}}is the collection of all ages of men, at issue of a policy, while agey2{{\rm{age}}}_{{y}_{2}}contains all ages of women. We also consider an interaction term in interactinteract, which gathers element-wise multiplication of ages in a couple. Finally, to perform multinomial regressions via neural networks (R package nnet), we divide the data by 100, such that an absorption time of 0.01 given by estimated distributions actually corresponds to 1 year.4.2Fitting the mIPH distribution: marginal behaviorWe assume that the remaining lifetimes in a couple, after issue of an insurance policy, follow mIPH distributions. We use information previously discussed as covariates to link starting probabilities to ages of individuals in couples, to reflect different aging dynamics in our model.Let Yi(m)=gi(Xi(m))=log(βiXi(m)+1)/βi{Y}_{i}^{\left(m)}={g}_{i}\left({X}_{i}^{\left(m)})=\log \left({\beta }_{i}{X}_{i}^{\left(m)}+1)\hspace{0.1em}\text{/}\hspace{0.1em}{\beta }_{i}be the IPH-distributed marginal remaining lifetimes, with Xi(m)=(exp(βiYi(m))−1)/βi∼PH(π(m),Ti){X}_{i}^{\left(m)}=(\exp ({\beta }_{i}{Y}_{i}^{\left(m)})-1)\hspace{0.1em}\text{/}\hspace{0.1em}{\beta }_{i}\hspace{0.33em} \sim \hspace{0.33em}\hspace{0.1em}\text{PH}\hspace{0.1em}\left({{\boldsymbol{\pi }}}^{\left(m)},{{\boldsymbol{T}}}_{i})and βi>0{\beta }_{i}\gt 0, where i=1i=1corresponds to men and i=2i=2to women. Then, the distribution with covariate information we work with is given by marginal Matrix-Gompertz distributions, and we consider general Coxian sub-intensity matrices Ti{{\boldsymbol{T}}}_{i}of dimension p=10p=10(compared with [2]). The order ppwas found by fitting models with various respective choices, until a satisfactory approximation of marginals was reached. Finding the optimal dimension ppis still an open problem, and the literature on solving the identifiability issue of PH distributions is quite narrow (see, for instance, [3,14]).According to the dependence structure defined in (2.1), after both underlying processes {Jt(1)}t≥0{\left\{{J}_{t}^{\left(1)}\right\}}_{t\ge 0}and {Jt(2)}t≥0{\left\{{J}_{t}^{\left(2)}\right\}}_{t\ge 0}start in the same state they evolve independently until absorption. With a general Coxian sub-intensity matrix, each marginal jump process is only allowed to transit to the next state or directly to the absorption state. This stochastic structure has a nice interpretation in terms of aging. Indeed, we can think of forward transitions as natural aging steps, given that each time the process jumps to a certain state, the time of absorption obtains closer. Moreover, premature exits can be interpreted as deaths due to causes not related to aging. Indeed, given that exit rates are positive in each state, the absorption of a process may be caused by a transition to State p+1p+1from a state smaller than pp. Finally, granting the underlying processes to start in different states allows heterogeneity of health statuses for individuals of the same age (this philosophy was already underlying the construction in [18], but with the present inhomogeneity, much fewer states are needed to describe the data satisfactorily).Algorithm 1 is now applied for 1,000 iterations. In principle, this estimation procedure provides n=8,834n=\hspace{0.1em}\text{8,834}\hspace{0.1em}different initial distribution vectors, one for each couple depending on the ages at issue. Figure 1 depicts the age combinations in the data, with a majority of couples having a small age difference at policy issue. For illustration purposes, we depict below the estimated initial distributions for four different age combinations (age man, age woman): (63, 63), (68, 63), (63, 68) and (73, 63), which were chosen arbitrarily, but to represent different aging dynamics at issue of the policy. Let Yc=(Y1c,Y2c)∼mIPH(πˆc,Tˆ,ℒˆ){{\boldsymbol{Y}}}^{c}=\left({Y}_{1}^{c},{Y}_{2}^{c})\hspace{0.33em} \sim \hspace{0.33em}\hspace{0.1em}\text{mIPH}\hspace{0.1em}\left({\hat{{\boldsymbol{\pi }}}}^{c},\hat{{\mathcal{T}}},\hat{{\mathcal{ {\mathcal L} }}})denote the bivariate distribution of excess lifetimes for these four couples, where c=1,2,3,4c=1,2,3,4, Tˆ={Tˆ1,Tˆ2}\hat{{\mathcal{T}}}=\left\{{\hat{{\boldsymbol{T}}}}_{1},{\hat{{\boldsymbol{T}}}}_{2}\right\}and ℒˆ={λ1(⋅,βˆ1),λ2(⋅,βˆ2)}\hat{{\mathcal{ {\mathcal L} }}}=\left\{{\lambda }_{1}\left(\cdot ,\hspace{0.33em}{\hat{\beta }}_{1}),{\lambda }_{2}\left(\cdot ,\hspace{0.33em}{\hat{\beta }}_{2})\right\}. The common estimated sex-specific sub-intensity matrices are as follows: (4.1)Tˆ1=−0.0491.7/107000000000−3.6622.877000000000−1.8/1071.8/107000000000−1.9/1041.9/104000000000−0.6110.611000000000−0.0020.002000000000−9.7785.73000000000−0.360.225000000000−1.8521.099000000000−0.023{\hat{{\boldsymbol{T}}}}_{1}=\left(\begin{array}{cccccccccc}-0.049& 1.7\hspace{0.1em}\text{/}\hspace{0.1em}1{0}^{7}& 0& 0& 0& 0& 0& 0& 0& 0\\ 0& -3.662& 2.877& 0& 0& 0& 0& 0& 0& 0\\ 0& 0& -1.8\hspace{0.1em}\text{/}\hspace{0.1em}1{0}^{7}& 1.8\hspace{0.1em}\text{/}\hspace{0.1em}1{0}^{7}& 0& 0& 0& 0& 0& 0\\ 0& 0& 0& -1.9\hspace{0.1em}\text{/}\hspace{0.1em}1{0}^{4}& 1.9\hspace{0.1em}\text{/}\hspace{0.1em}1{0}^{4}& 0& 0& 0& 0& 0\\ 0& 0& 0& 0& -0.611& 0.611& 0& 0& 0& 0\\ 0& 0& 0& 0& 0& -0.002& 0.002& 0& 0& 0\\ 0& 0& 0& 0& 0& 0& -9.778& 5.73& 0& 0\\ 0& 0& 0& 0& 0& 0& 0& -0.36& 0.225& 0\\ 0& 0& 0& 0& 0& 0& 0& 0& -1.852& 1.099\\ 0& 0& 0& 0& 0& 0& 0& 0& 0& -0.023\end{array}\right)for men and (4.2)Tˆ2=−0.1960.196000000000−0.2910.291000000000−0.7630.763000000000−2.8/1082.8/108000000000−0.0010.001000000000−0.0030.003000000000−3.1821.165000000000−0.1722/107000000000−0.0082.3/1010000000000−3/106{\hat{{\boldsymbol{T}}}}_{2}=\left(\begin{array}{cccccccccc}-0.196& 0.196& 0& 0& 0& 0& 0& 0& 0& 0\\ 0& -0.291& 0.291& 0& 0& 0& 0& 0& 0& 0\\ 0& 0& -0.763& 0.763& 0& 0& 0& 0& 0& 0\\ 0& 0& 0& -2.8\hspace{0.1em}\text{/}\hspace{0.1em}1{0}^{8}& 2.8\hspace{0.1em}\text{/}\hspace{0.1em}1{0}^{8}& 0& 0& 0& 0& 0\\ 0& 0& 0& 0& -0.001& 0.001& 0& 0& 0& 0\\ 0& 0& 0& 0& 0& -0.003& 0.003& 0& 0& 0\\ 0& 0& 0& 0& 0& 0& -3.182& 1.165& 0& 0\\ 0& 0& 0& 0& 0& 0& 0& -0.172& 2\hspace{0.1em}\text{/}\hspace{0.1em}1{0}^{7}& 0\\ 0& 0& 0& 0& 0& 0& 0& 0& -0.008& 2.3\hspace{0.1em}\text{/}\hspace{0.1em}1{0}^{10}\\ 0& 0& 0& 0& 0& 0& 0& 0& 0& -3\hspace{0.1em}\text{/}\hspace{0.1em}1{0}^{6}\end{array}\right)for women. The optimal inhomogeneity parameters are βˆ1=43.101{\hat{\beta }}_{1}=43.101and βˆ2=47.474{\hat{\beta }}_{2}=47.474, and Table 1 presents the estimated coefficients of the last multinomial regression performed in Algorithm 1. Except for States 8 and 9, all coefficients in γˆ\hat{{\boldsymbol{\gamma }}}are significant. Note that having nonsignificant coefficients for States 8 and 9 does not necessarily indicate that the state space is too large. One has to separate the PH interpretation of the state space from the regression framework. Table 1 is claiming that the covariates used in the regression are not significant in explaining starting probabilities of States 8 and 9. Nevertheless, both states have their role in the description of random variables we are interested in.Figure 1Ages at issue of policies, for all couples. xxand yycorrespond to agey1{{\rm{age}}}_{{y}_{1}}and agey1{{\rm{age}}}_{{y}_{1}}, respectively.Table 1Coefficients of multinomial regression with associated standard errors in bracketspInterceptagey1{{\rm{age}}}_{{y}_{1}}agey2{{\rm{age}}}_{{y}_{2}}agey1⋅agey2{{\rm{age}}}_{{y}_{1}}\cdot {{\rm{age}}}_{{y}_{2}}2−20.963∗∗∗-20.96{3}^{\ast \ast \ast }43.733∗∗∗43.73{3}^{\ast \ast \ast }43.021∗∗∗43.02{1}^{\ast \ast \ast }−84.049∗∗∗-84.04{9}^{\ast \ast \ast }(7.157)(11.238)(12.400)(19.311)324.826∗∗∗24.82{6}^{\ast \ast \ast }−24.630∗∗-24.63{0}^{\ast \ast }−39.256∗∗∗-39.25{6}^{\ast \ast \ast }38.453∗∗38.45{3}^{\ast \ast }(6.677)(10.476)(11.639)(18.073)4−51.036∗∗∗-51.03{6}^{\ast \ast \ast }57.442∗∗∗57.44{2}^{\ast \ast \ast }90.062∗∗∗90.06{2}^{\ast \ast \ast }−104.233∗∗∗-104.23{3}^{\ast \ast \ast }(9.894)(15.348)(15.820)(24.374)5−42.469∗∗∗-42.46{9}^{\ast \ast \ast }56.804∗∗∗56.80{4}^{\ast \ast \ast }70.273∗∗∗70.27{3}^{\ast \ast \ast }−89.556∗∗∗-89.55{6}^{\ast \ast \ast }(6.873)(10.638)(11.443)(17.559)614.850*14.85{0}^{* }−41.377∗∗∗-41.37{7}^{\ast \ast \ast }−39.438∗∗∗-39.43{8}^{\ast \ast \ast }89.687∗∗∗89.68{7}^{\ast \ast \ast }(8.004)(12.150)(12.824)(19.317)754.157∗∗∗54.15{7}^{\ast \ast \ast }−97.618∗∗∗-97.61{8}^{\ast \ast \ast }−98.445∗∗∗-98.44{5}^{\ast \ast \ast }173.553∗∗∗173.55{3}^{\ast \ast \ast }(6.617)(10.263)(11.010)(16.817)8−14.363-14.363−5.608-5.60822.9908.794(9.419)(14.387)(14.885)(22.584)9−11.589-11.58912.732−4.892-4.89218.559(7.224)(11.046)(11.889)(18.050)1021.474∗∗∗21.47{4}^{\ast \ast \ast }−31.068∗∗∗-31.06{8}^{\ast \ast \ast }−54.907∗∗∗-54.90{7}^{\ast \ast \ast }84.080∗∗∗84.08{0}^{\ast \ast \ast }(6.504)(10.054)(10.892)(16.670)For a two sided statistical test symbols ∗∗∗\ast \ast \ast , ∗∗\ast \ast and ∗\ast correspond to significance levels of 1, 5, and 10%, respectively.Initial distribution vector estimates for these four couples are as follows: πˆ1=0.05260.07340.04480.08860.40650.03300.03260.05690.10770.1039,πˆ2=0.03560.03130.02970.03980.28050.04760.03960.03840.24720.2102,πˆ3=0.02850.02420.01140.16250.43990.04190.03040.12820.08190.0510,πˆ4=0.01720.00950.01400.01270.13780.04890.03430.01840.40410.3030.\begin{array}{rcl}{\hat{{\boldsymbol{\pi }}}}^{1}& =& \left(\begin{array}{cccccccccc}0.0526& 0.0734& 0.0448& 0.0886& 0.4065& 0.0330& 0.0326& 0.0569& 0.1077& 0.1039\end{array}\right),\\ {\hat{{\boldsymbol{\pi }}}}^{2}& =& \left(\begin{array}{cccccccccc}0.0356& 0.0313& 0.0297& 0.0398& 0.2805& 0.0476& 0.0396& 0.0384& 0.2472& 0.2102\end{array}\right),\\ {\hat{{\boldsymbol{\pi }}}}^{3}& =& \left(\begin{array}{cccccccccc}0.0285& 0.0242& 0.0114& 0.1625& 0.4399& 0.0419& 0.0304& 0.1282& 0.0819& 0.0510\end{array}\right),\\ {\hat{{\boldsymbol{\pi }}}}^{4}& =& \left(\begin{array}{cccccccccc}0.0172& 0.0095& 0.0140& 0.0127& 0.1378& 0.0489& 0.0343& 0.0184& 0.4041& 0.3030\end{array}\right).\end{array}At first glance, it might seem odd that survival times of spouses with large age difference start in the same state of the distribution, but personalized starting probabilities and sex-specific transition intensities account for this. For example, consider Couple 4. If both excess survival times start in State 1, we see that the man’s underlying jump process is much more likely to reach the absorbing state directly from State 1, while the woman’s will at least advance to State 7 before absorption becomes possible. Thus, marginal intensities mixed with age-dependent initial distribution vectors compensate the initialization in a shared state. In Figure 2, we depict the resulting marginal densities of the fitted remaining lifetime distributions for each of the four couples. As expected, the densities for women (solid black lines) allocate more mass to larger values than the male counterpart, and for older individuals, there is more probability mass for shorter survival times. Comparing Couples 1 and 4, we see that the density of the 63-year-old man has a major mode at y1=22{y}_{1}=22, while for the 73-year-old man, the major mode is at y1=17{y}_{1}=17. Note that despite the fact that the women in Couples 1 and 4 are of comparable age, their densities have substantially different modes. This is due to the fact that the estimation of the marginal distributions is not separated from the estimation of the joint distributions, and the age of their spouse at the time of policy issue evidently plays an important role for the distribution of the remaining lifetime, seen at the time of policy issue.Figure 2Marginal densities of remaining life times (in years). (a) Couple 1. (b) Couple 2. (c) Couple 3. (d) Couple 4.The multimodality we observe in all marginal densities may be due to having different cohorts in the data. We decided not to manipulate the data set further, to avoid restricting our analysis to specific couples. Although spouses with small age difference can be thought of as belonging to the same cohort, we would also need to restrain the ages at issue to instances where enough data points are available for a meaningful estimation procedure (in particular uncensored data points). Doing so would lead to analyzing only couples where both spouses are aged around 65 years.To assess the precision of our estimated marginals Yic∼IPH(πˆ,Tˆi,βˆi){Y}_{i}^{c}\hspace{0.33em} \sim \hspace{0.33em}\hspace{0.1em}\text{IPH}\hspace{0.1em}\left(\hat{{\boldsymbol{\pi }}},{\hat{{\boldsymbol{T}}}}_{i},{\hat{\beta }}_{i}), i=1,2i=1,2and c=1,2,3,4c=1,2,3,4, we use the conditional Kaplan–Meier (K–M in the following) estimator, also known as Beran estimator. For a sample X1,X2,…,Xn{X}_{1},{X}_{2},\ldots ,{X}_{n}and covariate matrix A{\boldsymbol{A}}, the conditional K–M estimator we use is (4.3)Pˆ(X≤t∣A=a)=FˆX(t∣a)=1−∏i:Xi:n≤t1−δi:nKa−Ai:nbn∑j=inKa−Aj:nbn,\hat{{\mathbb{P}}}\left(X\le t\hspace{0.33em}| \hspace{0.33em}{\boldsymbol{A}}={\boldsymbol{a}})={\hat{F}}_{X}\left(t\hspace{0.33em}| \hspace{0.33em}a)=1-\prod _{i:{X}_{i:n}\le t}\left(1-\frac{{\delta }_{i:n}K\left(\frac{{\boldsymbol{a}}-{{\boldsymbol{A}}}_{i:n}}{{b}_{n}}\right)}{{\sum }_{j=i}^{n}K\left(\frac{{\boldsymbol{a}}-{{\boldsymbol{A}}}_{j:n}}{{b}_{n}}\right)}\right),where Xi:n{X}_{i:n}are order statistics of the sample, δi:n{\delta }_{i:n}is the respective right-censoring indicators, K(⋅)K\left(\cdot )is a kernel function, and bn{b}_{n}is a band sequence. In our instance, the kernel function is a multivariate Gaussian density and bn=0.001{b}_{n}=0.001. For more details on the conditional K–M estimator, we refer the reader to the study by Dabrowska [12]. Figures 3 and 4 compare the survival probabilities obtained by the conditional K–M estimators with the one of the fitted distributions. One sees that in all cases the fit is in fact quite satisfactory.Figure 3Conditional K-M estimators vs the fitted distribution. (a) Couple 1. (b) Couple 2.Figure 4Conditional K-M estimators vs the fitted distribution. (a) Couple 3. (b) Couple 4.4.3Joint behaviorLet us now consider the resulting bivariate densities. In Figure 5, we find the bivariate densities for the four specified couples. The joint density of Couple 1 shows that large differences in their survival times are quite unlikely (the majority of the joint mass being located close the identity line). Also, the most likely survival times are close to 23 years. There is also a considerable probability mass above the identity line, where the woman survives longer than the man. For instance, we have F¯Y1(12,30)=32%{\bar{F}}_{{{\boldsymbol{Y}}}^{1}}\left(12,30)=32 \% , while F¯Y1(30,12)=11.79%{\bar{F}}_{{{\boldsymbol{Y}}}^{1}}\left(30,12)=11.79 \% .Figure 5Contour plots of the bivariate lifetime densities for the four couples. (a) Couple 1, (b) Couple 2, (c) Couple 3, and (d) Couple 4.For Couple 2, the situation is different (top right panel of Figure 5): with the man being already 68 years, and the woman being 63 years old, the remaining lifetimes are shorter, with the major mode of the joint distribution being located near (y1,y2)=(23,24)({y}_{1},{y}_{2})=\left(23,24), the second largest close to (y1,y2)=(17,35)({y}_{1},{y}_{2})=\left(17,35), and the next ones in the neighborhood of (y1,y2)=(17,19)({y}_{1},{y}_{2})=\left(17,19)and (y1,y2)=(9,19)({y}_{1},{y}_{2})=\left(9,19), respectively. There is now a much higher probability for the husband to die sooner.The joint density for Couple 3 resembles the one of Couple 1, with survival times beyond 40 years being more unlikely. Despite the man being the same age as the man in Couple 1, there is a larger probability for the couple to have survival times close to (y1,y2)=(29,45)({y}_{1},{y}_{2})=\left(29,45)than for Couple 1.Finally, the joint density of Couple 4 is close to the one of Couple 2. The spike around (y1,y2)=(17,35)({y}_{1},{y}_{2})=\left(17,35)is more pronounced than the analog of Couple 2, while the spike close to (y1,y2)=(23,24)({y}_{1},{y}_{2})=\left(23,24)is less important than its counterpart in Couple 2.One may be tempted to conclude from these four distributions that for the same age, having a younger partner leads to longer survival times, which would signal that the bereavement effect is weaker when spouses have larger differences in age at issue of a policy.4.4Dependence measuresLet us now explore some dependence measures for these four exemplary couples. In addition to Kendall’s tau and Spearman’s rho, we also consider three measures of time-dependent association, which were analyzed in the study by Luciano et al. [19].The mIPH distributions we consider all have the same copula as the corresponding mPH distributions with equal representation, since matrix-Gompertz distributions are monotone increasing transformations of PH distributions. At the same time, for the mPH class, we have explicit expressions for Kendall’s tau and Spearman’s rho. The pairwise Kendall’s tau of marginals Xk{X}_{k}and Xl{X}_{l}is given as follows: τXk,Xl=4∑i=1p∑j=1pπiπj(eiT⊗ejT)[−Tk⊕Tk]−1(e⊗tk)(eiT⊗ejT)[−Tl⊕Tl]−1(e⊗tl)−1,{\tau }_{{X}_{k},{X}_{l}}=4\mathop{\sum }\limits_{i=1}^{p}\mathop{\sum }\limits_{j=1}^{p}{\pi }_{i}{\pi }_{j}\left({{\boldsymbol{e}}}_{i}^{{\mathsf{T}}}\otimes {{\boldsymbol{e}}}_{j}^{{\mathsf{T}}}){\left[-{{\boldsymbol{T}}}_{k}\oplus {{\boldsymbol{T}}}_{k}]}^{-1}\left({\boldsymbol{e}}\otimes {{\boldsymbol{t}}}_{k})\left({{\boldsymbol{e}}}_{i}^{{\mathsf{T}}}\otimes {{\boldsymbol{e}}}_{j}^{{\mathsf{T}}}){\left[-{{\boldsymbol{T}}}_{l}\oplus {{\boldsymbol{T}}}_{l}]}^{-1}\left({\boldsymbol{e}}\otimes {{\boldsymbol{t}}}_{l})-1,while Spearman’s rank correlation is given as follows: ρXk,XlS=12∑j=1pπj(1−(π⊗ejT)[−Tk⊕Tk]−1(e⊗tk))(1−(π⊗ejT)[−Tl⊕tl]−1(e⊗Tl))−3,{\rho }_{{X}_{k},{X}_{l}}^{S}=12\mathop{\sum }\limits_{j=1}^{p}{\pi }_{j}(1-\left({\boldsymbol{\pi }}\otimes {{\boldsymbol{e}}}_{j}^{{\mathsf{T}}}){\left[-{{\boldsymbol{T}}}_{k}\oplus {{\boldsymbol{T}}}_{k}]}^{-1}\left({\boldsymbol{e}}\otimes {{\boldsymbol{t}}}_{k}))(1-\left({\boldsymbol{\pi }}\otimes {{\boldsymbol{e}}}_{j}^{{\mathsf{T}}}){\left[-{{\boldsymbol{T}}}_{l}\oplus {{\boldsymbol{t}}}_{l}]}^{-1}\left({\boldsymbol{e}}\otimes {{\boldsymbol{T}}}_{l}))-3,and compare with [7]. By calculating these quantities for the four specified couples, we obtain τY11,Y21=0.3104,τY12,Y22=0.2562,τY13,Y23=0.4367,τY14,Y24=0.2139,ρY11,Y21S=0.4526,ρY12,Y22S=0.3938,ρY13,Y23S=0.6144,ρY14,Y24S=0.3381.\begin{array}{rcl}{\tau }_{{Y}_{1}^{1},{Y}_{2}^{1}}& =& 0.3104,\hspace{1.0em}{\tau }_{{Y}_{1}^{2},{Y}_{2}^{2}}=0.2562,\hspace{1.0em}{\tau }_{{Y}_{1}^{3},{Y}_{2}^{3}}=0.4367,\hspace{1.0em}{\tau }_{{Y}_{1}^{4},{Y}_{2}^{4}}=0.2139,\\ {\rho }_{{Y}_{1}^{1},{Y}_{2}^{1}}^{S}& =& 0.4526,\hspace{1.0em}{\rho }_{{Y}_{1}^{2},{Y}_{2}^{2}}^{S}=0.3938,\hspace{1.0em}{\rho }_{{Y}_{1}^{3},{Y}_{2}^{3}}^{S}=0.6144,\hspace{1.0em}{\rho }_{{Y}_{1}^{4},{Y}_{2}^{4}}^{S}=0.3381.\end{array}All couples manifest positive concordance, which on top of Figure 5 is additional evidence that the lifetimes of individuals in a couple are correlated. We find the strongest concordance in couples where women are older than their husbands. Concretely, Couple 3 has the highest Kendall’s tau and Spearman’s rho values, followed by Couple 1. Couples 2 and 4 have lower and similar corresponding values.As a first measure of time-dependent association, we consider in Figure 6Ψ1(y1,y2)=S(y1,y2)/(SY1(y1)SY2(y2)).{\Psi }_{1}({y}_{1},{y}_{2})=S({y}_{1},{y}_{2})\hspace{0.1em}\text{/}\hspace{0.1em}({S}_{{Y}_{1}}({y}_{1}){S}_{{Y}_{2}}({y}_{2})).Most of the resulting values are greater than 1, indicating positive dependence. However, for values of y2{y}_{2}close to 30, Couples 1–3 exhibit Ψ1(y1,y2)<1{\Psi }_{1}({y}_{1},{y}_{2})\lt 1. This suggests negative dependence for remaining lifetimes when women survive at least 30 years and men at least 20 years. One can see that, roughly, after values y1=29{y}_{1}=29and y2=39{y}_{2}=39, the ratios Ψ1(y1,y2){\Psi }_{1}({y}_{1},{y}_{2})remain constant. This happens since for very large survival times y1,y2{y}_{1},{y}_{2}, marginal survival probabilities change very little, and this change is absorbed by S(y1,y2)S({y}_{1},{y}_{2}).Figure 6Ψ1(y1,y2){\Psi }_{1}({y}_{1},{y}_{2})for the four couples. (a) Couple 1, (b) Couple 2, (c) Couple 3, and (d) Couple 4.Next, we depict in Figure 7 the measures Ψ21(0,y2)=E(Y1∣Y2≥y2)/E(Y1),Ψ22(y1,0)=E(Y2∣Y1≥y1)/E(Y2)\begin{array}{rcl}{\Psi }_{2}^{1}\left(0,{y}_{2})& =& {\mathbb{E}}({Y}_{1}\hspace{0.33em}| \hspace{0.33em}{Y}_{2}\ge {y}_{2})\hspace{0.1em}\text{/}\hspace{0.1em}{\mathbb{E}}({Y}_{1}),\\ {\Psi }_{2}^{2}({y}_{1},0)& =& {\mathbb{E}}({Y}_{2}\hspace{0.33em}| \hspace{0.33em}{Y}_{1}\ge {y}_{1})\hspace{0.1em}\text{/}\hspace{0.1em}{\mathbb{E}}({Y}_{2})\end{array}for all four couples, which give the relative change of conditional expectations of spouses’ excess lifetimes, given that the partner survives at least y2(y1){y}_{2}({y}_{1})years. We see that the latter increase throughout with y2(y1){y}_{2}({y}_{1}). In general, the ratios Ψ2i(⋅,⋅){\Psi }_{2}^{i}\left(\cdot ,\cdot ), i=1,2i=1,2, are close in value when y1,y2≤20{y}_{1},{y}_{2}\le 20, meaning that the survival of a spouse has a similar effect on the remaining lifetime of the partner for the first 20 years. After that, the relative lifetime improvement increases much faster for women, i.e., their expected lifetime improvement is then more sensitive to the survival of the partner than vice versa. Like in Figure 6, both ratios remain constant after spouses’ survival times of y1=29{y}_{1}=29for women and y2=39{y}_{2}=39for men.Figure 7Ψ21(0,y){\Psi }_{2}^{1}\left(0,y)and Ψ22(x,0){\Psi }_{2}^{2}\left(x,0)for the four couples. (a) Couple 1, (b) Couple 2, (c) Couple 3, and (d) Couple 4.The last measure of time-dependent association we consider here is the cross-ratio CR(y1,y2)=S(y1,y2)d2dy1dy2S(y1,y2)ddy1SY1(y1)ddy2SY2(y2){\rm{CR}}({y}_{1},{y}_{2})=S({y}_{1},{y}_{2})\frac{\frac{{{\rm{d}}}^{2}}{{\rm{d}}{y}_{1}{\rm{d}}{y}_{2}}S({y}_{1},{y}_{2})}{\frac{{\rm{d}}}{{\rm{d}}{y}_{1}}{S}_{{Y}_{1}}({y}_{1})\frac{{\rm{d}}}{{\rm{d}}{y}_{2}}{S}_{{Y}_{2}}({y}_{2})}originally introduced by Clayton [11], which gives the relative increase of the force of mortality of an individual immediately after death of the partner. The quantity relevant in our model is CR(u,u){\rm{CR}}\left(u,u), and Figure 8 depicts the resulting figures for our model. In the study by Luciano et al. [19], the resulting curves were monotone increasing in uu, as a result of the imposed copula assumption on the joint lifetimes. In contrast, in the present setup, these curves are not monotone increasing in uu. One may interpret that in the present approach, the a priori dependence assumptions are less specified, and the data have a stronger impact on the resulting shape of CR(u,u){\rm{CR}}\left(u,u)than in a specified copula model, where the data merely influence the value of the dependence parameter. One can see from Figure 8 that the cross-ratios exceed 1 for u≤29u\le 29, so the survivor’s force of mortality is increased immediately after the death of the spouse, showing a bereavement effect (or broken-heart syndrome), but not a monotone one (with the magnitude varying across age combinations of the couple). This bereavement effect somewhat seems to disappear for survival times beyond 29 years, and for this reason, we do not plot values u>29u\gt 29, but in that range, survival probabilities are very low anyway, and there are very few data points in that range to draw strong conclusions.Figure 8CR(u,u)CR\left(u,u)for the four couples. (a) Couple 1, (b) Couple 2, (c) Couple 3, and (d) Couple 4.Eventually, like in many other situations, it may depend on the number of available data points whether one prefers to have a flexible dependence structure in the fitting or a prespecified copula family with possibly attractive stylized features, especially for extrapolated conclusions in regions with few data points. In this discussion, one may still appreciate the immediate causal interpretation of the mIPH model in terms of a common aging mechanism.4.5Life expectanciesLet us finally also use the model fit to obtain some insight into expected remaining lifetimes in the couple. By using Property (2.4), we are able to derive expected survival times for an individual, conditional on the survival time of their partner. Moreover, by using optimal regression coefficients found by Algorithm 1 (Table 1), we can study how marginal expectations vary with ages in a couple. By letting the man’s and woman’s age vary from age 60 to 100, we obtain for each age combination a distinct mIPH distribution for the random vector Y{\boldsymbol{Y}}. Given these distributions, we summarize in Figure 9 how the expected remaining lifetime at issue changes as a function of the male’s and female’s age at that point in time. For both man and woman, the marginal expected survival times decrease when both individuals in the couple grow older. For any specific age, both men and women have larger marginal expectations as their spouses become younger. That marginal expectation varies more for women, as one can see from the steepness and range of the women’s curve in Figure 9. Another way to interpret this is that men’s expected survival times are affected less by the age of their partner, compared to women.Figure 9Marginal expected excess survival times as a function of men’s age y1{y}_{1}and women’s y2{y}_{2}at issue. (a) Man and (b) woman.Besides age, conditioning also on the survival time of the spouse affects the initial distribution vector, leading to different expected values. Figures 10 and 11 present marginal expected survival times, conditional on a spouse survival of at least 10 and 15 years counted from issue of the policy, respectively. The respective change in the men’s remaining life expectation is notable by mere visual comparison, whereas for women, it is less pronounced. From Figure 10, we see that the expectation increases for almost all ages, compared with Figure 9. For men, one notices that the shape of the curve is changed for ages 65≤y1≤7565\le {y}_{1}\le 75. Moreover, for older ages, y1{y}_{1}and y2{y}_{2}, the curve is now slightly steeper than before. In the women’s case, marginal expectations for a woman aged 60 or 100 with husband aged 60 are now equivalent, and no other major change, can be easily observed. The marginal expectation for men whose spouses survive at least 15 years is very much different from their unconditional counterparts. Inspecting Figure 11, one can see a sizable change of the curve. Men with spouses of age y2∈(80,100){y}_{2}\in \left(80,100)are now expected to survive much longer than before, while once again for women, we only have a rather minor twist of the curve. In all figures, we see that the women’s curves do not show major change. Still, they suggest that women are more sensitive to both age and survival of their partner.Figure 10Marginal expected excess survival times, conditional on spouse survival during 10 years. (a) Man and (b) woman.Figure 11Marginal expected excess survival times, conditional on spouse survival during 15 years. (a) Man and (b) woman.5ConclusionIn this article, we introduce the mIPH class, study some of its properties, and develop an estimation procedure that allows for right-censored observations and covariate information. In particular, we use this framework to propose a bivariate Matrix-Gompertz distribution for the modeling of excess joint lifetimes of couples. By adapting a respective Expectation–Maximization algorithm, we estimate sub-intensity matrices, inhomogeneity functions, and different initial distribution vectors without separating joint features from marginals. Initial probabilities are assumed to be linked to spouses’ ages at the issue of an insurance policy. Employing multinomial logistic regressions to predict the latter, tailor-made bivariate distributions are produced that reflect distinct aging dynamics and dependence structures. The resulting mIPH distributions showcase strong positive concordance for remaining lifetimes of spouses, particularly when the difference in age at the issue of the policy is small.The results and illustrations given in this article demonstrate the accuracy and the flexibility of the mIPH class, which may also be employed in areas beyond the present lifetime setup, including applications in nonlife insurance. The mIPH class may be considered a valid alternative to copula-based methods, particularly when one wants to estimate marginal and multivariate properties at the same time, and has sufficiently many data points available to keep the pre-imposed dependence assumptions (and structure) minimal. In addition, modeling with members of the mIPH class allows for an immediate causal interpretation of the resulting model in terms of common aging through stages.

Dependence Modeling – de Gruyter

**Published: ** Jan 1, 2023

**Keywords: **mortality modeling; multivariate PH distributions; censoring; EM algorithm; 62N02

Loading...

You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!

Read and print from thousands of top scholarly journals.

System error. Please try again!

Already have an account? Log in

Bookmark this article. You can see your Bookmarks on your DeepDyve Library.

To save an article, **log in** first, or **sign up** for a DeepDyve account if you don’t already have one.

Copy and paste the desired citation format or use the link below to download a file formatted for EndNote

Access the full text.

Sign up today, get DeepDyve free for 14 days.

All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.