Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Principal surrogates in context of high vaccine efficacy

Principal surrogates in context of high vaccine efficacy IntroductionAn important factor influencing the duration and complexity of clinical trials is the choice of the endpoint used to assess vaccine efficacy. It would be extremely convenient to replace a late, costly or rare true endpoint by an immunological surrogate, which is measured earlier, cheaper, or more frequently. However, from a regulatory perspective, a surrogate endpoint (called sometimes Surrogate of protection, or Correlate of Protection) is not considered acceptable for the determination of efficacy, unless it has been validated, i.e. shown to predict clinical benefit. Prentice (1989) introduced a formal definition of surrogacy based on the concept of mediation in a single-trial setting. Although appealing, Prentice’s definition and criteria received criticism, such as (i) the assumption that the surrogate explains 100% of the VE is too restrictive; (ii) the approach can be susceptible to post-randomization selection bias; (iii) the immune response cannot be constant in the control group; etc. (Burzykowski, Molenberghs, and Buyse 2005; Frangakis and Rubin 2002). In subsequent decades, many statistical methods have been proposed for the evaluation of surrogate endpoints, most of them framed within the causal inference (Follmann 2006; Frangakis and Rubin 2002; Gilbert, Qin, and Self 2008) and meta-analytic paradigms (Buyse et al. 2000; Daniels and Hughes 1997; Gail et al. 2000).Although not common, vaccines with very high efficacy are documented in the literature (Black et al. 2000; Lin et al. 2001; Mitra et al. 2016; Phua et al. 2012; Prymula et al. 2014; Wei et al. 2016). These include the salmonella typhi vi conjugate (Mitra et al. 2016), or the combined measles-mumps-rubella-varicella immunisation (Prymula et al. 2014). Assessing Correlate of Protections (CoPs) in the context of high Vaccine Efficacy (VE) using classical statistical methods is problematic. Indeed, a very small number of cases/infections (corresponding to the vaccinated groups) can trigger considerable issues for such statistical models. There is therefore a need to evaluate the statistical methods for CoP assessment to the context of high efficacy vaccines. Callegaro and Tibaldi (2019) showed that the validation of a surrogate endpoint using the Prentice criteria and meta-analytic frameworks (by randomized subgroups in single trial setting) can be problematic in case of high VE because of the rare events available in the vaccine group. The aim of this paper is to evaluate the performance of the causal framework, specifically the Principal Surrogate approach (Follmann 2006; Gilbert, Qin, and Self 2008) in case of high VE.MethodsThe Prentice criteriaThe following set of notation will be used throughout the manuscript: Yjand Sjare random variables denoting the observed clinical (binary) and the surrogate endpoint for subject j=1, …, n and Zjis a binary treatment indicator (Z=1 for treatment and Z=0 for control group).Key concepts, including the hypothesis-testing approach to the validation of substitute endpoints using randomised clinical trial data, were introduced by Prentice in 1989 (Prentice 1989). Prentice’s four criteria for the validation of a surrogate endpoint can be evaluated using the following 4 models:logit(P(Yi=1))=μT+βZi,Si=μS+αZi+εSi,logit(P(Yi=1))=μ+γSi,logit(P(Yi=1))=μ̃T+βSZi+γZSi.\begin{align*}\hfill \text{logit}\left(P\left({Y}_{i}=1\right)\right)& ={\mu }_{T}+\beta {Z}_{i},\hfill \\ \hfill {S}_{i}& ={\mu }_{S}+\alpha {Z}_{i}+{\varepsilon }_{{S}_{i}},\hfill \\ \hfill \text{logit}\left(P\left({Y}_{i}=1\right)\right)& =\mu +\gamma {S}_{i},\hfill \\ \hfill \text{logit}\left(P\left({Y}_{i}=1\right)\right)& ={\tilde {\mu }}_{T}+{\beta }_{S}{Z}_{i}+{\gamma }_{Z}{S}_{i}.\hfill \end{align*}In this paper we will mainly focus on criterion 4. This criterion is met if the null hypothesis H01: γZ=0 is rejected (p-value(s)<α${< }\alpha $) and the null hypothesis H02: βS=0 is not rejected (p-value(z)≥α$\ge \alpha $). Note that when this criterion is met (βS=0 and γZ≠0), then model 4 degenerates to model 3. The significance level α is not adjusted for multiplicity because the null hypothesis is the intersection of two null hypothesis.Principal surrogate frameworkMany causal inference approaches/methods have been published in the literature. In what follows, we describe the Vaccine Efficacy Framework of Follmann and Gilbert (Follmann 2006; Gilbert, Qin, and Self 2008). Since Sican be affected by treatment, there are 2 naturally occurring counterfactual values of Si: Si(1) under treatment, and Si(0) under control. The observed clinical endpoint (binary) is denoted by Yiand the counterfactual values are Yi(1) under treatment, and Yi(0) under control. Criteria for S to be a good surrogate are based on risk estimands that condition on the potential surrogate responses (Gilbert, Qin, and Self 2008)risk(1)(s(1),s(0))=Pr(Y(1)=1|S(1)=s(1),S(0)=s(0))risk(0)(s(1),s(0))=Pr(Y(0)=1|S(1)=s(1),S(0)=s(0))\begin{align*}\hfill {\text{risk}}_{\left(1\right)}\left(s\left(1\right),s\left(0\right)\right)& =\mathrm{Pr}\left(Y\left(1\right)=1\vert S\left(1\right)=s\left(1\right),S\left(0\right)=s\left(0\right)\right)\hfill \\ \hfill {\text{risk}}_{\left(0\right)}\left(s\left(1\right),s\left(0\right)\right)& =\mathrm{Pr}\left(Y\left(0\right)=1\vert S\left(1\right)=s\left(1\right),S\left(0\right)=s\left(0\right)\right)\hfill \end{align*}A contrast in risk(1)(s(1), s(0)) and risk(0)(s(1), s(0)) is a causal effect on the clinical endpoint. A classical contrast used in vaccines is the Vaccine Efficacy (VE)VE(s(1),s(0))=1−Pr(Y(1)=1|S(1)=s(1),S(0)=s(0))Pr(Y(0)=1|S(1)=s(1),S(0)=s(0))$$\text{VE}\left(s\left(1\right),s\left(0\right)\right)=1-\frac{\mathrm{Pr}\left(Y\left(1\right)=1\vert S\left(1\right)=s\left(1\right),S\left(0\right)=s\left(0\right)\right)}{\mathrm{Pr}\left(Y\left(0\right)=1\vert S\left(1\right)=s\left(1\right),S\left(0\right)=s\left(0\right)\right)}$$A Principal Surrogate (PS) is a biomarker satisfying two conditions: causal necessityVE(s(1),s(0))=0 for all s(1)=s(0)$$\text{VE}\left(s\left(1\right),s\left(0\right)\right)=0\quad \text{for\,all}\quad s\left(1\right)=s\left(0\right)$$and Wide Effect Modification (WEM) which means thatVE(s(1),s(0)) increasing in s(1)−s(0),$$\text{VE}\left(s\left(1\right),s\left(0\right)\right)\quad \mathrm{increasing\,in}\quad s\left(1\right)-s\left(0\right),$$WEM is similar in spirit to the Individual Causal Association (ICA) (Alonso et al. 2015), which is the correlation between the individual causal effect on the endpoint and on the surrogate.In this paper we only focus on WEM. In fact, current works (Gabriel and Follmann 2016; Gabriel and Gilbert 2014; Gilbert, Qin, and Self 2008; Huang and Gilbert 2011; Wolfson and Gilbert 2010) suggest that WEM criterion is of primary importance for a biomarker to be a PS. Furthermore, Alonso et al. (2015) showed that the average causal necessity definition may be extremely restrictive.Estimating VEAssumptions A1–A3 (A1: Stable unit treatment value assumption; A2: Ignorable treatment assignments; A3: Equal individual clinical risk up to the time of surrogate measurements) imply that risk(Z)(s(1), s(0)) would be identified if we knew the potential outcomes Si(Z) of subjects assigned the opposite treatment 1 − Z (Wolfson and Gilbert 2010)risk(1)(s(1),s(0))=Pr(Y=1|Z=1,S(1)=s(1),S(0)=s(0))risk(0)(s(1),s(0))=Pr(Y=1|Z=0,S(1)=s(1),S(0)=s(0)).\begin{align*}\hfill {\text{risk}}_{\left(1\right)}\left(s\left(1\right),s\left(0\right)\right)& =\mathrm{Pr}\left(Y=1\vert Z=1,S\left(1\right)=s\left(1\right),S\left(0\right)=s\left(0\right)\right)\hfill \\ \hfill {\text{risk}}_{\left(0\right)}\left(s\left(1\right),s\left(0\right)\right)& =\mathrm{Pr}\left(Y=1\vert Z=0,S\left(1\right)=s\left(1\right),S\left(0\right)=s\left(0\right)\right).\hfill \end{align*}It follows that it is necessary to impute (or integrate out) the missing potential biomarkers. The risk can be modeled using the following logistic modellogit(P(Yi=1|zi,s1i,s0i))=β0+βzzi+βs(1)s1i+βs(1)zs1izi+βs(0)s0i+βs(0)zs0izi.$$\text{logit}\left(P\left({Y}_{i}=1\vert {z}_{i},{s}_{1i},{s}_{0i}\right)\right)={\beta }_{0}+{\beta }_{z}{z}_{i}+{\beta }_{s\left(1\right)}{s}_{1i}+{\beta }_{s\left(1\right)z}{s}_{1i}{z}_{i}+{\beta }_{s\left(0\right)}{s}_{0i}+{\beta }_{s\left(0\right)z}{s}_{0i}{z}_{i}.$$The model can be simplified in case of a Constant Biomarker (Si(0) = c)(1)logit(P(Yi=1|zi,s1i))=β0+βzzi+βs(1)s1i+βs(1)zs1izi,$$\text{logit}\left(P\left({Y}_{i}=1\vert {z}_{i},{s}_{1i}\right)\right)={\beta }_{0}+{\beta }_{z}{z}_{i}+{\beta }_{s\left(1\right)}{s}_{1i}+{\beta }_{s\left(1\right)z}{s}_{1i}{z}_{i},$$where the VE curve is usedVE(s(1))=1−Pr(Y=1|Z=1,S(1)=s(1))Pr(Y=1|Z=0,S(1)=s(1)).$$\text{VE}\left(s\left(1\right)\right)=1-\frac{\mathrm{Pr}\left(Y=1\vert Z=1,S\left(1\right)=s\left(1\right)\right)}{\mathrm{Pr}\left(Y=1\vert Z=0,S\left(1\right)=s\left(1\right)\right)}.$$The constant biomarker assumption is reasonable when subjects have been selected to have no meaningful exposure to the pathogen, so that S(0) = 0. Examples include HIV (Follmann 2006) or varicella vaccine trials (Chan et al. 2002). This assumption is also reasonable for populations exposed to the pathogen when the biomarker Siis the log10 Fold-Increase from baseline (FIi), which is the difference between the log10 post (Ai) and the log10 baseline (Bi) values (FIi=Ai− Bi).Missing values imputation/integrationThe key challenge in estimating these risk estimands is solving the problem of conditioning on counterfactual values that are not observable. This involves integrating out (or imputing) missing values based on some models, and under some set of assumptions and/or trial augmentations. Gilbert, Qin, and Self (2008) and Follmann (2006) proposed to use the estimated maximum likelihood followed by bootstrap. Huang, Gilbert, and Wolfson (2013) suggested a pseudoscore estimation procedure that does have a closed form variance estimator. Miao et al. (2013) used a multiple imputation approach. In this paper we fit model 1 using the method implemented in the R package pseval (Sachs and Gabriel 2016): Baseline Immunogenicity Predictor (BIP); parameters estimated using estimated maximum likelihood (missing information is integrated out) and the variance is estimated by bootstrap. Rcode is provided in the Appendix. This approach is similar in spirit to the method used in Follmann (2006).ResultsSimulations of Callegaro and Tibaldi (2019)To evaluate the impact of high vaccine efficacy on the PS validation, we repeated the simulations of Callegaro and Tibaldi (2019). The Dunning regression model (Dunning 2006) was used to simulate the data in an ideal CoP setting, where the treatment effect is fully explained by the post values (Ai) as follows:(2)P(Yi=1|π,Ai)=πeμ+γAi1+eμ+γAi.$$P\left({Y}_{i}=1\vert \pi ,{A}_{i}\right)=\pi \frac{{\text{e}}^{\mu +\gamma {A}_{i}}}{1+{\text{e}}^{\mu +\gamma {A}_{i}}}.$$Here, π can be interpreted as the probability of being exposed to the disease. This model corresponds to the classical logistic model when all subjects are exposed (π=1).Simulations were run using the following parameter assumptions: total sample size n=5,000, 1:1 randomization, π=0.1, μ=8.3, γ=log(1–0.95); the immune response post vaccination is normally distributed A|Z=0 ∼ N(3, 0.2) in the placebo group and A|Z=1 ∼ N(3 + Δ, 0.2) in the vaccine group, where Δ=0.33, 0.75, 1, 1.5. The value of the immune response at baseline is generated as B ∼ N(3, 0.2) with correlation between A and B of 0.90 in the placebo group and 0.50 in the vaccine group (0.2 is the variance of the normal distribution). For each scenario, 1,000 clinical trials were simulated.We fit Prentice model 4 on the simulated data with Fold-Increase (Si=FIi=Ai− Bi) as surrogate adjusting for the baseline (Bi) using logit regressionlogit(P(Yi=1))=μ̃T+βSZi+γZFIi+γBBi$$\text{logit}\left(P\left({Y}_{i}=1\right)\right)={\tilde {\mu }}_{T}+{\beta }_{S}{Z}_{i}+{\gamma }_{Z}{\text{FI}}_{i}+{\gamma }_{B}{B}_{i}$$and the scaled logistic model Dunning (2006)P(Yi=1)=πeμ̃T+βSZi+γZFIi+γBBi1+eμ̃T+βSZi+γZFIi+γBBi.$$P\left({Y}_{i}=1\right)=\pi \frac{{\text{e}}^{{\tilde {\mu }}_{T}+{\beta }_{S}{Z}_{i}+{\gamma }_{Z}{\text{FI}}_{i}+{\gamma }_{B}{B}_{i}}}{1+{\text{e}}^{{\tilde {\mu }}_{T}+{\beta }_{S}{Z}_{i}+{\gamma }_{Z}{\text{FI}}_{i}+{\gamma }_{B}{B}_{i}}}.$$Note that this model is consistent with the model used to generate the data (Eq. (2)), with a slightly different parametrization. The power to meet Prentice criterion 4 (PC4) was measured as the proportion of simulated trials with p-value(s) =2Φ(−|γẐ/Var(γẐ)|)<α$=2{\Phi}\left(-\vert \hat{{\gamma }_{Z}}/\sqrt{\mathrm{Var}\left(\hat{{\gamma }_{Z}}\right)}\vert \right){< }\alpha $and p-value(z) =2Φ(−|βŜ/Var(βŜ)|)≥α$=2{\Phi}\left(-\vert \hat{{\beta }_{S}}/\sqrt{\mathrm{Var}\left(\hat{{\beta }_{S}}\right)}\vert \right)\ge \alpha $, α=0.05.Furthermore, we applied the Principal surrogate approach on vaccine induced fold-increase (S(1)i=FI(1)i) where missing information is integrated out using the baseline surrogate measurement (Bi). The power of the WEM approach was measured as the proportion of simulated trials with significant Wald statistics for the s(1)z coefficent of model (1) (pvalue(s(1)z)<α, α=0.05). Appendix contains the R code used to apply the PS approach is provided.Table 1 shows that the power of both PC4 and WEM decreases when the VE increases. This is due to the fact that there is less information (number of events) as the VE increases. Note that the power of the Prentice approach is higher than in Callegaro and Tibaldi (2019) because of the inclusion of the baseline surrogate as covariate. Simulation results suggest similar power for PC4 and WEM approaches.Table 1:Simulation results of data generated using scaled logit Prentice model 3.ΔVÊ$\hat{\text{VE}}$PC4 logisticPC4 scaled logisticWEM0.330.410.940.960.920.750.750.930.960.921.000.870.890.950.901.500.960.800.880.73Power (α=0.05) to assess Prentice criterion 4 (PC4) using logistic and scaled logistic model and power to assess the Wide Effect Modification (WEM) of a Principal Surrogate using the logistic model (p-value of the interaction s(1)z).The performance of the two approaches depends on the correlation between A and B. In fact, larger is the correlation, more informative is the covariate B. To assess the role of the correlation on the results, we replicated Table 1 with smaller correlation between A and B (Cor(A,B)=0.5 in the placebo and in the vaccine group). Simulation results are shown in Table 2. We can see that when the correlation is smaller (i.e. when the covariate B is less informative) there is a greater loss of power for high VE for both approaches, especially for the PS approach. These results are aligned with the simulation results of Callegaro and Tibaldi (2019), showing a similar loss of power of Prentice method without covariates.Table 2:Simulation results of data generated using scaled logit Prentice model 3 with smaller correlation between A and B (cor(A,B)=0.5 in placebo and in the vaccine group).ΔVÊ$\hat{\text{VE}}$PC4 logisticPC4 scaled logisticWEM0.330.410.940.960.980.750.750.890.960.971.000.860.790.950.921.500.960.690.950.62Power (α=0.05) to assess Prentice criterion 4 (PC4) using logistic and scaled logistic model and power to assess the Wide Effect Modification (WEM) of a Principal Surrogate using the logistic model (p-value of the interaction s(1)z).Simulations with constant biomarker under placeboIn the previous simulations the Fold-Increase was not constant in placebo (it was normally distributed). To evaluate the performance of the Prentice and PS approach in case of constant biomarker under placebo, which mimics vaccine trials in a naive population, we simulated data using the model described above. However, in the inferential models, we replaced FI by FI* which is constant in Placebo. FI* is defined asFI*=FI ifFI>cc ifFI≤c$${\text{FI}}^{{\ast}}=\begin{cases}\text{FI}\quad \hfill & \text{if} \text{FI}{ >}c\hfill \\ c\quad \hfill & \text{if} \text{FI}\le c\hfill \end{cases}$$where c is the 99% quantile of the distribution of FI in Placebo.Table 3 shows some loss of power of the PS approach when the VE increases. Even if the use of the Prentice framework is not justified in this context, Table 3 shows the results of the Prentice criteria 4 (PC4 logistic model). Results from PC4 scaled logistic are not shown because the model is not converging. We observe a dramatic loss of power of Prentice criterion 4 when the VE is high.Table 3:Simulation results with constant biomarker (inferential models do not agree with the data generating mechanism).ΔVÊ$\hat{\text{VE}}$PC4 logisticWEM0.330.410.800.670.750.750.450.851.000.870.490.841.500.960.370.69Power (α=0.05) to assess Prentice criterion 4 (PC4) using logistic model and power to assess the Wide Effect Modification (WEM) of a Principal Surrogate using the logistic model (p-value of the interaction s(1)z).Note that Table 3 shows simulation results where the inferential models do not agree with the data generating mechanism, so it represents a situation of model miss-specification.To disentangle the problem of model miss-specification from the constant biomarker problem, we generate additional constant biomarker data using a model consistent with the “inferential” model used to fit the data. We simulated data using the following Dunning regression model:P(Yi=1|π,FIi*,Bi)=πeμ+γFIi*+γBBi1+eμ+γFIi*+γBBi.$$P\left({Y}_{i}=1\vert \pi ,{\text{FI}}_{i}^{{\ast}},{B}_{i}\right)=\pi \frac{{\text{e}}^{\mu +\gamma {\text{FI}}_{i}^{{\ast}}+{\gamma }_{B}{B}_{i}}}{1+{\text{e}}^{\mu +\gamma {\text{FI}}_{i}^{{\ast}}+{\gamma }_{B}{B}_{i}}}.$$Here, π = 0.1 and the other parameters are chosen to mimic Table 1 data: Δ=0.33, 0.75, 1, 1.47, μ=(8.66, 9.45, 9.82, 9.41), γ=(−5.39, − 5.15, − 4.8, − 4.45) and γB=(−2.31, − 2.63, − 2.79, − 2.66).Table 4 shows that the loss of power of Prentice approach shown in Table 3 was mainly due to model miss-specification. In fact, Table 4 shows a relatively higher power of PC4 logistics than Table 3 when VE is large.Table 4:Simulation results with constant biomarker (inferential models agree with the data generating mechanism).ΔVÊ$\hat{\text{VE}}$PC4 logisticWEM0.330.290.960.730.750.620.940.851.000.780.960.841.470.940.790.80Power (α=0.05) to assess Prentice criterion 4 (PC4) using logistic model and power to assess the Wide Effect Modification (WEM) of a Principal Surrogate using the logistic model (p-value of the interaction s(1)z).Simulations with low/moderate VEFor comparison, we considered simulations with low VE. We simulated data as described above with μ1=E(A|Z=1)=3, 3.075, 3.15, 3.23, corresponding to estimated VE about 0%, 10%, 20% and 30%, respectively. Note that Prentice criteria 1 will not be met in this situation. For simplicity, we focused only on Prentice criterion 4. Table 5 shows that both approaches (PC4 and WEM) are powerful in the case of low/moderate VE. Prentice criterion 4 seems to be slightly more powerful than PS.Table 5:Simulation results with small/moderate VE.ΔVÊ$\hat{\text{VE}}$PC4 logisticPC4 scaled logisticWEM0.000−0.010.950.960.920.0750.090.950.960.910.1500.200.950.970.910.2500.310.950.970.93Power (α=0.05) to assess Prentice criterion 4 (PC4) using logistic and scaled logistic model and power to assess the Wide Effect Modification (WEM) of a Principal Surrogate using the logistic model (p-value of the interaction s(1)z).Simulations using random intercept logistic (correlated potential outcomes)Finally, we generated data in a different way more aligned with the causal inference setting (potential outcomes). We generated correlated post-vaccination values (A(0), A(1)) using a bivariate normal distributionA(0)A(1)∼N33+Δ,0.20.10.10.2$$\left(\begin{matrix}\hfill A\left(0\right)\hfill \\ \hfill A\left(1\right)\hfill \end{matrix}\right)\sim N\left[\left(\begin{matrix}\hfill 3\hfill \\ \hfill 3+{\Delta}\hfill \end{matrix}\right),\left(\begin{matrix}\hfill 0.2\hfill & \hfill 0.1\hfill \\ \hfill 0.1\hfill & \hfill 0.2\hfill \end{matrix}\right)\right]$$with Δ=(0.33, 0.75, 1.1, 1.6). The mean and the variance of the baseline are the same as the post-dose surrogate in Placebo. The correlation between baseline and post is 90% in Placebo and 50% in Vaccinated, respectively. We generated the correlated clinical outcomes using a logistic model with individual random intercept (bi)logit(P(Y(z)i=1|A(z)i,bi))=μ+A(z)iγ+bi.$$\text{logit}\left(P\left(Y{\left(z\right)}_{i}=1\vert A{\left(z\right)}_{i},{b}_{i}\right)\right)=\mu +A{\left(z\right)}_{i}\gamma +{b}_{i}.$$The variables Y(0), Y(1) are conditionally independent given b but unconditionally (averaged over b) correlated. The extent of correlation depends on the variance of the random effect (var(b)).We generated bridge distributed random intercept (using R package bridgedist Swihart (2016)) such that the resultant marginal distribution follows a logistic regression model Wang and Louis (2003). In fact, the marginal logistic regression model is logit(P(Y(z)=1|A(z)))=μ/c + A(z)γ/c for z=0, 1 with c=1+3var(b)/π2$c=\sqrt{1+3\mathrm{var}\left(b\right)/{\pi }^{2}}$. We simulated data with the following parameters: var(b)=10 (scale=0.5), μ=3.6 and γ=−3.8. In this way, p0=P(Y=1|Z=0)=0.05 and the estimated VE is about 0.45, 0.75, 0.85 and 0.95.Table 6 shows that Prentice criterion 4 is more powerful than WEM.Table 6:Simulation results with random intercept (bridge) logistic regression.ΔVÊ$\hat{\text{VE}}$PC4 logisticWEM0.330.460.950.850.750.750.940.841.100.870.910.771.600.950.890.74Power (α=0.05) to assess Prentice criterion 4 (PC4) using logistic model and power to assess the Wide Effect Modification (WEM) of a Principal Surrogate using the logistic model (p-value of the interaction s(1)z).The Prentice framework is more powerful than PS for different reasons: (i) PS tests for an interaction, which is less powerful than a test for the main effect; (ii) the covariate S (observed surrogate in vaccinated and placebo) has greater range in the Prentice model 4 than the covariate S(1) in the PS model. It is easier to estimate a slope for a covariate with a bigger range. Figure 1 illustrate these differences.Figure 1:Simulated trial with n=5,000 per arm under the scenario with Δ=1.6. The true probability of infection is graphed as a function of observed As and A(1)s respectively. The top panel is the data used for the Prentice criteria while the bottom panel is used to test WEM. Red denotes the placebo group while blue denotes the vaccine group. The events are shown at the top and the non-events at the bottom of the graph.Case study: analysis of a simulated data-set with large VEIn this section we analyze one simulated dataset from the scenario with largest VE of Table 1. The sample size is n=5,000, with 1:1 randomization. The number of events observed in the two groups are 3 and 90, with an estimated VE of 96% (95%CI, 89–98%). Figure 2 shows that the vaccine and placebo groups had similar log10 titer distributions at baseline while there is a small overlap in distributions post vaccination. Antibody responses clearly increased from baseline to post-dose in vaccine recipients but not in placebo recipients.Figure 2:Distribution of the surrogate endpoint: baseline, post and fold-increase (post-baseline).Figure 3 shows the Spearman correlation between baseline and post (left panel) and between baseline and Fold-Increase (right panel).Figure 3:Correlation between baseline and post (left panel) and between baseline and fold-increase (right panel).Prentice frameworkFirst we examine the interaction between surrogate and the treatment. Table 7 shows that there is no interaction (p-value=0.49).Table 7:Logistic model with interaction between treatment group and surrogate.EstimateStd. errorz Valuep-Value(Intercept)0.8330.7171.1610.245Z−0.5761.696−0.3390.734FI−1.0600.555−1.9090.056B−1.4340.257−5.5830.000group:FI−0.9661.401−0.6900.490Secondly, we assess the four Prentice criteria. Table 8 shows that all criteria are met. In particular, the last 4 rows shows the results related to criterion 4. We can see that the effect of the surrogate is significant (p-value(s)=0.019), while the treatment effect is not significant, but is close to 5% (p-value(z) = 0.078).Table 8:Prentice criteria: logistic and linear models.CriterionVariableEstimateStd errorz Valuep-Value1(Intercept)0.4330.6980.6200.5351Z−3.4480.588−5.8650.0001B−1.2930.249−5.1960.0002(Intercept)0.9560.03130.7340.0002Z1.4720.009163.6530.0002B−0.3170.010−31.1660.0003(Intercept)1.0920.7021.5560.1203FI−1.9830.285−6.9700.0003B−1.5450.250−6.1760.0004(Intercept)0.8250.7171.1500.2504Z−1.6440.933−1.7630.0784FI−1.2050.514−2.3450.0194B−1.4320.257−5.5740.000Slightly better results are obtained if Dunning model is used (see Table 9).Table 9:Prentice criterion 4 using Dunning model.VariableEstimateStd errorz Valuep-Value(Intercept)8.5283.4422.4780.013FI−2.6621.131−2.3530.019Z−0.6201.305−0.4750.635B−2.9780.963−3.0920.002logit(pi)−2.3860.370−6.4400.000In summary, there is suggestive though not strong evidence that the Fold-Increase is a Statistical Surrogate.Principal surrogate frameworkTable 10 shows the results from R package pseval with 50 bootstrap (R codes are provided in the Appendix). We can see that the interaction between the treatment group and FI(1) (test for wide effect modification) is borderline (p-value=0.053).Table 10:Principal surrogate Evaluation.EstimateBoot seLower CL 2.5%Upper CL 97.5%p-Value(Intercept)−7.811.146−10.35−6.1479.13−12FI(1)2.640.5731.793.8914.18−6Z2.902.846−3.666.5933.08−1FI(1):Z−3.982.053−8.11−0.1575.28−2Figure 4 shows the estimated VE curve for Fold-Increase. The estimated VE curve is an increasing function of FI(1), however we can see large variability for small values of FI(1) and negative VEs for vaccine recipients with no rise.Figure 4:Estimated vaccine efficacy curve across levels of vaccine-induced fold-increase from baseline to post-vaccination, with 95% confidence intervals (dashed lines).In summary, there is suggestive though not strong evidence that the Fold-Increase is a Principal Surrogate.DiscussionAlthough not common, vaccines with very high efficacy (95% or above) are documented in the literature (Black et al. 2000; Lin et al. 2001; Mitra et al. 2016; Phua et al. 2012; Prymula et al. 2014; Wei et al. 2016). These trials raise the problem of assessing CoPs in the context where small number of cases/infections in vaccinated groups are available.Callegaro and Tibaldi (2019) showed that the validation of a surrogate endpoint using the Prentice criteria and meta-analytic frameworks (by randomized subgroups in single trial setting) can be problematic in case of high VE. In this paper, we evaluate the performance of the causal framework, specifically the Principal Surrogate (PS) approach (Follmann 2006; Gilbert, Qin, and Self 2008) in case of high VE.First, we replicated the simulation study of Callegaro and Tibaldi (2019) where the clinical outcome was simulated using Prentice model 3 (assuming full mediation) and using the Dunning model (Dunning 2006). These simulation results show that i) adjustments for important covariates (such as baseline surrogate) considerably improves the power of the Prentice approach (even if the model is miss-specified) in case of high VE. Furthermore, these simulation results show similar power of Prentice and PS frameworks. The power of both approches decreases when VE grows.Second, we slightly changed the Callegaro and Tibaldi scenario to consider the case of constant biomarker under placebo and the case of small/moderate VE. Simulation results show that i) PS is more powerful than Prentice in case of constant biomarker when the inferential model is miss-specified, otherwise Prentice is more powerful; ii) Prentice criteria 4 and PS frameworks are powerful when the VE is small (see Table 3). However, in this case Prentice criteria 1 is not met, so the two approaches give different conclusions.Finally, we simulated correlated potential outcome data using a bivariate (random intercept) logistic regression. In this case the Prentice framework is more powerful than the PS approach. This can be due to the following reasons: (i) Prentice model 4 corresponds to the model used to generate the data and so there is no lack of fit in the Prentice framework; (ii) PS tests for an interaction, which is less powerful than a test for the main effect; (iii) the covariate S (observed surrogate in vaccinated and placebo) has greater range in the Prentice model 4 than the covariate S(1) in the PS model. It is easier to estimate a slope for a covariate with a bigger range (see Figure 1); (iv) Principal stratification has to impute S(1) for placebo participants which increases the variability of estimates relative to knowing S(1). In contrast S is known in all for the Prentice criterion.For computational reasons, we performed relatively small number of iterations (1,000). Larger number of iterations can be considered in the future using multiple processors. What is computationally intensive is the bootstrap of the PS approach. As an example, 200 re-sampling on the case study required 14 min. To mitigate the computational load, it may be useful in the future to derive asymptotic formulas approximating the bootstrap approach.It is important to highlight that the power comparison between the two approaches should be interpreted with care. In fact, the two approaches measure two different things: Prentice framework evaluates if the surrogate is a “statistical surrogate” while the PS evaluates if the surrogate is a “principal surrogate” (see Gilbert et al. (2015) for more details).For illustration, we analyzed one data-set simulated with full mediation (Dunning model 3) and with high VE (VÊ=96%$\hat{\text{VE}}=96\%$). Results showed suggestive thought not strong evidence that the FI is a Statistical Surrogate (Prentice criteria) or a PS. These results are due to the lack of power of these approaches in case of high VE. An interesting topic for future research is the implementation of the two approaches in a Bayesian framework with weakly informative priors (WIP). In fact, Callegaro and Tibaldi (2019) showed that WIP can considerably increase the power of the meta-analytical approach in case of high VE.In conclusion, we evaluated by simulation the impact of high VE on the PS approach. Similarly to the Prentice framework, we showed that the power decreases when the VE grows. It follows that it can be challenging to validate a principal surrogate (and a statistical surrogate) when rare infections are observed in the vaccinated groups. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Statistical Communications in Infectious Diseases de Gruyter

Principal surrogates in context of high vaccine efficacy

Loading next page...
 
/lp/de-gruyter/principal-surrogates-in-context-of-high-vaccine-efficacy-VrP1IpEC1l

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
de Gruyter
Copyright
© 2021 Andrea Callegaro et al., published by De Gruyter, Berlin/Boston
ISSN
1948-4690
eISSN
1948-4690
DOI
10.1515/scid-2020-0003
Publisher site
See Article on Publisher Site

Abstract

IntroductionAn important factor influencing the duration and complexity of clinical trials is the choice of the endpoint used to assess vaccine efficacy. It would be extremely convenient to replace a late, costly or rare true endpoint by an immunological surrogate, which is measured earlier, cheaper, or more frequently. However, from a regulatory perspective, a surrogate endpoint (called sometimes Surrogate of protection, or Correlate of Protection) is not considered acceptable for the determination of efficacy, unless it has been validated, i.e. shown to predict clinical benefit. Prentice (1989) introduced a formal definition of surrogacy based on the concept of mediation in a single-trial setting. Although appealing, Prentice’s definition and criteria received criticism, such as (i) the assumption that the surrogate explains 100% of the VE is too restrictive; (ii) the approach can be susceptible to post-randomization selection bias; (iii) the immune response cannot be constant in the control group; etc. (Burzykowski, Molenberghs, and Buyse 2005; Frangakis and Rubin 2002). In subsequent decades, many statistical methods have been proposed for the evaluation of surrogate endpoints, most of them framed within the causal inference (Follmann 2006; Frangakis and Rubin 2002; Gilbert, Qin, and Self 2008) and meta-analytic paradigms (Buyse et al. 2000; Daniels and Hughes 1997; Gail et al. 2000).Although not common, vaccines with very high efficacy are documented in the literature (Black et al. 2000; Lin et al. 2001; Mitra et al. 2016; Phua et al. 2012; Prymula et al. 2014; Wei et al. 2016). These include the salmonella typhi vi conjugate (Mitra et al. 2016), or the combined measles-mumps-rubella-varicella immunisation (Prymula et al. 2014). Assessing Correlate of Protections (CoPs) in the context of high Vaccine Efficacy (VE) using classical statistical methods is problematic. Indeed, a very small number of cases/infections (corresponding to the vaccinated groups) can trigger considerable issues for such statistical models. There is therefore a need to evaluate the statistical methods for CoP assessment to the context of high efficacy vaccines. Callegaro and Tibaldi (2019) showed that the validation of a surrogate endpoint using the Prentice criteria and meta-analytic frameworks (by randomized subgroups in single trial setting) can be problematic in case of high VE because of the rare events available in the vaccine group. The aim of this paper is to evaluate the performance of the causal framework, specifically the Principal Surrogate approach (Follmann 2006; Gilbert, Qin, and Self 2008) in case of high VE.MethodsThe Prentice criteriaThe following set of notation will be used throughout the manuscript: Yjand Sjare random variables denoting the observed clinical (binary) and the surrogate endpoint for subject j=1, …, n and Zjis a binary treatment indicator (Z=1 for treatment and Z=0 for control group).Key concepts, including the hypothesis-testing approach to the validation of substitute endpoints using randomised clinical trial data, were introduced by Prentice in 1989 (Prentice 1989). Prentice’s four criteria for the validation of a surrogate endpoint can be evaluated using the following 4 models:logit(P(Yi=1))=μT+βZi,Si=μS+αZi+εSi,logit(P(Yi=1))=μ+γSi,logit(P(Yi=1))=μ̃T+βSZi+γZSi.\begin{align*}\hfill \text{logit}\left(P\left({Y}_{i}=1\right)\right)& ={\mu }_{T}+\beta {Z}_{i},\hfill \\ \hfill {S}_{i}& ={\mu }_{S}+\alpha {Z}_{i}+{\varepsilon }_{{S}_{i}},\hfill \\ \hfill \text{logit}\left(P\left({Y}_{i}=1\right)\right)& =\mu +\gamma {S}_{i},\hfill \\ \hfill \text{logit}\left(P\left({Y}_{i}=1\right)\right)& ={\tilde {\mu }}_{T}+{\beta }_{S}{Z}_{i}+{\gamma }_{Z}{S}_{i}.\hfill \end{align*}In this paper we will mainly focus on criterion 4. This criterion is met if the null hypothesis H01: γZ=0 is rejected (p-value(s)<α${< }\alpha $) and the null hypothesis H02: βS=0 is not rejected (p-value(z)≥α$\ge \alpha $). Note that when this criterion is met (βS=0 and γZ≠0), then model 4 degenerates to model 3. The significance level α is not adjusted for multiplicity because the null hypothesis is the intersection of two null hypothesis.Principal surrogate frameworkMany causal inference approaches/methods have been published in the literature. In what follows, we describe the Vaccine Efficacy Framework of Follmann and Gilbert (Follmann 2006; Gilbert, Qin, and Self 2008). Since Sican be affected by treatment, there are 2 naturally occurring counterfactual values of Si: Si(1) under treatment, and Si(0) under control. The observed clinical endpoint (binary) is denoted by Yiand the counterfactual values are Yi(1) under treatment, and Yi(0) under control. Criteria for S to be a good surrogate are based on risk estimands that condition on the potential surrogate responses (Gilbert, Qin, and Self 2008)risk(1)(s(1),s(0))=Pr(Y(1)=1|S(1)=s(1),S(0)=s(0))risk(0)(s(1),s(0))=Pr(Y(0)=1|S(1)=s(1),S(0)=s(0))\begin{align*}\hfill {\text{risk}}_{\left(1\right)}\left(s\left(1\right),s\left(0\right)\right)& =\mathrm{Pr}\left(Y\left(1\right)=1\vert S\left(1\right)=s\left(1\right),S\left(0\right)=s\left(0\right)\right)\hfill \\ \hfill {\text{risk}}_{\left(0\right)}\left(s\left(1\right),s\left(0\right)\right)& =\mathrm{Pr}\left(Y\left(0\right)=1\vert S\left(1\right)=s\left(1\right),S\left(0\right)=s\left(0\right)\right)\hfill \end{align*}A contrast in risk(1)(s(1), s(0)) and risk(0)(s(1), s(0)) is a causal effect on the clinical endpoint. A classical contrast used in vaccines is the Vaccine Efficacy (VE)VE(s(1),s(0))=1−Pr(Y(1)=1|S(1)=s(1),S(0)=s(0))Pr(Y(0)=1|S(1)=s(1),S(0)=s(0))$$\text{VE}\left(s\left(1\right),s\left(0\right)\right)=1-\frac{\mathrm{Pr}\left(Y\left(1\right)=1\vert S\left(1\right)=s\left(1\right),S\left(0\right)=s\left(0\right)\right)}{\mathrm{Pr}\left(Y\left(0\right)=1\vert S\left(1\right)=s\left(1\right),S\left(0\right)=s\left(0\right)\right)}$$A Principal Surrogate (PS) is a biomarker satisfying two conditions: causal necessityVE(s(1),s(0))=0 for all s(1)=s(0)$$\text{VE}\left(s\left(1\right),s\left(0\right)\right)=0\quad \text{for\,all}\quad s\left(1\right)=s\left(0\right)$$and Wide Effect Modification (WEM) which means thatVE(s(1),s(0)) increasing in s(1)−s(0),$$\text{VE}\left(s\left(1\right),s\left(0\right)\right)\quad \mathrm{increasing\,in}\quad s\left(1\right)-s\left(0\right),$$WEM is similar in spirit to the Individual Causal Association (ICA) (Alonso et al. 2015), which is the correlation between the individual causal effect on the endpoint and on the surrogate.In this paper we only focus on WEM. In fact, current works (Gabriel and Follmann 2016; Gabriel and Gilbert 2014; Gilbert, Qin, and Self 2008; Huang and Gilbert 2011; Wolfson and Gilbert 2010) suggest that WEM criterion is of primary importance for a biomarker to be a PS. Furthermore, Alonso et al. (2015) showed that the average causal necessity definition may be extremely restrictive.Estimating VEAssumptions A1–A3 (A1: Stable unit treatment value assumption; A2: Ignorable treatment assignments; A3: Equal individual clinical risk up to the time of surrogate measurements) imply that risk(Z)(s(1), s(0)) would be identified if we knew the potential outcomes Si(Z) of subjects assigned the opposite treatment 1 − Z (Wolfson and Gilbert 2010)risk(1)(s(1),s(0))=Pr(Y=1|Z=1,S(1)=s(1),S(0)=s(0))risk(0)(s(1),s(0))=Pr(Y=1|Z=0,S(1)=s(1),S(0)=s(0)).\begin{align*}\hfill {\text{risk}}_{\left(1\right)}\left(s\left(1\right),s\left(0\right)\right)& =\mathrm{Pr}\left(Y=1\vert Z=1,S\left(1\right)=s\left(1\right),S\left(0\right)=s\left(0\right)\right)\hfill \\ \hfill {\text{risk}}_{\left(0\right)}\left(s\left(1\right),s\left(0\right)\right)& =\mathrm{Pr}\left(Y=1\vert Z=0,S\left(1\right)=s\left(1\right),S\left(0\right)=s\left(0\right)\right).\hfill \end{align*}It follows that it is necessary to impute (or integrate out) the missing potential biomarkers. The risk can be modeled using the following logistic modellogit(P(Yi=1|zi,s1i,s0i))=β0+βzzi+βs(1)s1i+βs(1)zs1izi+βs(0)s0i+βs(0)zs0izi.$$\text{logit}\left(P\left({Y}_{i}=1\vert {z}_{i},{s}_{1i},{s}_{0i}\right)\right)={\beta }_{0}+{\beta }_{z}{z}_{i}+{\beta }_{s\left(1\right)}{s}_{1i}+{\beta }_{s\left(1\right)z}{s}_{1i}{z}_{i}+{\beta }_{s\left(0\right)}{s}_{0i}+{\beta }_{s\left(0\right)z}{s}_{0i}{z}_{i}.$$The model can be simplified in case of a Constant Biomarker (Si(0) = c)(1)logit(P(Yi=1|zi,s1i))=β0+βzzi+βs(1)s1i+βs(1)zs1izi,$$\text{logit}\left(P\left({Y}_{i}=1\vert {z}_{i},{s}_{1i}\right)\right)={\beta }_{0}+{\beta }_{z}{z}_{i}+{\beta }_{s\left(1\right)}{s}_{1i}+{\beta }_{s\left(1\right)z}{s}_{1i}{z}_{i},$$where the VE curve is usedVE(s(1))=1−Pr(Y=1|Z=1,S(1)=s(1))Pr(Y=1|Z=0,S(1)=s(1)).$$\text{VE}\left(s\left(1\right)\right)=1-\frac{\mathrm{Pr}\left(Y=1\vert Z=1,S\left(1\right)=s\left(1\right)\right)}{\mathrm{Pr}\left(Y=1\vert Z=0,S\left(1\right)=s\left(1\right)\right)}.$$The constant biomarker assumption is reasonable when subjects have been selected to have no meaningful exposure to the pathogen, so that S(0) = 0. Examples include HIV (Follmann 2006) or varicella vaccine trials (Chan et al. 2002). This assumption is also reasonable for populations exposed to the pathogen when the biomarker Siis the log10 Fold-Increase from baseline (FIi), which is the difference between the log10 post (Ai) and the log10 baseline (Bi) values (FIi=Ai− Bi).Missing values imputation/integrationThe key challenge in estimating these risk estimands is solving the problem of conditioning on counterfactual values that are not observable. This involves integrating out (or imputing) missing values based on some models, and under some set of assumptions and/or trial augmentations. Gilbert, Qin, and Self (2008) and Follmann (2006) proposed to use the estimated maximum likelihood followed by bootstrap. Huang, Gilbert, and Wolfson (2013) suggested a pseudoscore estimation procedure that does have a closed form variance estimator. Miao et al. (2013) used a multiple imputation approach. In this paper we fit model 1 using the method implemented in the R package pseval (Sachs and Gabriel 2016): Baseline Immunogenicity Predictor (BIP); parameters estimated using estimated maximum likelihood (missing information is integrated out) and the variance is estimated by bootstrap. Rcode is provided in the Appendix. This approach is similar in spirit to the method used in Follmann (2006).ResultsSimulations of Callegaro and Tibaldi (2019)To evaluate the impact of high vaccine efficacy on the PS validation, we repeated the simulations of Callegaro and Tibaldi (2019). The Dunning regression model (Dunning 2006) was used to simulate the data in an ideal CoP setting, where the treatment effect is fully explained by the post values (Ai) as follows:(2)P(Yi=1|π,Ai)=πeμ+γAi1+eμ+γAi.$$P\left({Y}_{i}=1\vert \pi ,{A}_{i}\right)=\pi \frac{{\text{e}}^{\mu +\gamma {A}_{i}}}{1+{\text{e}}^{\mu +\gamma {A}_{i}}}.$$Here, π can be interpreted as the probability of being exposed to the disease. This model corresponds to the classical logistic model when all subjects are exposed (π=1).Simulations were run using the following parameter assumptions: total sample size n=5,000, 1:1 randomization, π=0.1, μ=8.3, γ=log(1–0.95); the immune response post vaccination is normally distributed A|Z=0 ∼ N(3, 0.2) in the placebo group and A|Z=1 ∼ N(3 + Δ, 0.2) in the vaccine group, where Δ=0.33, 0.75, 1, 1.5. The value of the immune response at baseline is generated as B ∼ N(3, 0.2) with correlation between A and B of 0.90 in the placebo group and 0.50 in the vaccine group (0.2 is the variance of the normal distribution). For each scenario, 1,000 clinical trials were simulated.We fit Prentice model 4 on the simulated data with Fold-Increase (Si=FIi=Ai− Bi) as surrogate adjusting for the baseline (Bi) using logit regressionlogit(P(Yi=1))=μ̃T+βSZi+γZFIi+γBBi$$\text{logit}\left(P\left({Y}_{i}=1\right)\right)={\tilde {\mu }}_{T}+{\beta }_{S}{Z}_{i}+{\gamma }_{Z}{\text{FI}}_{i}+{\gamma }_{B}{B}_{i}$$and the scaled logistic model Dunning (2006)P(Yi=1)=πeμ̃T+βSZi+γZFIi+γBBi1+eμ̃T+βSZi+γZFIi+γBBi.$$P\left({Y}_{i}=1\right)=\pi \frac{{\text{e}}^{{\tilde {\mu }}_{T}+{\beta }_{S}{Z}_{i}+{\gamma }_{Z}{\text{FI}}_{i}+{\gamma }_{B}{B}_{i}}}{1+{\text{e}}^{{\tilde {\mu }}_{T}+{\beta }_{S}{Z}_{i}+{\gamma }_{Z}{\text{FI}}_{i}+{\gamma }_{B}{B}_{i}}}.$$Note that this model is consistent with the model used to generate the data (Eq. (2)), with a slightly different parametrization. The power to meet Prentice criterion 4 (PC4) was measured as the proportion of simulated trials with p-value(s) =2Φ(−|γẐ/Var(γẐ)|)<α$=2{\Phi}\left(-\vert \hat{{\gamma }_{Z}}/\sqrt{\mathrm{Var}\left(\hat{{\gamma }_{Z}}\right)}\vert \right){< }\alpha $and p-value(z) =2Φ(−|βŜ/Var(βŜ)|)≥α$=2{\Phi}\left(-\vert \hat{{\beta }_{S}}/\sqrt{\mathrm{Var}\left(\hat{{\beta }_{S}}\right)}\vert \right)\ge \alpha $, α=0.05.Furthermore, we applied the Principal surrogate approach on vaccine induced fold-increase (S(1)i=FI(1)i) where missing information is integrated out using the baseline surrogate measurement (Bi). The power of the WEM approach was measured as the proportion of simulated trials with significant Wald statistics for the s(1)z coefficent of model (1) (pvalue(s(1)z)<α, α=0.05). Appendix contains the R code used to apply the PS approach is provided.Table 1 shows that the power of both PC4 and WEM decreases when the VE increases. This is due to the fact that there is less information (number of events) as the VE increases. Note that the power of the Prentice approach is higher than in Callegaro and Tibaldi (2019) because of the inclusion of the baseline surrogate as covariate. Simulation results suggest similar power for PC4 and WEM approaches.Table 1:Simulation results of data generated using scaled logit Prentice model 3.ΔVÊ$\hat{\text{VE}}$PC4 logisticPC4 scaled logisticWEM0.330.410.940.960.920.750.750.930.960.921.000.870.890.950.901.500.960.800.880.73Power (α=0.05) to assess Prentice criterion 4 (PC4) using logistic and scaled logistic model and power to assess the Wide Effect Modification (WEM) of a Principal Surrogate using the logistic model (p-value of the interaction s(1)z).The performance of the two approaches depends on the correlation between A and B. In fact, larger is the correlation, more informative is the covariate B. To assess the role of the correlation on the results, we replicated Table 1 with smaller correlation between A and B (Cor(A,B)=0.5 in the placebo and in the vaccine group). Simulation results are shown in Table 2. We can see that when the correlation is smaller (i.e. when the covariate B is less informative) there is a greater loss of power for high VE for both approaches, especially for the PS approach. These results are aligned with the simulation results of Callegaro and Tibaldi (2019), showing a similar loss of power of Prentice method without covariates.Table 2:Simulation results of data generated using scaled logit Prentice model 3 with smaller correlation between A and B (cor(A,B)=0.5 in placebo and in the vaccine group).ΔVÊ$\hat{\text{VE}}$PC4 logisticPC4 scaled logisticWEM0.330.410.940.960.980.750.750.890.960.971.000.860.790.950.921.500.960.690.950.62Power (α=0.05) to assess Prentice criterion 4 (PC4) using logistic and scaled logistic model and power to assess the Wide Effect Modification (WEM) of a Principal Surrogate using the logistic model (p-value of the interaction s(1)z).Simulations with constant biomarker under placeboIn the previous simulations the Fold-Increase was not constant in placebo (it was normally distributed). To evaluate the performance of the Prentice and PS approach in case of constant biomarker under placebo, which mimics vaccine trials in a naive population, we simulated data using the model described above. However, in the inferential models, we replaced FI by FI* which is constant in Placebo. FI* is defined asFI*=FI ifFI>cc ifFI≤c$${\text{FI}}^{{\ast}}=\begin{cases}\text{FI}\quad \hfill & \text{if} \text{FI}{ >}c\hfill \\ c\quad \hfill & \text{if} \text{FI}\le c\hfill \end{cases}$$where c is the 99% quantile of the distribution of FI in Placebo.Table 3 shows some loss of power of the PS approach when the VE increases. Even if the use of the Prentice framework is not justified in this context, Table 3 shows the results of the Prentice criteria 4 (PC4 logistic model). Results from PC4 scaled logistic are not shown because the model is not converging. We observe a dramatic loss of power of Prentice criterion 4 when the VE is high.Table 3:Simulation results with constant biomarker (inferential models do not agree with the data generating mechanism).ΔVÊ$\hat{\text{VE}}$PC4 logisticWEM0.330.410.800.670.750.750.450.851.000.870.490.841.500.960.370.69Power (α=0.05) to assess Prentice criterion 4 (PC4) using logistic model and power to assess the Wide Effect Modification (WEM) of a Principal Surrogate using the logistic model (p-value of the interaction s(1)z).Note that Table 3 shows simulation results where the inferential models do not agree with the data generating mechanism, so it represents a situation of model miss-specification.To disentangle the problem of model miss-specification from the constant biomarker problem, we generate additional constant biomarker data using a model consistent with the “inferential” model used to fit the data. We simulated data using the following Dunning regression model:P(Yi=1|π,FIi*,Bi)=πeμ+γFIi*+γBBi1+eμ+γFIi*+γBBi.$$P\left({Y}_{i}=1\vert \pi ,{\text{FI}}_{i}^{{\ast}},{B}_{i}\right)=\pi \frac{{\text{e}}^{\mu +\gamma {\text{FI}}_{i}^{{\ast}}+{\gamma }_{B}{B}_{i}}}{1+{\text{e}}^{\mu +\gamma {\text{FI}}_{i}^{{\ast}}+{\gamma }_{B}{B}_{i}}}.$$Here, π = 0.1 and the other parameters are chosen to mimic Table 1 data: Δ=0.33, 0.75, 1, 1.47, μ=(8.66, 9.45, 9.82, 9.41), γ=(−5.39, − 5.15, − 4.8, − 4.45) and γB=(−2.31, − 2.63, − 2.79, − 2.66).Table 4 shows that the loss of power of Prentice approach shown in Table 3 was mainly due to model miss-specification. In fact, Table 4 shows a relatively higher power of PC4 logistics than Table 3 when VE is large.Table 4:Simulation results with constant biomarker (inferential models agree with the data generating mechanism).ΔVÊ$\hat{\text{VE}}$PC4 logisticWEM0.330.290.960.730.750.620.940.851.000.780.960.841.470.940.790.80Power (α=0.05) to assess Prentice criterion 4 (PC4) using logistic model and power to assess the Wide Effect Modification (WEM) of a Principal Surrogate using the logistic model (p-value of the interaction s(1)z).Simulations with low/moderate VEFor comparison, we considered simulations with low VE. We simulated data as described above with μ1=E(A|Z=1)=3, 3.075, 3.15, 3.23, corresponding to estimated VE about 0%, 10%, 20% and 30%, respectively. Note that Prentice criteria 1 will not be met in this situation. For simplicity, we focused only on Prentice criterion 4. Table 5 shows that both approaches (PC4 and WEM) are powerful in the case of low/moderate VE. Prentice criterion 4 seems to be slightly more powerful than PS.Table 5:Simulation results with small/moderate VE.ΔVÊ$\hat{\text{VE}}$PC4 logisticPC4 scaled logisticWEM0.000−0.010.950.960.920.0750.090.950.960.910.1500.200.950.970.910.2500.310.950.970.93Power (α=0.05) to assess Prentice criterion 4 (PC4) using logistic and scaled logistic model and power to assess the Wide Effect Modification (WEM) of a Principal Surrogate using the logistic model (p-value of the interaction s(1)z).Simulations using random intercept logistic (correlated potential outcomes)Finally, we generated data in a different way more aligned with the causal inference setting (potential outcomes). We generated correlated post-vaccination values (A(0), A(1)) using a bivariate normal distributionA(0)A(1)∼N33+Δ,0.20.10.10.2$$\left(\begin{matrix}\hfill A\left(0\right)\hfill \\ \hfill A\left(1\right)\hfill \end{matrix}\right)\sim N\left[\left(\begin{matrix}\hfill 3\hfill \\ \hfill 3+{\Delta}\hfill \end{matrix}\right),\left(\begin{matrix}\hfill 0.2\hfill & \hfill 0.1\hfill \\ \hfill 0.1\hfill & \hfill 0.2\hfill \end{matrix}\right)\right]$$with Δ=(0.33, 0.75, 1.1, 1.6). The mean and the variance of the baseline are the same as the post-dose surrogate in Placebo. The correlation between baseline and post is 90% in Placebo and 50% in Vaccinated, respectively. We generated the correlated clinical outcomes using a logistic model with individual random intercept (bi)logit(P(Y(z)i=1|A(z)i,bi))=μ+A(z)iγ+bi.$$\text{logit}\left(P\left(Y{\left(z\right)}_{i}=1\vert A{\left(z\right)}_{i},{b}_{i}\right)\right)=\mu +A{\left(z\right)}_{i}\gamma +{b}_{i}.$$The variables Y(0), Y(1) are conditionally independent given b but unconditionally (averaged over b) correlated. The extent of correlation depends on the variance of the random effect (var(b)).We generated bridge distributed random intercept (using R package bridgedist Swihart (2016)) such that the resultant marginal distribution follows a logistic regression model Wang and Louis (2003). In fact, the marginal logistic regression model is logit(P(Y(z)=1|A(z)))=μ/c + A(z)γ/c for z=0, 1 with c=1+3var(b)/π2$c=\sqrt{1+3\mathrm{var}\left(b\right)/{\pi }^{2}}$. We simulated data with the following parameters: var(b)=10 (scale=0.5), μ=3.6 and γ=−3.8. In this way, p0=P(Y=1|Z=0)=0.05 and the estimated VE is about 0.45, 0.75, 0.85 and 0.95.Table 6 shows that Prentice criterion 4 is more powerful than WEM.Table 6:Simulation results with random intercept (bridge) logistic regression.ΔVÊ$\hat{\text{VE}}$PC4 logisticWEM0.330.460.950.850.750.750.940.841.100.870.910.771.600.950.890.74Power (α=0.05) to assess Prentice criterion 4 (PC4) using logistic model and power to assess the Wide Effect Modification (WEM) of a Principal Surrogate using the logistic model (p-value of the interaction s(1)z).The Prentice framework is more powerful than PS for different reasons: (i) PS tests for an interaction, which is less powerful than a test for the main effect; (ii) the covariate S (observed surrogate in vaccinated and placebo) has greater range in the Prentice model 4 than the covariate S(1) in the PS model. It is easier to estimate a slope for a covariate with a bigger range. Figure 1 illustrate these differences.Figure 1:Simulated trial with n=5,000 per arm under the scenario with Δ=1.6. The true probability of infection is graphed as a function of observed As and A(1)s respectively. The top panel is the data used for the Prentice criteria while the bottom panel is used to test WEM. Red denotes the placebo group while blue denotes the vaccine group. The events are shown at the top and the non-events at the bottom of the graph.Case study: analysis of a simulated data-set with large VEIn this section we analyze one simulated dataset from the scenario with largest VE of Table 1. The sample size is n=5,000, with 1:1 randomization. The number of events observed in the two groups are 3 and 90, with an estimated VE of 96% (95%CI, 89–98%). Figure 2 shows that the vaccine and placebo groups had similar log10 titer distributions at baseline while there is a small overlap in distributions post vaccination. Antibody responses clearly increased from baseline to post-dose in vaccine recipients but not in placebo recipients.Figure 2:Distribution of the surrogate endpoint: baseline, post and fold-increase (post-baseline).Figure 3 shows the Spearman correlation between baseline and post (left panel) and between baseline and Fold-Increase (right panel).Figure 3:Correlation between baseline and post (left panel) and between baseline and fold-increase (right panel).Prentice frameworkFirst we examine the interaction between surrogate and the treatment. Table 7 shows that there is no interaction (p-value=0.49).Table 7:Logistic model with interaction between treatment group and surrogate.EstimateStd. errorz Valuep-Value(Intercept)0.8330.7171.1610.245Z−0.5761.696−0.3390.734FI−1.0600.555−1.9090.056B−1.4340.257−5.5830.000group:FI−0.9661.401−0.6900.490Secondly, we assess the four Prentice criteria. Table 8 shows that all criteria are met. In particular, the last 4 rows shows the results related to criterion 4. We can see that the effect of the surrogate is significant (p-value(s)=0.019), while the treatment effect is not significant, but is close to 5% (p-value(z) = 0.078).Table 8:Prentice criteria: logistic and linear models.CriterionVariableEstimateStd errorz Valuep-Value1(Intercept)0.4330.6980.6200.5351Z−3.4480.588−5.8650.0001B−1.2930.249−5.1960.0002(Intercept)0.9560.03130.7340.0002Z1.4720.009163.6530.0002B−0.3170.010−31.1660.0003(Intercept)1.0920.7021.5560.1203FI−1.9830.285−6.9700.0003B−1.5450.250−6.1760.0004(Intercept)0.8250.7171.1500.2504Z−1.6440.933−1.7630.0784FI−1.2050.514−2.3450.0194B−1.4320.257−5.5740.000Slightly better results are obtained if Dunning model is used (see Table 9).Table 9:Prentice criterion 4 using Dunning model.VariableEstimateStd errorz Valuep-Value(Intercept)8.5283.4422.4780.013FI−2.6621.131−2.3530.019Z−0.6201.305−0.4750.635B−2.9780.963−3.0920.002logit(pi)−2.3860.370−6.4400.000In summary, there is suggestive though not strong evidence that the Fold-Increase is a Statistical Surrogate.Principal surrogate frameworkTable 10 shows the results from R package pseval with 50 bootstrap (R codes are provided in the Appendix). We can see that the interaction between the treatment group and FI(1) (test for wide effect modification) is borderline (p-value=0.053).Table 10:Principal surrogate Evaluation.EstimateBoot seLower CL 2.5%Upper CL 97.5%p-Value(Intercept)−7.811.146−10.35−6.1479.13−12FI(1)2.640.5731.793.8914.18−6Z2.902.846−3.666.5933.08−1FI(1):Z−3.982.053−8.11−0.1575.28−2Figure 4 shows the estimated VE curve for Fold-Increase. The estimated VE curve is an increasing function of FI(1), however we can see large variability for small values of FI(1) and negative VEs for vaccine recipients with no rise.Figure 4:Estimated vaccine efficacy curve across levels of vaccine-induced fold-increase from baseline to post-vaccination, with 95% confidence intervals (dashed lines).In summary, there is suggestive though not strong evidence that the Fold-Increase is a Principal Surrogate.DiscussionAlthough not common, vaccines with very high efficacy (95% or above) are documented in the literature (Black et al. 2000; Lin et al. 2001; Mitra et al. 2016; Phua et al. 2012; Prymula et al. 2014; Wei et al. 2016). These trials raise the problem of assessing CoPs in the context where small number of cases/infections in vaccinated groups are available.Callegaro and Tibaldi (2019) showed that the validation of a surrogate endpoint using the Prentice criteria and meta-analytic frameworks (by randomized subgroups in single trial setting) can be problematic in case of high VE. In this paper, we evaluate the performance of the causal framework, specifically the Principal Surrogate (PS) approach (Follmann 2006; Gilbert, Qin, and Self 2008) in case of high VE.First, we replicated the simulation study of Callegaro and Tibaldi (2019) where the clinical outcome was simulated using Prentice model 3 (assuming full mediation) and using the Dunning model (Dunning 2006). These simulation results show that i) adjustments for important covariates (such as baseline surrogate) considerably improves the power of the Prentice approach (even if the model is miss-specified) in case of high VE. Furthermore, these simulation results show similar power of Prentice and PS frameworks. The power of both approches decreases when VE grows.Second, we slightly changed the Callegaro and Tibaldi scenario to consider the case of constant biomarker under placebo and the case of small/moderate VE. Simulation results show that i) PS is more powerful than Prentice in case of constant biomarker when the inferential model is miss-specified, otherwise Prentice is more powerful; ii) Prentice criteria 4 and PS frameworks are powerful when the VE is small (see Table 3). However, in this case Prentice criteria 1 is not met, so the two approaches give different conclusions.Finally, we simulated correlated potential outcome data using a bivariate (random intercept) logistic regression. In this case the Prentice framework is more powerful than the PS approach. This can be due to the following reasons: (i) Prentice model 4 corresponds to the model used to generate the data and so there is no lack of fit in the Prentice framework; (ii) PS tests for an interaction, which is less powerful than a test for the main effect; (iii) the covariate S (observed surrogate in vaccinated and placebo) has greater range in the Prentice model 4 than the covariate S(1) in the PS model. It is easier to estimate a slope for a covariate with a bigger range (see Figure 1); (iv) Principal stratification has to impute S(1) for placebo participants which increases the variability of estimates relative to knowing S(1). In contrast S is known in all for the Prentice criterion.For computational reasons, we performed relatively small number of iterations (1,000). Larger number of iterations can be considered in the future using multiple processors. What is computationally intensive is the bootstrap of the PS approach. As an example, 200 re-sampling on the case study required 14 min. To mitigate the computational load, it may be useful in the future to derive asymptotic formulas approximating the bootstrap approach.It is important to highlight that the power comparison between the two approaches should be interpreted with care. In fact, the two approaches measure two different things: Prentice framework evaluates if the surrogate is a “statistical surrogate” while the PS evaluates if the surrogate is a “principal surrogate” (see Gilbert et al. (2015) for more details).For illustration, we analyzed one data-set simulated with full mediation (Dunning model 3) and with high VE (VÊ=96%$\hat{\text{VE}}=96\%$). Results showed suggestive thought not strong evidence that the FI is a Statistical Surrogate (Prentice criteria) or a PS. These results are due to the lack of power of these approaches in case of high VE. An interesting topic for future research is the implementation of the two approaches in a Bayesian framework with weakly informative priors (WIP). In fact, Callegaro and Tibaldi (2019) showed that WIP can considerably increase the power of the meta-analytical approach in case of high VE.In conclusion, we evaluated by simulation the impact of high VE on the PS approach. Similarly to the Prentice framework, we showed that the power decreases when the VE grows. It follows that it can be challenging to validate a principal surrogate (and a statistical surrogate) when rare infections are observed in the vaccinated groups.

Journal

Statistical Communications in Infectious Diseasesde Gruyter

Published: Jan 1, 2021

Keywords: causal inference; high vaccine efficacy; principal surrogate; surrogate endpoint; vaccine clinical trial

There are no references for this article.