Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Item Response Theory Modeling of the International Prostate Symptom Score in Patients with Lower Urinary Tract Symptoms Associated with Benign Prostatic Hyperplasia

Item Response Theory Modeling of the International Prostate Symptom Score in Patients with Lower... The AAPS Journal (2020) 22:115 DOI: 10.1208/s12248-020-00500-w Research Article Item Response Theory Modeling of the International Prostate Symptom Score in Patients with Lower Urinary Tract Symptoms Associated with Benign Prostatic Hyperplasia 1,2,3,4 1 2 Yassine Kamal Lyauk, Daniël M. Jonker, Trine Meldgaard Lund, 3 3 Andrew C. Hooker, and Mats O. Karlsson Received 29 June 2020; accepted 12 August 2020 Abstract. Item response theory (IRT) was used to characterize the time course of lower urinary tract symptoms due to benign prostatic hyperplasia (BPH-LUTS) measured by item- level International Prostate Symptom Scores (IPSS). The Fisher information content of IPSS items was determined and the power to detect a drug effect using the IRT approach was examined. Data from 403 patients with moderate-to-severe BPH-LUTS in a placebo- controlled phase II trial studying the effect of degarelix over 6 months were used for modeling. Three pharmacometric models were developed: a model for total IPSS, a unidimensional IRT model, and a bidimensional IRT model, the latter separating voiding and storage items. The population-level time course of BPH-LUTS in all models was described by initial improvement followed by worsening. In the unidimensional IRT model, the combined information content of IPSS voiding items represented 72% of the total information content, indicating that the voiding subscore may be more sensitive to changes in BPH-LUTS compared with the storage subscore. The pharmacometric models showed considerably higher power to detect a drug effect compared with a cross-sectional and while- on-treatment analysis of covariance, respectively. Compared with the sample size required to detect a drug effect at 80% power with the total IPSS model, a reduction of 5.9% and 11.7% was obtained with the unidimensional and bidimensional IPSS IRT model, respectively. Pharmacometric IRT analysis of the IPSS within BPH-LUTS may increase the precision and efficiency of treatment effect assessment, albeit to a more limited extent compared with applications in other therapeutic areas. KEY WORDS: item response theory; BPH; LUTS; International Prostate Symptom Score; pharmacometrics. INTRODUCTION urinary tract symptoms (LUTS) and are characterized by an increased: sensation of incomplete emptying of the bladder Benign prostate hyperplasia (BPH) is a common following urination, urination frequency, urination intermit- condition in the aging male and is estimated to affect 50% tency, urgency to urinate, weakness of the urinary stream, of males by age 60 years and 90% by age 85 years (1,2). straining to start urination, and nocturia. LUTS are associ- The clinical manifestations of BPH are known as lower ated with adverse health effects such as significantly diminished quality of life and depression, as well as impairment in activities of daily living (3–5). In approxi- Electronic supplementary material The online version of this article mately 10% of patients, the condition may lead to severe (https://doi.org/10.1208/s12248-020-00500-w) contains supplementary complications such as acute urinary retention, urosepsis, and material, which is available to authorized users. kidney failure (2,6). The severity of BPH-LUTS is com- Translational Medicine, Ferring Pharmaceuticals A/S, Kay Fiskers monly measured by the International Prostate Symptom Plads 11, 2300, Copenhagen, Denmark. Score (IPSS) (also known as the American Urological Department of Drug Design and Pharmacology, University of Association score) (7), which consists of seven questions Copenhagen, Copenhagen, Denmark. describing the severity of each of the clinical manifestations Department of Pharmaceutical Biosciences, Uppsala University, of LUTS. The IPSS questionnaire is considered the gold Uppsala, Sweden. standard measure for assessing BPH-LUTS, and its use is To whom correspondence should be addressed. (e–mail: widespread in the clinic, as a primary or secondary endpoint yassinekamallyauk@gmail.com; ysl@ferring.com; in clinical trials, and in urology research (8). yassine.lyauk@sund.ku.dk; yassine.lyauk@farmbio.uu.se) 1550-7416/20/0000-0001/0 2020 The Author(s) 115 Page 2 of 15 The AAPS Journal (2020) 22:115 Pairwise cross-sectional testing based on the summary Ferring Pharmaceuticals’ A/S trial CS36 (NCT00947882) was score mean change from baseline is the traditional pre- a phase II, double-blind, parallel-group, dose-finding study evalu- specified analysis for clinical trials using scale measures as the ating the efficacy and safety of degarelix over 6 months. Following a primary efficacy endpoint. However, analysis of clinical trial wash-out period, 403 patients were randomized to a single data through longitudinal pharmacometric modeling has been subcutaneous injection of 10, 20, or 30 mg degarelix 40 mg/mL shown to increase the power to detect a drug effect compared solution, or placebo and were required to have an IPSS ≥ 13 at with pairwise testing (9–11). Furthermore, an extension of screening 2 weeks prior to dosing at the baseline visit. The primary longitudinal pharmacometric modeling specific to multiple- endpoint was the mean change from baseline in IPSS compared item questionnaire data (9), which utilizes concepts derived with placebo 3 months after dosing. Visits were planned at 2 weeks, from item response theory (IRT), has identified the potential for and 1, 2, 3, 4, 5, and 6 months after dosing. Rich pharmacokinetic increased assessment precision in several therapeutic areas sampling (n = 15) was performed in 43 patients while sparse (n=2) (namely, Alzheimer’s disease, Parkinson’s disease, multiple pharmacokinetic sampling was performed in 240 patients. An sclerosis, and depression) (9,12–14). Moreover, the methodol- interim trial analysis was planned for 6 months post-dosing in order ogy has shown an increase in the power to detect a drug effect to stop the trial early if the primary endpoint was not met. Trial compared with longitudinal pharmacometric analysis of sum- CS36 was conducted in accordance with the Declaration of mary score data (9,15). Briefly, IRT quantifies the relationship Helsinki and Good Clinical Practice. between an individual’s intrinsic trait (e.g., disability) and the probability of answering a questionnaire (e.g., IPSS) in a Item Response Theory Modeling particular way (16,17). By preserving the information contained within responses to individual items, it is possible to estimate an The score for each of the seven IPSS items may range individual’s latent disability, how well items discriminate be- from zero to five. The relationship between disability and the tween individuals with differing estimates of latent disability, probability (P) of a patient answering a score of at least k was and the location of item responses along the disability scale. therefore modeled through a graded response model (19): The GnRH receptor antagonist, degarelix, approved for the treatment of advanced prostate cancer (Firmagon®), was a ψ −b jðÞ i j:k investigated as an alternative medical approach for the treat- PY ≥k ¼ ij a ψ −b jðÞ i j:k ment of moderate-to-severe BPH-LUTS in patients without 1 þ e prostate cancer. Due to its depot formation upon administration, functioning as a slow-release formulation, treatment with where Y represents the score of patient i on item j, a the degarelix was envisioned to achieve greater compliance and ij j slope/discrimination parameter of item j, ψ the unobserved effectiveness compared with currently approved treatments i disability of patient i, and b the difficulty parameter of item j. requiring daily administration. The degarelix doses tested within j Cumulative probabilities for an item with a score of maximum BPH-LUTS were substantially lower than the approved doses 5 were modeled as follows: used for treating prostate cancer (a loading dose of 240 mg followed by maintenance doses of 80 mg) to avoid eliciting PY ¼ 0 ¼ 1−PY ≥1 ij ij prolonged testosterone suppression in patients. PY ¼ k ¼ PY ≥k −PY ≥k þ 1 ij ij ij To date, only one publication describes longitudinal model– PY ¼ 5 ¼ PY ≥5 ij ij based analysis of the total IPSS (18) and, moreover, longitudinal pharmacometric IRT modeling has not been applied to the analysis of the IPSS within BPH-LUTS. Using data from 403 patients in a Item characteristic curves (ICCs) were estimated as fixed phase II trial investigating the treatment of moderate-to-severe effects by treating IPSS measurements from each patient’s BPH-LUTS with degarelix over 6 months, we set out to (i) study visit as originating from a separate individual (in this characterize the internal characteristics of the IPSS through IRT work referred to as the IDVIS approach). Disability was analysis of the item-level data, (ii) utilize the obtained IRT estimated as a random effect, and its distribution was fixed to information to develop pharmacometric IRT models describing a standard normal distribution (mean 0 and variance 1) at the time course of underlying BPH-LUTS, and (iii) examine the baseline. Post-baseline shift parameters were included to power to detect a drug effect of pharmacometric IRT IPSS allow for a different mean and variance of disability post- modeling compared with cross-sectional testing and longitudinal baseline (where disability is likely to have changed compared modeling, respectively, based on total IPSS. with baseline due to placebo and/or drug effects). A similar ICC estimation approach has been reported previously in the METHODS literature (13,14,20,21). Factor analysis (FA) is an established statistical method Data (22) for assessing item patterns and informing the item structure of IRT models (23). The procedure is aimed at The IPSS is a seven-item questionnaire, where each item explaining the interrelationship between many observed can be scored from 0 to 5, yielding a composite IPSS ranging variables by way of few latent variables and is based on from zero to 35. Item scores reflect symptom frequency (not analysis of the between-item correlation matrix. It may be at all, less than 1 in five times, less than half the time, about used to identify the number of questionnaire domains and half the time, more than half the time, and almost always) identify which items correspond to each of these (exploratory except for the nocturia item, where they correspond to FA) or to investigate the item patterns with a pre-specified categorized counts (0 to ≥ 5 awakenings). number of factors (confirmatory FA). Lastly, it may also The AAPS Journal (2020) 22:115 115 Page 3 of 15 inform whether the assumption of only one general dimen- performed in the context of unidimensional IRT modeling. sion for all items is supported (24). In the current work, a This allows for an overall perspective across all IPSS items unidimensional IRT model was first fit to the CS36 data, and while in the multidimensional IRT framework, it is only the adequacy of the unidimensionality assumption was feasible within each separate dimension. assessed based on the item factor loadings. The latter indicate an item’s correlation with the factor, where higher absolute Structural Longitudinal Modeling values suggest closer association. Following development of the unidimensional IRT model, confirmatory FA with two For underlying disability in the context of IRT as well dimensions (a minimum of three items per dimension is as observed total IPSS, a similar approach to longitudinal needed to preserve model identification) and varimax orthog- model development was undertaken. First, data from onal rotation (25) was used to inform the item structure of a patients randomized to the placebo group were modeled. bidimensional IRT model. In the developed IRT ICC models, Here, different structural models were tested to best residual correlation between items was also assessed and was describe the time course of the placebo effect, such as calculated as follows: linear, bi-linear, power, exponential, Weibull, Gompertz, RES ¼ DV −E and inverse Bateman models. The addition of a linear ij ij ij E ¼ PðÞ 1 1 þ PðÞ 2 2 þ PðÞ 3 3 þ PðÞ 4 4 þ PðÞ 5 5 drift parameter (27) to describe worsening or continued ij ∗ ∗ ∗ ∗ ∗ improvement was tested for all abovementioned models. Subsequently, data from patients assigned to degarelix with DV being the observed score from the ith individual for ij treatment were added to the data set to describe the drug the jth IPSS item and E being the corresponding weighted ij effect. In this step, we investigated models describing prediction based on the IRT-derived ICCs and individual degarelix treatment effects as present or absent, indepen- disability estimates. dent of the administered dose, as well as dose-response models (linear and Emax). An offset treatment effect, as well as onset treatment effects to describe time delays in Pharmacometric Implementation of Item Response Theory reaching the full response (linear, exponential, slope- intercept models), was investigated. Normally and log- Following the IRT ICC estimation step, the resulting normally distributed between-subject variability was inves- knowledge was incorporated into a pharmacometric framework. tigated for all parameters. For the total IPSS model, First, the original individual assignment was reconciled with the additive, proportional, and combined error models were data (i.e., longitudinal observations were restored for each patient), investigated to describe residual variability. and IRT-derived latent disability estimates were modeled longitu- dinally as the dependent variable. Uncertainty in the Empirical Bayes Estimates (EBEs) of latent disability was taken into account Covariate Analysis through an additional additive residual error model term, similar to the IPPSE (individual PK parameters with standard errors) Investigated baseline covariates consisted of demo- approach in sequential PK/PD modeling (26) (we here name it graphics (age, weight, and body mass index), physiological the PSI-IPPSE approach). Schindler et al. previously proposed a disease-specific measures (total prostate volume, serum similar approach (20) but without standard errors. Secondly and testosterone, prostate-specific antigen, average flow rate, lastly, the IRT ICC estimation model and the final longitudinal flow time including time to maximum flow, maximum latent disability model from the PSI-IPPSE step were combined urine flow, post-void residual volume, voiding time, and into a single model to allow translation of latent disability to voiding volume), validated disease-specific patient-re- observed IPSS at the item and summary level, respectively. In the ported outcome (quality of life (QoL) score, BPH Impact latter model, the impact of re-estimating only the longitudinal Index (BII) score), and study site region (North America parameters, as well as the simultaneous estimation of ICCs and or Europe). Baseline IPSS was tested as a covariate on longitudinal parameters, was examined. the drug effect parameter during longitudinal IPSS model- ing. Lastly, individual degarelix area under the curve Calculation of Fisher Information Content (AUC ) estimates derived from application of a previ- 0-∞ ously developed population pharmacokinetic model (28) To investigate which IPSS items carry the most informa- to the CS36 trial pharmacokinetic data were investigated tion (i.e., the signal-to-noise ratio in determining patients’ as a predictor of treatment effect variability, both as a latent disability) and where on the disability scale they are continuous value and binned by quartile. most informative, the Fisher information content of each Covariate analysis was performed by way of a IPSS item was calculated as the negative expectation of the stepwise search at a significance level of 0.01 in the second derivative of the log-likelihood using the unidimen- forward inclusion step and 0.001 in the backward elimi- sional IRT ICC estimation model. The information functions nation step. Linear relationships were investigated for were visualized to illustrate the sensitivity of each IPSS item covariates. A multiplicative covariate model (Eq. 1) was over the full disability range. Individual items were ranked used to test continuous covariates on parameters except in according to the amount of information they contained the case of parameters liable to assume a typical value (θ) relative to the total information based on each item’s of zero (e.g., baseline disability in longitudinal IRT calculated area under the curve within this study’s estimated modeling), where an additive covariate model was used disability range. Information content assessment was (Eq. 2) 115 Page 4 of 15 The AAPS Journal (2020) 22:115 Power Calculations Parameter ¼ θ Parameter A stochastic simulation and estimation (SSE) procedure with ðÞ 1 þ θ ðÞ Covariate−Covariate ∗ Covariate median 1000 samples was used to assess the 80% power to detect a drug ð1Þ effect at a 5% level of significance. The model with the lowest AIC among the two developed longitudinal IRT models (unidimen- sional and bidimensional) was chosen as the simulation model. For simplicity, the Monte Carlo simulations assumed no missing Parameter ¼ θ individual IPSS item scores and no drop-out over the 6-month Parameter period. Power curves were generated by estimating the power of þ θ ðÞ Covariate−Covariate ð2Þ Covariate median the models at four different sample sizes, which were informed by an initial exploratory Monte Carlo Mapped Power (MCMP) (30) procedure. In the pharmacometric models, the actual type I error level and corresponding empirically derived ΔOFV was estimated by simulating 1000 trials with no drug effect at each sample size, similar to Wählby et al. (31). The power of two different analysis of Model Evaluation and Diagnostics covariance (ANCOVA) tests was determined using the same simulated data sets on which the power of the pharmacometric Non-covariate–related model selection was based on models was estimated. Both analyses included treatment as factor several criteria: for hierarchal models, the difference in and baseline summary IPSS as a covariate. The first ANCOVA objective function value (OFV) corresponding to a signifi- used cross-sectional data, regarding only the change from baseline cance level of 0.05 was considered statistically significant at 3 months post-dose, which was the landmark time point in the assuming a χ distribution while for non-nested models, the CS36 trial. This type of analysis is commonly pre-specified as the difference in Akaike information criterion (AIC) was used. main analysis of clinical trials. In the second ANCOVA, the Moreover, model stability based on the convergence of average summary IPSS change from baseline during the entire minimization and covariance steps, parameter precision treatment period was considered the dependent variable, which is assessed through NONMEM’s relative standard error esti- known as the “while on treatment” (WOT) strategy/estimand (32). mate, and graphical diagnostics were also considered during At each sample size, power was determined as the proportion of model selection. analyses that identified a statistically significant (p < 0.05) treatment Visual predictive checks (VPCs) of the longitudinal IPSS, as effect. well as the change in IPSS from baseline stratified by treatment arm using 200 samples, were used to assess the adequacy of the model characterization of the observed IPSS data. Software In the IRT analyses, the goodness of fit of ICCs was assessed using a novel sampling-based cross-validated generalized additive The Laplacian method in NONMEM version 7.4.3 (33) model (GAM) cubic spline smooth, which builds upon the was used for IRT ICC estimation and final longitudinal IRT commonly used GAM smooth diagnostic (21). As for all modeling, while the first-order conditional estimation with pharmacometric model diagnostics, EBE-based visual representa- interaction was used for longitudinal IPSS modeling as well as tions may be misleading due to η-shrinkage (29). In this particular intermediate longitudinal IRT modeling of EBEs of disability. diagnostic, EBE-shrinkage can cause an adequate model to appear The mIRT R-package (34) version 1.32.0 was used to obtain inadequate, in particular at extreme disability values. In order to initial estimates for the ICCs and to perform factor analysis as counteract the potential effects of η-shrinkage of disability EBEs well as multidimensional IRT model exploration. ICC diag- on the GAM smooth diagnostic, an approach was developed nostics were obtained using R version 4.6.0. Simulation-based utilizing random sampling from the individual posterior η distribu- model diagnostics for the longitudinal models were obtained tions from the final ICC estimation model uncertainty estimate of using Perl-Speaks-NONMEM (35) (PsN) version 4.9.0. EBEs (Fisher information assessed variance or conditional vari- ance). Two hundred η samples were drawn randomly, assuming normal distributions with mean individual posterior η estimate and RESULTS variance individual η Fisher information assessed variance. Dis- ability estimates were subsequently calculated for each generated η Table I shows the subject characteristics at baseline. In total, while respecting the baseline or post-baseline IDVIS origin of η, 3117 summary IPSS and 21,836 item-level IPSS responses from 403 using the estimated fixed-effects post-baseline shift parameters. patients were available for analysis. The distribution of responses is Similar to the traditional IRT GAM diagnostic, GAM smooths shown in Supplemental Fig. S1. Three hundred and sixty-nine of were applied to the data (one for each unique item–difficulty the 403 randomized patients completed the 6-month treatment category combination). To adjust for the difference between the period. Figure 1 shows the mean summary IPSS time course in number of sampling-generated and number of actual study– each trial arm as well as the distribution of responses for each IPSS derived disability estimates, the 95% confidence interval of the item. A marked drop in total IPSS was observed in all treatment GAM smooths was adjusted by multiplying the computed standard arms following dosing, and there was a similar distribution of item- error with the square root of the number of generated η samples. level IPSS responses at the three key trial visits (baseline, the To diagnose the final longitudinal IRT model, VPCs were landmark time point, and end-of-trial) in both the placebo arm and generated for both item-level IPSS observations and summary the pooled treatment arms. From Fig. 1, there was no apparent IPSS scores using 2000 Monte Carlo simulations. dose-response for the effect of degarelix on the IPSS. The AAPS Journal (2020) 22:115 115 Page 5 of 15 Table I. Baseline Demographic and International Prostate Symptom Score (IPSS) Characteristics in Clinical Trial CS36 Variable Placebo Degarelix 10 mg Degarelix 20 mg Degarelix 30 mg Number of patients 98 101 99 105 Age in years (median [range]) 65.0 [50.0, 86.0] 65.0 [50.0, 81.0] 66.0 [52.0, 82.0] 65.0 [50.0, 87.0] Body weight in kg (median [range]) 86.4 [60.0, 128.0] 87.0 [54.1, 126.2] 85.0 [57.0, 141.2] 84.0 [55.0, 183.8] Body mass index in kg/m/m (median [range]) 28.5 [20.1, 40.2] 27.8 [18.9, 40.5] 27.7 [21.4, 38.9] 27.7 [19.8, 58.1] Total IPSS (median [range]) 18.0 [13.0, 33.0] 18.0 [11.0, 33.0] 19.0 [13.0, 33.0] 19.0 [13.0, 35.0] IPSS storage subscore (median [range]) 8.0 [3.0, 15.0] 8.0 [3.0, 15.0] 8.0 [4.0, 15.0] 8.0 [2.0, 15.0] IPSS voiding subscore (median [range]) 10.0 [4.0, 20.0] 11.0 [0.0, 20.0] 11.0 [3.0, 20.0] 11.0 [4.0, 20.0] Quality of life score (median [range]) 4.0 [2.0, 6.0] 4.0 [1.0, 6.0] 4.0 [2.0, 6.0] 4.0 [3.0, 6.0] BPH Impact Index score (median [range]) 7.0 [0.0, 13.0] 7.0 [0.0, 12.0] 7.0 [0.0, 12.0] 7.0 [0.0, 12.0] Voided volume in mL (median [range]) 175.5 [77.0, 466.0] 188.1 [125.0, 632.0] 185.0 [57.0, 505.0] 186.0 [106.4, 484.0] Voiding time in s (median [range]) 37.0 [19.0, 121.0] 40.0 [21.0, 128.0] 42.0 [15.0, 112.0] 39.0 [20.6, 344.5] Post void residual volume in mL (median [range]) 39.1 [0.0, 230.0] 50.5 [0.0, 246.6] 45.0 [0.0, 189.0] 56.3 [0.0, 999.0] Average flow rate in mL/s (median [range]) 5.0 [2.6, 10.4] 5.0 [2.6, 9.5] 5.3 [2.7, 10.6] 5.0 [2.3, 8.5] Maximum urine flow in mL/s (median [range]) 10.0 [4.6, 16.4] 10.0 [4.4, 19.2] 10.0 [5.4, 50.0] 9.9 [5.1, 16.0] Flow time including time to maximum flow 33.0 [18.0, 113.0] 36.0 [20.0, 120.0] 37.4 [13.0, 101.0] 37.0 [20.6, 100.4] in s (median [range]) Total prostate volume in mL (median [range]) 39.1 [16.8, 102.0] 38.4 [14.2, 128.0] 38.3 [17.0, 155.7] 36.1 [9.8, 135.9] Prostate specific antigen in ng/mL (median [range]) 2.0 [0.2, 9.6] 1.8 [0.1, 9.0] 2.3 [0.3, 9.6] 1.8 [0.3, 7.8] Serum testosterone in ng/mL (median [range]) 4.1 [1.0, 10.2] 4.3 [0.2, 13.6] 4.3 [2.0, 8.0] 4.3 [0.6, 12.2] Region North America (N, %) 57 (58.2) 60 (59.4) 60 (60.6) 63 (60.0) Region Europe (N, %) 41 (41.8) 41 (40.6) 39 (39.4) 42 (40.0) Both the traditional cross-validated cubic spline GAM Item Response Theory Analysis smooth and the sampling-based extension of the latter indicated that the estimated ICCs described the data ade- The unidimensional IRT model had high (> 0.6) item quately (Fig. 3). Better model agreement was observed with factor loadings except for the nocturia item, which had a the sampling-based GAM smooth compared with the tradi- modest factor loading value of 0.39, suggesting adequacy of tional method, although low typical η-shrinkage (SD-based) the unidimensionality assumption. Factor analysis with two (9.6%) and low individual shrinkage variability (95% CI dimensions identified items relating to voiding (the emptying, 9.6% to 9.9%, range 6.3% to 42.0%) was observed. intermittency, weak stream, and straining IPSS items) and Total IPSS spanning the entirety of the scale were storage (the frequency, urgency, and nocturia IPSS items) symptoms, respectively, as belonging to separate dimensions, observed in the CS36 data and high correlation (r = 0.95) informing the development of a bidimensional IRT model with estimated IRT disability was observed (Fig. 4a). (item factor loading values are shown in Supplemental However, for a given summary IPSS value, there exists a Table S1). wide range of underlying disability, most evident for moderate BPH-LUTS (8 ≤ IPSS ≤ 19). Moreover, Fig. 4b illustrates that the minimal detectable decrease (MDD) of three IPSS points (36,37) corresponds to a wide range of Unidimensional Item Characteristic Curve Estimation Model decreases in latent disability. In turn, there is a notable In the unidimensional IRT ICC estimation model, 44 overlap between the latter disability improvements and those parameters (35 difficulty parameters, 7 discrimination param- corresponding to observed improvements below the MDD (− eters, and 2 post-baseline shift disability parameters) were 3< ΔIPSS < 0), no observed change (ΔIPSS = 0), and to a estimated with low uncertainty in order to characterize the small extent observed worsening (ΔIPSS > 0). Lastly, the ICCs (Table II). The incomplete emptying IPSS item had the threshold commonly used to determine clinical progression highest discrimination parameter value (1.38); i.e., it is more (ΔIPSS ≥ 4) (37–40) corresponds to no change or increases in underlying disability. sensitive to changes in disability around the difficulty As shown in Table III, the most informative IPSS item parameter of each score. The nocturia item had the lowest was incomplete emptying (23.8% of total information), discrimination parameter value (0.49), indicating that a large closely followed by intermittency (20.8% of total informa- increase in disability gives a relatively small increase in tion). These items can determine patients’ disability more probability of increased score. The ICCs of each IPSS item precisely relative to the other IPSS items. The nocturia item are illustrated in Fig. 2 and show expected scores larger than was found to contain the least information (3.4%), which is in zero for individuals with low disability (< − 4) for all items, line with this item having the lowest discrimination parameter most notably for the frequency, weak stream, and nocturia value (Table II). Of note, the IPSS voiding items (incomplete items. For the nocturia item, individuals with a low disability emptying, intermittency, weak stream, and straining) com- estimate are predominantly expected to score higher than 0, bined carried 72% of the total information while IPSS storage indicating that the vast majority of patients will answer that items (frequency, urgency, and nocturia) combined only they get up to urinate at least once every night. 115 Page 6 of 15 The AAPS Journal (2020) 22:115 Fig. 1. The mean International Prostate Symptom Score (IPSS) in each CS36 trial arm along with the standard error of the mean at each visit. The distribution of item-level IPSS at the baseline visit, landmark time point (3 months post-dose), and end of trial (6 months post-dose) is shown for the placebo arm as well as the pooled degarelix dose arms contained 28% of the total information. A visual representa- IRT model. All three developed models adequately described tion of the Fisher information curves for each item is shown in the data as illustrated by VPCs (Supplemental Figs. S9, S10, Supplemental Fig. S2. S11, S12, and S13). The time course of IPSS and latent disability in the summary score and unidimensional IRT model, respectively, Bidimensional Item Characteristic Curve Estimation Model were described according to In the bidimensional IRT ICC estimation model, 47 IPSS or Disability ¼ Baseline þ Placebo þ Drug parameters were estimated with low uncertainty (35 difficulty parameters, 7 discrimination parameters, two sets of post- baseline shift disability parameters, and a correlation term where Baseline is the estimated baseline, Drug is the offset between latent variables) using Cholesky decomposition (to degarelix treatment effect, and Placebo is the placebo effect estimate the correlation between the latent variables fixed to described by 1). The bidimensional ICC estimation model had a 407.5 lower OFV than the unidimensional ICC estimation model, lnðÞ 2 − Time Tprog Placebo ¼ Pmax 1−e þ Drift Time and its IRT parameter estimates and ICCs are presented in Table II and visually represented in Supplemental Figs. S3 and S4, respectively. Estimated ICCs adequately described where Pmax is the maximal placebo effect, Tprog is the half- the data as shown in Supplemental Figs. S5 and S6. Typical η- shrinkage was 10% (individual shrinkage 95% CI 9.8% to life to reach Pmax, and Drift describes worsening or 10%, range 6.9% to 38.6%) and 13% (individual shrinkage continued improvement.In the bidimensional IRT model, 95% CI 13.6% to 13.8%, range 9.8% to 38.8%) in the voiding the placebo effect in each dimension was described using a and storage dimension, respectively. Weibull function The residual correlation between items in the two WEI respective developed IRT ICC estimation models is shown lnðÞ 2 − *Time ðÞ Tprog Placebo ¼ Pmax 1−e þ Drift Time in Supplemental Figs. S7 and S8. Longitudinal Models where WEI is the Weibull exponent. Separate offset drug effects were estimated on each of the two latent variable Three longitudinal models were developed: a total score scales. model, a unidimensional IRT model, and a bidimensional The AAPS Journal (2020) 22:115 115 Page 7 of 15 Table II. Item Characteristic Curve (ICC) Parameter Estimates in the (a) Unidimensional and (b) Bidimensional Item Response Theory (IRT) models ab Unidimensional model Bidimensional model Parameter Estimate Relative standard error (%) Estimate Relative standard error (%) IRT ICC parameters a 1.38 7.0 1.6 7.6 b − 4.09 5.9 − 3.4 7.2 1,1 b 1.82 7.4 1.56 8.1 1,2 b 1.68 6.7 1.44 7.4 1,3 b 1.41 6.8 1.2 7.6 1,4 b 1.27 8.0 1.09 8.5 1,5 a 0.98 7.0 1.4 8.5 b − 5.39 6.0 − 4.83 7.4 2,1 b 2.64 7.5 2.24 8.3 2,2 b 2.04 6.7 1.8 7.8 2,3 b 1.49 7.1 1.3 8.2 2,4 b 1.55 7.8 1.3 8.2 2,5 a 1.29 7.7 1.68 8.2 b − 3.77 6.0 − 3.03 7.4 3,1 b 1.8 7.4 1.48 8.0 3,2 b 1.6 7.1 1.32 7.7 3,3 b 1.08 7.5 0.88 8.0 3,4 b 1.34 8.1 1.1 8.4 3,5 a 0.92 6.7 1.16 8.0 b − 3.86 5.6 − 3.65 7.3 4,1 b 2.09 6.8 1.88 8.1 4,2 b 1.68 6.6 1.55 7.7 4,3 b 1.22 7.2 1.12 8.0 4,4 b 1.42 7.7 1.27 8.7 4,5 a 1.09 7.2 1.36 7.7 b − 5.11 6.3 − 4.16 7.3 5,1 b 2.31 7.8 1.9 8.3 5,2 b 1.69 7.0 1.4 7.7 5,3 b 1.32 7.1 1.09 7.7 5,4 b 1.12 7.5 0.93 8.1 5,5 a 0.95 7.8 1.25 8.2 b − 3.1 6.1 − 2.46 7.5 6,1 b 1.72 7.7 1.38 8.2 6,2 b 1.68 7.5 1.35 8.1 6,3 b 1.67 9.8 1.34 8.3 6,4 b 1.67 8.4 1.34 10.1 6,5 a 0.49 8.4 0.601 8.5 b − 7.89 7.5 − 6.93 7.7 7,1 b 5.19 8.7 4.4 8.5 7,2 b 3.52 8.1 3.04 8.2 7,3 b 2.44 8.9 2.09 8.9 7,4 b 2.1 10.5 1.77 10.2 7,5 Post-baseline shift parameters Mean latent variable dimension 1 − 1.38 6.1 − 1.07 8.8 Variance latent variable dimension 1 2.22 6.4 1.61 7.3 Mean latent variable dimension 2 - - − 1.40 8.5 Variance latent variable dimension 2 – 2.4 7.4 Correlation between dimensions - - 69.1 3.6 a is the discrimination parameter for item i; b is the difficulty parameter for item i and category k. In the bidimensional model, dimension 1 i i,k (voiding) consists of items 1, 3, 5, and 6 while dimension 2 (storage) includes items 2, 4, and 7. At baseline, the latent variable(s) was fixed to N(0, 1) while the mean and variance of the latent variable(s) was estimated for post-baseline data (IDVIS approach) Item #1: “Incomplete Emptying”; Item #2: “Frequency”; Item #3: “Intermittency”; Item #4: “Urgency”; Item #5: “Weak Stream”, Item #6: “Straining”, Item #7: “Nocturia” 115 Page 8 of 15 The AAPS Journal (2020) 22:115 Fig. 2. Item characteristic curves for each International Prostate Symptom Score item in the unidimensional item response theory model Final longitudinal model parameter estimates for the the item-level and summary-level IPSS (data not shown). total IPSS and unidimensional IRT model, along with their Simultaneous re-estimation of ICCs and longitudinal precision, are shown in Table IV. The lowest OFV and best parameters (estimates shown in Supplemental Table S2) goodness of fit were achieved by specifying log-normally yielded an OFV decrease of 11 points compared with the distributed inter-individual variability (IIV) for Baseline fixed ICC longitudinal unidimensional IRT model. This IPSS and Tprog and normally distributed IIV for Pmax , and was deemed insignificant, and hence, the longitudinal IPSS IPSS Drift . In longitudinal latent disability modeling, log- unidimensional IRT model with fixed ICCs and estimated IPSS normal IIV was specified for Tprog , while normal longitudinal parameters was kept as the final model. In Disability distributions were specified for Baseline , Pmax , the latter, covariate relationships found to be significant Disability Disability and Drift . The typical value of Drift was fixed to zero, using the PSI-IPPSE method underwent an additional Disability and no significant changes in OFV were observed by doing backward eliminationstep(<0.001)toconfirm their so. The addition of IIV on Drug was not feasible in neither significance. All covariates remained statistically significant longitudinal IPSS nor latent disability modeling, as it yielded in the full model. Lastly, Box-Cox transformation of the no significant OFV decrease and a variance close to zero, Baseline and Drift IIV distributions in both models indicating that placebo and drug effect variability could not resulted in significant drops in OFV. However, in longitu- be distinguished in the current data. Incorporation of the dinal unidimensional IRT modeling, the Box-Cox shape offset drug effect into the total IPSS model, unidimensional parameter had a high relative standard error (> 400%) IRT model, and bidimensional IRT model gave an OFV and was therefore ultimately not included as part of the reduction of 22.1 (df = 1), 20.3 (df = 1), and 42.5 (df = 2), final model. respectively, compared with the respective models without an During longitudinal bidimensional IRT modeling, high estimated drug effect. No dose-response or exposure- correlation (≥ 96%) was observed between the Tprog IIV response using AUC as the exposure metric was observed and Pmax IIV components for each dimension, which 0-∞ on the IPSS and latent disability scale, respectively. affected model stability. These IIV parameters were hence In the longitudinal the total IPSS and unidimensional collapsed into a single common parameter across the two IRT model, covariates were tested on the Base, Pmax, dimensions. The typical value of the Weibull exponent was and Drug parameters. Significant covariates (p < 0.001) on also estimated to be the same in both dimensions due to Baseline in both models consisted of the baseline BII model stability. As per the unidimensional IRT model, score, baseline QoL score, and study region, while longitudinal parameters were re-estimated in the final longi- baseline QoL score was included on Pmax tudinal bidimensional IRT model. The final model minimized IPSS (Table IV). Due to the long runtime of the longitudinal successfully and its parameter estimates are shown in Table V. full ICC model, covariates were identified using the It was not possible to obtain parameter precision estimates, longitudinal PSI-IPPSE approach and were subsequently include covariates, or simultaneously estimate ICCs and incorporated into the full longitudinal ICC model. Re- longitudinal parameters due to convergence and stability estimation of the longitudinal parameters in the latter issues. The final bidimensional longitudinal IRT model yielded an OFV decrease of approximately 130 points, adequately described both summary and item level data and substantially better fit was observed in the VPCs of (Supplemental Figs. S12 and S13, respectively). The AAPS Journal (2020) 22:115 115 Page 9 of 15 Fig. 3. The International Prostate Symptom Score (IPSS) item characteristic curve fits in the unidimensional item response theory model for the cumulative probabilities (red lines) along with cross-validated cubic spline generalized additive model (GAM) smooth (green area) and η sampling-based cross-validated cubic spline GAM smooth using 200 samples (blue area) Fig. 4. a Observed International Prostate Symptom Scores (IPSS) vs. item response theory disability estimates from the unidimensional item response theory model based on 3117 separate measurements from 403 patients over the 6-month trial period. b Observed change from baseline in International Prostate Symptom Scores (IPSS) vs. change from the baseline of item response theory disability from the unidimensional item response theory model in 403 patients over the 6-month trial period. MDD minimally detectable difference 115 Page 10 of 15 The AAPS Journal (2020) 22:115 Table III. Fisher Information Content Ranking of International Prostate Symptom Score (IPSS) Items Based on the Unidimensional Item Response Theory Model IPSS item Item subscore category % of total Fisher information Cumulative % total Q1: Incomplete Emptying Voiding 23.8 23.8 Q3: Intermittency Voiding 20.8 44.6 Q5: Weak Stream Voiding 15.4 60 Q2: Frequency Storage 13.1 73.1 Q6: Straining Voiding 11.8 84.9 Q4: Urgency Storage 11.6 96.5 Q7: Nocturia Storage 3.4 99.9 Power of Testing and Model-Based Methods additional SSE procedure confirmed this finding, using the unidimensional IRT model as simulation model (data not The bidimensional IRT model was used as the shown). The bidimensional IRT model provided the simulation model in the SSE procedure as it provided a highest power to detect a drug effect, allowing for a total lower AIC value (59,086.3) compared with the unidimen- trial sample of approximately N = 106 to reach 80% power sional IRT model (AIC value of 61,622.6). The resulting compared with the total IPSS and unidimensional IRT power curves are shown in Fig. 5. The pharmacometric models. The type 1 error of each model under each models all provided considerably higher power to detect a sample size and empirically derived OFV cut-off in the drug effect compared with the cross-sectional ANCOVA SSE procedure is presented in the Supplemental Table S3. as well as the WOT ANCOVA. The unidimensional IRT Only model runs that minimized successfully were used in model yielded slightly higher power (approximately N = the calculation of power (on average ~ 80% of full- 113 to reach 80% power) compared with the total IPSS reduced bidimensional model pairs and ~ 90% of unidi- model (approximately N = 120 to reach 80% power). An mensional and total IPSS model pairs, respectively). Table IV. Longitudinal model parameter estimates. IPSS: summary International Prostate Symptom Score, IRT: Item response theory. Relative standard errors were obtained in NONMEM IPSS model Unidimensional IRT model Parameter Value Relative standard error (%) Value Relative standard error (%) Baseline 19.6 1.7 0.0283 146.3 Pmax (maximal placebo response) − 4.12 9.9 − 1.03 10.9 Tprog (placebo half-life) 15.3 18.8 12.3 20.5 Drug effect − 1.98 19.2 − 0.542 20.3 Baseline Box-Cox shape 1.87 41.7 0.373 25.4 Drift Box-Cox shape 39.3 47.6 - - Covariates Baseline QoL on Pmax 0.208 13.2 - - Baseline BII on Baseline 0.0211 19.6 0.121 17.9 Baseline QoL on Baseline 0.0873 12.7 0.325 17.4 Region on Baseline − 0.0803 26 − 0.338 24.1 Interindividual variability (IIV) IIV Baseline 13.7% 8.3 75.9% 7.7 IIV Pmax 121.7% 15.4 128.5% 15.4 IIV Drift 1.8% 19.4 0.7% 8.8 IIV Tprog 90.6% 12 52.4% 9.9 IIV Baseline-Pmax correlation - - 1.7% IIV Baseline-Drift correlation - - 9.2% IIV Pmax-Drift correlation 43.1% 34% Residual error Proportional residual error 10.9% 8.9 Additive residual error 189.2% 6.7 The AAPS Journal (2020) 22:115 115 Page 11 of 15 DISCUSSION Table V. Parameter estimates for the longitudinal bidimensional item response theory model Item Response Theory Analysis Parameter Value The current paper presents the first reported IRTanalyses of Baseline (voiding scale) − 0.0251 V the IPSS and longitudinal pharmacometric IRT model within Baseline (storage scale) − 0.0667 S BPH-LUTS. Both a unidimensional and a bidimensional IPSS Pmax (maximal placebo response voiding scale) − 0.75 IRT model were developed based on factor analyses, the latter Pmax (maximal placebo response storage scale) − 0.845 further confirming previous findings (41,42). Tprog (placebo half-life voiding scale) 12.9 In the unidimensional IRT model, the vast majority of Tprog (placebo half-life storage scale) 13.4 the total information content was contained in IPSS voiding Weibull shape parameter (common for both scales) 1.53 items and this finding is supported by a principal component Drug effect voiding scale − 0.488 analysis showing total IPSS being predicted by improvement Drug effect storage scale − 0.749 in voiding symptoms rather than storage symptoms (43). Interindividual variability (IIV) Subscore analysis, i.e., distinguishing treatment effects on the IIV Baseline (voiding scale) 97.3% IPSS voiding and storage subscores in addition to the total IIV Baseline (storage scale) 128.8% IPSS, is routinely performed as a secondary statistical analysis IIV Baseline -Baseline correlation 26% v S IIV Pmax (common for both scales) 145.6% of clinical trials within BPH-LUTS, although its clinical IIV Tprog (common for both scales) 61.1% meaningfulness has not been established (42,44,45). The IIV Drift (common for both scales) 0.6% current results suggest that the IPSS voiding subscore is more IIV Pmax-Drift correlation 40% sensitive in assessing a patient’s BPH-LUTS in comparison with the storage subscore and may therefore also be better suited for detecting symptomatic drug effects. It is however to be noted that the most favorable signal-to-noise ratio will be obtained by regarding all available data and acknowledging the information contribution of individual items as opposed to considering the composite (sub)score(s), as exampled by pharmacometric IRT in Parkinson’s disease (15). Fig. 5. Power curves for the pharmacometric models obtained using a type I error corrected stochastic simulation and estimation procedure. One thousand simulated data sets from the bidimensional item response theory model at sample sizes of 33, 66, 99, and 137 patients were used for model estimation with the respective full (with a drug effect parameter) and reduced (without a drug effect parameter) models. Vertical lines indicate the 95% confidence interval for the calculated power estimates 115 Page 12 of 15 The AAPS Journal (2020) 22:115 The incomplete emptying item was found to be the most effect although three different drug doses (10 mg, 20 mg, and informative. This item has previously been found to be associated 30 mg) were included in the analyzed trial. Lack of observed dose- with worsening of both voiding and storage symptoms (46). response and exposure-response relationships may be explained by Incomplete emptying had the highest discrimination parameter the narrow dose range studied in the current trial. Including at least value (1.38) in the unidimensional IRT model; however, com- four active doses spanning an at least 10-fold range has previously pared with other reported unidimensional IRT analyses in been emphasized to characterize dose-exposure-response ade- different therapeutic areas, this is relatively low (e.g., the highest quately (57). In the current trial, the width of the dose range was discrimination parameter value was 3.35 in the ADAS-cog IRT restricted due to the expectation of an increase in the incidence of analysis (9) and 3.5 in the EDSS IRT analysis (12)). This may prolonged testosterone suppression at higher doses of degarelix. indicate that BPH-LUTS is a diffuse and heterogeneous disease, Further discussion regarding longitudinal modeling and covariate and consequently, IPSS items have difficulty in discriminating analysis results are presented in the Supplemental Discussion. between different levels of disability. The longitudinal bidimensional IRT model allowed for The nocturia item was found to be the least informative, and estimation of a differential drug effect on voiding and storage several reports in the literature support this. Firstly, the item may IPSS symptoms, while preserving item-level information. This not be sufficiently specific to BPH-LUTS; the primary cause of approach may be more in line with the different effects of adult nocturnal polyuria has been attributed to the decline in therapy on the primary pathophysiologies behind voiding and nocturnal secretion of antidiuretic hormone due to aging (47,48) storage symptoms (58,59). Limitations of the pharmacometric as opposed to being a direct consequence of BPH. The nocturia bidimensional model included lack of longitudinal parameter item was also the least specific in Japanese men with BPH and a precision estimates and inability to include covariates. This similar explanation was proposed (49). Secondly, nocturia may be can be attributed to the increased model complexity due to unspecific to urologic conditions in general. Significant correlation presence of several latent variables, and other longitudinal between IPSS nocturia and items 5 and 6 describing nocturia in pharmacometric multidimensional IRT models have reported the 8-item overactive bladder questionnaire (OAB-8) has been similar issues (13,14). More advanced and computationally established (50); an IRT analysis of the OAB-8 in both men and intensive methods for assessing parameter uncertainty (e.g., a women showed the two items describing nocturia to have the non-parametric bootstrap) may be used to obtain parameter relatively lowest discrimination parameter values (51)(ratioto precision, but were beyond the scope of the current work. the highest discrimination parameter estimate was 0.35, 0.40, and Item- and summary-level VPCs were therefore the primary 0.42 for IPSS nocturia, OAB-8 item 5, and OAB-8 item 6, basis for concluding adequate model fitand predictive respectively). It should be emphasized that nocturia and urgency performance. If longitudinal model stability and covariate symptoms appear to be the most bothersome symptoms to identification are of primary interest, the longitudinal unidi- patients suffering from LUTS (52,53). Lower information content mensional IRT model may be a better-suited alternative. The does not entail that the corresponding symptom is not bother- unidimensional approach may also be advantageous for more some from a patient perspective; it indicates that the frequency of straightforward translation between changes in the summary observed scores varies less across patients with highly different IPSS and IRT-estimated disability. From a psychometric disease severity compared with other items. The item is therefore standpoint, both the unidimensional and bidimensional IPSS less sensitive in assessing the overall condition and less useful for IRT approaches are valid (41). distinguishing between patients. The bother of each BPH-LUTS symptom is expected to vary between patients, yet this is not Power captured by the IPSS; this diagnostic limitation (54)isaddressed by other questionnaires, e.g., the Danish Prostate Symptom Score The longitudinal model-based analyses showed consid- (55) and the International Continence Society Questionnaire erably higher power to detect a drug effect compared with the Male LUTS questionnaire (56). cross-sectional ANCOVA using only data from the visit Based on comparison between IRT disability and total IPSS, 3 months post-dose. The higher power of longitudinal the MDD of IPSS ≤− 3 for classifying patients as experiencing pharmacometric modeling compared with cross-sectional clinically significant improvement (36,37) and IPSS ≥4for testing is not a novel finding and has previously been reported determining clinical progression of BPH-LUTS (37–40)is in several other therapeutic areas (9–11), yet comparison with supported. However, seeing that there is extensive overlap a WOT estimand-based test has to our knowledge not been between changes in latent disability at the observed MDD and presented previously. These findings are discussed further in below it (decreases lower than three total IPSS points and to a the Supplemental Discussion. certain extent increases in total IPSS), using only the change in A modest increase in power to detect a drug effect was total IPSS to evaluate response may overlook many patients that observed by the use of the unidimensional IRT modeling compared benefit from treatment. The same reasoning applies to patients with the total IPSS model, and this finding was unexpected given that experience worsening of their symptoms. that other longitudinal IRT applications have shown greater Discussion regarding the developed sampling-based increases in power compared with longitudinal summary score GAM smooth methodology for evaluating ICCs is presented modeling (9,15). Studies have shown that the larger the number of in the Supplemental Discussion. items in a questionnaire, the higher the power of IRT (60,61), and this may explain the similar power between the summary IPSS Longitudinal Modeling model and the unidimensional IRT model in the current study compared with analyses of questionnaires with a higher number of In both the longitudinal total IPSS and IRT models, a model items. Furthermore, the heterogeneity in the item discrimination describing treatment as present or absent best described treatment parameter values has been shown to affect the power of IRT The AAPS Journal (2020) 22:115 115 Page 13 of 15 compared with summary score modeling (62). For instance, for the pharmacometric IRT modeling when applied to different mea- 8-item Expanded Disability Status Scale (EDSS) in multiple surement scales, as it may differ to a great extent depending on sclerosis, pharmacometric IRT analysis showed a larger power the internal characteristics of the latter. Knowledge regarding the increase compared with summary score modeling (63)thaninthe size of the increase in the power to detect a drug effect may be current study, which may be explained by the higher variability primordial in informingadrug developer’s decision to implement between discrimination parameter estimates of EDSS items (66% the more complex IRT methodology. For completeness, it is to be CV) compared with IPSS items (29% CV) (12). In the current noted that pharmacometric modeling of longitudinal data is not work, the bidimensional pharmacometric IRT model was used for the current standard for detecting drug effects in clinical trials. simulation of data on which the power to detect a drug effect was Further research regarding, e.g., its general alignment with estimated for the unidimensional IRT and total IPSS models, traditional statistical analyses, the adequacy of its underlying respectively. A sensitivity analysis specifying the unidimensional assumptions, its type I error control, and its pre-specification (65– IRT model as the simulation model was performed and confirmed 67), is needed before it may be regarded as the primary analysis the currently reported power difference between the method and thereby dictate the sample size of clinical trials. pharmacometric unidimensional IRT model and the total IPSS The IRT methodology may be implemented in all clinical model (data not shown). trials where composite scores are used to assess treatment A higher power to detect a drug effect was observed with the efficacy, i.e., from proof-of-concept phase II to confirmatory longitudinal bidimensional IRT model compared with the unidi- phase III trials. However, the shift from using “observed total mensional IRT model. This may be due to the differences in ICCs score” to “underlying disease” as the estimand summary and disability scale of the multidimensional model compared with measure (32) may represent a substantial paradigm shift and the unidimensional model, which, in turn, give a more precise may therefore require framework developments supervised by discernment of the drug effect. Given a questionnaire where regulators. An example could be the development of standard- multidimensionality is substantiated, we hypothesize that the ized item banks based on a large number of item-level patient difference in power to detect a drug effect may increase compared responses from many trials. This would inform precise ICCs and with a unidimensional IRT model as the correlation between latent thereby allow for precise and, most importantly, consistent variables decreases, as this would gradually increase the difference estimation of latent disability across different clinical trials. The in ICCs and disability scale. This is the first investigation of the merit and practical utility of IRT in increasing the efficiency of impact of IRT dimensionality on the power to detect a drug effect clinical development programs appear to already be recognized and hence warrants further investigation. For example, the original within the US Food and Drug Administration (68). application of pharmacometric IRT based on the ADAS-cog scale (9) investigated the power of a unidimensional IRT model; based CONCLUSION on findings suggesting that the ADAS-cog is multidimensional (64), it may also be of interest to assess the power of a multidimensional Pharmacometric models were developed based on item- pharmacometric ADAS-cog IRT model. level and summary-level IPSS, respectively, to describe the A limitation of the current as well as previous pharmacometric time course of underlying disability and total IPSS in patients IRT studies (9,15,63) was that simulation model bias was present in with moderate-to-severe BPH-LUTS in a clinical trial setting. the power calculations: the pharmacometric IRT model used for IRT analysis revealed that voiding IPSS items combined simulation of data was also used to estimate power and may contained the majority of the information content, which may therefore have favored the pharmacometric IRT approaches. have implications for the analysis of IPSS subscores. The Other approaches, such as developing longitudinal ordered unidimensional IRT model showed slightly higher power to categorical models for each item and simulating data from these, detect a drug effect compared with the composite score were considered. However, it is not clear whether the IPSS ICCs model, while the bidimensional IRT model further increased would be preserved or require re-estimation based on simulated the power. Taking the multidimensional nature of the IPSS data by doing so and whether meaningful comparison with into account in a pharmacometric IRT framework may hence previously reported reductions in sample size would be feasible. allow for more precise quantification of drug effects and The current findings may serve to more precisely assess optimization of statistical power. patients’ underlying BPH-LUTS by utilizing the available item-level IPSS responses instead of considering only the sum ACKNOWLEDGMENTS of these scores. Furthermore, they may inform more efficient clinical development of BPH-LUTS treatments, although the The authors would like to thank Sebastian Ueckert and gain in power to detect a drug effect was found to be lower Leticia Arrington for their valuable input during the research. compared with previously reported applications with different This work was funded jointly by the Danish Innovation Fund scales describing different neurological conditions (9,15,63). (grant number 5189-00064b), Ferring Pharmaceuticals A/S, IRT focuses on quantifying the information of questionnaires and the Swedish Research Council Grant 2018-03317. in specific patient populations; since the modeled data spanned the entire range of total IPSS (i.e., from the lowest AUTHOR CONTRIBUTIONS to the highest possible disease severity), the presented results may be extended to the analysis of the IPSS in other clinical Y.K.L. wrote the manuscript and analyzed the data. trials including similar patients with moderate-to-severe BPH- Y.K.L., D.M.J, T.M.L., A.C.H., and M.O.K. designed the LUTS, regardless of treatment and its effect size. research. D.M.J, T.M.L., A.C.H., and M.O.K. reviewed the 5The current study emphasizes the importance of quantify- manuscript. ing the increase in power to detect a drug effect with 115 Page 14 of 15 The AAPS Journal (2020) 22:115 12. Novakovic AM, Krekels EHJ, Munafo A, Ueckert S, Karlsson FUNDING INFORMATION MO. Application of item response theory to modeling of expanded disability status scale in multiple sclerosis. AAPS J. Open access funding provided by Uppsala University. 2017;19(1):172–9. 13. Krekels E, Novakovic AM, Vermeulen AM, Friberg LE, Karlsson MO. Item response theory to quantify longitudinal COMPLIANCE WITH ETHICAL STANDARDS placebo and paliperidone effects on PANSS scores in schizo- phrenia. CPT Pharmacomet Syst Pharmacol. 2017;6(8):543–51. Conflict of Interest Y.K.L. and D.M.J. are employees of Ferring 14. Gottipati G, Karlsson MO, Plan EL. Modeling a composite Pharmaceuticals A/S. All other authors declare that they have no score in Parkinson’s disease using item response theory. AAPS J. 2017;19(3):837–45. conflicts of interest. 15. Buatois S, Retout S, Frey N, Ueckert S. Item response theory as an efficient tool to describe a heterogeneous clinical rating scale in de novo idiopathic Parkinson’s disease patients. Pharm Res. 2017;34(10):2109–18. Open Access This article is licensed under a Creative 16. Baker FB. The basics of item response theory. Second Edition Commons Attribution 4.0 International License, which per- [Internet]. For full text: http://ericae; 2001 [cited 2019 May 23]. Available from: https://eric.ed.gov/?id=ED458219 mits use, sharing, adaptation, distribution and reproduction in 17. DeMars C. Item response theory. Oxford, New York: Oxford any medium or format, as long as you give appropriate credit University Press; 2010. 144 p. (Understanding Statistics). to the original author(s) and the source, provide a link to the 18. D’Agate, S. PAGE 2018 III-77 Development of a drug-disease Creative Commons licence, and indicate if changes were model describing individual IPSS trajectories in BPH patients: implication of disease progression and covariate factors on long made. The images or other third party material in this article term treatment response. are included in the article's Creative Commons licence, unless 19. Samejima F. Estimation of latent ability using a response indicated otherwise in a credit line to the material. If material pattern of graded scores. Psychometrika. 1969;34(1):1–97. is not included in the article's Creative Commons licence and 20. Schindler E, Friberg LE, Lum BL, Wang B, Quartino A, Li C, your intended use is not permitted by statutory regulation or et al. A pharmacometric analysis of patient-reported outcomes in breast cancer patients through item response theory. Pharm exceeds the permitted use, you will need to obtain permission Res. 2018;35(6):122. directly from the copyright holder. To view a copy of this 21. Ueckert S. Modeling composite assessment data using item licence, visit http://creativecommons.org/licenses/by/4.0/. response theory. CPT Pharmacomet Syst Pharmacol. 2018;7(4):205–18. 22. Thurstone LL. Multiple factor analysis. Psychol Rev. 1931;38(5):406–27. REFERENCES 23. De Ayala RJ, Hertzog MA. The assessment of dimensionality for use in item response theory. Multivar Behav Res. 1991;26(4):765–92. 24. Samejima F. Graded response model. In: van der Linden WJ, 1. Berry SJ, Coffey DS, Walsh PC, Ewing LL. The development of Hambleton RK, eds. Handbook of modern item response human benign prostatic hyperplasia with age. J Urol. theory. New York: Springer; 1997:85–100. 1984;132(3):474–9. 25. Kaiser HF. The varimax criterion for analytic rotation in factor 2. Medina JJ, Parra RO, Moore RG. Benign prostatic hyperplasia analysis. Psychometrika. 1958;23(3):187–200. (the aging prostate). Med Clin North Am. 1999;83(5):1213–29. 26. Lacroix BD, Friberg LE, Karlsson MO. Evaluation of IPPSE, 3. Parsons JK, Mougey J, Lambert L, Wilt TJ, Fink HA, Garzotto an alternative method for sequential population PKPD analysis. M, et al. Lower urinary tract symptoms increase the risk of falls J Pharmacokinet Pharmacodyn. 2012 Apr;39(2):177–93. in older men. BJU Int. 2009;104(1):63–8. 27. Pilla Reddy V, Kozielska M, Johnson M, Vermeulen A, de 4. Calais Da Silva F, Marquis P, Deschaseaux P, Gineste JL, Greef R, Liu J, et al. Structural models describing placebo Cauquil J, Patrick DL. Relative importance of sexuality and treatment effects in schizophrenia and other neuropsychiatric quality of life in patients with prostatic symptoms. Results of an disorders. Clin Pharmacokinet. 2011;50(7):429–50. international study. Eur Urol 1997;31(3):272–280. 28. Tornøe CW, Agersø H, Nielsen HA, Madsen H, Jonsson EN. 5. Taylor BC, Wilt TJ, Fink HA, Lambert LC, Marshall LM, Population pharmacokinetic modeling of a subcutaneous depot Hoffman AR, et al. Prevalence, severity, and health correlates for GnRH antagonist degarelix. Pharm Res. 2004 of lower urinary tract symptoms among older men: the MrOS Apr;21(4):574–84. study. Urology. 2006 Oct;68(4):804–9. 29. Savic RM, Karlsson MO. Importance of shrinkage in empirical 6. Jacobsen SJ, Jacobson DJ, Girman CJ, Roberts RO, Rhodes T, Bayes estimates for diagnostics: problems and solutions. AAPS Guess HA, et al. Natural history of prostatism: risk factors for J. 2009;11(3):558–69. acute urinary retention. J Urol. 1997;158(2):481–7. 30. Vong C, Bergstrand M, Nyberg J, Karlsson MO. Rapid sample 7. Barry MJ, Fowler FJ, O’Leary MP, Bruskewitz RC, Holtgrewe size calculations for a defined likelihood ratio test-based power HL, Mebust WK, et al. The American Urological Association in mixed-effects models. AAPS J. 2012;14(2):176–86. symptom index for benign prostatic hyperplasia. The Measure- 31. Wählby U, Bouw MR, Jonsson EN, Karlsson MO. Assessment ment Committee of the American Urological Association. J of type I error rates for the statistical sub-model in NONMEM. Urol. 1992;148(5):1549–57 discussion 1564. J Pharmacokinet Pharmacodyn. 2002;29(3):251–69. 8. Griffith JW. Self-report measurement of lower urinary tract 32. International Conference on Harmonisation E9(R1) addendum: symptoms: a commentary on the literature since 2011. Curr Urol statistical principles for clinical trials - estimands and sensitivity Rep. 2012;13(6):420–6. analysis in clinical trials < https://www.ema.europa.eu/en/docu- 9. Ueckert S, Plan EL, Ito K, Karlsson MO, Corrigan B, Hooker ments/scientific-guideline/ich-e9-r1-addendum-estimands-sensi- AC. Improved utilization of ADAS-cog assessment data tivity-analysis-clinical-trials-guideline-statistical- through item response theory based pharmacometric modeling. principles_en.pdf> (2020). Accessed March 11, 2020. Pharm Res. 2014;31(8):2152–65. 33. Beal SL, Sheiner LB, Boeckmann A. NONMEM user’s guides 10. Karlsson KE, Vong C, Bergstrand M, Jonsson EN, Karlsson Ellicott City. 2009. MO. Comparisons of analysis methods for proof-of-concept 34. Chalmers RP. mirt: a multidimensional item response theory trials. CPT Pharmacomet Syst Pharmacol. 2013;2(1):e23. package for the R environment. J Stat Softw. 2012;48(1):1–29. 11. Nelander, Karin, Hamrénn, B, Johansson, S, Åstrand, M. PAGE 35. Keizer RJ, Zandvliet AS, Beijnen JH, Schellens JHM, Huitema 2016 III-33 Longitudinal dose-response modelling as primary ADR. Performance of methods for handling missing categorical analysis of a clinical study. The AAPS Journal (2020) 22:115 115 Page 15 of 15 covariate data in population pharmacokinetic analyses. AAPS J. 53. Everaert K, Anderson P, Wood R, Andersson FL, Holm-Larsen 2012;14(3):601–11. T. Nocturia is more bothersome than daytime LUTS: results 36. Barry MJ, Williford WO, Chang Y, Machi M, Jones KM, from an observational, real-life practice database including 8659 Walker-Corkery E, et al. Benign prostatic hyperplasia specific European and American LUTS patients. Int J Clin Pract. health status measures in clinical research: how much change in 2018;72(6):e13091. the American Urological Association symptom index and the 54. Gratzke C, Bachmann A, Descazeaud A, Drake MJ, benign prostatic hyperplasia impact index is perceptible to Madersbacher S, Mamoulakis C, et al. EAU guidelines on the patients? J Urol. 1995;154(5):1770–4. assessment of non-neurogenic male lower urinary tract symp- 37. Barry MJ, Cantor A, Roehrborn CG, CAMUS Study Group. toms including benign prostatic obstruction. Eur Urol. Relationships among participant international prostate symp- 2015;67(6):1099–109. tom score, benign prostatic hyperplasia impact index changes 55. Schou J, Poulsen AL, Nordling J. The value of a new symptom and global ratings of change in a trial of phytotherapy in men score (DAN-PSS) in diagnosing uro-dynamic infravesical ob- with lower urinary tract symptoms. J Urol. 2013;189(3):987–92. struction in BPH. Scand J Urol Nephrol. 1993;27(4):489–92. 38. McConnell JD, Roehrborn CG, Bautista OM, Andriole GL, Dixon 56. Donovan JL, Peters TJ, Abrams P, Brookes ST, de aa Rosette CM, Kusek JW, et al. The long-term effect of doxazosin, finasteride, JJ, Schäfer W. Scoring the short form ICSmaleSF questionnaire. and combination therapy on the clinical progression of benign International Continence Society J Urol. 2000;164(6):1948–55. prostatic hyperplasia. N Engl J Med. 2003;349(25):2387–98. 57. European Medicines Agency. Report from dose finding work- 39. Roehrborn CG, Siami P, Barkin J, Damião R, Major-Walker K, shop <https://www.ema.europa.eu/en/documents/report/report- Nandy I, et al. The effects of combination therapy with european-medicines-agency/european-federation-pharmaceuti- dutasteride and tamsulosin on clinical outcomes in men with cal-industries-associations-workshop-importance-dose-finding- symptomatic benign prostatic hyperplasia: 4-year results from dose_en.pdf> (2015). Accessed May 31st, 2020. the CombAT study. Eur Urol. 2010 Jan 1;57(1):123–31. 58. Caine M. The present role of alpha-adrenergic blockers in the 40. Tacklind J, Fink HA, Macdonald R, Rutks I, Wilt TJ. treatment of benign prostatic hypertrophy. J Urol. Finasteride for benign prostatic hyperplasia. Cochrane Data- 1986;136:1):1–4. base Syst Rev. 2010;10:CD006015. 59. Andersson K-E. Storage and voiding symptoms: pathophysio- 41. Welch G, Kawachi I, Barry MJ, Giovannucci E, Colditz GA, logic aspects. Urology. 2003;62(5):3–10. Willett WC. Distinction between symptoms of voiding and 60. Holman R, Glas CAW, de Haan RJ. Power analysis in filling in benign prostatic hyperplasia: findings from the health randomized clinical trials based on item response theory. professionals follow-up study. Urology. 1998;51(3):422–7. Control Clin Trials. 2003;24(4):390–410. 42. Barry MJ, Williford WO, Fowler FJ, Jones KM, Lepor H. 61. Doostfatemeh M, Taghi Ayatollah SM, Jafari P. Power and Filling and voiding symptoms in the American Urological sample size calculations in clinical trials with patient-reported Association symptom index: the value of their distinction in a outcomes under equal and unequal group sizes based on graded Veterans Affairs randomized trial of medical therapy in men response model: a simulation study. Value Health. with a clinical diagnosis of benign prostatic hyperplasia. J Urol. 2016;19(5):639–47. 2000;164(5):1559–64. 62. Schindler E, Friberg LE, Karlsson MO. PAGE 2015 II-01 43. Yokoyama O, Ozeki A, Suzuki N, Murakami M. Early improvement Comparison of item response theory and classical test theory of storage or voiding symptoms by tadalafil predicts treatment for power/sample size for questionnaire data with various outcomes in patients with lower urinary tract symptoms from benign degrees of variability in items’ discrimination parameters. 2015. prostatic hyperplasia. Int J Urol. 2018;25(3):240–5. 63. Novakovic AM. Longitudinal models for quantifying disease 44. US Food and Drug Administration. Guidance for the non- and therapeutic response in multiple sclerosis. Uppsala: Acta clinical and clinical investigation of devices used for the Universitatis Upsaliensis; 2017. treatment of benign prostatic hyperplasia (BPH) (2010). 64. Verma N, Markey MK. Item response analysis of Alzheimer’s <https://www.fda.gov/regulatory-information/search-fda-guid- disease assessment scale. Conf Proc Annu Int Conf IEEE Eng ance-documents/guidance-non-clinical-and-clinical-investiga- Med Biol Soc IEEE Eng Med Biol Soc Annu Conf. tion-devices-used-treatment-benign-prostatic-hyperplasia> 2014;2014:2476–9. Accessed March 20, 2020. 65. Bieth B. et al Population Approach Group Europe (PAGE) 45. Montorsi F, Henkel T, Geboers A, Mirone V, Arrosagaray P, Model-based analyses for pivotal decisions, with an application Morrill B, et al. Effect of dutasteride, tamsulosin and the to equivalence testing for biosimilars Abstr 2343 (2012). combination on patient-reported quality of life and treatment 66. Musuamba F, Manolis E, Holford N, Cheung S, Friberg L, satisfaction in men with moderate-to-severe benign prostatic Ogungbenro K, et al. Advanced methods for dose and regimen hyperplasia: 4-year data from the CombAT study. Int J Clin finding during drug development: summary of the EMA/EFPIA Pract. 2010;64(8):1042–51. workshop on dose finding (London 4–5 December 2014). CPT 46. Lee JY, Lee DH, Lee H, Bang WJ, Hah YS, Cho KS. Clinical Pharmacomet Syst Pharmacol. 2017 Jul;6(7):418–29. implications of a feeling of incomplete emptying with little post- 67. Marshall S, Madabushi R, Manolis E, Krudys K, Staab A, void residue in men with lower urinary tract symptoms. Dykstra K, et al. Model-informed drug discovery and develop- Neurourol Urodyn. 2014;33(7):1123–7. ment: current industry good practice and regulatory expecta- 47. Asplund R. The nocturnal polyuria syndrome (NPS). Gen tions and future perspectives. CPT Pharmacomet Syst Pharmacol. 1995 Oct;26(6):1203–9. Pharmacol. 2019;8(2):87–96. 48. Miller M. Nocturnal polyuria in older people: pathophysiology 68. Younis, I. Clinical trial database analyses to inform regulatory and clinical implications. J Am Geriatr Soc. 2000;48(10):1321–9. guidances: improving the efficiency of schizophrenia clinical 49. Homma Y, Yamaguchi T, Kondo Y, Horie S, Takahashi S, trials. The International Society for CNS Clinical trials and Kitamura T. Significance of nocturia in the international methodology (ISCTM) 14th Annual Scientific Meeting <https:// prostate symptom score for benign prostatic hyperplasia. J Urol. isctm.org/public_access/Feb2018/Presentations/S2-Younis.pdf> 2002;167(1):172–6. (2018) Accessed July 15th, 2020. 50. Trafford Crump R, Sehgal A, Wright I, Carlson K, Baverstock R. From prostate health to overactive bladder: developing a crosswalk for the IPSS to OAB-V8. Urology. 2019;125:73–8. Publisher’s Note Springer Nature remains neutral with regard 51. Peterson AC, Sehgal A, Crump RT, Baverstock R, Sutherland JM, Carlson K. Evaluating the 8-item overactive bladder to jurisdictional claims in published maps and institutional questionnaire (OAB-v8) using item response theory. Neurourol affiliations. Urodyn. 2018;37(3):1095–100. 52. Agarwal A, Eryuzlu LN, Cartwright R, Thorlund K, Tammela TLJ, Guyatt GH, et al. What is the most bothersome lower urinary tract symptom? Individual- and population-level perspectives for both men and women. Eur Urol. 2014;65(6):1211–7. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png "The AAPS Journal" Springer Journals

Item Response Theory Modeling of the International Prostate Symptom Score in Patients with Lower Urinary Tract Symptoms Associated with Benign Prostatic Hyperplasia

Loading next page...
 
/lp/springer-journals/item-response-theory-modeling-of-the-international-prostate-symptom-RX0IGY0DUi

References (86)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2020
eISSN
1550-7416
DOI
10.1208/s12248-020-00500-w
Publisher site
See Article on Publisher Site

Abstract

The AAPS Journal (2020) 22:115 DOI: 10.1208/s12248-020-00500-w Research Article Item Response Theory Modeling of the International Prostate Symptom Score in Patients with Lower Urinary Tract Symptoms Associated with Benign Prostatic Hyperplasia 1,2,3,4 1 2 Yassine Kamal Lyauk, Daniël M. Jonker, Trine Meldgaard Lund, 3 3 Andrew C. Hooker, and Mats O. Karlsson Received 29 June 2020; accepted 12 August 2020 Abstract. Item response theory (IRT) was used to characterize the time course of lower urinary tract symptoms due to benign prostatic hyperplasia (BPH-LUTS) measured by item- level International Prostate Symptom Scores (IPSS). The Fisher information content of IPSS items was determined and the power to detect a drug effect using the IRT approach was examined. Data from 403 patients with moderate-to-severe BPH-LUTS in a placebo- controlled phase II trial studying the effect of degarelix over 6 months were used for modeling. Three pharmacometric models were developed: a model for total IPSS, a unidimensional IRT model, and a bidimensional IRT model, the latter separating voiding and storage items. The population-level time course of BPH-LUTS in all models was described by initial improvement followed by worsening. In the unidimensional IRT model, the combined information content of IPSS voiding items represented 72% of the total information content, indicating that the voiding subscore may be more sensitive to changes in BPH-LUTS compared with the storage subscore. The pharmacometric models showed considerably higher power to detect a drug effect compared with a cross-sectional and while- on-treatment analysis of covariance, respectively. Compared with the sample size required to detect a drug effect at 80% power with the total IPSS model, a reduction of 5.9% and 11.7% was obtained with the unidimensional and bidimensional IPSS IRT model, respectively. Pharmacometric IRT analysis of the IPSS within BPH-LUTS may increase the precision and efficiency of treatment effect assessment, albeit to a more limited extent compared with applications in other therapeutic areas. KEY WORDS: item response theory; BPH; LUTS; International Prostate Symptom Score; pharmacometrics. INTRODUCTION urinary tract symptoms (LUTS) and are characterized by an increased: sensation of incomplete emptying of the bladder Benign prostate hyperplasia (BPH) is a common following urination, urination frequency, urination intermit- condition in the aging male and is estimated to affect 50% tency, urgency to urinate, weakness of the urinary stream, of males by age 60 years and 90% by age 85 years (1,2). straining to start urination, and nocturia. LUTS are associ- The clinical manifestations of BPH are known as lower ated with adverse health effects such as significantly diminished quality of life and depression, as well as impairment in activities of daily living (3–5). In approxi- Electronic supplementary material The online version of this article mately 10% of patients, the condition may lead to severe (https://doi.org/10.1208/s12248-020-00500-w) contains supplementary complications such as acute urinary retention, urosepsis, and material, which is available to authorized users. kidney failure (2,6). The severity of BPH-LUTS is com- Translational Medicine, Ferring Pharmaceuticals A/S, Kay Fiskers monly measured by the International Prostate Symptom Plads 11, 2300, Copenhagen, Denmark. Score (IPSS) (also known as the American Urological Department of Drug Design and Pharmacology, University of Association score) (7), which consists of seven questions Copenhagen, Copenhagen, Denmark. describing the severity of each of the clinical manifestations Department of Pharmaceutical Biosciences, Uppsala University, of LUTS. The IPSS questionnaire is considered the gold Uppsala, Sweden. standard measure for assessing BPH-LUTS, and its use is To whom correspondence should be addressed. (e–mail: widespread in the clinic, as a primary or secondary endpoint yassinekamallyauk@gmail.com; ysl@ferring.com; in clinical trials, and in urology research (8). yassine.lyauk@sund.ku.dk; yassine.lyauk@farmbio.uu.se) 1550-7416/20/0000-0001/0 2020 The Author(s) 115 Page 2 of 15 The AAPS Journal (2020) 22:115 Pairwise cross-sectional testing based on the summary Ferring Pharmaceuticals’ A/S trial CS36 (NCT00947882) was score mean change from baseline is the traditional pre- a phase II, double-blind, parallel-group, dose-finding study evalu- specified analysis for clinical trials using scale measures as the ating the efficacy and safety of degarelix over 6 months. Following a primary efficacy endpoint. However, analysis of clinical trial wash-out period, 403 patients were randomized to a single data through longitudinal pharmacometric modeling has been subcutaneous injection of 10, 20, or 30 mg degarelix 40 mg/mL shown to increase the power to detect a drug effect compared solution, or placebo and were required to have an IPSS ≥ 13 at with pairwise testing (9–11). Furthermore, an extension of screening 2 weeks prior to dosing at the baseline visit. The primary longitudinal pharmacometric modeling specific to multiple- endpoint was the mean change from baseline in IPSS compared item questionnaire data (9), which utilizes concepts derived with placebo 3 months after dosing. Visits were planned at 2 weeks, from item response theory (IRT), has identified the potential for and 1, 2, 3, 4, 5, and 6 months after dosing. Rich pharmacokinetic increased assessment precision in several therapeutic areas sampling (n = 15) was performed in 43 patients while sparse (n=2) (namely, Alzheimer’s disease, Parkinson’s disease, multiple pharmacokinetic sampling was performed in 240 patients. An sclerosis, and depression) (9,12–14). Moreover, the methodol- interim trial analysis was planned for 6 months post-dosing in order ogy has shown an increase in the power to detect a drug effect to stop the trial early if the primary endpoint was not met. Trial compared with longitudinal pharmacometric analysis of sum- CS36 was conducted in accordance with the Declaration of mary score data (9,15). Briefly, IRT quantifies the relationship Helsinki and Good Clinical Practice. between an individual’s intrinsic trait (e.g., disability) and the probability of answering a questionnaire (e.g., IPSS) in a Item Response Theory Modeling particular way (16,17). By preserving the information contained within responses to individual items, it is possible to estimate an The score for each of the seven IPSS items may range individual’s latent disability, how well items discriminate be- from zero to five. The relationship between disability and the tween individuals with differing estimates of latent disability, probability (P) of a patient answering a score of at least k was and the location of item responses along the disability scale. therefore modeled through a graded response model (19): The GnRH receptor antagonist, degarelix, approved for the treatment of advanced prostate cancer (Firmagon®), was a ψ −b jðÞ i j:k investigated as an alternative medical approach for the treat- PY ≥k ¼ ij a ψ −b jðÞ i j:k ment of moderate-to-severe BPH-LUTS in patients without 1 þ e prostate cancer. Due to its depot formation upon administration, functioning as a slow-release formulation, treatment with where Y represents the score of patient i on item j, a the degarelix was envisioned to achieve greater compliance and ij j slope/discrimination parameter of item j, ψ the unobserved effectiveness compared with currently approved treatments i disability of patient i, and b the difficulty parameter of item j. requiring daily administration. The degarelix doses tested within j Cumulative probabilities for an item with a score of maximum BPH-LUTS were substantially lower than the approved doses 5 were modeled as follows: used for treating prostate cancer (a loading dose of 240 mg followed by maintenance doses of 80 mg) to avoid eliciting PY ¼ 0 ¼ 1−PY ≥1 ij ij prolonged testosterone suppression in patients. PY ¼ k ¼ PY ≥k −PY ≥k þ 1 ij ij ij To date, only one publication describes longitudinal model– PY ¼ 5 ¼ PY ≥5 ij ij based analysis of the total IPSS (18) and, moreover, longitudinal pharmacometric IRT modeling has not been applied to the analysis of the IPSS within BPH-LUTS. Using data from 403 patients in a Item characteristic curves (ICCs) were estimated as fixed phase II trial investigating the treatment of moderate-to-severe effects by treating IPSS measurements from each patient’s BPH-LUTS with degarelix over 6 months, we set out to (i) study visit as originating from a separate individual (in this characterize the internal characteristics of the IPSS through IRT work referred to as the IDVIS approach). Disability was analysis of the item-level data, (ii) utilize the obtained IRT estimated as a random effect, and its distribution was fixed to information to develop pharmacometric IRT models describing a standard normal distribution (mean 0 and variance 1) at the time course of underlying BPH-LUTS, and (iii) examine the baseline. Post-baseline shift parameters were included to power to detect a drug effect of pharmacometric IRT IPSS allow for a different mean and variance of disability post- modeling compared with cross-sectional testing and longitudinal baseline (where disability is likely to have changed compared modeling, respectively, based on total IPSS. with baseline due to placebo and/or drug effects). A similar ICC estimation approach has been reported previously in the METHODS literature (13,14,20,21). Factor analysis (FA) is an established statistical method Data (22) for assessing item patterns and informing the item structure of IRT models (23). The procedure is aimed at The IPSS is a seven-item questionnaire, where each item explaining the interrelationship between many observed can be scored from 0 to 5, yielding a composite IPSS ranging variables by way of few latent variables and is based on from zero to 35. Item scores reflect symptom frequency (not analysis of the between-item correlation matrix. It may be at all, less than 1 in five times, less than half the time, about used to identify the number of questionnaire domains and half the time, more than half the time, and almost always) identify which items correspond to each of these (exploratory except for the nocturia item, where they correspond to FA) or to investigate the item patterns with a pre-specified categorized counts (0 to ≥ 5 awakenings). number of factors (confirmatory FA). Lastly, it may also The AAPS Journal (2020) 22:115 115 Page 3 of 15 inform whether the assumption of only one general dimen- performed in the context of unidimensional IRT modeling. sion for all items is supported (24). In the current work, a This allows for an overall perspective across all IPSS items unidimensional IRT model was first fit to the CS36 data, and while in the multidimensional IRT framework, it is only the adequacy of the unidimensionality assumption was feasible within each separate dimension. assessed based on the item factor loadings. The latter indicate an item’s correlation with the factor, where higher absolute Structural Longitudinal Modeling values suggest closer association. Following development of the unidimensional IRT model, confirmatory FA with two For underlying disability in the context of IRT as well dimensions (a minimum of three items per dimension is as observed total IPSS, a similar approach to longitudinal needed to preserve model identification) and varimax orthog- model development was undertaken. First, data from onal rotation (25) was used to inform the item structure of a patients randomized to the placebo group were modeled. bidimensional IRT model. In the developed IRT ICC models, Here, different structural models were tested to best residual correlation between items was also assessed and was describe the time course of the placebo effect, such as calculated as follows: linear, bi-linear, power, exponential, Weibull, Gompertz, RES ¼ DV −E and inverse Bateman models. The addition of a linear ij ij ij E ¼ PðÞ 1 1 þ PðÞ 2 2 þ PðÞ 3 3 þ PðÞ 4 4 þ PðÞ 5 5 drift parameter (27) to describe worsening or continued ij ∗ ∗ ∗ ∗ ∗ improvement was tested for all abovementioned models. Subsequently, data from patients assigned to degarelix with DV being the observed score from the ith individual for ij treatment were added to the data set to describe the drug the jth IPSS item and E being the corresponding weighted ij effect. In this step, we investigated models describing prediction based on the IRT-derived ICCs and individual degarelix treatment effects as present or absent, indepen- disability estimates. dent of the administered dose, as well as dose-response models (linear and Emax). An offset treatment effect, as well as onset treatment effects to describe time delays in Pharmacometric Implementation of Item Response Theory reaching the full response (linear, exponential, slope- intercept models), was investigated. Normally and log- Following the IRT ICC estimation step, the resulting normally distributed between-subject variability was inves- knowledge was incorporated into a pharmacometric framework. tigated for all parameters. For the total IPSS model, First, the original individual assignment was reconciled with the additive, proportional, and combined error models were data (i.e., longitudinal observations were restored for each patient), investigated to describe residual variability. and IRT-derived latent disability estimates were modeled longitu- dinally as the dependent variable. Uncertainty in the Empirical Bayes Estimates (EBEs) of latent disability was taken into account Covariate Analysis through an additional additive residual error model term, similar to the IPPSE (individual PK parameters with standard errors) Investigated baseline covariates consisted of demo- approach in sequential PK/PD modeling (26) (we here name it graphics (age, weight, and body mass index), physiological the PSI-IPPSE approach). Schindler et al. previously proposed a disease-specific measures (total prostate volume, serum similar approach (20) but without standard errors. Secondly and testosterone, prostate-specific antigen, average flow rate, lastly, the IRT ICC estimation model and the final longitudinal flow time including time to maximum flow, maximum latent disability model from the PSI-IPPSE step were combined urine flow, post-void residual volume, voiding time, and into a single model to allow translation of latent disability to voiding volume), validated disease-specific patient-re- observed IPSS at the item and summary level, respectively. In the ported outcome (quality of life (QoL) score, BPH Impact latter model, the impact of re-estimating only the longitudinal Index (BII) score), and study site region (North America parameters, as well as the simultaneous estimation of ICCs and or Europe). Baseline IPSS was tested as a covariate on longitudinal parameters, was examined. the drug effect parameter during longitudinal IPSS model- ing. Lastly, individual degarelix area under the curve Calculation of Fisher Information Content (AUC ) estimates derived from application of a previ- 0-∞ ously developed population pharmacokinetic model (28) To investigate which IPSS items carry the most informa- to the CS36 trial pharmacokinetic data were investigated tion (i.e., the signal-to-noise ratio in determining patients’ as a predictor of treatment effect variability, both as a latent disability) and where on the disability scale they are continuous value and binned by quartile. most informative, the Fisher information content of each Covariate analysis was performed by way of a IPSS item was calculated as the negative expectation of the stepwise search at a significance level of 0.01 in the second derivative of the log-likelihood using the unidimen- forward inclusion step and 0.001 in the backward elimi- sional IRT ICC estimation model. The information functions nation step. Linear relationships were investigated for were visualized to illustrate the sensitivity of each IPSS item covariates. A multiplicative covariate model (Eq. 1) was over the full disability range. Individual items were ranked used to test continuous covariates on parameters except in according to the amount of information they contained the case of parameters liable to assume a typical value (θ) relative to the total information based on each item’s of zero (e.g., baseline disability in longitudinal IRT calculated area under the curve within this study’s estimated modeling), where an additive covariate model was used disability range. Information content assessment was (Eq. 2) 115 Page 4 of 15 The AAPS Journal (2020) 22:115 Power Calculations Parameter ¼ θ Parameter A stochastic simulation and estimation (SSE) procedure with ðÞ 1 þ θ ðÞ Covariate−Covariate ∗ Covariate median 1000 samples was used to assess the 80% power to detect a drug ð1Þ effect at a 5% level of significance. The model with the lowest AIC among the two developed longitudinal IRT models (unidimen- sional and bidimensional) was chosen as the simulation model. For simplicity, the Monte Carlo simulations assumed no missing Parameter ¼ θ individual IPSS item scores and no drop-out over the 6-month Parameter period. Power curves were generated by estimating the power of þ θ ðÞ Covariate−Covariate ð2Þ Covariate median the models at four different sample sizes, which were informed by an initial exploratory Monte Carlo Mapped Power (MCMP) (30) procedure. In the pharmacometric models, the actual type I error level and corresponding empirically derived ΔOFV was estimated by simulating 1000 trials with no drug effect at each sample size, similar to Wählby et al. (31). The power of two different analysis of Model Evaluation and Diagnostics covariance (ANCOVA) tests was determined using the same simulated data sets on which the power of the pharmacometric Non-covariate–related model selection was based on models was estimated. Both analyses included treatment as factor several criteria: for hierarchal models, the difference in and baseline summary IPSS as a covariate. The first ANCOVA objective function value (OFV) corresponding to a signifi- used cross-sectional data, regarding only the change from baseline cance level of 0.05 was considered statistically significant at 3 months post-dose, which was the landmark time point in the assuming a χ distribution while for non-nested models, the CS36 trial. This type of analysis is commonly pre-specified as the difference in Akaike information criterion (AIC) was used. main analysis of clinical trials. In the second ANCOVA, the Moreover, model stability based on the convergence of average summary IPSS change from baseline during the entire minimization and covariance steps, parameter precision treatment period was considered the dependent variable, which is assessed through NONMEM’s relative standard error esti- known as the “while on treatment” (WOT) strategy/estimand (32). mate, and graphical diagnostics were also considered during At each sample size, power was determined as the proportion of model selection. analyses that identified a statistically significant (p < 0.05) treatment Visual predictive checks (VPCs) of the longitudinal IPSS, as effect. well as the change in IPSS from baseline stratified by treatment arm using 200 samples, were used to assess the adequacy of the model characterization of the observed IPSS data. Software In the IRT analyses, the goodness of fit of ICCs was assessed using a novel sampling-based cross-validated generalized additive The Laplacian method in NONMEM version 7.4.3 (33) model (GAM) cubic spline smooth, which builds upon the was used for IRT ICC estimation and final longitudinal IRT commonly used GAM smooth diagnostic (21). As for all modeling, while the first-order conditional estimation with pharmacometric model diagnostics, EBE-based visual representa- interaction was used for longitudinal IPSS modeling as well as tions may be misleading due to η-shrinkage (29). In this particular intermediate longitudinal IRT modeling of EBEs of disability. diagnostic, EBE-shrinkage can cause an adequate model to appear The mIRT R-package (34) version 1.32.0 was used to obtain inadequate, in particular at extreme disability values. In order to initial estimates for the ICCs and to perform factor analysis as counteract the potential effects of η-shrinkage of disability EBEs well as multidimensional IRT model exploration. ICC diag- on the GAM smooth diagnostic, an approach was developed nostics were obtained using R version 4.6.0. Simulation-based utilizing random sampling from the individual posterior η distribu- model diagnostics for the longitudinal models were obtained tions from the final ICC estimation model uncertainty estimate of using Perl-Speaks-NONMEM (35) (PsN) version 4.9.0. EBEs (Fisher information assessed variance or conditional vari- ance). Two hundred η samples were drawn randomly, assuming normal distributions with mean individual posterior η estimate and RESULTS variance individual η Fisher information assessed variance. Dis- ability estimates were subsequently calculated for each generated η Table I shows the subject characteristics at baseline. In total, while respecting the baseline or post-baseline IDVIS origin of η, 3117 summary IPSS and 21,836 item-level IPSS responses from 403 using the estimated fixed-effects post-baseline shift parameters. patients were available for analysis. The distribution of responses is Similar to the traditional IRT GAM diagnostic, GAM smooths shown in Supplemental Fig. S1. Three hundred and sixty-nine of were applied to the data (one for each unique item–difficulty the 403 randomized patients completed the 6-month treatment category combination). To adjust for the difference between the period. Figure 1 shows the mean summary IPSS time course in number of sampling-generated and number of actual study– each trial arm as well as the distribution of responses for each IPSS derived disability estimates, the 95% confidence interval of the item. A marked drop in total IPSS was observed in all treatment GAM smooths was adjusted by multiplying the computed standard arms following dosing, and there was a similar distribution of item- error with the square root of the number of generated η samples. level IPSS responses at the three key trial visits (baseline, the To diagnose the final longitudinal IRT model, VPCs were landmark time point, and end-of-trial) in both the placebo arm and generated for both item-level IPSS observations and summary the pooled treatment arms. From Fig. 1, there was no apparent IPSS scores using 2000 Monte Carlo simulations. dose-response for the effect of degarelix on the IPSS. The AAPS Journal (2020) 22:115 115 Page 5 of 15 Table I. Baseline Demographic and International Prostate Symptom Score (IPSS) Characteristics in Clinical Trial CS36 Variable Placebo Degarelix 10 mg Degarelix 20 mg Degarelix 30 mg Number of patients 98 101 99 105 Age in years (median [range]) 65.0 [50.0, 86.0] 65.0 [50.0, 81.0] 66.0 [52.0, 82.0] 65.0 [50.0, 87.0] Body weight in kg (median [range]) 86.4 [60.0, 128.0] 87.0 [54.1, 126.2] 85.0 [57.0, 141.2] 84.0 [55.0, 183.8] Body mass index in kg/m/m (median [range]) 28.5 [20.1, 40.2] 27.8 [18.9, 40.5] 27.7 [21.4, 38.9] 27.7 [19.8, 58.1] Total IPSS (median [range]) 18.0 [13.0, 33.0] 18.0 [11.0, 33.0] 19.0 [13.0, 33.0] 19.0 [13.0, 35.0] IPSS storage subscore (median [range]) 8.0 [3.0, 15.0] 8.0 [3.0, 15.0] 8.0 [4.0, 15.0] 8.0 [2.0, 15.0] IPSS voiding subscore (median [range]) 10.0 [4.0, 20.0] 11.0 [0.0, 20.0] 11.0 [3.0, 20.0] 11.0 [4.0, 20.0] Quality of life score (median [range]) 4.0 [2.0, 6.0] 4.0 [1.0, 6.0] 4.0 [2.0, 6.0] 4.0 [3.0, 6.0] BPH Impact Index score (median [range]) 7.0 [0.0, 13.0] 7.0 [0.0, 12.0] 7.0 [0.0, 12.0] 7.0 [0.0, 12.0] Voided volume in mL (median [range]) 175.5 [77.0, 466.0] 188.1 [125.0, 632.0] 185.0 [57.0, 505.0] 186.0 [106.4, 484.0] Voiding time in s (median [range]) 37.0 [19.0, 121.0] 40.0 [21.0, 128.0] 42.0 [15.0, 112.0] 39.0 [20.6, 344.5] Post void residual volume in mL (median [range]) 39.1 [0.0, 230.0] 50.5 [0.0, 246.6] 45.0 [0.0, 189.0] 56.3 [0.0, 999.0] Average flow rate in mL/s (median [range]) 5.0 [2.6, 10.4] 5.0 [2.6, 9.5] 5.3 [2.7, 10.6] 5.0 [2.3, 8.5] Maximum urine flow in mL/s (median [range]) 10.0 [4.6, 16.4] 10.0 [4.4, 19.2] 10.0 [5.4, 50.0] 9.9 [5.1, 16.0] Flow time including time to maximum flow 33.0 [18.0, 113.0] 36.0 [20.0, 120.0] 37.4 [13.0, 101.0] 37.0 [20.6, 100.4] in s (median [range]) Total prostate volume in mL (median [range]) 39.1 [16.8, 102.0] 38.4 [14.2, 128.0] 38.3 [17.0, 155.7] 36.1 [9.8, 135.9] Prostate specific antigen in ng/mL (median [range]) 2.0 [0.2, 9.6] 1.8 [0.1, 9.0] 2.3 [0.3, 9.6] 1.8 [0.3, 7.8] Serum testosterone in ng/mL (median [range]) 4.1 [1.0, 10.2] 4.3 [0.2, 13.6] 4.3 [2.0, 8.0] 4.3 [0.6, 12.2] Region North America (N, %) 57 (58.2) 60 (59.4) 60 (60.6) 63 (60.0) Region Europe (N, %) 41 (41.8) 41 (40.6) 39 (39.4) 42 (40.0) Both the traditional cross-validated cubic spline GAM Item Response Theory Analysis smooth and the sampling-based extension of the latter indicated that the estimated ICCs described the data ade- The unidimensional IRT model had high (> 0.6) item quately (Fig. 3). Better model agreement was observed with factor loadings except for the nocturia item, which had a the sampling-based GAM smooth compared with the tradi- modest factor loading value of 0.39, suggesting adequacy of tional method, although low typical η-shrinkage (SD-based) the unidimensionality assumption. Factor analysis with two (9.6%) and low individual shrinkage variability (95% CI dimensions identified items relating to voiding (the emptying, 9.6% to 9.9%, range 6.3% to 42.0%) was observed. intermittency, weak stream, and straining IPSS items) and Total IPSS spanning the entirety of the scale were storage (the frequency, urgency, and nocturia IPSS items) symptoms, respectively, as belonging to separate dimensions, observed in the CS36 data and high correlation (r = 0.95) informing the development of a bidimensional IRT model with estimated IRT disability was observed (Fig. 4a). (item factor loading values are shown in Supplemental However, for a given summary IPSS value, there exists a Table S1). wide range of underlying disability, most evident for moderate BPH-LUTS (8 ≤ IPSS ≤ 19). Moreover, Fig. 4b illustrates that the minimal detectable decrease (MDD) of three IPSS points (36,37) corresponds to a wide range of Unidimensional Item Characteristic Curve Estimation Model decreases in latent disability. In turn, there is a notable In the unidimensional IRT ICC estimation model, 44 overlap between the latter disability improvements and those parameters (35 difficulty parameters, 7 discrimination param- corresponding to observed improvements below the MDD (− eters, and 2 post-baseline shift disability parameters) were 3< ΔIPSS < 0), no observed change (ΔIPSS = 0), and to a estimated with low uncertainty in order to characterize the small extent observed worsening (ΔIPSS > 0). Lastly, the ICCs (Table II). The incomplete emptying IPSS item had the threshold commonly used to determine clinical progression highest discrimination parameter value (1.38); i.e., it is more (ΔIPSS ≥ 4) (37–40) corresponds to no change or increases in underlying disability. sensitive to changes in disability around the difficulty As shown in Table III, the most informative IPSS item parameter of each score. The nocturia item had the lowest was incomplete emptying (23.8% of total information), discrimination parameter value (0.49), indicating that a large closely followed by intermittency (20.8% of total informa- increase in disability gives a relatively small increase in tion). These items can determine patients’ disability more probability of increased score. The ICCs of each IPSS item precisely relative to the other IPSS items. The nocturia item are illustrated in Fig. 2 and show expected scores larger than was found to contain the least information (3.4%), which is in zero for individuals with low disability (< − 4) for all items, line with this item having the lowest discrimination parameter most notably for the frequency, weak stream, and nocturia value (Table II). Of note, the IPSS voiding items (incomplete items. For the nocturia item, individuals with a low disability emptying, intermittency, weak stream, and straining) com- estimate are predominantly expected to score higher than 0, bined carried 72% of the total information while IPSS storage indicating that the vast majority of patients will answer that items (frequency, urgency, and nocturia) combined only they get up to urinate at least once every night. 115 Page 6 of 15 The AAPS Journal (2020) 22:115 Fig. 1. The mean International Prostate Symptom Score (IPSS) in each CS36 trial arm along with the standard error of the mean at each visit. The distribution of item-level IPSS at the baseline visit, landmark time point (3 months post-dose), and end of trial (6 months post-dose) is shown for the placebo arm as well as the pooled degarelix dose arms contained 28% of the total information. A visual representa- IRT model. All three developed models adequately described tion of the Fisher information curves for each item is shown in the data as illustrated by VPCs (Supplemental Figs. S9, S10, Supplemental Fig. S2. S11, S12, and S13). The time course of IPSS and latent disability in the summary score and unidimensional IRT model, respectively, Bidimensional Item Characteristic Curve Estimation Model were described according to In the bidimensional IRT ICC estimation model, 47 IPSS or Disability ¼ Baseline þ Placebo þ Drug parameters were estimated with low uncertainty (35 difficulty parameters, 7 discrimination parameters, two sets of post- baseline shift disability parameters, and a correlation term where Baseline is the estimated baseline, Drug is the offset between latent variables) using Cholesky decomposition (to degarelix treatment effect, and Placebo is the placebo effect estimate the correlation between the latent variables fixed to described by 1). The bidimensional ICC estimation model had a 407.5 lower OFV than the unidimensional ICC estimation model, lnðÞ 2 − Time Tprog Placebo ¼ Pmax 1−e þ Drift Time and its IRT parameter estimates and ICCs are presented in Table II and visually represented in Supplemental Figs. S3 and S4, respectively. Estimated ICCs adequately described where Pmax is the maximal placebo effect, Tprog is the half- the data as shown in Supplemental Figs. S5 and S6. Typical η- shrinkage was 10% (individual shrinkage 95% CI 9.8% to life to reach Pmax, and Drift describes worsening or 10%, range 6.9% to 38.6%) and 13% (individual shrinkage continued improvement.In the bidimensional IRT model, 95% CI 13.6% to 13.8%, range 9.8% to 38.8%) in the voiding the placebo effect in each dimension was described using a and storage dimension, respectively. Weibull function The residual correlation between items in the two WEI respective developed IRT ICC estimation models is shown lnðÞ 2 − *Time ðÞ Tprog Placebo ¼ Pmax 1−e þ Drift Time in Supplemental Figs. S7 and S8. Longitudinal Models where WEI is the Weibull exponent. Separate offset drug effects were estimated on each of the two latent variable Three longitudinal models were developed: a total score scales. model, a unidimensional IRT model, and a bidimensional The AAPS Journal (2020) 22:115 115 Page 7 of 15 Table II. Item Characteristic Curve (ICC) Parameter Estimates in the (a) Unidimensional and (b) Bidimensional Item Response Theory (IRT) models ab Unidimensional model Bidimensional model Parameter Estimate Relative standard error (%) Estimate Relative standard error (%) IRT ICC parameters a 1.38 7.0 1.6 7.6 b − 4.09 5.9 − 3.4 7.2 1,1 b 1.82 7.4 1.56 8.1 1,2 b 1.68 6.7 1.44 7.4 1,3 b 1.41 6.8 1.2 7.6 1,4 b 1.27 8.0 1.09 8.5 1,5 a 0.98 7.0 1.4 8.5 b − 5.39 6.0 − 4.83 7.4 2,1 b 2.64 7.5 2.24 8.3 2,2 b 2.04 6.7 1.8 7.8 2,3 b 1.49 7.1 1.3 8.2 2,4 b 1.55 7.8 1.3 8.2 2,5 a 1.29 7.7 1.68 8.2 b − 3.77 6.0 − 3.03 7.4 3,1 b 1.8 7.4 1.48 8.0 3,2 b 1.6 7.1 1.32 7.7 3,3 b 1.08 7.5 0.88 8.0 3,4 b 1.34 8.1 1.1 8.4 3,5 a 0.92 6.7 1.16 8.0 b − 3.86 5.6 − 3.65 7.3 4,1 b 2.09 6.8 1.88 8.1 4,2 b 1.68 6.6 1.55 7.7 4,3 b 1.22 7.2 1.12 8.0 4,4 b 1.42 7.7 1.27 8.7 4,5 a 1.09 7.2 1.36 7.7 b − 5.11 6.3 − 4.16 7.3 5,1 b 2.31 7.8 1.9 8.3 5,2 b 1.69 7.0 1.4 7.7 5,3 b 1.32 7.1 1.09 7.7 5,4 b 1.12 7.5 0.93 8.1 5,5 a 0.95 7.8 1.25 8.2 b − 3.1 6.1 − 2.46 7.5 6,1 b 1.72 7.7 1.38 8.2 6,2 b 1.68 7.5 1.35 8.1 6,3 b 1.67 9.8 1.34 8.3 6,4 b 1.67 8.4 1.34 10.1 6,5 a 0.49 8.4 0.601 8.5 b − 7.89 7.5 − 6.93 7.7 7,1 b 5.19 8.7 4.4 8.5 7,2 b 3.52 8.1 3.04 8.2 7,3 b 2.44 8.9 2.09 8.9 7,4 b 2.1 10.5 1.77 10.2 7,5 Post-baseline shift parameters Mean latent variable dimension 1 − 1.38 6.1 − 1.07 8.8 Variance latent variable dimension 1 2.22 6.4 1.61 7.3 Mean latent variable dimension 2 - - − 1.40 8.5 Variance latent variable dimension 2 – 2.4 7.4 Correlation between dimensions - - 69.1 3.6 a is the discrimination parameter for item i; b is the difficulty parameter for item i and category k. In the bidimensional model, dimension 1 i i,k (voiding) consists of items 1, 3, 5, and 6 while dimension 2 (storage) includes items 2, 4, and 7. At baseline, the latent variable(s) was fixed to N(0, 1) while the mean and variance of the latent variable(s) was estimated for post-baseline data (IDVIS approach) Item #1: “Incomplete Emptying”; Item #2: “Frequency”; Item #3: “Intermittency”; Item #4: “Urgency”; Item #5: “Weak Stream”, Item #6: “Straining”, Item #7: “Nocturia” 115 Page 8 of 15 The AAPS Journal (2020) 22:115 Fig. 2. Item characteristic curves for each International Prostate Symptom Score item in the unidimensional item response theory model Final longitudinal model parameter estimates for the the item-level and summary-level IPSS (data not shown). total IPSS and unidimensional IRT model, along with their Simultaneous re-estimation of ICCs and longitudinal precision, are shown in Table IV. The lowest OFV and best parameters (estimates shown in Supplemental Table S2) goodness of fit were achieved by specifying log-normally yielded an OFV decrease of 11 points compared with the distributed inter-individual variability (IIV) for Baseline fixed ICC longitudinal unidimensional IRT model. This IPSS and Tprog and normally distributed IIV for Pmax , and was deemed insignificant, and hence, the longitudinal IPSS IPSS Drift . In longitudinal latent disability modeling, log- unidimensional IRT model with fixed ICCs and estimated IPSS normal IIV was specified for Tprog , while normal longitudinal parameters was kept as the final model. In Disability distributions were specified for Baseline , Pmax , the latter, covariate relationships found to be significant Disability Disability and Drift . The typical value of Drift was fixed to zero, using the PSI-IPPSE method underwent an additional Disability and no significant changes in OFV were observed by doing backward eliminationstep(<0.001)toconfirm their so. The addition of IIV on Drug was not feasible in neither significance. All covariates remained statistically significant longitudinal IPSS nor latent disability modeling, as it yielded in the full model. Lastly, Box-Cox transformation of the no significant OFV decrease and a variance close to zero, Baseline and Drift IIV distributions in both models indicating that placebo and drug effect variability could not resulted in significant drops in OFV. However, in longitu- be distinguished in the current data. Incorporation of the dinal unidimensional IRT modeling, the Box-Cox shape offset drug effect into the total IPSS model, unidimensional parameter had a high relative standard error (> 400%) IRT model, and bidimensional IRT model gave an OFV and was therefore ultimately not included as part of the reduction of 22.1 (df = 1), 20.3 (df = 1), and 42.5 (df = 2), final model. respectively, compared with the respective models without an During longitudinal bidimensional IRT modeling, high estimated drug effect. No dose-response or exposure- correlation (≥ 96%) was observed between the Tprog IIV response using AUC as the exposure metric was observed and Pmax IIV components for each dimension, which 0-∞ on the IPSS and latent disability scale, respectively. affected model stability. These IIV parameters were hence In the longitudinal the total IPSS and unidimensional collapsed into a single common parameter across the two IRT model, covariates were tested on the Base, Pmax, dimensions. The typical value of the Weibull exponent was and Drug parameters. Significant covariates (p < 0.001) on also estimated to be the same in both dimensions due to Baseline in both models consisted of the baseline BII model stability. As per the unidimensional IRT model, score, baseline QoL score, and study region, while longitudinal parameters were re-estimated in the final longi- baseline QoL score was included on Pmax tudinal bidimensional IRT model. The final model minimized IPSS (Table IV). Due to the long runtime of the longitudinal successfully and its parameter estimates are shown in Table V. full ICC model, covariates were identified using the It was not possible to obtain parameter precision estimates, longitudinal PSI-IPPSE approach and were subsequently include covariates, or simultaneously estimate ICCs and incorporated into the full longitudinal ICC model. Re- longitudinal parameters due to convergence and stability estimation of the longitudinal parameters in the latter issues. The final bidimensional longitudinal IRT model yielded an OFV decrease of approximately 130 points, adequately described both summary and item level data and substantially better fit was observed in the VPCs of (Supplemental Figs. S12 and S13, respectively). The AAPS Journal (2020) 22:115 115 Page 9 of 15 Fig. 3. The International Prostate Symptom Score (IPSS) item characteristic curve fits in the unidimensional item response theory model for the cumulative probabilities (red lines) along with cross-validated cubic spline generalized additive model (GAM) smooth (green area) and η sampling-based cross-validated cubic spline GAM smooth using 200 samples (blue area) Fig. 4. a Observed International Prostate Symptom Scores (IPSS) vs. item response theory disability estimates from the unidimensional item response theory model based on 3117 separate measurements from 403 patients over the 6-month trial period. b Observed change from baseline in International Prostate Symptom Scores (IPSS) vs. change from the baseline of item response theory disability from the unidimensional item response theory model in 403 patients over the 6-month trial period. MDD minimally detectable difference 115 Page 10 of 15 The AAPS Journal (2020) 22:115 Table III. Fisher Information Content Ranking of International Prostate Symptom Score (IPSS) Items Based on the Unidimensional Item Response Theory Model IPSS item Item subscore category % of total Fisher information Cumulative % total Q1: Incomplete Emptying Voiding 23.8 23.8 Q3: Intermittency Voiding 20.8 44.6 Q5: Weak Stream Voiding 15.4 60 Q2: Frequency Storage 13.1 73.1 Q6: Straining Voiding 11.8 84.9 Q4: Urgency Storage 11.6 96.5 Q7: Nocturia Storage 3.4 99.9 Power of Testing and Model-Based Methods additional SSE procedure confirmed this finding, using the unidimensional IRT model as simulation model (data not The bidimensional IRT model was used as the shown). The bidimensional IRT model provided the simulation model in the SSE procedure as it provided a highest power to detect a drug effect, allowing for a total lower AIC value (59,086.3) compared with the unidimen- trial sample of approximately N = 106 to reach 80% power sional IRT model (AIC value of 61,622.6). The resulting compared with the total IPSS and unidimensional IRT power curves are shown in Fig. 5. The pharmacometric models. The type 1 error of each model under each models all provided considerably higher power to detect a sample size and empirically derived OFV cut-off in the drug effect compared with the cross-sectional ANCOVA SSE procedure is presented in the Supplemental Table S3. as well as the WOT ANCOVA. The unidimensional IRT Only model runs that minimized successfully were used in model yielded slightly higher power (approximately N = the calculation of power (on average ~ 80% of full- 113 to reach 80% power) compared with the total IPSS reduced bidimensional model pairs and ~ 90% of unidi- model (approximately N = 120 to reach 80% power). An mensional and total IPSS model pairs, respectively). Table IV. Longitudinal model parameter estimates. IPSS: summary International Prostate Symptom Score, IRT: Item response theory. Relative standard errors were obtained in NONMEM IPSS model Unidimensional IRT model Parameter Value Relative standard error (%) Value Relative standard error (%) Baseline 19.6 1.7 0.0283 146.3 Pmax (maximal placebo response) − 4.12 9.9 − 1.03 10.9 Tprog (placebo half-life) 15.3 18.8 12.3 20.5 Drug effect − 1.98 19.2 − 0.542 20.3 Baseline Box-Cox shape 1.87 41.7 0.373 25.4 Drift Box-Cox shape 39.3 47.6 - - Covariates Baseline QoL on Pmax 0.208 13.2 - - Baseline BII on Baseline 0.0211 19.6 0.121 17.9 Baseline QoL on Baseline 0.0873 12.7 0.325 17.4 Region on Baseline − 0.0803 26 − 0.338 24.1 Interindividual variability (IIV) IIV Baseline 13.7% 8.3 75.9% 7.7 IIV Pmax 121.7% 15.4 128.5% 15.4 IIV Drift 1.8% 19.4 0.7% 8.8 IIV Tprog 90.6% 12 52.4% 9.9 IIV Baseline-Pmax correlation - - 1.7% IIV Baseline-Drift correlation - - 9.2% IIV Pmax-Drift correlation 43.1% 34% Residual error Proportional residual error 10.9% 8.9 Additive residual error 189.2% 6.7 The AAPS Journal (2020) 22:115 115 Page 11 of 15 DISCUSSION Table V. Parameter estimates for the longitudinal bidimensional item response theory model Item Response Theory Analysis Parameter Value The current paper presents the first reported IRTanalyses of Baseline (voiding scale) − 0.0251 V the IPSS and longitudinal pharmacometric IRT model within Baseline (storage scale) − 0.0667 S BPH-LUTS. Both a unidimensional and a bidimensional IPSS Pmax (maximal placebo response voiding scale) − 0.75 IRT model were developed based on factor analyses, the latter Pmax (maximal placebo response storage scale) − 0.845 further confirming previous findings (41,42). Tprog (placebo half-life voiding scale) 12.9 In the unidimensional IRT model, the vast majority of Tprog (placebo half-life storage scale) 13.4 the total information content was contained in IPSS voiding Weibull shape parameter (common for both scales) 1.53 items and this finding is supported by a principal component Drug effect voiding scale − 0.488 analysis showing total IPSS being predicted by improvement Drug effect storage scale − 0.749 in voiding symptoms rather than storage symptoms (43). Interindividual variability (IIV) Subscore analysis, i.e., distinguishing treatment effects on the IIV Baseline (voiding scale) 97.3% IPSS voiding and storage subscores in addition to the total IIV Baseline (storage scale) 128.8% IPSS, is routinely performed as a secondary statistical analysis IIV Baseline -Baseline correlation 26% v S IIV Pmax (common for both scales) 145.6% of clinical trials within BPH-LUTS, although its clinical IIV Tprog (common for both scales) 61.1% meaningfulness has not been established (42,44,45). The IIV Drift (common for both scales) 0.6% current results suggest that the IPSS voiding subscore is more IIV Pmax-Drift correlation 40% sensitive in assessing a patient’s BPH-LUTS in comparison with the storage subscore and may therefore also be better suited for detecting symptomatic drug effects. It is however to be noted that the most favorable signal-to-noise ratio will be obtained by regarding all available data and acknowledging the information contribution of individual items as opposed to considering the composite (sub)score(s), as exampled by pharmacometric IRT in Parkinson’s disease (15). Fig. 5. Power curves for the pharmacometric models obtained using a type I error corrected stochastic simulation and estimation procedure. One thousand simulated data sets from the bidimensional item response theory model at sample sizes of 33, 66, 99, and 137 patients were used for model estimation with the respective full (with a drug effect parameter) and reduced (without a drug effect parameter) models. Vertical lines indicate the 95% confidence interval for the calculated power estimates 115 Page 12 of 15 The AAPS Journal (2020) 22:115 The incomplete emptying item was found to be the most effect although three different drug doses (10 mg, 20 mg, and informative. This item has previously been found to be associated 30 mg) were included in the analyzed trial. Lack of observed dose- with worsening of both voiding and storage symptoms (46). response and exposure-response relationships may be explained by Incomplete emptying had the highest discrimination parameter the narrow dose range studied in the current trial. Including at least value (1.38) in the unidimensional IRT model; however, com- four active doses spanning an at least 10-fold range has previously pared with other reported unidimensional IRT analyses in been emphasized to characterize dose-exposure-response ade- different therapeutic areas, this is relatively low (e.g., the highest quately (57). In the current trial, the width of the dose range was discrimination parameter value was 3.35 in the ADAS-cog IRT restricted due to the expectation of an increase in the incidence of analysis (9) and 3.5 in the EDSS IRT analysis (12)). This may prolonged testosterone suppression at higher doses of degarelix. indicate that BPH-LUTS is a diffuse and heterogeneous disease, Further discussion regarding longitudinal modeling and covariate and consequently, IPSS items have difficulty in discriminating analysis results are presented in the Supplemental Discussion. between different levels of disability. The longitudinal bidimensional IRT model allowed for The nocturia item was found to be the least informative, and estimation of a differential drug effect on voiding and storage several reports in the literature support this. Firstly, the item may IPSS symptoms, while preserving item-level information. This not be sufficiently specific to BPH-LUTS; the primary cause of approach may be more in line with the different effects of adult nocturnal polyuria has been attributed to the decline in therapy on the primary pathophysiologies behind voiding and nocturnal secretion of antidiuretic hormone due to aging (47,48) storage symptoms (58,59). Limitations of the pharmacometric as opposed to being a direct consequence of BPH. The nocturia bidimensional model included lack of longitudinal parameter item was also the least specific in Japanese men with BPH and a precision estimates and inability to include covariates. This similar explanation was proposed (49). Secondly, nocturia may be can be attributed to the increased model complexity due to unspecific to urologic conditions in general. Significant correlation presence of several latent variables, and other longitudinal between IPSS nocturia and items 5 and 6 describing nocturia in pharmacometric multidimensional IRT models have reported the 8-item overactive bladder questionnaire (OAB-8) has been similar issues (13,14). More advanced and computationally established (50); an IRT analysis of the OAB-8 in both men and intensive methods for assessing parameter uncertainty (e.g., a women showed the two items describing nocturia to have the non-parametric bootstrap) may be used to obtain parameter relatively lowest discrimination parameter values (51)(ratioto precision, but were beyond the scope of the current work. the highest discrimination parameter estimate was 0.35, 0.40, and Item- and summary-level VPCs were therefore the primary 0.42 for IPSS nocturia, OAB-8 item 5, and OAB-8 item 6, basis for concluding adequate model fitand predictive respectively). It should be emphasized that nocturia and urgency performance. If longitudinal model stability and covariate symptoms appear to be the most bothersome symptoms to identification are of primary interest, the longitudinal unidi- patients suffering from LUTS (52,53). Lower information content mensional IRT model may be a better-suited alternative. The does not entail that the corresponding symptom is not bother- unidimensional approach may also be advantageous for more some from a patient perspective; it indicates that the frequency of straightforward translation between changes in the summary observed scores varies less across patients with highly different IPSS and IRT-estimated disability. From a psychometric disease severity compared with other items. The item is therefore standpoint, both the unidimensional and bidimensional IPSS less sensitive in assessing the overall condition and less useful for IRT approaches are valid (41). distinguishing between patients. The bother of each BPH-LUTS symptom is expected to vary between patients, yet this is not Power captured by the IPSS; this diagnostic limitation (54)isaddressed by other questionnaires, e.g., the Danish Prostate Symptom Score The longitudinal model-based analyses showed consid- (55) and the International Continence Society Questionnaire erably higher power to detect a drug effect compared with the Male LUTS questionnaire (56). cross-sectional ANCOVA using only data from the visit Based on comparison between IRT disability and total IPSS, 3 months post-dose. The higher power of longitudinal the MDD of IPSS ≤− 3 for classifying patients as experiencing pharmacometric modeling compared with cross-sectional clinically significant improvement (36,37) and IPSS ≥4for testing is not a novel finding and has previously been reported determining clinical progression of BPH-LUTS (37–40)is in several other therapeutic areas (9–11), yet comparison with supported. However, seeing that there is extensive overlap a WOT estimand-based test has to our knowledge not been between changes in latent disability at the observed MDD and presented previously. These findings are discussed further in below it (decreases lower than three total IPSS points and to a the Supplemental Discussion. certain extent increases in total IPSS), using only the change in A modest increase in power to detect a drug effect was total IPSS to evaluate response may overlook many patients that observed by the use of the unidimensional IRT modeling compared benefit from treatment. The same reasoning applies to patients with the total IPSS model, and this finding was unexpected given that experience worsening of their symptoms. that other longitudinal IRT applications have shown greater Discussion regarding the developed sampling-based increases in power compared with longitudinal summary score GAM smooth methodology for evaluating ICCs is presented modeling (9,15). Studies have shown that the larger the number of in the Supplemental Discussion. items in a questionnaire, the higher the power of IRT (60,61), and this may explain the similar power between the summary IPSS Longitudinal Modeling model and the unidimensional IRT model in the current study compared with analyses of questionnaires with a higher number of In both the longitudinal total IPSS and IRT models, a model items. Furthermore, the heterogeneity in the item discrimination describing treatment as present or absent best described treatment parameter values has been shown to affect the power of IRT The AAPS Journal (2020) 22:115 115 Page 13 of 15 compared with summary score modeling (62). For instance, for the pharmacometric IRT modeling when applied to different mea- 8-item Expanded Disability Status Scale (EDSS) in multiple surement scales, as it may differ to a great extent depending on sclerosis, pharmacometric IRT analysis showed a larger power the internal characteristics of the latter. Knowledge regarding the increase compared with summary score modeling (63)thaninthe size of the increase in the power to detect a drug effect may be current study, which may be explained by the higher variability primordial in informingadrug developer’s decision to implement between discrimination parameter estimates of EDSS items (66% the more complex IRT methodology. For completeness, it is to be CV) compared with IPSS items (29% CV) (12). In the current noted that pharmacometric modeling of longitudinal data is not work, the bidimensional pharmacometric IRT model was used for the current standard for detecting drug effects in clinical trials. simulation of data on which the power to detect a drug effect was Further research regarding, e.g., its general alignment with estimated for the unidimensional IRT and total IPSS models, traditional statistical analyses, the adequacy of its underlying respectively. A sensitivity analysis specifying the unidimensional assumptions, its type I error control, and its pre-specification (65– IRT model as the simulation model was performed and confirmed 67), is needed before it may be regarded as the primary analysis the currently reported power difference between the method and thereby dictate the sample size of clinical trials. pharmacometric unidimensional IRT model and the total IPSS The IRT methodology may be implemented in all clinical model (data not shown). trials where composite scores are used to assess treatment A higher power to detect a drug effect was observed with the efficacy, i.e., from proof-of-concept phase II to confirmatory longitudinal bidimensional IRT model compared with the unidi- phase III trials. However, the shift from using “observed total mensional IRT model. This may be due to the differences in ICCs score” to “underlying disease” as the estimand summary and disability scale of the multidimensional model compared with measure (32) may represent a substantial paradigm shift and the unidimensional model, which, in turn, give a more precise may therefore require framework developments supervised by discernment of the drug effect. Given a questionnaire where regulators. An example could be the development of standard- multidimensionality is substantiated, we hypothesize that the ized item banks based on a large number of item-level patient difference in power to detect a drug effect may increase compared responses from many trials. This would inform precise ICCs and with a unidimensional IRT model as the correlation between latent thereby allow for precise and, most importantly, consistent variables decreases, as this would gradually increase the difference estimation of latent disability across different clinical trials. The in ICCs and disability scale. This is the first investigation of the merit and practical utility of IRT in increasing the efficiency of impact of IRT dimensionality on the power to detect a drug effect clinical development programs appear to already be recognized and hence warrants further investigation. For example, the original within the US Food and Drug Administration (68). application of pharmacometric IRT based on the ADAS-cog scale (9) investigated the power of a unidimensional IRT model; based CONCLUSION on findings suggesting that the ADAS-cog is multidimensional (64), it may also be of interest to assess the power of a multidimensional Pharmacometric models were developed based on item- pharmacometric ADAS-cog IRT model. level and summary-level IPSS, respectively, to describe the A limitation of the current as well as previous pharmacometric time course of underlying disability and total IPSS in patients IRT studies (9,15,63) was that simulation model bias was present in with moderate-to-severe BPH-LUTS in a clinical trial setting. the power calculations: the pharmacometric IRT model used for IRT analysis revealed that voiding IPSS items combined simulation of data was also used to estimate power and may contained the majority of the information content, which may therefore have favored the pharmacometric IRT approaches. have implications for the analysis of IPSS subscores. The Other approaches, such as developing longitudinal ordered unidimensional IRT model showed slightly higher power to categorical models for each item and simulating data from these, detect a drug effect compared with the composite score were considered. However, it is not clear whether the IPSS ICCs model, while the bidimensional IRT model further increased would be preserved or require re-estimation based on simulated the power. Taking the multidimensional nature of the IPSS data by doing so and whether meaningful comparison with into account in a pharmacometric IRT framework may hence previously reported reductions in sample size would be feasible. allow for more precise quantification of drug effects and The current findings may serve to more precisely assess optimization of statistical power. patients’ underlying BPH-LUTS by utilizing the available item-level IPSS responses instead of considering only the sum ACKNOWLEDGMENTS of these scores. Furthermore, they may inform more efficient clinical development of BPH-LUTS treatments, although the The authors would like to thank Sebastian Ueckert and gain in power to detect a drug effect was found to be lower Leticia Arrington for their valuable input during the research. compared with previously reported applications with different This work was funded jointly by the Danish Innovation Fund scales describing different neurological conditions (9,15,63). (grant number 5189-00064b), Ferring Pharmaceuticals A/S, IRT focuses on quantifying the information of questionnaires and the Swedish Research Council Grant 2018-03317. in specific patient populations; since the modeled data spanned the entire range of total IPSS (i.e., from the lowest AUTHOR CONTRIBUTIONS to the highest possible disease severity), the presented results may be extended to the analysis of the IPSS in other clinical Y.K.L. wrote the manuscript and analyzed the data. trials including similar patients with moderate-to-severe BPH- Y.K.L., D.M.J, T.M.L., A.C.H., and M.O.K. designed the LUTS, regardless of treatment and its effect size. research. D.M.J, T.M.L., A.C.H., and M.O.K. reviewed the 5The current study emphasizes the importance of quantify- manuscript. ing the increase in power to detect a drug effect with 115 Page 14 of 15 The AAPS Journal (2020) 22:115 12. Novakovic AM, Krekels EHJ, Munafo A, Ueckert S, Karlsson FUNDING INFORMATION MO. Application of item response theory to modeling of expanded disability status scale in multiple sclerosis. AAPS J. Open access funding provided by Uppsala University. 2017;19(1):172–9. 13. Krekels E, Novakovic AM, Vermeulen AM, Friberg LE, Karlsson MO. Item response theory to quantify longitudinal COMPLIANCE WITH ETHICAL STANDARDS placebo and paliperidone effects on PANSS scores in schizo- phrenia. CPT Pharmacomet Syst Pharmacol. 2017;6(8):543–51. Conflict of Interest Y.K.L. and D.M.J. are employees of Ferring 14. Gottipati G, Karlsson MO, Plan EL. Modeling a composite Pharmaceuticals A/S. All other authors declare that they have no score in Parkinson’s disease using item response theory. AAPS J. 2017;19(3):837–45. conflicts of interest. 15. Buatois S, Retout S, Frey N, Ueckert S. Item response theory as an efficient tool to describe a heterogeneous clinical rating scale in de novo idiopathic Parkinson’s disease patients. Pharm Res. 2017;34(10):2109–18. Open Access This article is licensed under a Creative 16. Baker FB. The basics of item response theory. Second Edition Commons Attribution 4.0 International License, which per- [Internet]. For full text: http://ericae; 2001 [cited 2019 May 23]. Available from: https://eric.ed.gov/?id=ED458219 mits use, sharing, adaptation, distribution and reproduction in 17. DeMars C. Item response theory. Oxford, New York: Oxford any medium or format, as long as you give appropriate credit University Press; 2010. 144 p. (Understanding Statistics). to the original author(s) and the source, provide a link to the 18. D’Agate, S. PAGE 2018 III-77 Development of a drug-disease Creative Commons licence, and indicate if changes were model describing individual IPSS trajectories in BPH patients: implication of disease progression and covariate factors on long made. The images or other third party material in this article term treatment response. are included in the article's Creative Commons licence, unless 19. Samejima F. Estimation of latent ability using a response indicated otherwise in a credit line to the material. If material pattern of graded scores. Psychometrika. 1969;34(1):1–97. is not included in the article's Creative Commons licence and 20. Schindler E, Friberg LE, Lum BL, Wang B, Quartino A, Li C, your intended use is not permitted by statutory regulation or et al. A pharmacometric analysis of patient-reported outcomes in breast cancer patients through item response theory. Pharm exceeds the permitted use, you will need to obtain permission Res. 2018;35(6):122. directly from the copyright holder. To view a copy of this 21. Ueckert S. Modeling composite assessment data using item licence, visit http://creativecommons.org/licenses/by/4.0/. response theory. CPT Pharmacomet Syst Pharmacol. 2018;7(4):205–18. 22. Thurstone LL. Multiple factor analysis. Psychol Rev. 1931;38(5):406–27. REFERENCES 23. De Ayala RJ, Hertzog MA. The assessment of dimensionality for use in item response theory. Multivar Behav Res. 1991;26(4):765–92. 24. Samejima F. Graded response model. In: van der Linden WJ, 1. Berry SJ, Coffey DS, Walsh PC, Ewing LL. The development of Hambleton RK, eds. Handbook of modern item response human benign prostatic hyperplasia with age. J Urol. theory. New York: Springer; 1997:85–100. 1984;132(3):474–9. 25. Kaiser HF. The varimax criterion for analytic rotation in factor 2. Medina JJ, Parra RO, Moore RG. Benign prostatic hyperplasia analysis. Psychometrika. 1958;23(3):187–200. (the aging prostate). Med Clin North Am. 1999;83(5):1213–29. 26. Lacroix BD, Friberg LE, Karlsson MO. Evaluation of IPPSE, 3. Parsons JK, Mougey J, Lambert L, Wilt TJ, Fink HA, Garzotto an alternative method for sequential population PKPD analysis. M, et al. Lower urinary tract symptoms increase the risk of falls J Pharmacokinet Pharmacodyn. 2012 Apr;39(2):177–93. in older men. BJU Int. 2009;104(1):63–8. 27. Pilla Reddy V, Kozielska M, Johnson M, Vermeulen A, de 4. Calais Da Silva F, Marquis P, Deschaseaux P, Gineste JL, Greef R, Liu J, et al. Structural models describing placebo Cauquil J, Patrick DL. Relative importance of sexuality and treatment effects in schizophrenia and other neuropsychiatric quality of life in patients with prostatic symptoms. Results of an disorders. Clin Pharmacokinet. 2011;50(7):429–50. international study. Eur Urol 1997;31(3):272–280. 28. Tornøe CW, Agersø H, Nielsen HA, Madsen H, Jonsson EN. 5. Taylor BC, Wilt TJ, Fink HA, Lambert LC, Marshall LM, Population pharmacokinetic modeling of a subcutaneous depot Hoffman AR, et al. Prevalence, severity, and health correlates for GnRH antagonist degarelix. Pharm Res. 2004 of lower urinary tract symptoms among older men: the MrOS Apr;21(4):574–84. study. Urology. 2006 Oct;68(4):804–9. 29. Savic RM, Karlsson MO. Importance of shrinkage in empirical 6. Jacobsen SJ, Jacobson DJ, Girman CJ, Roberts RO, Rhodes T, Bayes estimates for diagnostics: problems and solutions. AAPS Guess HA, et al. Natural history of prostatism: risk factors for J. 2009;11(3):558–69. acute urinary retention. J Urol. 1997;158(2):481–7. 30. Vong C, Bergstrand M, Nyberg J, Karlsson MO. Rapid sample 7. Barry MJ, Fowler FJ, O’Leary MP, Bruskewitz RC, Holtgrewe size calculations for a defined likelihood ratio test-based power HL, Mebust WK, et al. The American Urological Association in mixed-effects models. AAPS J. 2012;14(2):176–86. symptom index for benign prostatic hyperplasia. The Measure- 31. Wählby U, Bouw MR, Jonsson EN, Karlsson MO. Assessment ment Committee of the American Urological Association. J of type I error rates for the statistical sub-model in NONMEM. Urol. 1992;148(5):1549–57 discussion 1564. J Pharmacokinet Pharmacodyn. 2002;29(3):251–69. 8. Griffith JW. Self-report measurement of lower urinary tract 32. International Conference on Harmonisation E9(R1) addendum: symptoms: a commentary on the literature since 2011. Curr Urol statistical principles for clinical trials - estimands and sensitivity Rep. 2012;13(6):420–6. analysis in clinical trials < https://www.ema.europa.eu/en/docu- 9. Ueckert S, Plan EL, Ito K, Karlsson MO, Corrigan B, Hooker ments/scientific-guideline/ich-e9-r1-addendum-estimands-sensi- AC. Improved utilization of ADAS-cog assessment data tivity-analysis-clinical-trials-guideline-statistical- through item response theory based pharmacometric modeling. principles_en.pdf> (2020). Accessed March 11, 2020. Pharm Res. 2014;31(8):2152–65. 33. Beal SL, Sheiner LB, Boeckmann A. NONMEM user’s guides 10. Karlsson KE, Vong C, Bergstrand M, Jonsson EN, Karlsson Ellicott City. 2009. MO. Comparisons of analysis methods for proof-of-concept 34. Chalmers RP. mirt: a multidimensional item response theory trials. CPT Pharmacomet Syst Pharmacol. 2013;2(1):e23. package for the R environment. J Stat Softw. 2012;48(1):1–29. 11. Nelander, Karin, Hamrénn, B, Johansson, S, Åstrand, M. PAGE 35. Keizer RJ, Zandvliet AS, Beijnen JH, Schellens JHM, Huitema 2016 III-33 Longitudinal dose-response modelling as primary ADR. Performance of methods for handling missing categorical analysis of a clinical study. The AAPS Journal (2020) 22:115 115 Page 15 of 15 covariate data in population pharmacokinetic analyses. AAPS J. 53. Everaert K, Anderson P, Wood R, Andersson FL, Holm-Larsen 2012;14(3):601–11. T. Nocturia is more bothersome than daytime LUTS: results 36. Barry MJ, Williford WO, Chang Y, Machi M, Jones KM, from an observational, real-life practice database including 8659 Walker-Corkery E, et al. Benign prostatic hyperplasia specific European and American LUTS patients. Int J Clin Pract. health status measures in clinical research: how much change in 2018;72(6):e13091. the American Urological Association symptom index and the 54. Gratzke C, Bachmann A, Descazeaud A, Drake MJ, benign prostatic hyperplasia impact index is perceptible to Madersbacher S, Mamoulakis C, et al. EAU guidelines on the patients? J Urol. 1995;154(5):1770–4. assessment of non-neurogenic male lower urinary tract symp- 37. Barry MJ, Cantor A, Roehrborn CG, CAMUS Study Group. toms including benign prostatic obstruction. Eur Urol. Relationships among participant international prostate symp- 2015;67(6):1099–109. tom score, benign prostatic hyperplasia impact index changes 55. Schou J, Poulsen AL, Nordling J. The value of a new symptom and global ratings of change in a trial of phytotherapy in men score (DAN-PSS) in diagnosing uro-dynamic infravesical ob- with lower urinary tract symptoms. J Urol. 2013;189(3):987–92. struction in BPH. Scand J Urol Nephrol. 1993;27(4):489–92. 38. McConnell JD, Roehrborn CG, Bautista OM, Andriole GL, Dixon 56. Donovan JL, Peters TJ, Abrams P, Brookes ST, de aa Rosette CM, Kusek JW, et al. The long-term effect of doxazosin, finasteride, JJ, Schäfer W. Scoring the short form ICSmaleSF questionnaire. and combination therapy on the clinical progression of benign International Continence Society J Urol. 2000;164(6):1948–55. prostatic hyperplasia. N Engl J Med. 2003;349(25):2387–98. 57. European Medicines Agency. Report from dose finding work- 39. Roehrborn CG, Siami P, Barkin J, Damião R, Major-Walker K, shop <https://www.ema.europa.eu/en/documents/report/report- Nandy I, et al. The effects of combination therapy with european-medicines-agency/european-federation-pharmaceuti- dutasteride and tamsulosin on clinical outcomes in men with cal-industries-associations-workshop-importance-dose-finding- symptomatic benign prostatic hyperplasia: 4-year results from dose_en.pdf> (2015). Accessed May 31st, 2020. the CombAT study. Eur Urol. 2010 Jan 1;57(1):123–31. 58. Caine M. The present role of alpha-adrenergic blockers in the 40. Tacklind J, Fink HA, Macdonald R, Rutks I, Wilt TJ. treatment of benign prostatic hypertrophy. J Urol. Finasteride for benign prostatic hyperplasia. Cochrane Data- 1986;136:1):1–4. base Syst Rev. 2010;10:CD006015. 59. Andersson K-E. Storage and voiding symptoms: pathophysio- 41. Welch G, Kawachi I, Barry MJ, Giovannucci E, Colditz GA, logic aspects. Urology. 2003;62(5):3–10. Willett WC. Distinction between symptoms of voiding and 60. Holman R, Glas CAW, de Haan RJ. Power analysis in filling in benign prostatic hyperplasia: findings from the health randomized clinical trials based on item response theory. professionals follow-up study. Urology. 1998;51(3):422–7. Control Clin Trials. 2003;24(4):390–410. 42. Barry MJ, Williford WO, Fowler FJ, Jones KM, Lepor H. 61. Doostfatemeh M, Taghi Ayatollah SM, Jafari P. Power and Filling and voiding symptoms in the American Urological sample size calculations in clinical trials with patient-reported Association symptom index: the value of their distinction in a outcomes under equal and unequal group sizes based on graded Veterans Affairs randomized trial of medical therapy in men response model: a simulation study. Value Health. with a clinical diagnosis of benign prostatic hyperplasia. J Urol. 2016;19(5):639–47. 2000;164(5):1559–64. 62. Schindler E, Friberg LE, Karlsson MO. PAGE 2015 II-01 43. Yokoyama O, Ozeki A, Suzuki N, Murakami M. Early improvement Comparison of item response theory and classical test theory of storage or voiding symptoms by tadalafil predicts treatment for power/sample size for questionnaire data with various outcomes in patients with lower urinary tract symptoms from benign degrees of variability in items’ discrimination parameters. 2015. prostatic hyperplasia. Int J Urol. 2018;25(3):240–5. 63. Novakovic AM. Longitudinal models for quantifying disease 44. US Food and Drug Administration. Guidance for the non- and therapeutic response in multiple sclerosis. Uppsala: Acta clinical and clinical investigation of devices used for the Universitatis Upsaliensis; 2017. treatment of benign prostatic hyperplasia (BPH) (2010). 64. Verma N, Markey MK. Item response analysis of Alzheimer’s <https://www.fda.gov/regulatory-information/search-fda-guid- disease assessment scale. Conf Proc Annu Int Conf IEEE Eng ance-documents/guidance-non-clinical-and-clinical-investiga- Med Biol Soc IEEE Eng Med Biol Soc Annu Conf. tion-devices-used-treatment-benign-prostatic-hyperplasia> 2014;2014:2476–9. Accessed March 20, 2020. 65. Bieth B. et al Population Approach Group Europe (PAGE) 45. Montorsi F, Henkel T, Geboers A, Mirone V, Arrosagaray P, Model-based analyses for pivotal decisions, with an application Morrill B, et al. Effect of dutasteride, tamsulosin and the to equivalence testing for biosimilars Abstr 2343 (2012). combination on patient-reported quality of life and treatment 66. Musuamba F, Manolis E, Holford N, Cheung S, Friberg L, satisfaction in men with moderate-to-severe benign prostatic Ogungbenro K, et al. Advanced methods for dose and regimen hyperplasia: 4-year data from the CombAT study. Int J Clin finding during drug development: summary of the EMA/EFPIA Pract. 2010;64(8):1042–51. workshop on dose finding (London 4–5 December 2014). CPT 46. Lee JY, Lee DH, Lee H, Bang WJ, Hah YS, Cho KS. Clinical Pharmacomet Syst Pharmacol. 2017 Jul;6(7):418–29. implications of a feeling of incomplete emptying with little post- 67. Marshall S, Madabushi R, Manolis E, Krudys K, Staab A, void residue in men with lower urinary tract symptoms. Dykstra K, et al. Model-informed drug discovery and develop- Neurourol Urodyn. 2014;33(7):1123–7. ment: current industry good practice and regulatory expecta- 47. Asplund R. The nocturnal polyuria syndrome (NPS). Gen tions and future perspectives. CPT Pharmacomet Syst Pharmacol. 1995 Oct;26(6):1203–9. Pharmacol. 2019;8(2):87–96. 48. Miller M. Nocturnal polyuria in older people: pathophysiology 68. Younis, I. Clinical trial database analyses to inform regulatory and clinical implications. J Am Geriatr Soc. 2000;48(10):1321–9. guidances: improving the efficiency of schizophrenia clinical 49. Homma Y, Yamaguchi T, Kondo Y, Horie S, Takahashi S, trials. The International Society for CNS Clinical trials and Kitamura T. Significance of nocturia in the international methodology (ISCTM) 14th Annual Scientific Meeting <https:// prostate symptom score for benign prostatic hyperplasia. J Urol. isctm.org/public_access/Feb2018/Presentations/S2-Younis.pdf> 2002;167(1):172–6. (2018) Accessed July 15th, 2020. 50. Trafford Crump R, Sehgal A, Wright I, Carlson K, Baverstock R. From prostate health to overactive bladder: developing a crosswalk for the IPSS to OAB-V8. Urology. 2019;125:73–8. Publisher’s Note Springer Nature remains neutral with regard 51. Peterson AC, Sehgal A, Crump RT, Baverstock R, Sutherland JM, Carlson K. Evaluating the 8-item overactive bladder to jurisdictional claims in published maps and institutional questionnaire (OAB-v8) using item response theory. Neurourol affiliations. Urodyn. 2018;37(3):1095–100. 52. Agarwal A, Eryuzlu LN, Cartwright R, Thorlund K, Tammela TLJ, Guyatt GH, et al. What is the most bothersome lower urinary tract symptom? Individual- and population-level perspectives for both men and women. Eur Urol. 2014;65(6):1211–7.

Journal

"The AAPS Journal"Springer Journals

Published: Aug 27, 2020

There are no references for this article.