Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Detecting Multiple Change Points Using Adaptive Regression Splines with Application to Neural Recordings

Detecting Multiple Change Points Using Adaptive Regression Splines with Application to Neural... Time series, as frequently the case in neuroscience, are rarely stationary, but often exhibit abrupt changes due to attractor transitions or bifurcations in the dynamical systems producing them. A plethora of methods for detecting such change points in time series statistics have been developed over the years, in addition to test crite- ria to evaluate their significance. Issues to consider when developing change point analysis methods include computational demands, difficulties arising from either limited amount of data or a large number of covariates, and arriving at statistical tests with sufficient power to detect as many changes as contained in potentially high-dimensional time series. Here, a general method called Paired Adaptive Re- gressors for Cumulative Sum is developed for detecting multiple change points in the mean of multivariate time series. The method’s advantages over alternative approaches are demonstrated through a series of simulation experiments. This is followed by a real data application to neural recordings from rat medial prefrontal cortex during learning. Finally, the method’s flexibility to incorporate useful fea- tures from state-of-the-art change point detection techniques is discussed, along with potential drawbacks and suggestions to remedy them. Keywords: change point, cumulative sum, adaptive regression splines, nonstation- ary, bootstrap test, block-permutation, behaviour, spike counts arXiv:1802.03627v3 [stat.ME] 3 Sep 2018 1 Introduction Stationary data are the exception rather than the rule in many areas of science (As- ton & Kirch, 2012; Elsner, Niu, & Jagger, 2004; Z. Fan, Dror, Mildorf, Piana, & Shaw, 2015; Gartner ¨ , Duvarci, Roeper, & Schneider, 2017; Latimer, Yates, Meis- ter, Huk, & Pillow, 2015; Paillard, 1998; Shah, Lam, Ng, & Murphy, 2007; Stock & Watson, 2014). Time series statistics often change, sometimes abruptly, due to transitions in the underlying system dynamics, adaptive processes or external fac- tors. In neuroscience, both behavioural time series (Durstewitz, Vittoz, Floresco, & Seamans, 2010; Powell & Redish, 2016; A. C. Smith et al., 2004) and their neu- ral correlates (Durstewitz et al., 2010; Gartner ¨ et al., 2017; Latimer et al., 2015; Powell & Redish, 2016; Roitman & Shadlen, 2002) exhibit strongly nonstationary features which relate to important cognitive processes such as learning (Durste- witz et al., 2010; Powell & Redish, 2016; A. C. Smith et al., 2004) and perceptual decision making (Hanks & Summerfield, 2017; Latimer et al., 2015; Roitman & Shadlen, 2002). As such, identifying nonstationary features in behavioural and neural time series becomes necessary, both for interpreting the data in relation to the potential influences generating those features, and for removing those features from the data in order to perform statistical analyses that assume stationary obser- vations (Hamilton, 1994; Shumway & Stoffer, 2010). Abrupt jumps in time series statistics form one important class of nonstationary events. These are often caused by bifurcations, which, in turn, may occur with gradual changes in parameters of the underlying system (Strogatz, 2001). Consequently, they are of wide interest to both statistical data analysis and the study of dynamical systems, and are com- monly referred to as change points (CP; Chen & Gupta, 2012). Detecting CPs has a long and varied history in statistics, and we will not at- tempt to exhaustively survey the different approaches, including regression models (Brown, Durbin, & Evans, 1975; Quandt, 1958), Bayesian techniques (Chernoff & Zacks, 1964) and cumulative sum (CUSUM) statistics (Basseville, 1988; Page, 1954), to name but a few, within the limited scope of this article. Instead, we refer the reader to the excellent reviews on the topic (Aminikhanghahi & Cook, 2017; Bhattacharya, 1994; Chen & Gupta, 2012) and focus on the offline CUSUM class of methods (Hinkley, 1971a) to which PARCS belongs (as opposed to sequential CUSUM methods, Page, 1954, that locate a CP online, while the time series is evolving), specifically methods that aim at detecting CPs in the mean of the time series. CUSUM-based methods are powerful, easy to implement, and are backed up by an extensive literature, theoretical results and various extensions to multiple CPs and multivariate scenarios, making them an ideal starting point. These meth- ods assume that the time series is piecewise stationary in the statistic under consid- eration (e.g., piecewise constant mean) and rely on a cumulative sum transforma- 2 tion of the time series. Commonly, at-most-one-change (AMOC) is identified by maximum-type statistics (Kirch, 2007) at the extremum of the curve resulting from that transformation (Antoch, Husk ˇ ova, ´ & Veraverbeke, 1995; Basseville, 1988). Extending the CUSUM method to multiple CPs usually involves repetitive par- titioning of the time series upon each detection (binary segmentation methods; Bai, 1997; Cho & Fryzlewicz, 2015; Fryzlewicz, 2014; Olshen, Venkatraman, Lucito, & Wigler, 2004; Scott & Knott, 1974). This segmentation procedure, however, may hamper detection in later iterations as the reduction in number of observations depletes statistical power exponentially fast as more CPs are to be retrieved. In this article, we develop the PARCS (Paired Adaptive Regressors for Cumulative Sum) method which offers a straightforward extension that leverages the full time series in order to detect multiple CPs, thus providing a new solution to this issue. PARCS rests on the fact that a CUSUM transformation of the data relates to computing an integral transformation of the piecewise constant mean time series model, result- ing in a piecewise linear mean function that bends at potential CPs and could be approximated by adaptive regression spline methods (Friedman, 1991; Friedman & Silverman, 1989; Stone, Hansen, Kooperberg, & Truong, 1997). Namely, rather than attempting to approximate the discontinuous time series mean directly (Efron, Hastie, Johnstone, & Tibshirani, 2004; Vert & Bleakley, 2010), the PARCS model is an approximation to the continuous CUSUM-transformed time series by a piece- wise linear function. The bending points of the PARCS model are each defined by a pair of non-overlapping piecewise linear regression splines that are first selected by a two-stage iterative procedure. The PARCS model is further refined by a nonparametric CP significance test based on bootstraps (Antoch & Husk ˇ ova, ´ 2001; Dumbgen, 1991; Husk ˇ ova, ´ 2004; Kirch, 2007; Matteson & James, 2014). While analytically derived parametric tests may usually be preferable over bootstrap-based tests due to better convergence and coverage of the tails, in the current CP setting closed form expressions for paramet- ric tests are hard to come by and are usually replaced by approximations (Gombay & Horvath, ´ 1996; Horvath, ´ 1997). In this case, tests based on bootstraps are prefer- able since they are known to converge faster to the limit distribution of the test statistic (often they are also not as conservative as parametric approximations for datasets of a relatively small size; Antoch & Husk ˇ ova, ´ 2001; Csor ¨ go ¨ & Horvath, ´ 1997; Kirch, 2007). In order to accommodate the possibility of temporally depen- dent noise in the data (Antoch, Husk ˇ ova, ´ & Pra ´sk ˇ ova, ´ 1997; Horvath, ´ 1997; Picard, 1985), model selection is carried out by a nonparametric block-permutation boot- strap procedure (Davison & Hinkley, 1997; Husk ˇ ova ´ & Slaby, ` 2001; Kirch, 2007) developed specifically for PARCS, which relies on a test statistic that quantifies the amount of bending at each candidate CP. Since model estimation is based on linear regression, PARCS is also effortlessly extended to spatially independent, 3 multivariate time series. The article is structured as follows. Section 2.1 introduces the CUSUM method for AMOC detection. We then develop the PARCS method, presenting in Section 2.2 the procedure for inferring a nested model that allows for significance testing of multiple CPs, followed in Section 2.3 by an outline of the nonparametric per- mutation test procedure for refining the PARCS model further. Results in Section 3 illustrate that PARCS improves on several issues inherent in classical methods for change point analysis. In Section 3.1, we compare the PARCS approach to the CUSUM method in detecting a single CP, followed in Section 3.2 by a comparison with standard binary segmentation in detecting multiple CPs. We also demonstrate in Section3.3 that PARCS is successful in detecting CPs in spatially independent, multivariate time series. We then present in Section 3.4 an example from the neu- rosciences, in which neural and behavioural CPs are compared during operant rule- switching learning (Durstewitz et al., 2010). Finally, we discuss in Section 4 the PARCS approach in relation to other state-of-the-art CP detection methods, along with drawbacks and potential extensions. 2 Methods This section outlines the CUSUM method and the PARCS extension to multiple CPs, in addition to a nonparametric permutation technique to test for the statis- tical significance of CPs as identified by PARCS. For generality, the formulation assumes temporally dependent observations in the time series, independent obser- vations being a special case. 2.1 CUSUM: Cumulative Sum of Differences to the Mean A class of methods for identifying a single CP in the mean relies on computing a CUSUM transformation of the time series x = fx g . A useful formulation that t 1:T allows for dependent observations in the time series is given by the moving average (MA) step model (Antoch et al., 1997; Horvath, ´ 1997; Kirch, 2007; Lombard & Hart, 1994), x = b + w 1 +   ;  = 1;   N (0;  ); (1) t tc  t 0 t where a jump in the time series mean from baseline b to b + w occurs after time step c, the change point. The step parameter or weight w is positive (negative) when the time series mean increases (decreases) following c. The largest integer such that noise coefficient  6= 0 defines a finite order q of the MA process, which 4 is 0 for temporally independent observations. We will assume that the MA process is stationary, which will always be the case if it is finite, with  independent and identically distributed (i.i.d.) random variables (for an infinite process, points x for t  0 may be considered unobserved, and coefficients  have to fulfil certain conditions to make the process stationary, as given, for instance, in Shumway & Stoffer, 2010). The Gaussian noise assumption in the MA process can be relaxed, as long as the noise process has zero mean and finite, constant variance (see Antoch et al., 1997; Horvath, ´ 1997; Kirch, 2007; Lombard & Hart, 1994, for theoretical results on the more general form of dependent noise). The discrete Heaviside step function, 1 , is defined by, tc 1 if i > 0; 1 = 0 otherwise: Identifying the presence of a CP requires testing the null hypothesis, H : w = 0, against the alternative, H : w 6= 0 (Antoch et al., 1995; Lombard & Hart, 1994). This begins by inferring the time of the step according to a CP locator statistic. A typical offline CP locator statistic is the maximum point of the weighted absolute cumulative sum of differences to the mean (Antoch & Husk ˇ ova, ´ 2001; Horvath, ´ 1997), c^ = arg max x hxi ; (2) t(T t) 0<t<T =1 where hxi is the arithmetic mean of the time series (see Figure 1A). The first term on the right-hand side corrects for bias toward the centre, where more centrally- located points are down-weighed by an amount controlled by parameter 2 [0; 0:5]. Other CUSUM-based locator statistics exist with different bias-correcting terms and cumulative sum transformations (Antoch et al., 1997; Bhattacharya, 1994; Ji- rak, 2012; Kirch, 2007). As outlined in the Discussion, PARCS may be modified to include such bias-correcting terms as well. However, as we will demonstrate, PARCS can significantly reduce centre bias even without recourse to such a term. To show this, we will mostly deal with the generic case, = 0, when comparing PARCS to the CUSUM transformation as defined in Eq. 2. This has the added ad- vantage of avoiding having to select an optimal power or an optimal weight factor, a choice that usually depends on prior assumptions on the CP’s potential loca- tion (Bhattacharya, 1994). As such, and unless stated otherwise, the term CUSUM transformation will refer, thereof, to the cumulative sum of differences to the mean, y , x hxi ; (3) =1 5 A B C D Figure 1: Paired Adaptive Regressors for Cumulative Sum; (A,B) time series x with (A) one or (B) two step changes and their corresponding CUSUM transfor- mation y; (C) fitting y by a piecewise linear model y^ using two pairs of regressors h and h ; (D) the PARCS model fit y^ to the CUSUM transformation y of a time 1 2 series x, returning estimates of multiple CPs, c^ and c^ . 1 2 where the maximum value, S = max x hxi = max jy j; (4) 0<t<T 0<t<T =1 defines a test statistic by which it is decided whether to reject the null hypothesis. Given potentially dependent observations, q > 0, as defined by the model in Eq. 1, nonparametric bootstrap testing proceeds by block-permutation (Davison & Hinkley, 1997; Husk ˇ ova, ´ 2004; Kirch, 2007), such that temporal dependence in the data is preserved (see Section 3.2). The candidate CP c^ is identified according to Eq. 2 and its associated test statistic S is computed by Eq. 4. Estimates b and w^ are retrieved from the arithmetic means of x before and after c^ using the model in Eq. 1. By subtracting w^  1 from the time series x we arrive at a time series x tc^ 0 that provides an estimate of the null distribution. The stationary time series x is split into n blocks of size k, chosen such that temporal dependencies are mostly preserved in the permuted time series (Davison & Hinkley, 1997). One way to do so is to select the block size to be larger than the order of the underlying MA process, q + 1 (since the autocorrelation function of an MA(q) process cuts off at order q; Davison & Hinkley, 1997). This requires identifying the order q which can 6 be determined from the H -conform time series x by inspecting its autocorrela- 0 0 tion function (J. Fan & Yao, 2003) for different time lags  . The autocorrelation function’s asymptotic distribution (Kendall & Stuart, 1983), acorr(x ; )  N 1=(T ); +1=(T ) ; (5) provides a test statistic for deciding the largest time lag q at which to reject the null hypothesis H : acorr(x ; q) = 0, given some preset significance level 2 [0; 1]. 0 0 The resulting blocks are randomly permuted and each permutation is CUSUM- transformed according to Eq. 3 to compute an H -conform sample S of the test 0 0 statistic S in Eq. 4 (note that we do not know the true step parameter w or the true CP c, of course, such that this procedure will only yield an estimate of the H dis- tribution). A sufficiently large number B of permutations results in samples S of an H -conform empirical distribution function (EDF) F(S ) , 1 =B 0 0 S S i=1 0 i that weighs every sample S equally. The candidate CP c^ is detected when the test statistic S as computed from the original time series x satisfies S  F (1 ), where is a preset significance level and F (1 ) the inverse of the EDF, de- th fined as the (1 )B largest value out of B permutations (Davison & Hinkley, 1997; Durstewitz, 2017). 2.2 PARCS: Paired Adaptive Regressors for Cumulative Sum The PARCS method for estimating multiple CPs rests on the fact that the integral of a piecewise constant function is piecewise linear. The AMOC model as defined in Eq. 1 assumes a piecewise stationary MA process, consisting of two segments with constant mean. A process consisting of M + 1 segments generalises Eq. 1 to data containing at-most-M -change, X X x = b + w  1 +   ;  = 1;   N (0;  ): (6) t m tc  t 0 t m=1 The CUSUM transformation y = fy g of this process as given by Eq. 3 t 1:T corresponds to the numerical integration of a piecewise stationary process xhxi. That is, y is approximately (due to the noise) piecewise linear (exactly piecewise linear in the mean; see Figure 1B). If points fc g at which y bends were 1:M known, the latter can be fitted by a weighted sum of local piecewise linear basis functions or splines, centred at the knotsfc g , m 1:M ( ( t c if t > c c t if t < c m m m m h = and h = : t;c t;c m m 0 otherwise 0 otherwise 7 This fit corresponds to modelling the expected value of y, conditioned on spline pair setH = fh g , resulting in model inference, 1:M M M X X + + ^ ^ ^ E y H  y^ = + h + h ; t 0 t;M m t;c m t;c m m m=1 m=1 which is a simple regression problem that can be solved by estimating the intercept ^ ^ and coefficients that minimise the mean-square-error, mse (y; y^ ) = (y y^ ) : (7) M M t t;M t=1 However, in the multiple CP detection setting (assuming M is known), optimal knot placement is not known a priori, but can be inferred by adaptively adjusting knot locations (Friedman, 1991; Friedman & Silverman, 1989; Stone et al., 1997) to maximally satisfy the goodness-of-fit criterion in Eq. 7. In other words, and as shown in Figures 1C,D, the problem of identifying multiple CPs is replaced by the equivalent problem of inferring the order-M PARCS model (or PARCS model), M M X X + + ^ ^ ^ y^ = + h + h ; (8) M 0 m m c^ c^ m m m=1 m=1 with associated M -tuple c^ , (c^ ) that best fits the CUSUM transformation of m 1:M the time series. Regression coefficients in model 8 are real numbers, while knots in the present time series context are positive integers, excluding the first and last time steps, c^ 2 f2; 3; : : : ; T 1g. Fitting the PARCS model is based on a forward/backward spline selection strategy (P. L. Smith, 1982) with added CP ranking stage and proceeds as outlined in Algorithm 1. Starting with an empty PARCS model, containing only the in- tercept , a forward sweep increases model complexity to a forward upper bound order L > M by adding at each iteration the spline pair h , not yet contained in the model, that decreases residual mean-square-error the most. A reasonable heuristic for setting L is 2 to 3 times M (assuming M is known or given some liberal guess). This is followed by a backward pruning iteration, in which the spline pair whose removal increases residual mean-square-error the least is dropped from the model. Pruning removes those knots that were added at the beginning of the forward phase which became redundant as the model was refined by later additions (Friedman & Silverman, 1989). This stage continues until the number of knots reaches the preset final upper bound of model complexity M , i.e., LM knots are pruned. Knots are then sorted in descending order according to the amount of explained variance. The 8 Input: L; M and y Output: c^ , (c^ ) and y^ m 1:M M c^;H ? for m 1 to L do // forward stage c^ arg min mse (y; y^ ) h 2= H m m 1<c<T c H[h H H[ h c^ for m L to M + 1 do // pruning stage c^ arg min mse (y; y^ ) m1 Hnh c^ c^nc^ and H Hnh c^ P P M + M ^ ^ ^ y^ + h + h M 0 m=1 m m=1 m c^ c^ m m for m M to 1 do // ranking stage c^ arg min mse (y; y ) m m1 Hnh H Hnh c^ Algorithm 1: Procedure for inferring the PARCS model with for- ward/backward spline selection (first/second loop) and CP ranking (third loop). Regression coefficients are computed by least squares estimation, con- ditioned on the set of knot locations of predefined size M that minimises mean-square-error. Final knot locations are specified by eliminating spuri- ous knots through block-permutation bootstrapping as described in Section 2.3. ranking iteration returns a nested model by pruning the PARCS model further, down to the PARCS model. The first knot to be pruned, reducing the number of knots to M 1, explains the least variance and is placed last as c^ in the M -tuple c^. The last knot to be pruned explains the most variance and is placed first as c^ . Note that regression coefficients are re-estimated every time a knot is added to or removed from the PARCS model. The model can be effortlessly extended to the multiple response setting in the case of spatially independent time series (extension to a nondiagonal MA covari- ance matrix, Stone et al., 1997, will be considered elsewhere). Given N indepen- dent, piecewise stationary MA processes with common CPsfc g , m 1:M X X x = b + w 1 +   ;  = 1;   N (0;  ); (9) t;n n mn tc  t;n 0 t;n m=1 0 where n = 1; : : : ; N , the corresponding multivariate CUSUM transformation y = fy g is fitted by the multiple response, PARCS model, conditioned on com- t;n 1:N M 9 mon spline pairs, M M X X + + ^ ^ ^ E y H  y^ = + h + h ; t;n t;M;n 0n mn mn t;c^ t;c^ m m m=1 m=1 using Algorithm 1. Returning CPs that are common to all variables x is done by using the goodness-of-fit criterion in Eq. 7, averaged over all responses y . 2.3 PARCS Model Selection by Block-Permutation Bootstrap The piecewise linear PARCS formulation, Eq. 8, of the CUSUM transformation in Eq. 3 bends at the CPs. Due to the presence of noise in the original time series x, some noise realisations may appear as slight bends in the CUSUM-transformed time series, leading PARCS to return false CPs. As such, the amount of bending at knot c^ can be used as a test statistic for bootstrap significance testing that can refine the PARCS model further. No bending indicates either a constant fit, + + ^ ^ ^ ^ = = 0, or a smooth linear fit, = (see also Figure 1C). Thus, a m m m m suitable test statistic that quantifies the amount of bending at c^ is given by, ^ ^ S = + ; (10) m m where for multivariate time series, the test statistic is the average over all time series. Before describing the block-permutation bootstrap method for PARCS, we out- line a procedure for identifying the order q of the MA noise process, provided as pseudocode in Algorithm 2. First, an H -conform time series x = fx g is 0 0 t;0 1:T computed by regressing out the PARCS model y^ of Eq. 8 from the CUSUM- transformed time series y and then inverting the CUSUM transformation. This is followed by inspecting the autocorrelation function of x for different time lags  . The largest time lag at which the null hypothesis H : acorr(x ; q) = 0 is rejected, 0 0 given some preset significance level 2 [0; 1] is then returned as the order q, given some predefined upper bound of MA order, Q. Given the M -tuple CP set c^ returned by Algorithm 1 and an estimate of the dependent normal noise order q by Algorithm 2, a block-permutation bootstrap test returns the subset &^ of significant CPs, as outlined in Algorithm 3. First, an H -conform time series x is computed. For each CP c^ 2 c^, starting with the 0 0 m one ranked highest, all CP-splines already deemed significant by the bootstrap test are regressed out of y. A PARCS model with the remaining knots, including c^ , is estimated and the test statistic S, evaluated at c^ according to Eq. 10, is computed. Knot c^ is tested for significance against an H -conform EDF, estimated through m 0 block-permutation bootstrapping: A total of B bootstrap samples is generated from 10 Input: y; c^; Q and Output: q  Q q Q P P M M + + ^ ^ ^ y y + h + h 0 0 m=1 m m=1 m c^ c^ m m x y y +hxi for t = 1; : : : ; T where y = 0 t;0 t;0 t1;0 0;0 for  1 to Q do F CDF N 1=(T ); +1=(T ) 1 1 if acorr(x ; ) 2 F ( =2); F (1 =2) then q  1 break Algorithm 2: Identifying the order q of the MA process, given some upper bound Q. The H -conform time series x is estimated before entering the 0 0 loop. The loop increases the autocorrelation time lag and exits when the autocorrelation of x is not significantly different from 0 anymore. the H -conform series x by randomly permuting blocks of size k = q + 1. For 0 0 each of these i = 1 : : : B bootstrap samples test statistic S is evaluated at knot location c^ , yielding an EDF F(S ) which assigns equal probability 1=B to each m 0 bootstrapped S . A significant c^ is then added to &^, or rejected as false discovery i m otherwise. The procedure repeats for the CP next in the rank order. Similar to Algorithm 1, regression coefficients are re-estimated every time a knot is added to or removed from the PARCS model. 3 Results We first evaluate the PARCS method on synthetic data in single and multiple CP detection settings, followed by a real data example on detecting behavioural and neural change points during rule learning. 3.1 Alleviating CUSUM Bias in AMOC Detection We first compare the CUSUM method for detecting a single CP to the PARCS approach in order to evaluate the effect of each method on the centre bias in CP detection. Both white and MA Gaussian noise are considered. We also compare PARCS to the CUSUM locator statistic of Eq. 2 with = 0:5 (the maximum likelihood estimator of CP location under the assumption of i.i.d. Gaussian noise) and identify conditions under which one method is preferable over the other. 11 Input: y; c^; k; B and Output: &^ c^ &^ ? P P M M + + ^ ^ ^ y y + h + h 0 0 m=1 m m=1 m c^ c^ m m x y y +hxi for t = 1; : : : ; T where y = 0 t;0 t;0 t1;0 0;0 for m 1 to M do P P j&^j j&^j ^ ^ ^ y y + h + h c^n&^ =1  &^ =1  &^ P P M M + + ^ ^ ^ y^ + h + h ^ ^ 0 cn& =m c^ =m c^ ^ ^ S + m m F(S ) BlockPermutationBootstrap(x ; c^ ; k; B) 0 0 m if S  F (1 ) then &^ &^[ c^ Algorithm 3: Block-permutation bootstrap procedure for PARCS, given block size k. The H -conform time series x is estimated before entering 0 0 the loop. The loop iterates over the rank-ordered CPs to test for each CP’s significance. Univariate time series of length T = 100 are simulated according to the step model in Eq. 1 with different levels of white Gaussian noise,  2 f0:4; 0:5; : : : ; 1:0g, and different ground truth CP locations, c 2 f20; 30; : : : ; 80g. Baseline is set to b = 0 and step parameter to w = 1. Note that in the step model with white Gaus- sian noise, increasing  is equivalent to reducing w. A single CP was identified by using the CUSUM method and estimating the PARCS model, both followed by bootstrap significance testing with B = 10000 permutations, nominal signifi- cance level = 0:05, and blocks of size k = 1 (since noise is independent in this example). Each parameter configuration was repeated for 1000 noise realisations. We compare bias in CP detection toward the centre of the time series in both the CUSUM and PARCS methods. We measure this centre bias by cb = (2 1 1) (c c^), which is positive when estimate c^ falls onto the side located cT=2 toward the centre from c, and is negative otherwise. As expected given the choice = 0 in Eq. 2, the CUSUM method shows a strong centre bias which increases for lower signal-to-noise ratio and more peripheral CPs (see Figure 2A). The CUSUM method’s power decreases for harder parameter settings (higher  and more periph- eral c) in that true CPs are missed in more of the realisations. The PARCS method results in a significant reduction in centre bias but does not eliminate it completely, and yields more misses relative to CUSUM if both are run at the same nominal level (see Figure 2B). Summary comparison between the two methods is shown in 12 Figure 2C for two exemplary CP locations, c 2 f20; 60g. To fully appreciate the source of CUSUM centre bias and its reduction by PARCS, time series realisations with the two hardest parameter settings ( = 1:0 and c 2 f20; 80g) are considered in Figure 2D, which compares the distribution of cb in the 81% of realisations in which both CUSUM and PARCS returned a CP. The histograms show a strongly skewed, heavy-tailed distribution for CUSUM, compared to a more symmetric distribution around 0 for PARCS, indicating only little bias. Most of the centre bias in PARCS is accounted for by outliers. This is illustrated by excluding outliers in the boxplots, which show a median of 1 time step centre bias in PARCS against median centre bias of 4 time steps in the case of CUSUM. Note that measuring centre bias as defined above does not differenti- ate between biased detections and false discoveries where, in extreme cases, a CP may be detected beyond the middle point T=2 of the time series, corresponding to centre bias greater than jc T=2j. However, this scenario rarely occurred in the simulation results reported here. While PARCS reduces centre bias, the simulation results above indicate that it behaves more conservatively than CUSUM at the same nominal level. In prin- ciple, false H rejection rates (type II errors) may be reduced by adjusting the level, at the same time producing more false discoveries. In order to assess how well the nominal significance level agrees with the empirical type I error rate (false discoveries), 1000 white Gaussian noise realisations of length T = 100 are simulated with  = 1:0. Conclusions drawn from this analysis are largely the same for larger signal-to-noise ratio (results not shown). A single CP was extracted us- ing the CUSUM method and estimating the PARCS model on these time series conforming to the null hypothesis H : w = 0. Type I error rates at different nominal levels are shown in Figure 2E as probability-probability (P-P) plots, depicting the nominal, 1 , against the empirical, 1 ^, probabilities of ac- cepting the null hypothesis when the null hypothesis is true. While the empirical type I error rate of CUSUM perfectly agrees with the nominal significance level, for PARCS, in contrast, the empirical rate of false discoveries tends toward 0% for = 0:05 and remains smaller than 1% for as large as 0:18. This entails that PARCS behaves highly conservatively, and that the nominal level may be ad- justed considerably upward without strongly influencing the false discovery rate. On the other hand, despite being more conservative, Figure 2I shows that the re- ceiver operating characteristic (ROC) curve for PARCS, depicting the method’s false discovery rate against its power for different nominal levels, consistently lies above that of CUSUM. For estimating the statistical power of each method, 1000 white Gaussian noise realisations with one CP at a random location in the range [20; 80]%T (and w =  = 1) are simulated and type II error rates at dif- ferent nominal levels are computed. This ROC analysis indicates that for every 13 A B C D 20 2 7 12 2 5 13 17 30 30 2 1 2 6 40 2 1 2 -10 10 20 30 50 0 60 1 -2 -4 70 2 1 3 0 -6 80 1 6 12 2 3 1014 -2 0.4 0.6 0.8 1 0.4 0.6 0.8 1 0.4 0.6 0.8 1 0 20 40 60 E F G .70.75.80.85.90.95 .70.75.80.85.90.95 .70.75.80.85.90.95 .70.75.80.85.90.95 I J K L .05.20 .40 .60 .80 .95 .05.20 .40 .60 .80 .95 .05.20 .40 .60 .80 .95 .05.20 .40 .60 .80 .95 Figure 2: Centre bias in PARCS compared to CUSUM for temporally independent noise; (A,B) bias, hc^ ci, colour-coded as indicated by the colour bar; numbers indicate rounded type II error rates; (C) bias  s.e.m. for c = 20 (solid) and c = 60 (dashed); (D) centre bias distributions for c 2 f20; 80g and  = 1:0; inset shows centre bias distributions as boxplots that mark the median and first and third quartiles; whiskers include points within 1.5 times the interquartile range; outliers are excluded; (E-H) P-P plots comparing nominal (x-axis) versus factual (y-axis) true H rejection rates in time series of length (E) T = 100, (F) T = 50, (G) T = 26, and (H) T = 10; dotted vertical line, nominal = 0:05; dotted horizontal line, factual ^ = 0:05; (I-L) ROC curves depicting false discovery rate (type I error rate; x-axis) versus power (y-axis) for different series lengths as in E-H; dotted vertical line, nominal = 0:05; In E-L, larger filled circles indicate the empirical H rejection rates at a nominal = 0:05, and empty circles indicate where the factual ^  0:05. 0 nominal level for CUSUM, there exists at least one nominal for PARCS such that PARCS has both higher power (fewer type II errors) and lower false detection rate (fewer type I errors), making it the preferable method. This point is explored in more detail later in the context of multiple CP detection in Section 3.2. For shorter time series, PARCS behaves similarly as for longer time series in our simulations, while for CUSUM type I error rates now start to fall below the nominal level as well (Figures 2F-H). The area under the ROC curves become smaller for shorter time series for both methods, but the ROC curve of PARCS consistently lies above that of CUSUM in those cases as well (Figures 2J-L). Next, we examine the behaviour of the tests with dependent noise. In the case of temporally dependent noise, an appropriate block size for the bootstrap pro- cedure could be determined by inspecting the autocorrelation function of x (see Eq. 5 and Algorithm 2). 1000 noise realisations of length T = 100 are drawn from an order-2 MA process with coefficients  = 0:5= and  = 0:4=. 1 2 Since increasing noise variance in the temporally dependent case is not equiva- lent to decreasing the step parameter, we repeat the analysis with the same pa- rameters from the temporally independent case above but varying the step pa- rameter, w 2 f0:7; 0:8; : : : ; 1:3g and considering two levels of Gaussian noise, 2 f0:7; 1:0g. Figure 3 shows results of the comparison for  = 0:7 (top row) and  = 1:0 (bottom row). Similar to the white noise case, the CUSUM method’s centre bias increases for smaller signal-to-noise ratio (smaller w), and PARCS, in comparison, consistently reduces centre bias. For the same nominal level, PARCS misses more of the true CPs than CUSUM for peripheral CPs when  = 0:7 (top panels of Figures 3A,B), but the type II error rate of the two methods is more comparable in the high noise case,  = 1:0 (bottom panels of Figures 3A,B), despite the PARCS method’s more conservative behaviour (far lower type I error rates) in this setting as well. A summary comparison between the two methods is shown in Figure 3C for two exemplary CP locations, c 2 f20; 60g. Despite a significant reduction when using PARCS, both centre bias distributions in the 62% of realisations with a CP identified by the two methods, and with the two hardest parameter settings (c 2 f20; 80g,  = 1:0 and w = 0:7) remain strongly skewed (bottom panel of Figure 3D). So far, we compared PARCS to the CUSUM statistic with = 0. It is intu- itive when developing PARCS to choose = 0 for the CUSUM transformation in Eq. 2, since this directly corresponds to the numerical integral of the time se- ries upon which the PARCS approach is based (but see Section 4). Besides, un- der certain conditions, the CUSUM method using the test statistic with < 0:5 is more sensitive than that with = 0:5 (Antoch et al., 1995). However, the CUSUM statistic with = 0:5 returns the maximum likelihood estimator of CP 15 A B C D 20 4 1 8 4 1 12 30 1 -10 10 20 30 50 0 -5 70 1 1 80 4 1 7 2 1 -2 -10 20 3015 9 4 2 1 321812 7 4 1 1 30 10 4 2 1 10 4 3 1 40 5 1 5 1 -10 10 20 30 50 3 1 2 1 0 60 5 2 4 2 -5 70 11 4 1 11 4 2 1 80 322110 4 2 302213 8 3 2 -2 -10 0.7 0.9 1.1 1.3 0.7 0.9 1.1 1.3 0.7 0.9 1.1 1.3 0 20 40 60 Figure 3: Centre bias in PARCS compared to CUSUM for temporally dependent noise with  = 0:7 (top) and  = 1:0 (bottom); (A,B) bias, hc^ ci, colour-coded as indicated by the colour bar; numbers indicate rounded rounded type II error rates; (C) bias  s.e.m. for c = 20 (solid) and c = 60 (dashed); (D) centre bias distributions for c 2 f20; 80g and  = 1:0; inset shows centre bias distributions as boxplots that mark the median and first and third quartiles; whiskers include points within 1.5 times the interquartile range; outliers are excluded. location in an AMOC scenario when noise in the step model of Eq. 1 is i.i.d. and normally distributed, leading theoretically to the strongest centre bias reduction under those conditions (Antoch & Husk ˇ ova, ´ 2001). We therefore also compare PARCS to this maximum likelihood CUSUM estimator here, henceforth referred to as CUSUM . ML Univerariate time series of length T = 100 are simulated according to the step model Eq. 1 with different ground truth CP locations, c 2 f20; 30; : : : ; 80g. We consider only the scenario with largest white Gaussian noise variance,  = 1:0, in this analysis, for which PARCS showed the largest centre bias. A single CP was identified by using the CUSUM method and estimating the PARCS ML 1 model, both followed by bootstrap significance testing. Other parameters are as in the previous analyses above. CUSUM results in a significant reduction in ML centre bias compared to PARCS in three of the most peripheral ground truth CPs, c 2 f20; 30; 80g, but does not eliminate it completely (see Figure 4A). For the same nominal level, CUSUM also shows lower type II error rates compared ML to PARCS for these CPs as reported in Table 1, but recall that PARCS has a far lower type I error rate than CUSUM for the same choice of nominal (cf. Figures 2E-H and Kirch, 2007). The two methods are comparable in the quality of their 0 A B C -2 -4 30 50 70 15 25 35 8 13 18 20 40 60 80 10 20 30 40 5 10 16 21 Figure 4: Bias  s.e.m. in PARCS compared to CUSUM with time series of ML length (A) T = 100, (B) T = 50, and (C) T = 26; noise is temporally independent with  = 1:0. detections for all other ground truth CP locations, c 2 f40; : : : ; 70g, with PARCS having a slight advantage. In order to assess how well the two methods fare in the small sample size limit, and to characterise the convergence behaviour of the bootstrap procedure in each method, we repeat the same analysis for shorter series lengths, T 2 f50; 26g. Ground truth CPs are set to the same relative location within the time series as in the T = 100 simulations. As summarised in Table 1, detection rates deteriorate as series length decreases, as does the bias relative to series length (where the rel- ative location within the series with respect to the periphery is more relevant than the absolute CP location; see Figures 4B,C). Especially for T = 26, PARCS per- forms mostly better than CUSUM , giving higher detection rates (see Table 1) ML and smaller centre bias (Figure 4C) in the majority of ground truth CPs, although it is still more conservative with near 0% type I error rate (given the bootstrap reso- lution; see Figure 2G). As we show next, this is a particularly important advantage of PARCS over the CUSUM-based methods when detecting multiple CPs, since CUSUM-based techniques rely on dissecting the time series into smaller segments in this case, reducing sample size at each iteration. 3.2 Detecting Multiple CPs in Univariate Data For the scenario with multiple CPs, we assess the performance of PARCS in com- parison to the CUSUM method with standard binary segmentation (Bai, 1997; Scott & Knott, 1974) for univariate data with white Gaussian noise. Standard bi- nary segmentation is known to mislocate CPs in some scenarios, but modifications 17 T method type II error rate c = round( 20 30 40 50 60 70 80 %T ) CUSUM 07 03 02 01 01 03 12 ML PARCS 17 03 01 01 01 03 16 CUSUM 30 20 16 13 16 26 37 ML PARCS 44 22 12 08 10 19 41 CUSUM 53 38 35 32 38 44 59 ML PARCS 68 41 32 24 29 37 58 Table 1: Type II error rates in PARCS compared to CUSUM for different lengths ML of the time series; underline, method with higher detection rate; nominal level, 0.05 for both methods. to the segmentation procedure have been proposed for solving this problem (Fry- zlewicz, 2014). We show that PARCS provides an alternative approach. We then discuss a fundamental practical problem in statistical testing when using segmenta- tion methods in general that is avoided by PARCS. Through comparison with stan- dard binary segmentation, we illustrate conditions under which using such meth- ods becomes infeasible. We then consider temporally dependent noise in univariate time series. The binary segmentation method (Bai, 1997; Scott & Knott, 1974) for detect- ing multiple CPs proceeds as follows (pseudocode can be found in Fryzlewicz, 2014): If, according to a CUSUM test criterion, a CP c^ is detected over the full time series, the series is partitioned at c^ . The procedure is repeated on the result- ing left and right segments, potentially returning two additional CPs, c^ and c^ , 21 22 respectively. The procedure is terminated when no more CPs are detected after subsequent partitioning. Similar to the single CP case, we use bootstrap testing in deciding the significance of a CP at each stage. In the present context, we will refer to this CUSUM-based binary segmentation method simply by ‘CUSUM’. Three different processes with white Gaussian noise,  = 1:0, of the form given by Eq. 6 and of length T = 100 are simulated for 1000 realisations each. A baseline b = 0 and two CPs at time steps c = 20 and c = 60 are set in all three 1 2 scenarios. Weights (w ; w ) are set to (1; 2), (2;1) and (2; 1) (see insets in top 1 2 row of Figure 5). For the present comparison, binary segmentation is terminated after at most one partitioning, as this completely suffices to compare the methods (note that this allows CUSUM to detect up to three potential CPs, with only two 18 20 60 5 20 40 60 80 95 5 20 40 60 80 95 5 20 40 60 80 95 time step time step time step Figure 5: Comparing PARCS to CUSUM with binary segmentation for multiple CP detection; stacked histograms of correct detection rates for CUSUM’s c^ ; c^ ; c^ 1 21 22 (top) and PARCS’ c^ ; c^ ; c^ (bottom) over 1000 realisations; transparent bars show 1 2 3 candidate CPs excluded by the permutation test; dashed grey, ground truth CPs; top inset, deterministic component of time series for the respective column’s scenario; left, centre, and right panels refer to first, second, and third scenario, respectively. present in the series). Similarly, the PARCS model is estimated, and both methods use the corresponding permutation bootstrap test with B = 10000 and k = 1. In order to compare type I and type II error rates between the two methods, we set nominal levels to 0.05 and 0.30 for CUSUM and PARCS, respectively, which is expected to return about the same factual type I error rates of 5% for both methods in series of length T 2 [26; 100], according to Figures 2E-G. A first look at Figure 5 suggests that both CUSUM and PARCS detect CPs that are close to the ground truth. A more detailed comparison with respect to type I and type II error rates and the quality of CP detections is provided in Table 2. The quality of detections using accuracy scores is defined as the correct detection rate 19 T method error rate accuracy score type I type II c = 20%T c = 60%T CUSUM 10 / 14 / 41 08 / 02 / 01 72 / 92 / 77 91 / 78 / 45 PARCS 02 / 03 / 02 04 / 00 / 01 80 / 96 / 95 96 / 74 / 76 CUSUM 12 / 18 / 19 27 / 21 / 11 34 / 56 / 61 74 / 38 / 16 PARCS 04 / 04 / 03 13 / 02 / 05 51 / 82 / 82 85 / 52 / 52 CUSUM 12 / 22 / 18 41 / 44 / 16 28 / 44 / 71 77 / 26 / 25 PARCS 06 / 07 / 06 24 / 09 / 10 37 / 75 / 76 79 / 47 / 47 Table 2: Comparing PARCS to CUSUM with binary segmentation for multiple CP detection for different lengths of the time series; error rates and accuracy scores are rounded; triplet, scenarios 1 / 2 / 3; underline, method with lower error rate or higher accuracy score; nominal levels, 0.05 and 0.30 for CUSUM and PARCS, respectively. within a5%T range from the ground truth CP location, adjusted for type I errors by an additive term of ^=M , where ^ is the factual (empirical) level. This way, the accuracy score is an overall performance measure that takes into account both type I and type II errors, and how far off the detected CP is from the true one. The first scenario (left panels in Figure 5) has the hardest parameter setting, since c is both more peripheral and smaller in magnitude than c . CUSUM first 1 2 detects the easier CP, followed by detecting c at the left hand segment as c^ , but 1 21 with a lower accuracy score as confirmed in Table 2. Similarly, PARCS returns c and c as the first and second rank CPs, respectively, with accuracy scores higher than those of CUSUM. The relatively low accuracy scores at detecting c in both methods are due to its peripheral location and small magnitude. Both type I and type II error rates are markedly lower in PARCS. The second and third scenarios (centre and right panels in Figure 5, respec- tively) are easier in terms of ground truth parameter settings, since the more pe- ripheral CP c has the larger magnitude. In the second scenario (centre panels in Figure 5), the two methods are comparable with regards to their overall accuracy scores as defined above (many detections lie outside the 5%T accuracy score range, especially for c ), but PARCS has the lower type I and type II error rates. CUSUM has a higher rate of false discoveries than in the first scenario. Its first detection is the higher magnitude CP, which comes with a lower accuracy than PARCS, followed by a detection at the right hand segment with a higher accuracy 20 than PARCS. The relatively low accuracy rates for detecting c in both methods are due to its small magnitude. The third scenario (right panels in Figure 5) is an example of a setting in which standard binary segmentation may fail in correctly al- locating CPs (see Fryzlewicz, 2014, for a binary segmentation approach that solves this problem). While the performance of PARCS remains about the same as in the second scenario, CUSUM’s first detection diverges from either of the two ground truth CPs in a large number of realisations (see top-right panel in Figure 5). The large type I error rate (more than three times that in the first and second scenar- ios) markedly reduces the accuracy scores, which are substantially lower than for PARCS for both ground truth CPs (see Table 2). Another factor behind the high type I error rates of any iterative procedure including binary segmentation is that the same CP could be detected again at later iterations when an earlier detection is slightly biased. We now compare the impact of shorter series on the performance of binary segmentation methods and PARCS. We consider 1000 noise realisations from each of the three scenarios with T 2 f50; 26g. Two ground truth CPs are set to the same relative location within the time series as for T = 100. As seen in Table 2, both type I and type II error rates increase in both methods with the decrease in series length, with the exception of type I error rates for CUSUM in the third scenario. PARCS is consistently the superior method, having both higher statistical power and less false discoveries than CUSUM. While accuracy scores predictably decrease with shorter series length, comparison between the two methods remains qualitatively similar to the T = 100 case in the first and third scenario, while there is a marked change in the second scenario: While CUSUM is slightly superior in accurately detecting c for T = 100, PARCS progressively surpasses CUSUM in accuracy as series length decreases. This behaviour is a result of deterioration in the power of the bootstrap test statistic for shorter series. Not only does the overall sample size decrease, but CUSUM in the second scenario with T = 26 is tasked after a potential first detection of c with bootstrapping the CUSUM test statistic on segments as short as 4 or 5 time steps only, which is statistically infeasible, either when using bootstraps or approximate parametric tests (Cho & Fryzlewicz, 2015; Fryzlewicz, 2014; Olshen et al., 2004). We stress that this is a fundamental drawback to any method that relies on partitioning, and is not specific to standard binary segmentation. The limitations of binary segmentation methods become more obvious when noise is temporally dependent. For instance, given a time series of length T = 100 with parameters as in the second scenario and an order-2 MA noise process, blocks of size k ' 3 are required for proper block-permutation. If CUSUM first detected c = 20 accurately, c^ = 20, the left hand segment would be only 20 time steps 1 1 long. This allows for only 7 blocks, yielding 5040 possible permutations. This 21 number drops to 720 permutations had c^ been detected only 2 time steps further to the left, which makes it hard to approximate the EDF of the CUSUM statistic reliably. In addition, specifying the block size first requires estimating the MA process order by approximating the H -conform time series (see Algorithm 2), but potential CPs are not known a priori, due to the recursive nature of binary segmentation methods. Given these considerations, we focus on PARCS only as we now move over to the case of detecting multiple CPs in series with temporally dependent noise. 1000 noise realisations of length T = 100 are drawn from an order-2 MA process with = 0:7 and coefficients  = 0:5= and  = 0:4=. Other parameters are as in 1 2 the previous analysis. In Figure 6, we illustrate distributions of correct detections for the second scenario only (with the other two scenarios qualitatively comparable to their counterparts in Figure 5), but error and accuracy rates on these scenarios are reported as well. After removing the three step changes following PARCS CP detection, the majority of residual time series (70% as shown in bottom panel in Figure 6A) had an autocorrelation that cuts off at the correct order of the ground truth MA(2) noise process, i.e. with acorr(x ; 2) being the last coefficient that lies outside the 95% confidence bounds (dashed lines; top panel in Figure 6A; results for the first and third scenarios are comparable). Block-permutation bootstrap test- ing with nominal = 0:05 and B = 10000 is carried out on these series with blocks of size 3 and, for other time series, according to the estimated order in Fig- ure 6A (with an upper bound of 10 on block size). Exactly two CPs are detected in more than 99.5% of realisations in all scenarios. Figure 6B shows that the distri- bution of correct detections for the second scenario is largely concentrated around the ground truth CPs. Accuracy scores in each scenario are, respectively, 96%, 99%, and 99% for c and 99%, 89%, and 89% for c . Note also the oscillation in 1 2 the example realisation in Figure 6C, which results from dependent noise with a negative MA coefficient  . 3.3 Detecting Multiple CPs in Multivariate Data The PARCS method’s ability to detect multiple CPs in spatially independent, mul- tivariate time series is demonstrated in Figure 7 on 1000 realisations of length T = 100 with N = 9 covariates and white Gaussian noise,  = 1. Parameters are set to b = (0; 0; 0; 2; 2; 2; 0; 1; 2), c = 20, w = w  (1; 2; 2;2; 0; 0; 0; 0; 0), 1 1 0 c = 60 and w = w  (2; 1;1; 0; 1;1; 0; 0; 0). The scaling parameter w con- 2 2 0 0 trols signal-to-noise ratio in the time series and is initially set to 1.0. Given these parameter values, the two CPs are not represented in all covariates of the time se- ries, as exemplified in Figure 7A, rendering CPs harder to detect from the averaged univariate time series (with steps differing in sign across the covariates partially 22 A B C 20 60 time lag 0.4 0.2 -0.2 -0.4 -2 -0.6 1 100 15 30 time 0 3 5 >5 5 20 40 60 80 95 1 2 4 MA order time step step Figure 6: Multiple CP detection in temporally dependent data from the second scenario; (A) estimating MA order; (top) average autocorrelation over time series realisations for different time lags  s:d:; dashed grey, 95% confidence interval; (bottom) ratio of 1000 realisations with a given estimated order; (B) stacked his- tograms of correct detection rates over 1000 realisations; transparent bars show candidate CPs excluded by the permutation test; dashed grey, ground truth CPs; inset, deterministic component of time series; (C) deterministic component of the time series (grey) superimposed on an exemplary time series (blue); bottom panel shows a close up over 16 data points around c ; red line highlights oscillation due to the negative MA coefficient. cancelling each other, resulting in small weights,hw i = 3=9 andhw i = 2=9, for 1 2 the resulting univariate time series). Following PARCS CP detection, augmented by a bootstrap test with nominal = 0:05, B = 10000 and k = 1, exactly two CPs are detected in 99.9% of realisations. Accuracy scores are 99.8% and 98% for c and c , respectively. The 1 2 lower variance in c detections, as seen in Figure 7B, is due to the higher average absolute weight jw j = 7=9 compared to jw j = 6=9. 1 2 We then test the method’s performance for smaller signal-to-noise ratios with w 2 f1:0; 0:9; : : : ; 0:1g. Figure 7C shows that correct detection rates within a 2%T range from the ground truth CPs for different values of w remain above 50%, even for magnitudes as small as w = 0:5. These rates are a result of PARCS leveraging CP information from multiple covariates simultaneously, rather than depleting the signal through averaging. % time series acorr A B 20 60 1.0 0.8 0.6 0.4 0 0.2 -2 1 1 100 5 20 40 60 80 95 5 20 40 60 80 95 time step time step time step Figure 7: Multiple CP detection in spatially independent, multivariate data; (A) deterministic component of the different covariates in the time series (grey) super- imposed on an exemplary time series (blue); (B) stacked histograms of correct de- tection rates over 1000 realisations; transparent bars show candidate CPs excluded by the permutation test; dashed grey, ground truth CPs; (C) stacked histograms of correct detection rates over 1000 realisations given different signal-to-noise ratios (controlled by w ), and binned with 5 time step windows; rates are logarithmically scaled as indicated by the colour bar. 3.4 Detecting Neural Events that Reflect Learning A previous study by one of the authors and colleagues exemplifies the practical value of change point detection in neuroscience (Durstewitz et al., 2010). These authors demonstrated that acquiring a new behavioural rule in rats is accompanied by sudden jumps in behavioural performance, which in turn is reflected in the activ- ity of neural units recorded simultaneously in the medial prefrontal cortex (mPFC). In the current section, we revisit part of these data to showcase PARCS in a real data scenario. Before moving to the demonstration, it is important to note that the data in question are not normally distributed and potentially include linear trends (Durste- witz et al., 2010) not accounted for by the step models in Eqs. 1, 6 and 9. As such, some preprocessing may be necessary for a statistical analysis that is more consis- tent with the step model assumptions (this may include detrending and potentially some mild smoothing with Gaussian kernels; see Durstewitz et al., 2010). How- ever, to keep the present demonstration simple, PARCS was applied directly to the data with minimal preprocessing, which only involves square-root-transforming 24 the neural count data for bringing them closer to a Gaussian distribution and sta- bilising the variance (Kihlberg, Herson, & Schotz, 1972). In order to show that PARCS can still return reasonable CP estimates under these non-Gaussian conditions, we first test its performance on simulated spike count data, before applying it to the empirical data. We simulate 1000 realisations of length T = 100 with N = 9 covariates according to a Poisson process. Parame- ters are set to b = (1; 1; 1; 3; 3; 3; 1; 2; 1), c = 20, w = (1; 2; 2;2; 0; 0; 0; 0; 0), 1 1 c = 60 and w = (2; 1;1; 0; 1;1; 0; 0; 0). This choice of parameters results in 2 2 average firing rates that are comparable in their means to the white Gaussian noise case (cf. Figures 7A and 8A) and to the low firing rates often observed in mPFC neurons. One obvious diversion from Gaussian assumptions in the case of Poisson noise is that the variance is not constant anymore, but is equal to the means within each of the segments separated by true CPs. Following square-root-transforming the data and PARCS CP detection, augmented by a bootstrap test with nominal = 0:05, B = 10000 and k = 1, exactly two CPs are detected in 92% of reali- sations. Accuracy scores are 98% and 70% for c and c , respectively. The lower 1 2 accuracy scores compared to the white Gaussian noise scenario are due to the lower signal-to-noise ratios, resulting from the increase in noise variance with firing rates (cf. Figures 7B and 8B). Nevertheless, these results sufficiently justify the use of PARCS in the present context. We now turn to the experimentally obtained dataset. Six animals were trained on a two-choice deterministic operant rule switching task which proceeds as fol- lows: At the beginning of the session, the animal follows a previously acquired behavioural rule whereby it responds to a visual cue by a lever press for attaining a reward (visual rule). Unknown to the animal, reward contingencies are switched after 20 trials to a novel spatial rule, in which attaining the reward requires press- ing a certain baited lever (right or left), regardless of the visual cue. The session is terminated when the animal reaches a preset criterion that indicates that the new rule behaviour has been learnt. In addition to the binary behavioural data of lever presses over trials, spike counts emitted by mPFC units during the 3 seconds fol- lowing cue onset were collected through single unit recording techniques. Neural and behavioural data from one animal are shown in Figure 9A and 9B, respec- tively. Trials corresponding to the steady state visual and spatial rule (first and last 20 trials, respectively) are not considered in the analysis. Time series with one or two significant CPs were described in the original study byDurstewitz et al. (2010), so PARCS models are estimated for each animal for both the multivariate neural data (multiple response PARCS model; Figure 9A) and the univariate behavioural data (Figure 9B), in addition to PARCS models for the behavioural data as summarised in Figure 9C. As shown in Figure 9B, one neu- ral CP in that exemplary animal matches its behavioural counterpart. The second 25 A B 20 60 1 100 5 20 40 60 80 95 time step time step Figure 8: Multiple CP detection in spatially independent, multivariate, Poisson data; (A) deterministic component of the different covariates in the time series (grey) superimposed on an exemplary time series (blue); y-axis, square-root- transformed spike counts; (B) stacked histograms of correct detection rates over 1000 realisations; transparent bars show candidate CPs excluded by the permuta- tion test; dashed grey, ground truth CPs. neural CP, while not as close to its behavioural counterpart, is only 7 trials apart, and the two are highly correlated across animals, as shown in Figure 9C, concur- ring with the original findings of Durstewitz et al. (2010). Besides the significant correlation, the corresponding black linear regression line lies very close to the di- agonal, indicating that neural and behavioural CPs are not only correlated, but are almost equal. Moreover, those authors report that data from many animals contain at most a single CP (also note the low weight of one of the CPs as estimated from one of the animals using PARCS ). Comparing the PARCS behavioural CP to its 2 1 neural counterpart (blue circles in Figure 9C) shows that correlation remains high and significant. A sample size of s > 5 is usually recommended for evaluating the significance of a correlation. However, the corresponding linear regression line is also close to the diagonal, in further support for the reliability of this result, and in agreement with the original results despite different procedures: For the neural data in the original study, CUSUM-based detection was performed on a multivari- ate discrimination statistic defined across the whole neural population, while here, the model was determined directly from the multiple spike count data. 26 A 30 50 70 90 110 30 50 70 90 110 30 50 70 90 110 time step time step time step B C 1.0 press-right 0.8 0.6 0.4 0.2 press-left 30 50 70 90 110 time step 40 60 80 100 Figure 9: Comparing behavioural and mPFC neural CPs; (A) blue, square-root- transformed spike count data in the three seconds following cue onset from 6 rep- resentative mPFC units of one rat; grey, mean as estimated by inverting the neural multiple response PARCS model. Note potential CP in top-centre unit which was not detected by PARCS since it did not contribute strongly to population-wide CPs; dashed lines, behavioural CPs from the same animal; (B) blue, lever press at each trial; this animal is rewarded for pressing the right lever during the spatial rule; grey, probability of pressing right lever as estimated by inverting the behavioural PARCS model; dashed lines, neural CPs from the same animal (see A); (C) re- lating behavioural and neural CPs; blue, behavioural CPs with higher weight; r, correlation coefficients as computed over all 12 data points (black) and over those where behavioural CPs have the higher weight (blue); p-values, significance levels of corresponding r; black and blue lines, respective least-square linear regression fits to the two sets of data points; red and yellow circles, neural and behavioural CP pairs from the exemplary animal in A and B, respectively. 27 4 Discussion In the current article, we introduced PARCS, a method for detecting multiple step changes, or CPs, in potentially multivariate, temporally dependent data, supported by a bootstrap-based nonparametric test. We also showed that PARCS substan- tially reduces centre bias in estimating CPs compared to the most basic specifica- tion of the CUSUM method, and presented conditions under which it compares to or outperforms the maximum-likelihood CUSUM statistic. Furthermore, we demonstrated that PARCS may achieve higher sensitivity (statistical power) than CUSUM-based methods while at the same time having lower type I errors in multi- ple CP scenarios, mainly because PARCS can make use of the full time series while CUSUM-based methods rely on segmenting the time series for detecting multiple CPs. We finally confirmed previous results pertaining to the acquisition of a new behavioural rule and the role of the medial prefrontal cortex in this process. As already apparent from some of our simulation studies, the basic PARCS method as introduced here leaves room for improvement. In the presence of a single CP, we showed that PARCS strongly reduces the amount of bias toward the centre that results from the direct application of the most basic form of the CUSUM locator statistic. Theoretically-grounded modifications to the CUSUM transforma- tion that reduce this amount of bias rely on down-weighing more centrally-located points (Kirch, 2007). As shown with PARCS, this problem is not quite as se- vere. Nevertheless, since PARCS approximates the CUSUM transformation using a regression model, similar down-weighing could be incorporated into the PARCS procedure as well by using weighted least squares instead of regular least squares (Hastie, Tibshirani, & Friedman, 2009), which is a straightforward amendment. Furthermore, the PARCS method currently requires a liberal guess of the number M of CPs in advance, followed by refinements through nonparametric bootstrap testing. It is desirable, however, especially when no prior information on M is available, to have statistical tests as termination criteria for the forward and back- ward stages. In adaptive regression spline methods (Friedman, 1991; Friedman & Silverman, 1989; Stone et al., 1997), there is strong empirical evidence (Hink- ley, 1969, 1971b) backed by theoretical results (Feder, 1975) that the difference in residual mean-square-error between two nested models that differ in one addi- tional knot is well approximated, albeit conservatively, by a scaled  statistic on 4 degrees of freedom (Friedman, 1991). This led to one nonparametric termina- tion recipe that is based on generalised cross-validation (Craven & Wahba, 1979). Another approach is to infer the piecewise linear regression model with the aid of a parametric test for specifying the number and location of knots, without re- course to iterative procedures (Liu, Wu, & Zidek, 1997). Unfortunately, neither approach is directly applicable to PARCS, since they both require assumptions that 28 are not met in the CUSUM-transformed time series. The CUSUM transformation of the time series is a nonstationary ARMA(1; q) process. Deriving reasonable generalised cross-validation (Craven & Wahba, 1979; Friedman, 1991; Friedman & Silverman, 1989), F-ratio (Durstewitz, 2017; Hastie et al., 2009) or parametric (Liu et al., 1997) test statistics require currently unknown corrections to those tests which account for nonstationarity and the particular form of the ARMA model underlying the CUSUM-transformed data. When multiple CPs are present in the data, PARCS can outperform standard bi- nary segmentation (Bai, 1997; Scott & Knott, 1974). Other segmentation methods also solve the problem of mislocating CPs inherent in the standard procedure (Cho & Fryzlewicz, 2015; Fryzlewicz, 2014; Olshen et al., 2004). Wild binary segmen- tation (WBS; Fryzlewicz, 2014), for instance, relies on sampling local CUSUM transformations of randomly chosen segments of the time series. The candidate CP with the largest value among sampled CUSUM curves is returned to be tested against a criterion, followed by binary segmentation. WBS is preferable to PARCS in that its test statistic and termination criterion when noise is independent are backed up by rigorous theory, and may be the favourable method when segments are large enough for the test statistic to converge. If series of only limited length are available, however, WBS may run into similar problems as standard binary segmentation for CUSUM, since each detection is still followed by partitioning the data further. WBS also, to the best of our knowledge, currently lacks a thorough analysis on the behaviour of its test statistic for dependent data. It is tempting to speculate on the potential for a hybrid method that capitalises on the desirable features of both methods. Computational demands arise in WBS from the need to choose segment range parameters by sampling few thousand CUSUM curves to which PARCS may offer an easy and efficient workaround: Fryzlewicz (2014) demonstrated that the optimal WBS segment choice is the segment bounded by the two CPs closest to the target CP from each side. PARCS could thus provide an informed selection of boundaries by returning candidate CPs in the data and use these to demarcate segments, rather than random sampling as in WBS. In dealing with multivariate data, recent methods tackled the computational de- mands of having a large number of covariates and sparse CP representations (Cho & Fryzlewicz, 2015; Wang & Samworth, 2018). These methods rely on low di- mensional projections of the multivariate CUSUM curve that preserve the CPs and follow this projection by a binary segmentation method. Since PARCS for mul- tivariate time series is also based on the CUSUM transformation, it is straightfor- ward to leverage the computational savings provided by such projection methods in reducing the dimensionality of the PARCS input, while avoiding the drawbacks of binary segmentation methods. This may offer a route for extending PARCS to the important case of multivariate CP detection in mutually dependent time series with 29 spatial dependence, a configuration which these projection methods also consider (Cho & Fryzlewicz, 2015; Wang & Samworth, 2018). Alternatively, nondiagonal covariance structure in multivariate series may be accounted for by extending the PARCS formulation to the multivariate regression spline realm (Friedman, 1991; Stone et al., 1997). Finally, when analysing the neural and behavioural data during the rule switch- ing task, we mentioned that data may also contain trends that are not accounted for by step change time series models (Durstewitz et al., 2010). Caution must be made when analysing real data using CP detection methods in that these meth- ods, PARCS included, assume a step change model underlying the generation of the data and hence may attempt to approximate trends and other nonstationary features by a series of step changes, a point made more explicit by Fryzlewicz (2014) (Durstewitz et al., 2010, therefore removed trends around candidate CPs first). Hence, to avoid wrong conclusions with respect to the source and type of nonstationarity in experimental time series, it may be necessary to either augment change point detection by adequate preprocessing (Durstewitz et al., 2010) or to generalise time series models for CP detection to include other forms of nonsta- tionarity. Author Contributions HT, DD conceived study; HT developed and implemented methods; HT carried out simulations; HT analysed data; HT, DD interpreted results; HT prepared figures; HT wrote manuscript; HT, DD revised and finalised manuscript. Funding This research was funded by grants to DD by the German Research Foundation (DFG) (SPP1665, DU 354/8-2) and through the German Ministry for Education and Research (BMBF) via the e:Med framework (01ZX1311A & 01ZX1314E). Acknowledgements The authors thank Dr. Georgia Koppe and Dr. Eleonora Russo for discussions and Dr. Jeremy Seamans for providing the neural and behavioural data. 30 Data Availability Statement Method implementation will be freely available on an online repository upon pub- lication. References Aminikhanghahi, S., & Cook, D. J. (2017). A survey of methods for time series change point detection. Knowl. Inf. Syst., 51(2), 339–367. doi: 10.1007/ s10115-016-0987-z Antoch, J., Husk ˇ ova, ´ M., & Pra ´sk ˇ ova, ´ Z. (1997). Effect of dependence on statistics for determination of change. J. Stat. Plan. Inference, 60(2), 291–310. doi: 10.1016/S0378-3758(96)00138-3 Antoch, J., Husk ˇ ova, ´ M., & Veraverbeke, N. (1995). Change-point problem and bootstrap. J. Nonparametr. Statist., 5(2), 123–144. doi: 10.1080/ ˇ ´ Antoch, J., & Huskova, M. (2001). Permutation tests in change point analysis. Stat. Probab. Lett., 53(1), 37–46. doi: 10.1016/S0167-7152(01)00009-8 Aston, J. A. D., & Kirch, C. (2012). Evaluating stationarity via change-point alternatives with applications to fMRI data. Ann. Appl. Stat., 6(4), 1906– 1948. doi: 10.1214/12-AOAS565 Bai, J. (1997). Estimating multiple breaks one at a time. Econ. Theory, 13(3), 315–352. doi: 10.1017/S0266466600005831 Basseville, M. (1988). Detecting changes in signals and systems–a survey. Auto- matica, 24(3), 309–326. doi: 10.1016/0005-1098(88)90073-8 Bhattacharya, P. K. (1994). Some aspects of change-point analysis. Lect. Notes Monogr. Ser., 23, 28–56. doi: 10.1214/lnms/1215463112 Brown, R. L., Durbin, J., & Evans, J. M. (1975). Techniques for testing the constancy of regression relationships over time. J. R. Stat. Soc. Series B Stat. Methodol., 149–192. Chen, J., & Gupta, A. K. (2012). Parametric statistical change point analysis: With applications to genetics, medicine, and finance. Basel, Switzerland: Springer Science+Business Media, LLC. Chernoff, H., & Zacks, S. (1964). Estimating the current mean of a normal dis- tribution which is subjected to changes in time. Ann. Math. Stat., 35(3), 999–1018. doi: 10.1214/aoms/1177700517 Cho, H., & Fryzlewicz, P. (2015). Multiple-change-point detection for high dimen- sional time series via sparsified binary segmentation. J. R. Stat. Soc. Series B Stat. Methodol., 77(2), 475–507. doi: 10.1111/rssb.12079 31 Craven, P., & Wahba, G. (1979). Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math., 31(4), 317–403. doi: 10.1007/BF01404567 Csor ¨ go, ¨ M., & Horvath, ´ L. (1997). Limit theorems in change-point analysis (Vol. 18). New York, NY: John Wiley & Sons Inc. Davison, A. C., & Hinkley, D. V. (1997). Bootstrap methods and their application. New York, NY: Cambridge University Press. Dumbgen, L. (1991). The asymptotic behavior of some nonparametric change- point estimators. Ann. Stat., 1471–1495. doi: 10.1214/aos/1176348257 Durstewitz, D. (2017). Advanced data analysis in neuroscience: Integrating sta- tistical and computational models. Springer International Publishing. Durstewitz, D., Vittoz, N. M., Floresco, S. B., & Seamans, J. K. (2010). Abrupt transitions between prefrontal neural ensemble states accompany behavioral transitions during rule learning. Neuron, 66(3), 438–448. doi: 10.1016/ j.neuron.2010.03.029 Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Ann. Stat., 32(2), 407–499. doi: 10.1214/009053604000000067 Elsner, J. B., Niu, X., & Jagger, T. H. (2004). Detecting shifts in hurricane rates using a Markov chain Monte Carlo approach. J. Clim., 17(13), 2652–2666. doi: 10.1175/1520-0442(2004)017h2652:DSIHRUi2.0.CO;2 Fan, J., & Yao, Q. (2003). Nonlinear time series: nonparametric and parametric methods. New York, NY: Springer. Fan, Z., Dror, R. O., Mildorf, T. J., Piana, S., & Shaw, D. E. (2015). Identifying localized changes in large systems: Change-point detection for biomolecular simulations. Proc. Natl. Acad. Sci. U.S.A., 112(24), 7454–9. doi: 10.1073/ pnas.1415846112 Feder, P. I. (1975). The log likelihood ratio in segmented regression. Ann. Stat., 84–97. Friedman, J. H. (1991). Multivariate adaptive regression splines. Ann. Stat., 19(1), 1–67. doi: 10.1214/aos/1176347963 Friedman, J. H., & Silverman, B. W. (1989). Flexible parsimonious smoothing and additive modeling. Technometrics, 31(1), 3–21. doi: 10.1080/00401706 .1989.10488470 Fryzlewicz, P. (2014). Wild binary segmentation for multiple change-point detec- tion. Ann. Stat., 42(6), 2243–2281. doi: 10.1214/14-AOS1245 Gartner ¨ , M., Duvarci, S., Roeper, J., & Schneider, G. (2017). Detecting joint pausiness in parallel spike trains. J. Neurosci. Methods, 285, 69–81. doi: 10.1016/j.jneumeth.2017.05.008 Gombay, E., & Horvath, ´ L. (1996). On the rate of approximations for maximum likelihood tests in change-point models. J. Multivar. Anal., 56(1), 120–152. 32 doi: 10.1006/jmva.1996.0007 Hamilton, J. D. (1994). Time series analysis. Princeton, NJ: Princeton University Press. Hanks, T. D., & Summerfield, C. (2017). Perceptual decision making in rodents, monkeys, and humans. Neuron, 93(1), 15–31. doi: 10.1016/j.neuron.2016 .12.003 Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learn- ing: Data mining, inference, and prediction (second ed.). New York, NY: Springer. Hinkley, D. V. (1969). Inference about the intersection in two-phase regression. Biometrika, 56(3), 495–504. doi: 10.1093/biomet/56.3.495 Hinkley, D. V. (1971a). Inference about the change-point from cumulative sum tests. Biometrika, 58(3), 509–523. doi: 10.1093/biomet/58.3.509 Hinkley, D. V. (1971b). Inference in two-phase regression. J. Am. Stat. Ass., 66(336), 736–743. doi: 10.1080/01621459.1971.10482337 Horvath, ´ L. (1997). Detection of changes in linear sequences. Ann. Inst. Stat. Math., 49(2), 271–283. doi: 10.1023/A:1003110912735 Husk ˇ ova, ´ M. (2004). Permutation principle and bootstrap in change point analysis. In L. Horvath ´ & B. Szyszkowicz (Eds.), Asymptotic methods in stochastics: Festschrift for Miklos ´ Csor ¨ go ˝ (Vol. 44, pp. 273–291). American Mathemat- ical Soc. Husk ˇ ova, ´ M., & Slaby, ` A. (2001). Permutation tests for multiple changes. Kyber- netika, 37(5), 605–622. Jirak, M. (2012). Change-point analysis in increasing dimension. J. Multivar. Anal., 111, 136–159. doi: 10.1016/j.jmva.2012.05.007 Kendall, M. G., & Stuart, A. (1983). The advanced theory of statistics (Vol. 3). London, UK: Griffin. Kihlberg, J., Herson, J., & Schotz, W. (1972). Square root transformation revisited. Appl. Statist., 20, 76–81. doi: 10.1214/aos/1176343000 Kirch, C. (2007). Block permutation principles for the change analysis of de- pendent data. J. Stat. Plan. Inference, 137(7), 2453–2474. doi: 10.1016/ j.jspi.2006.09.026 Latimer, K. W., Yates, J. L., Meister, M. L. R., Huk, A. C., & Pillow, J. W. (2015). Single-trial spike trains in parietal cortex reveal discrete steps during decision-making. Science, 349(6244), 184–187. doi: 10.1126/ science.aaa4056 Liu, J., Wu, S., & Zidek, J. V. (1997). On segmented multivariate regression. Stat. Sin., 497–525. Lombard, F., & Hart, J. (1994). The analysis of change-point data with depen- dent errors. Lect. Notes Monogr. Ser., 23, 194–209. doi: doi:10.1214/lnms/ 33 1215463125 Matteson, D. S., & James, N. A. (2014). A nonparametric approach for multiple change point analysis of multivariate data. J. Am. Stat. Assoc., 109(505), 334–345. doi: 10.1080/01621459.2013.849605 Olshen, A. B., Venkatraman, E., Lucito, R., & Wigler, M. (2004). Circular bi- nary segmentation for the analysis of array-based DNA copy number data. Biostatistics, 5(4), 557–572. doi: 10.1093/biostatistics/kxh008 Page, E. (1954). Continuous inspection schemes. Biometrika, 41(1/2), 100–115. doi: 10.2307/2333009 Paillard, D. (1998). The timing of pleistocene glaciations from a simple multiple- state climate model. Nature, 391(6665), 378. doi: 10.1038/34891 Picard, D. (1985). Testing and estimating change-points in time series. Adv. Appl. Probab., 17(4), 841–867. doi: 10.1017/S0001867800015433 Powell, N. J., & Redish, A. D. (2016). Representational changes of latent strategies in rat medial prefrontal cortex precede changes in behaviour. Nat. Commun., 7(12830). doi: 10.1038/ncomms12830 Quandt, R. E. (1958). The estimation of the parameters of a linear regression system obeying two separate regimes. J. Am. Stat. Assoc., 53(284), 873– 880. doi: 10.1080/01621459.1958.10501484 Roitman, J. D., & Shadlen, M. N. (2002). Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. J. Neurosci., 22(21), 9475–9489. Scott, A. J., & Knott, M. (1974). A cluster analysis method for grouping means in the analysis of variance. Biometrics, 30(3), 507–512. doi: 10.2307/2529204 Shah, S. P., Lam, W. L., Ng, R. T., & Murphy, K. P. (2007). Modeling recurrent DNA copy number alterations in array CGH data. Bioinformatics, 23(13), i450–i458. doi: 10.1093/bioinformatics/btm221 Shumway, R. H., & Stoffer, D. S. (2010). Time series analysis and its applications: With R examples (3rd ed.). New York, NY: Springer. Smith, A. C., Frank, L. M., Wirth, S., Yanike, M., Hu, D., Kubota, Y., . . . Brown, E. N. (2004). Dynamic analysis of learning in behavioral experiments. J. Neurosci., 24(2), 447–461. Smith, P. L. (1982). Curve fitting and modeling with splines using statistical variable selection techniques (NASA Report 166034). Langley Research Center, Hampton, VA. Stock, J. H., & Watson, M. W. (2014). Estimating turning points using large data sets. J. Econom., 178, 368–381. doi: 10.1016/j.jeconom.2013.08.034 Stone, C. J., Hansen, M. H., Kooperberg, C., & Truong, Y. K. (1997). Poly- nomial splines and their tensor products in extended linear modeling: 1994 Wald memorial lecture. Ann. Stat., 25(4), 1371–1470. doi: 10.1214/aos/ 34 1031594728 Strogatz, S. H. (2001). Nonlinear dynamics and chaos: With applications to physics, biology, chemistry, and engineering. Perseus Books Publishing. Vert, J.-P., & Bleakley, K. (2010). Fast detection of multiple change-points shared by many signals using group LARS. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, & A. Culotta (Eds.), Advances in neural in- formation processing systems 23 (pp. 2343–2351). Curran Associates, Inc. Wang, T., & Samworth, R. J. (2018). High dimensional change point estimation via sparse projection. J. R. Stat. Soc. Series B Stat. Methodol., 80(1), 57–83. doi: 10.1111/rssb.12243 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Statistics arXiv (Cornell University)

Detecting Multiple Change Points Using Adaptive Regression Splines with Application to Neural Recordings

Statistics , Volume 2018 (1802) – Feb 10, 2018

Loading next page...
 
/lp/arxiv-cornell-university/detecting-multiple-change-points-using-adaptive-regression-splines-rIh3pPPJc0
ISSN
1662-5196
eISSN
ARCH-3347
DOI
10.3389/fninf.2018.00067
Publisher site
See Article on Publisher Site

Abstract

Time series, as frequently the case in neuroscience, are rarely stationary, but often exhibit abrupt changes due to attractor transitions or bifurcations in the dynamical systems producing them. A plethora of methods for detecting such change points in time series statistics have been developed over the years, in addition to test crite- ria to evaluate their significance. Issues to consider when developing change point analysis methods include computational demands, difficulties arising from either limited amount of data or a large number of covariates, and arriving at statistical tests with sufficient power to detect as many changes as contained in potentially high-dimensional time series. Here, a general method called Paired Adaptive Re- gressors for Cumulative Sum is developed for detecting multiple change points in the mean of multivariate time series. The method’s advantages over alternative approaches are demonstrated through a series of simulation experiments. This is followed by a real data application to neural recordings from rat medial prefrontal cortex during learning. Finally, the method’s flexibility to incorporate useful fea- tures from state-of-the-art change point detection techniques is discussed, along with potential drawbacks and suggestions to remedy them. Keywords: change point, cumulative sum, adaptive regression splines, nonstation- ary, bootstrap test, block-permutation, behaviour, spike counts arXiv:1802.03627v3 [stat.ME] 3 Sep 2018 1 Introduction Stationary data are the exception rather than the rule in many areas of science (As- ton & Kirch, 2012; Elsner, Niu, & Jagger, 2004; Z. Fan, Dror, Mildorf, Piana, & Shaw, 2015; Gartner ¨ , Duvarci, Roeper, & Schneider, 2017; Latimer, Yates, Meis- ter, Huk, & Pillow, 2015; Paillard, 1998; Shah, Lam, Ng, & Murphy, 2007; Stock & Watson, 2014). Time series statistics often change, sometimes abruptly, due to transitions in the underlying system dynamics, adaptive processes or external fac- tors. In neuroscience, both behavioural time series (Durstewitz, Vittoz, Floresco, & Seamans, 2010; Powell & Redish, 2016; A. C. Smith et al., 2004) and their neu- ral correlates (Durstewitz et al., 2010; Gartner ¨ et al., 2017; Latimer et al., 2015; Powell & Redish, 2016; Roitman & Shadlen, 2002) exhibit strongly nonstationary features which relate to important cognitive processes such as learning (Durste- witz et al., 2010; Powell & Redish, 2016; A. C. Smith et al., 2004) and perceptual decision making (Hanks & Summerfield, 2017; Latimer et al., 2015; Roitman & Shadlen, 2002). As such, identifying nonstationary features in behavioural and neural time series becomes necessary, both for interpreting the data in relation to the potential influences generating those features, and for removing those features from the data in order to perform statistical analyses that assume stationary obser- vations (Hamilton, 1994; Shumway & Stoffer, 2010). Abrupt jumps in time series statistics form one important class of nonstationary events. These are often caused by bifurcations, which, in turn, may occur with gradual changes in parameters of the underlying system (Strogatz, 2001). Consequently, they are of wide interest to both statistical data analysis and the study of dynamical systems, and are com- monly referred to as change points (CP; Chen & Gupta, 2012). Detecting CPs has a long and varied history in statistics, and we will not at- tempt to exhaustively survey the different approaches, including regression models (Brown, Durbin, & Evans, 1975; Quandt, 1958), Bayesian techniques (Chernoff & Zacks, 1964) and cumulative sum (CUSUM) statistics (Basseville, 1988; Page, 1954), to name but a few, within the limited scope of this article. Instead, we refer the reader to the excellent reviews on the topic (Aminikhanghahi & Cook, 2017; Bhattacharya, 1994; Chen & Gupta, 2012) and focus on the offline CUSUM class of methods (Hinkley, 1971a) to which PARCS belongs (as opposed to sequential CUSUM methods, Page, 1954, that locate a CP online, while the time series is evolving), specifically methods that aim at detecting CPs in the mean of the time series. CUSUM-based methods are powerful, easy to implement, and are backed up by an extensive literature, theoretical results and various extensions to multiple CPs and multivariate scenarios, making them an ideal starting point. These meth- ods assume that the time series is piecewise stationary in the statistic under consid- eration (e.g., piecewise constant mean) and rely on a cumulative sum transforma- 2 tion of the time series. Commonly, at-most-one-change (AMOC) is identified by maximum-type statistics (Kirch, 2007) at the extremum of the curve resulting from that transformation (Antoch, Husk ˇ ova, ´ & Veraverbeke, 1995; Basseville, 1988). Extending the CUSUM method to multiple CPs usually involves repetitive par- titioning of the time series upon each detection (binary segmentation methods; Bai, 1997; Cho & Fryzlewicz, 2015; Fryzlewicz, 2014; Olshen, Venkatraman, Lucito, & Wigler, 2004; Scott & Knott, 1974). This segmentation procedure, however, may hamper detection in later iterations as the reduction in number of observations depletes statistical power exponentially fast as more CPs are to be retrieved. In this article, we develop the PARCS (Paired Adaptive Regressors for Cumulative Sum) method which offers a straightforward extension that leverages the full time series in order to detect multiple CPs, thus providing a new solution to this issue. PARCS rests on the fact that a CUSUM transformation of the data relates to computing an integral transformation of the piecewise constant mean time series model, result- ing in a piecewise linear mean function that bends at potential CPs and could be approximated by adaptive regression spline methods (Friedman, 1991; Friedman & Silverman, 1989; Stone, Hansen, Kooperberg, & Truong, 1997). Namely, rather than attempting to approximate the discontinuous time series mean directly (Efron, Hastie, Johnstone, & Tibshirani, 2004; Vert & Bleakley, 2010), the PARCS model is an approximation to the continuous CUSUM-transformed time series by a piece- wise linear function. The bending points of the PARCS model are each defined by a pair of non-overlapping piecewise linear regression splines that are first selected by a two-stage iterative procedure. The PARCS model is further refined by a nonparametric CP significance test based on bootstraps (Antoch & Husk ˇ ova, ´ 2001; Dumbgen, 1991; Husk ˇ ova, ´ 2004; Kirch, 2007; Matteson & James, 2014). While analytically derived parametric tests may usually be preferable over bootstrap-based tests due to better convergence and coverage of the tails, in the current CP setting closed form expressions for paramet- ric tests are hard to come by and are usually replaced by approximations (Gombay & Horvath, ´ 1996; Horvath, ´ 1997). In this case, tests based on bootstraps are prefer- able since they are known to converge faster to the limit distribution of the test statistic (often they are also not as conservative as parametric approximations for datasets of a relatively small size; Antoch & Husk ˇ ova, ´ 2001; Csor ¨ go ¨ & Horvath, ´ 1997; Kirch, 2007). In order to accommodate the possibility of temporally depen- dent noise in the data (Antoch, Husk ˇ ova, ´ & Pra ´sk ˇ ova, ´ 1997; Horvath, ´ 1997; Picard, 1985), model selection is carried out by a nonparametric block-permutation boot- strap procedure (Davison & Hinkley, 1997; Husk ˇ ova ´ & Slaby, ` 2001; Kirch, 2007) developed specifically for PARCS, which relies on a test statistic that quantifies the amount of bending at each candidate CP. Since model estimation is based on linear regression, PARCS is also effortlessly extended to spatially independent, 3 multivariate time series. The article is structured as follows. Section 2.1 introduces the CUSUM method for AMOC detection. We then develop the PARCS method, presenting in Section 2.2 the procedure for inferring a nested model that allows for significance testing of multiple CPs, followed in Section 2.3 by an outline of the nonparametric per- mutation test procedure for refining the PARCS model further. Results in Section 3 illustrate that PARCS improves on several issues inherent in classical methods for change point analysis. In Section 3.1, we compare the PARCS approach to the CUSUM method in detecting a single CP, followed in Section 3.2 by a comparison with standard binary segmentation in detecting multiple CPs. We also demonstrate in Section3.3 that PARCS is successful in detecting CPs in spatially independent, multivariate time series. We then present in Section 3.4 an example from the neu- rosciences, in which neural and behavioural CPs are compared during operant rule- switching learning (Durstewitz et al., 2010). Finally, we discuss in Section 4 the PARCS approach in relation to other state-of-the-art CP detection methods, along with drawbacks and potential extensions. 2 Methods This section outlines the CUSUM method and the PARCS extension to multiple CPs, in addition to a nonparametric permutation technique to test for the statis- tical significance of CPs as identified by PARCS. For generality, the formulation assumes temporally dependent observations in the time series, independent obser- vations being a special case. 2.1 CUSUM: Cumulative Sum of Differences to the Mean A class of methods for identifying a single CP in the mean relies on computing a CUSUM transformation of the time series x = fx g . A useful formulation that t 1:T allows for dependent observations in the time series is given by the moving average (MA) step model (Antoch et al., 1997; Horvath, ´ 1997; Kirch, 2007; Lombard & Hart, 1994), x = b + w 1 +   ;  = 1;   N (0;  ); (1) t tc  t 0 t where a jump in the time series mean from baseline b to b + w occurs after time step c, the change point. The step parameter or weight w is positive (negative) when the time series mean increases (decreases) following c. The largest integer such that noise coefficient  6= 0 defines a finite order q of the MA process, which 4 is 0 for temporally independent observations. We will assume that the MA process is stationary, which will always be the case if it is finite, with  independent and identically distributed (i.i.d.) random variables (for an infinite process, points x for t  0 may be considered unobserved, and coefficients  have to fulfil certain conditions to make the process stationary, as given, for instance, in Shumway & Stoffer, 2010). The Gaussian noise assumption in the MA process can be relaxed, as long as the noise process has zero mean and finite, constant variance (see Antoch et al., 1997; Horvath, ´ 1997; Kirch, 2007; Lombard & Hart, 1994, for theoretical results on the more general form of dependent noise). The discrete Heaviside step function, 1 , is defined by, tc 1 if i > 0; 1 = 0 otherwise: Identifying the presence of a CP requires testing the null hypothesis, H : w = 0, against the alternative, H : w 6= 0 (Antoch et al., 1995; Lombard & Hart, 1994). This begins by inferring the time of the step according to a CP locator statistic. A typical offline CP locator statistic is the maximum point of the weighted absolute cumulative sum of differences to the mean (Antoch & Husk ˇ ova, ´ 2001; Horvath, ´ 1997), c^ = arg max x hxi ; (2) t(T t) 0<t<T =1 where hxi is the arithmetic mean of the time series (see Figure 1A). The first term on the right-hand side corrects for bias toward the centre, where more centrally- located points are down-weighed by an amount controlled by parameter 2 [0; 0:5]. Other CUSUM-based locator statistics exist with different bias-correcting terms and cumulative sum transformations (Antoch et al., 1997; Bhattacharya, 1994; Ji- rak, 2012; Kirch, 2007). As outlined in the Discussion, PARCS may be modified to include such bias-correcting terms as well. However, as we will demonstrate, PARCS can significantly reduce centre bias even without recourse to such a term. To show this, we will mostly deal with the generic case, = 0, when comparing PARCS to the CUSUM transformation as defined in Eq. 2. This has the added ad- vantage of avoiding having to select an optimal power or an optimal weight factor, a choice that usually depends on prior assumptions on the CP’s potential loca- tion (Bhattacharya, 1994). As such, and unless stated otherwise, the term CUSUM transformation will refer, thereof, to the cumulative sum of differences to the mean, y , x hxi ; (3) =1 5 A B C D Figure 1: Paired Adaptive Regressors for Cumulative Sum; (A,B) time series x with (A) one or (B) two step changes and their corresponding CUSUM transfor- mation y; (C) fitting y by a piecewise linear model y^ using two pairs of regressors h and h ; (D) the PARCS model fit y^ to the CUSUM transformation y of a time 1 2 series x, returning estimates of multiple CPs, c^ and c^ . 1 2 where the maximum value, S = max x hxi = max jy j; (4) 0<t<T 0<t<T =1 defines a test statistic by which it is decided whether to reject the null hypothesis. Given potentially dependent observations, q > 0, as defined by the model in Eq. 1, nonparametric bootstrap testing proceeds by block-permutation (Davison & Hinkley, 1997; Husk ˇ ova, ´ 2004; Kirch, 2007), such that temporal dependence in the data is preserved (see Section 3.2). The candidate CP c^ is identified according to Eq. 2 and its associated test statistic S is computed by Eq. 4. Estimates b and w^ are retrieved from the arithmetic means of x before and after c^ using the model in Eq. 1. By subtracting w^  1 from the time series x we arrive at a time series x tc^ 0 that provides an estimate of the null distribution. The stationary time series x is split into n blocks of size k, chosen such that temporal dependencies are mostly preserved in the permuted time series (Davison & Hinkley, 1997). One way to do so is to select the block size to be larger than the order of the underlying MA process, q + 1 (since the autocorrelation function of an MA(q) process cuts off at order q; Davison & Hinkley, 1997). This requires identifying the order q which can 6 be determined from the H -conform time series x by inspecting its autocorrela- 0 0 tion function (J. Fan & Yao, 2003) for different time lags  . The autocorrelation function’s asymptotic distribution (Kendall & Stuart, 1983), acorr(x ; )  N 1=(T ); +1=(T ) ; (5) provides a test statistic for deciding the largest time lag q at which to reject the null hypothesis H : acorr(x ; q) = 0, given some preset significance level 2 [0; 1]. 0 0 The resulting blocks are randomly permuted and each permutation is CUSUM- transformed according to Eq. 3 to compute an H -conform sample S of the test 0 0 statistic S in Eq. 4 (note that we do not know the true step parameter w or the true CP c, of course, such that this procedure will only yield an estimate of the H dis- tribution). A sufficiently large number B of permutations results in samples S of an H -conform empirical distribution function (EDF) F(S ) , 1 =B 0 0 S S i=1 0 i that weighs every sample S equally. The candidate CP c^ is detected when the test statistic S as computed from the original time series x satisfies S  F (1 ), where is a preset significance level and F (1 ) the inverse of the EDF, de- th fined as the (1 )B largest value out of B permutations (Davison & Hinkley, 1997; Durstewitz, 2017). 2.2 PARCS: Paired Adaptive Regressors for Cumulative Sum The PARCS method for estimating multiple CPs rests on the fact that the integral of a piecewise constant function is piecewise linear. The AMOC model as defined in Eq. 1 assumes a piecewise stationary MA process, consisting of two segments with constant mean. A process consisting of M + 1 segments generalises Eq. 1 to data containing at-most-M -change, X X x = b + w  1 +   ;  = 1;   N (0;  ): (6) t m tc  t 0 t m=1 The CUSUM transformation y = fy g of this process as given by Eq. 3 t 1:T corresponds to the numerical integration of a piecewise stationary process xhxi. That is, y is approximately (due to the noise) piecewise linear (exactly piecewise linear in the mean; see Figure 1B). If points fc g at which y bends were 1:M known, the latter can be fitted by a weighted sum of local piecewise linear basis functions or splines, centred at the knotsfc g , m 1:M ( ( t c if t > c c t if t < c m m m m h = and h = : t;c t;c m m 0 otherwise 0 otherwise 7 This fit corresponds to modelling the expected value of y, conditioned on spline pair setH = fh g , resulting in model inference, 1:M M M X X + + ^ ^ ^ E y H  y^ = + h + h ; t 0 t;M m t;c m t;c m m m=1 m=1 which is a simple regression problem that can be solved by estimating the intercept ^ ^ and coefficients that minimise the mean-square-error, mse (y; y^ ) = (y y^ ) : (7) M M t t;M t=1 However, in the multiple CP detection setting (assuming M is known), optimal knot placement is not known a priori, but can be inferred by adaptively adjusting knot locations (Friedman, 1991; Friedman & Silverman, 1989; Stone et al., 1997) to maximally satisfy the goodness-of-fit criterion in Eq. 7. In other words, and as shown in Figures 1C,D, the problem of identifying multiple CPs is replaced by the equivalent problem of inferring the order-M PARCS model (or PARCS model), M M X X + + ^ ^ ^ y^ = + h + h ; (8) M 0 m m c^ c^ m m m=1 m=1 with associated M -tuple c^ , (c^ ) that best fits the CUSUM transformation of m 1:M the time series. Regression coefficients in model 8 are real numbers, while knots in the present time series context are positive integers, excluding the first and last time steps, c^ 2 f2; 3; : : : ; T 1g. Fitting the PARCS model is based on a forward/backward spline selection strategy (P. L. Smith, 1982) with added CP ranking stage and proceeds as outlined in Algorithm 1. Starting with an empty PARCS model, containing only the in- tercept , a forward sweep increases model complexity to a forward upper bound order L > M by adding at each iteration the spline pair h , not yet contained in the model, that decreases residual mean-square-error the most. A reasonable heuristic for setting L is 2 to 3 times M (assuming M is known or given some liberal guess). This is followed by a backward pruning iteration, in which the spline pair whose removal increases residual mean-square-error the least is dropped from the model. Pruning removes those knots that were added at the beginning of the forward phase which became redundant as the model was refined by later additions (Friedman & Silverman, 1989). This stage continues until the number of knots reaches the preset final upper bound of model complexity M , i.e., LM knots are pruned. Knots are then sorted in descending order according to the amount of explained variance. The 8 Input: L; M and y Output: c^ , (c^ ) and y^ m 1:M M c^;H ? for m 1 to L do // forward stage c^ arg min mse (y; y^ ) h 2= H m m 1<c<T c H[h H H[ h c^ for m L to M + 1 do // pruning stage c^ arg min mse (y; y^ ) m1 Hnh c^ c^nc^ and H Hnh c^ P P M + M ^ ^ ^ y^ + h + h M 0 m=1 m m=1 m c^ c^ m m for m M to 1 do // ranking stage c^ arg min mse (y; y ) m m1 Hnh H Hnh c^ Algorithm 1: Procedure for inferring the PARCS model with for- ward/backward spline selection (first/second loop) and CP ranking (third loop). Regression coefficients are computed by least squares estimation, con- ditioned on the set of knot locations of predefined size M that minimises mean-square-error. Final knot locations are specified by eliminating spuri- ous knots through block-permutation bootstrapping as described in Section 2.3. ranking iteration returns a nested model by pruning the PARCS model further, down to the PARCS model. The first knot to be pruned, reducing the number of knots to M 1, explains the least variance and is placed last as c^ in the M -tuple c^. The last knot to be pruned explains the most variance and is placed first as c^ . Note that regression coefficients are re-estimated every time a knot is added to or removed from the PARCS model. The model can be effortlessly extended to the multiple response setting in the case of spatially independent time series (extension to a nondiagonal MA covari- ance matrix, Stone et al., 1997, will be considered elsewhere). Given N indepen- dent, piecewise stationary MA processes with common CPsfc g , m 1:M X X x = b + w 1 +   ;  = 1;   N (0;  ); (9) t;n n mn tc  t;n 0 t;n m=1 0 where n = 1; : : : ; N , the corresponding multivariate CUSUM transformation y = fy g is fitted by the multiple response, PARCS model, conditioned on com- t;n 1:N M 9 mon spline pairs, M M X X + + ^ ^ ^ E y H  y^ = + h + h ; t;n t;M;n 0n mn mn t;c^ t;c^ m m m=1 m=1 using Algorithm 1. Returning CPs that are common to all variables x is done by using the goodness-of-fit criterion in Eq. 7, averaged over all responses y . 2.3 PARCS Model Selection by Block-Permutation Bootstrap The piecewise linear PARCS formulation, Eq. 8, of the CUSUM transformation in Eq. 3 bends at the CPs. Due to the presence of noise in the original time series x, some noise realisations may appear as slight bends in the CUSUM-transformed time series, leading PARCS to return false CPs. As such, the amount of bending at knot c^ can be used as a test statistic for bootstrap significance testing that can refine the PARCS model further. No bending indicates either a constant fit, + + ^ ^ ^ ^ = = 0, or a smooth linear fit, = (see also Figure 1C). Thus, a m m m m suitable test statistic that quantifies the amount of bending at c^ is given by, ^ ^ S = + ; (10) m m where for multivariate time series, the test statistic is the average over all time series. Before describing the block-permutation bootstrap method for PARCS, we out- line a procedure for identifying the order q of the MA noise process, provided as pseudocode in Algorithm 2. First, an H -conform time series x = fx g is 0 0 t;0 1:T computed by regressing out the PARCS model y^ of Eq. 8 from the CUSUM- transformed time series y and then inverting the CUSUM transformation. This is followed by inspecting the autocorrelation function of x for different time lags  . The largest time lag at which the null hypothesis H : acorr(x ; q) = 0 is rejected, 0 0 given some preset significance level 2 [0; 1] is then returned as the order q, given some predefined upper bound of MA order, Q. Given the M -tuple CP set c^ returned by Algorithm 1 and an estimate of the dependent normal noise order q by Algorithm 2, a block-permutation bootstrap test returns the subset &^ of significant CPs, as outlined in Algorithm 3. First, an H -conform time series x is computed. For each CP c^ 2 c^, starting with the 0 0 m one ranked highest, all CP-splines already deemed significant by the bootstrap test are regressed out of y. A PARCS model with the remaining knots, including c^ , is estimated and the test statistic S, evaluated at c^ according to Eq. 10, is computed. Knot c^ is tested for significance against an H -conform EDF, estimated through m 0 block-permutation bootstrapping: A total of B bootstrap samples is generated from 10 Input: y; c^; Q and Output: q  Q q Q P P M M + + ^ ^ ^ y y + h + h 0 0 m=1 m m=1 m c^ c^ m m x y y +hxi for t = 1; : : : ; T where y = 0 t;0 t;0 t1;0 0;0 for  1 to Q do F CDF N 1=(T ); +1=(T ) 1 1 if acorr(x ; ) 2 F ( =2); F (1 =2) then q  1 break Algorithm 2: Identifying the order q of the MA process, given some upper bound Q. The H -conform time series x is estimated before entering the 0 0 loop. The loop increases the autocorrelation time lag and exits when the autocorrelation of x is not significantly different from 0 anymore. the H -conform series x by randomly permuting blocks of size k = q + 1. For 0 0 each of these i = 1 : : : B bootstrap samples test statistic S is evaluated at knot location c^ , yielding an EDF F(S ) which assigns equal probability 1=B to each m 0 bootstrapped S . A significant c^ is then added to &^, or rejected as false discovery i m otherwise. The procedure repeats for the CP next in the rank order. Similar to Algorithm 1, regression coefficients are re-estimated every time a knot is added to or removed from the PARCS model. 3 Results We first evaluate the PARCS method on synthetic data in single and multiple CP detection settings, followed by a real data example on detecting behavioural and neural change points during rule learning. 3.1 Alleviating CUSUM Bias in AMOC Detection We first compare the CUSUM method for detecting a single CP to the PARCS approach in order to evaluate the effect of each method on the centre bias in CP detection. Both white and MA Gaussian noise are considered. We also compare PARCS to the CUSUM locator statistic of Eq. 2 with = 0:5 (the maximum likelihood estimator of CP location under the assumption of i.i.d. Gaussian noise) and identify conditions under which one method is preferable over the other. 11 Input: y; c^; k; B and Output: &^ c^ &^ ? P P M M + + ^ ^ ^ y y + h + h 0 0 m=1 m m=1 m c^ c^ m m x y y +hxi for t = 1; : : : ; T where y = 0 t;0 t;0 t1;0 0;0 for m 1 to M do P P j&^j j&^j ^ ^ ^ y y + h + h c^n&^ =1  &^ =1  &^ P P M M + + ^ ^ ^ y^ + h + h ^ ^ 0 cn& =m c^ =m c^ ^ ^ S + m m F(S ) BlockPermutationBootstrap(x ; c^ ; k; B) 0 0 m if S  F (1 ) then &^ &^[ c^ Algorithm 3: Block-permutation bootstrap procedure for PARCS, given block size k. The H -conform time series x is estimated before entering 0 0 the loop. The loop iterates over the rank-ordered CPs to test for each CP’s significance. Univariate time series of length T = 100 are simulated according to the step model in Eq. 1 with different levels of white Gaussian noise,  2 f0:4; 0:5; : : : ; 1:0g, and different ground truth CP locations, c 2 f20; 30; : : : ; 80g. Baseline is set to b = 0 and step parameter to w = 1. Note that in the step model with white Gaus- sian noise, increasing  is equivalent to reducing w. A single CP was identified by using the CUSUM method and estimating the PARCS model, both followed by bootstrap significance testing with B = 10000 permutations, nominal signifi- cance level = 0:05, and blocks of size k = 1 (since noise is independent in this example). Each parameter configuration was repeated for 1000 noise realisations. We compare bias in CP detection toward the centre of the time series in both the CUSUM and PARCS methods. We measure this centre bias by cb = (2 1 1) (c c^), which is positive when estimate c^ falls onto the side located cT=2 toward the centre from c, and is negative otherwise. As expected given the choice = 0 in Eq. 2, the CUSUM method shows a strong centre bias which increases for lower signal-to-noise ratio and more peripheral CPs (see Figure 2A). The CUSUM method’s power decreases for harder parameter settings (higher  and more periph- eral c) in that true CPs are missed in more of the realisations. The PARCS method results in a significant reduction in centre bias but does not eliminate it completely, and yields more misses relative to CUSUM if both are run at the same nominal level (see Figure 2B). Summary comparison between the two methods is shown in 12 Figure 2C for two exemplary CP locations, c 2 f20; 60g. To fully appreciate the source of CUSUM centre bias and its reduction by PARCS, time series realisations with the two hardest parameter settings ( = 1:0 and c 2 f20; 80g) are considered in Figure 2D, which compares the distribution of cb in the 81% of realisations in which both CUSUM and PARCS returned a CP. The histograms show a strongly skewed, heavy-tailed distribution for CUSUM, compared to a more symmetric distribution around 0 for PARCS, indicating only little bias. Most of the centre bias in PARCS is accounted for by outliers. This is illustrated by excluding outliers in the boxplots, which show a median of 1 time step centre bias in PARCS against median centre bias of 4 time steps in the case of CUSUM. Note that measuring centre bias as defined above does not differenti- ate between biased detections and false discoveries where, in extreme cases, a CP may be detected beyond the middle point T=2 of the time series, corresponding to centre bias greater than jc T=2j. However, this scenario rarely occurred in the simulation results reported here. While PARCS reduces centre bias, the simulation results above indicate that it behaves more conservatively than CUSUM at the same nominal level. In prin- ciple, false H rejection rates (type II errors) may be reduced by adjusting the level, at the same time producing more false discoveries. In order to assess how well the nominal significance level agrees with the empirical type I error rate (false discoveries), 1000 white Gaussian noise realisations of length T = 100 are simulated with  = 1:0. Conclusions drawn from this analysis are largely the same for larger signal-to-noise ratio (results not shown). A single CP was extracted us- ing the CUSUM method and estimating the PARCS model on these time series conforming to the null hypothesis H : w = 0. Type I error rates at different nominal levels are shown in Figure 2E as probability-probability (P-P) plots, depicting the nominal, 1 , against the empirical, 1 ^, probabilities of ac- cepting the null hypothesis when the null hypothesis is true. While the empirical type I error rate of CUSUM perfectly agrees with the nominal significance level, for PARCS, in contrast, the empirical rate of false discoveries tends toward 0% for = 0:05 and remains smaller than 1% for as large as 0:18. This entails that PARCS behaves highly conservatively, and that the nominal level may be ad- justed considerably upward without strongly influencing the false discovery rate. On the other hand, despite being more conservative, Figure 2I shows that the re- ceiver operating characteristic (ROC) curve for PARCS, depicting the method’s false discovery rate against its power for different nominal levels, consistently lies above that of CUSUM. For estimating the statistical power of each method, 1000 white Gaussian noise realisations with one CP at a random location in the range [20; 80]%T (and w =  = 1) are simulated and type II error rates at dif- ferent nominal levels are computed. This ROC analysis indicates that for every 13 A B C D 20 2 7 12 2 5 13 17 30 30 2 1 2 6 40 2 1 2 -10 10 20 30 50 0 60 1 -2 -4 70 2 1 3 0 -6 80 1 6 12 2 3 1014 -2 0.4 0.6 0.8 1 0.4 0.6 0.8 1 0.4 0.6 0.8 1 0 20 40 60 E F G .70.75.80.85.90.95 .70.75.80.85.90.95 .70.75.80.85.90.95 .70.75.80.85.90.95 I J K L .05.20 .40 .60 .80 .95 .05.20 .40 .60 .80 .95 .05.20 .40 .60 .80 .95 .05.20 .40 .60 .80 .95 Figure 2: Centre bias in PARCS compared to CUSUM for temporally independent noise; (A,B) bias, hc^ ci, colour-coded as indicated by the colour bar; numbers indicate rounded type II error rates; (C) bias  s.e.m. for c = 20 (solid) and c = 60 (dashed); (D) centre bias distributions for c 2 f20; 80g and  = 1:0; inset shows centre bias distributions as boxplots that mark the median and first and third quartiles; whiskers include points within 1.5 times the interquartile range; outliers are excluded; (E-H) P-P plots comparing nominal (x-axis) versus factual (y-axis) true H rejection rates in time series of length (E) T = 100, (F) T = 50, (G) T = 26, and (H) T = 10; dotted vertical line, nominal = 0:05; dotted horizontal line, factual ^ = 0:05; (I-L) ROC curves depicting false discovery rate (type I error rate; x-axis) versus power (y-axis) for different series lengths as in E-H; dotted vertical line, nominal = 0:05; In E-L, larger filled circles indicate the empirical H rejection rates at a nominal = 0:05, and empty circles indicate where the factual ^  0:05. 0 nominal level for CUSUM, there exists at least one nominal for PARCS such that PARCS has both higher power (fewer type II errors) and lower false detection rate (fewer type I errors), making it the preferable method. This point is explored in more detail later in the context of multiple CP detection in Section 3.2. For shorter time series, PARCS behaves similarly as for longer time series in our simulations, while for CUSUM type I error rates now start to fall below the nominal level as well (Figures 2F-H). The area under the ROC curves become smaller for shorter time series for both methods, but the ROC curve of PARCS consistently lies above that of CUSUM in those cases as well (Figures 2J-L). Next, we examine the behaviour of the tests with dependent noise. In the case of temporally dependent noise, an appropriate block size for the bootstrap pro- cedure could be determined by inspecting the autocorrelation function of x (see Eq. 5 and Algorithm 2). 1000 noise realisations of length T = 100 are drawn from an order-2 MA process with coefficients  = 0:5= and  = 0:4=. 1 2 Since increasing noise variance in the temporally dependent case is not equiva- lent to decreasing the step parameter, we repeat the analysis with the same pa- rameters from the temporally independent case above but varying the step pa- rameter, w 2 f0:7; 0:8; : : : ; 1:3g and considering two levels of Gaussian noise, 2 f0:7; 1:0g. Figure 3 shows results of the comparison for  = 0:7 (top row) and  = 1:0 (bottom row). Similar to the white noise case, the CUSUM method’s centre bias increases for smaller signal-to-noise ratio (smaller w), and PARCS, in comparison, consistently reduces centre bias. For the same nominal level, PARCS misses more of the true CPs than CUSUM for peripheral CPs when  = 0:7 (top panels of Figures 3A,B), but the type II error rate of the two methods is more comparable in the high noise case,  = 1:0 (bottom panels of Figures 3A,B), despite the PARCS method’s more conservative behaviour (far lower type I error rates) in this setting as well. A summary comparison between the two methods is shown in Figure 3C for two exemplary CP locations, c 2 f20; 60g. Despite a significant reduction when using PARCS, both centre bias distributions in the 62% of realisations with a CP identified by the two methods, and with the two hardest parameter settings (c 2 f20; 80g,  = 1:0 and w = 0:7) remain strongly skewed (bottom panel of Figure 3D). So far, we compared PARCS to the CUSUM statistic with = 0. It is intu- itive when developing PARCS to choose = 0 for the CUSUM transformation in Eq. 2, since this directly corresponds to the numerical integral of the time se- ries upon which the PARCS approach is based (but see Section 4). Besides, un- der certain conditions, the CUSUM method using the test statistic with < 0:5 is more sensitive than that with = 0:5 (Antoch et al., 1995). However, the CUSUM statistic with = 0:5 returns the maximum likelihood estimator of CP 15 A B C D 20 4 1 8 4 1 12 30 1 -10 10 20 30 50 0 -5 70 1 1 80 4 1 7 2 1 -2 -10 20 3015 9 4 2 1 321812 7 4 1 1 30 10 4 2 1 10 4 3 1 40 5 1 5 1 -10 10 20 30 50 3 1 2 1 0 60 5 2 4 2 -5 70 11 4 1 11 4 2 1 80 322110 4 2 302213 8 3 2 -2 -10 0.7 0.9 1.1 1.3 0.7 0.9 1.1 1.3 0.7 0.9 1.1 1.3 0 20 40 60 Figure 3: Centre bias in PARCS compared to CUSUM for temporally dependent noise with  = 0:7 (top) and  = 1:0 (bottom); (A,B) bias, hc^ ci, colour-coded as indicated by the colour bar; numbers indicate rounded rounded type II error rates; (C) bias  s.e.m. for c = 20 (solid) and c = 60 (dashed); (D) centre bias distributions for c 2 f20; 80g and  = 1:0; inset shows centre bias distributions as boxplots that mark the median and first and third quartiles; whiskers include points within 1.5 times the interquartile range; outliers are excluded. location in an AMOC scenario when noise in the step model of Eq. 1 is i.i.d. and normally distributed, leading theoretically to the strongest centre bias reduction under those conditions (Antoch & Husk ˇ ova, ´ 2001). We therefore also compare PARCS to this maximum likelihood CUSUM estimator here, henceforth referred to as CUSUM . ML Univerariate time series of length T = 100 are simulated according to the step model Eq. 1 with different ground truth CP locations, c 2 f20; 30; : : : ; 80g. We consider only the scenario with largest white Gaussian noise variance,  = 1:0, in this analysis, for which PARCS showed the largest centre bias. A single CP was identified by using the CUSUM method and estimating the PARCS ML 1 model, both followed by bootstrap significance testing. Other parameters are as in the previous analyses above. CUSUM results in a significant reduction in ML centre bias compared to PARCS in three of the most peripheral ground truth CPs, c 2 f20; 30; 80g, but does not eliminate it completely (see Figure 4A). For the same nominal level, CUSUM also shows lower type II error rates compared ML to PARCS for these CPs as reported in Table 1, but recall that PARCS has a far lower type I error rate than CUSUM for the same choice of nominal (cf. Figures 2E-H and Kirch, 2007). The two methods are comparable in the quality of their 0 A B C -2 -4 30 50 70 15 25 35 8 13 18 20 40 60 80 10 20 30 40 5 10 16 21 Figure 4: Bias  s.e.m. in PARCS compared to CUSUM with time series of ML length (A) T = 100, (B) T = 50, and (C) T = 26; noise is temporally independent with  = 1:0. detections for all other ground truth CP locations, c 2 f40; : : : ; 70g, with PARCS having a slight advantage. In order to assess how well the two methods fare in the small sample size limit, and to characterise the convergence behaviour of the bootstrap procedure in each method, we repeat the same analysis for shorter series lengths, T 2 f50; 26g. Ground truth CPs are set to the same relative location within the time series as in the T = 100 simulations. As summarised in Table 1, detection rates deteriorate as series length decreases, as does the bias relative to series length (where the rel- ative location within the series with respect to the periphery is more relevant than the absolute CP location; see Figures 4B,C). Especially for T = 26, PARCS per- forms mostly better than CUSUM , giving higher detection rates (see Table 1) ML and smaller centre bias (Figure 4C) in the majority of ground truth CPs, although it is still more conservative with near 0% type I error rate (given the bootstrap reso- lution; see Figure 2G). As we show next, this is a particularly important advantage of PARCS over the CUSUM-based methods when detecting multiple CPs, since CUSUM-based techniques rely on dissecting the time series into smaller segments in this case, reducing sample size at each iteration. 3.2 Detecting Multiple CPs in Univariate Data For the scenario with multiple CPs, we assess the performance of PARCS in com- parison to the CUSUM method with standard binary segmentation (Bai, 1997; Scott & Knott, 1974) for univariate data with white Gaussian noise. Standard bi- nary segmentation is known to mislocate CPs in some scenarios, but modifications 17 T method type II error rate c = round( 20 30 40 50 60 70 80 %T ) CUSUM 07 03 02 01 01 03 12 ML PARCS 17 03 01 01 01 03 16 CUSUM 30 20 16 13 16 26 37 ML PARCS 44 22 12 08 10 19 41 CUSUM 53 38 35 32 38 44 59 ML PARCS 68 41 32 24 29 37 58 Table 1: Type II error rates in PARCS compared to CUSUM for different lengths ML of the time series; underline, method with higher detection rate; nominal level, 0.05 for both methods. to the segmentation procedure have been proposed for solving this problem (Fry- zlewicz, 2014). We show that PARCS provides an alternative approach. We then discuss a fundamental practical problem in statistical testing when using segmenta- tion methods in general that is avoided by PARCS. Through comparison with stan- dard binary segmentation, we illustrate conditions under which using such meth- ods becomes infeasible. We then consider temporally dependent noise in univariate time series. The binary segmentation method (Bai, 1997; Scott & Knott, 1974) for detect- ing multiple CPs proceeds as follows (pseudocode can be found in Fryzlewicz, 2014): If, according to a CUSUM test criterion, a CP c^ is detected over the full time series, the series is partitioned at c^ . The procedure is repeated on the result- ing left and right segments, potentially returning two additional CPs, c^ and c^ , 21 22 respectively. The procedure is terminated when no more CPs are detected after subsequent partitioning. Similar to the single CP case, we use bootstrap testing in deciding the significance of a CP at each stage. In the present context, we will refer to this CUSUM-based binary segmentation method simply by ‘CUSUM’. Three different processes with white Gaussian noise,  = 1:0, of the form given by Eq. 6 and of length T = 100 are simulated for 1000 realisations each. A baseline b = 0 and two CPs at time steps c = 20 and c = 60 are set in all three 1 2 scenarios. Weights (w ; w ) are set to (1; 2), (2;1) and (2; 1) (see insets in top 1 2 row of Figure 5). For the present comparison, binary segmentation is terminated after at most one partitioning, as this completely suffices to compare the methods (note that this allows CUSUM to detect up to three potential CPs, with only two 18 20 60 5 20 40 60 80 95 5 20 40 60 80 95 5 20 40 60 80 95 time step time step time step Figure 5: Comparing PARCS to CUSUM with binary segmentation for multiple CP detection; stacked histograms of correct detection rates for CUSUM’s c^ ; c^ ; c^ 1 21 22 (top) and PARCS’ c^ ; c^ ; c^ (bottom) over 1000 realisations; transparent bars show 1 2 3 candidate CPs excluded by the permutation test; dashed grey, ground truth CPs; top inset, deterministic component of time series for the respective column’s scenario; left, centre, and right panels refer to first, second, and third scenario, respectively. present in the series). Similarly, the PARCS model is estimated, and both methods use the corresponding permutation bootstrap test with B = 10000 and k = 1. In order to compare type I and type II error rates between the two methods, we set nominal levels to 0.05 and 0.30 for CUSUM and PARCS, respectively, which is expected to return about the same factual type I error rates of 5% for both methods in series of length T 2 [26; 100], according to Figures 2E-G. A first look at Figure 5 suggests that both CUSUM and PARCS detect CPs that are close to the ground truth. A more detailed comparison with respect to type I and type II error rates and the quality of CP detections is provided in Table 2. The quality of detections using accuracy scores is defined as the correct detection rate 19 T method error rate accuracy score type I type II c = 20%T c = 60%T CUSUM 10 / 14 / 41 08 / 02 / 01 72 / 92 / 77 91 / 78 / 45 PARCS 02 / 03 / 02 04 / 00 / 01 80 / 96 / 95 96 / 74 / 76 CUSUM 12 / 18 / 19 27 / 21 / 11 34 / 56 / 61 74 / 38 / 16 PARCS 04 / 04 / 03 13 / 02 / 05 51 / 82 / 82 85 / 52 / 52 CUSUM 12 / 22 / 18 41 / 44 / 16 28 / 44 / 71 77 / 26 / 25 PARCS 06 / 07 / 06 24 / 09 / 10 37 / 75 / 76 79 / 47 / 47 Table 2: Comparing PARCS to CUSUM with binary segmentation for multiple CP detection for different lengths of the time series; error rates and accuracy scores are rounded; triplet, scenarios 1 / 2 / 3; underline, method with lower error rate or higher accuracy score; nominal levels, 0.05 and 0.30 for CUSUM and PARCS, respectively. within a5%T range from the ground truth CP location, adjusted for type I errors by an additive term of ^=M , where ^ is the factual (empirical) level. This way, the accuracy score is an overall performance measure that takes into account both type I and type II errors, and how far off the detected CP is from the true one. The first scenario (left panels in Figure 5) has the hardest parameter setting, since c is both more peripheral and smaller in magnitude than c . CUSUM first 1 2 detects the easier CP, followed by detecting c at the left hand segment as c^ , but 1 21 with a lower accuracy score as confirmed in Table 2. Similarly, PARCS returns c and c as the first and second rank CPs, respectively, with accuracy scores higher than those of CUSUM. The relatively low accuracy scores at detecting c in both methods are due to its peripheral location and small magnitude. Both type I and type II error rates are markedly lower in PARCS. The second and third scenarios (centre and right panels in Figure 5, respec- tively) are easier in terms of ground truth parameter settings, since the more pe- ripheral CP c has the larger magnitude. In the second scenario (centre panels in Figure 5), the two methods are comparable with regards to their overall accuracy scores as defined above (many detections lie outside the 5%T accuracy score range, especially for c ), but PARCS has the lower type I and type II error rates. CUSUM has a higher rate of false discoveries than in the first scenario. Its first detection is the higher magnitude CP, which comes with a lower accuracy than PARCS, followed by a detection at the right hand segment with a higher accuracy 20 than PARCS. The relatively low accuracy rates for detecting c in both methods are due to its small magnitude. The third scenario (right panels in Figure 5) is an example of a setting in which standard binary segmentation may fail in correctly al- locating CPs (see Fryzlewicz, 2014, for a binary segmentation approach that solves this problem). While the performance of PARCS remains about the same as in the second scenario, CUSUM’s first detection diverges from either of the two ground truth CPs in a large number of realisations (see top-right panel in Figure 5). The large type I error rate (more than three times that in the first and second scenar- ios) markedly reduces the accuracy scores, which are substantially lower than for PARCS for both ground truth CPs (see Table 2). Another factor behind the high type I error rates of any iterative procedure including binary segmentation is that the same CP could be detected again at later iterations when an earlier detection is slightly biased. We now compare the impact of shorter series on the performance of binary segmentation methods and PARCS. We consider 1000 noise realisations from each of the three scenarios with T 2 f50; 26g. Two ground truth CPs are set to the same relative location within the time series as for T = 100. As seen in Table 2, both type I and type II error rates increase in both methods with the decrease in series length, with the exception of type I error rates for CUSUM in the third scenario. PARCS is consistently the superior method, having both higher statistical power and less false discoveries than CUSUM. While accuracy scores predictably decrease with shorter series length, comparison between the two methods remains qualitatively similar to the T = 100 case in the first and third scenario, while there is a marked change in the second scenario: While CUSUM is slightly superior in accurately detecting c for T = 100, PARCS progressively surpasses CUSUM in accuracy as series length decreases. This behaviour is a result of deterioration in the power of the bootstrap test statistic for shorter series. Not only does the overall sample size decrease, but CUSUM in the second scenario with T = 26 is tasked after a potential first detection of c with bootstrapping the CUSUM test statistic on segments as short as 4 or 5 time steps only, which is statistically infeasible, either when using bootstraps or approximate parametric tests (Cho & Fryzlewicz, 2015; Fryzlewicz, 2014; Olshen et al., 2004). We stress that this is a fundamental drawback to any method that relies on partitioning, and is not specific to standard binary segmentation. The limitations of binary segmentation methods become more obvious when noise is temporally dependent. For instance, given a time series of length T = 100 with parameters as in the second scenario and an order-2 MA noise process, blocks of size k ' 3 are required for proper block-permutation. If CUSUM first detected c = 20 accurately, c^ = 20, the left hand segment would be only 20 time steps 1 1 long. This allows for only 7 blocks, yielding 5040 possible permutations. This 21 number drops to 720 permutations had c^ been detected only 2 time steps further to the left, which makes it hard to approximate the EDF of the CUSUM statistic reliably. In addition, specifying the block size first requires estimating the MA process order by approximating the H -conform time series (see Algorithm 2), but potential CPs are not known a priori, due to the recursive nature of binary segmentation methods. Given these considerations, we focus on PARCS only as we now move over to the case of detecting multiple CPs in series with temporally dependent noise. 1000 noise realisations of length T = 100 are drawn from an order-2 MA process with = 0:7 and coefficients  = 0:5= and  = 0:4=. Other parameters are as in 1 2 the previous analysis. In Figure 6, we illustrate distributions of correct detections for the second scenario only (with the other two scenarios qualitatively comparable to their counterparts in Figure 5), but error and accuracy rates on these scenarios are reported as well. After removing the three step changes following PARCS CP detection, the majority of residual time series (70% as shown in bottom panel in Figure 6A) had an autocorrelation that cuts off at the correct order of the ground truth MA(2) noise process, i.e. with acorr(x ; 2) being the last coefficient that lies outside the 95% confidence bounds (dashed lines; top panel in Figure 6A; results for the first and third scenarios are comparable). Block-permutation bootstrap test- ing with nominal = 0:05 and B = 10000 is carried out on these series with blocks of size 3 and, for other time series, according to the estimated order in Fig- ure 6A (with an upper bound of 10 on block size). Exactly two CPs are detected in more than 99.5% of realisations in all scenarios. Figure 6B shows that the distri- bution of correct detections for the second scenario is largely concentrated around the ground truth CPs. Accuracy scores in each scenario are, respectively, 96%, 99%, and 99% for c and 99%, 89%, and 89% for c . Note also the oscillation in 1 2 the example realisation in Figure 6C, which results from dependent noise with a negative MA coefficient  . 3.3 Detecting Multiple CPs in Multivariate Data The PARCS method’s ability to detect multiple CPs in spatially independent, mul- tivariate time series is demonstrated in Figure 7 on 1000 realisations of length T = 100 with N = 9 covariates and white Gaussian noise,  = 1. Parameters are set to b = (0; 0; 0; 2; 2; 2; 0; 1; 2), c = 20, w = w  (1; 2; 2;2; 0; 0; 0; 0; 0), 1 1 0 c = 60 and w = w  (2; 1;1; 0; 1;1; 0; 0; 0). The scaling parameter w con- 2 2 0 0 trols signal-to-noise ratio in the time series and is initially set to 1.0. Given these parameter values, the two CPs are not represented in all covariates of the time se- ries, as exemplified in Figure 7A, rendering CPs harder to detect from the averaged univariate time series (with steps differing in sign across the covariates partially 22 A B C 20 60 time lag 0.4 0.2 -0.2 -0.4 -2 -0.6 1 100 15 30 time 0 3 5 >5 5 20 40 60 80 95 1 2 4 MA order time step step Figure 6: Multiple CP detection in temporally dependent data from the second scenario; (A) estimating MA order; (top) average autocorrelation over time series realisations for different time lags  s:d:; dashed grey, 95% confidence interval; (bottom) ratio of 1000 realisations with a given estimated order; (B) stacked his- tograms of correct detection rates over 1000 realisations; transparent bars show candidate CPs excluded by the permutation test; dashed grey, ground truth CPs; inset, deterministic component of time series; (C) deterministic component of the time series (grey) superimposed on an exemplary time series (blue); bottom panel shows a close up over 16 data points around c ; red line highlights oscillation due to the negative MA coefficient. cancelling each other, resulting in small weights,hw i = 3=9 andhw i = 2=9, for 1 2 the resulting univariate time series). Following PARCS CP detection, augmented by a bootstrap test with nominal = 0:05, B = 10000 and k = 1, exactly two CPs are detected in 99.9% of realisations. Accuracy scores are 99.8% and 98% for c and c , respectively. The 1 2 lower variance in c detections, as seen in Figure 7B, is due to the higher average absolute weight jw j = 7=9 compared to jw j = 6=9. 1 2 We then test the method’s performance for smaller signal-to-noise ratios with w 2 f1:0; 0:9; : : : ; 0:1g. Figure 7C shows that correct detection rates within a 2%T range from the ground truth CPs for different values of w remain above 50%, even for magnitudes as small as w = 0:5. These rates are a result of PARCS leveraging CP information from multiple covariates simultaneously, rather than depleting the signal through averaging. % time series acorr A B 20 60 1.0 0.8 0.6 0.4 0 0.2 -2 1 1 100 5 20 40 60 80 95 5 20 40 60 80 95 time step time step time step Figure 7: Multiple CP detection in spatially independent, multivariate data; (A) deterministic component of the different covariates in the time series (grey) super- imposed on an exemplary time series (blue); (B) stacked histograms of correct de- tection rates over 1000 realisations; transparent bars show candidate CPs excluded by the permutation test; dashed grey, ground truth CPs; (C) stacked histograms of correct detection rates over 1000 realisations given different signal-to-noise ratios (controlled by w ), and binned with 5 time step windows; rates are logarithmically scaled as indicated by the colour bar. 3.4 Detecting Neural Events that Reflect Learning A previous study by one of the authors and colleagues exemplifies the practical value of change point detection in neuroscience (Durstewitz et al., 2010). These authors demonstrated that acquiring a new behavioural rule in rats is accompanied by sudden jumps in behavioural performance, which in turn is reflected in the activ- ity of neural units recorded simultaneously in the medial prefrontal cortex (mPFC). In the current section, we revisit part of these data to showcase PARCS in a real data scenario. Before moving to the demonstration, it is important to note that the data in question are not normally distributed and potentially include linear trends (Durste- witz et al., 2010) not accounted for by the step models in Eqs. 1, 6 and 9. As such, some preprocessing may be necessary for a statistical analysis that is more consis- tent with the step model assumptions (this may include detrending and potentially some mild smoothing with Gaussian kernels; see Durstewitz et al., 2010). How- ever, to keep the present demonstration simple, PARCS was applied directly to the data with minimal preprocessing, which only involves square-root-transforming 24 the neural count data for bringing them closer to a Gaussian distribution and sta- bilising the variance (Kihlberg, Herson, & Schotz, 1972). In order to show that PARCS can still return reasonable CP estimates under these non-Gaussian conditions, we first test its performance on simulated spike count data, before applying it to the empirical data. We simulate 1000 realisations of length T = 100 with N = 9 covariates according to a Poisson process. Parame- ters are set to b = (1; 1; 1; 3; 3; 3; 1; 2; 1), c = 20, w = (1; 2; 2;2; 0; 0; 0; 0; 0), 1 1 c = 60 and w = (2; 1;1; 0; 1;1; 0; 0; 0). This choice of parameters results in 2 2 average firing rates that are comparable in their means to the white Gaussian noise case (cf. Figures 7A and 8A) and to the low firing rates often observed in mPFC neurons. One obvious diversion from Gaussian assumptions in the case of Poisson noise is that the variance is not constant anymore, but is equal to the means within each of the segments separated by true CPs. Following square-root-transforming the data and PARCS CP detection, augmented by a bootstrap test with nominal = 0:05, B = 10000 and k = 1, exactly two CPs are detected in 92% of reali- sations. Accuracy scores are 98% and 70% for c and c , respectively. The lower 1 2 accuracy scores compared to the white Gaussian noise scenario are due to the lower signal-to-noise ratios, resulting from the increase in noise variance with firing rates (cf. Figures 7B and 8B). Nevertheless, these results sufficiently justify the use of PARCS in the present context. We now turn to the experimentally obtained dataset. Six animals were trained on a two-choice deterministic operant rule switching task which proceeds as fol- lows: At the beginning of the session, the animal follows a previously acquired behavioural rule whereby it responds to a visual cue by a lever press for attaining a reward (visual rule). Unknown to the animal, reward contingencies are switched after 20 trials to a novel spatial rule, in which attaining the reward requires press- ing a certain baited lever (right or left), regardless of the visual cue. The session is terminated when the animal reaches a preset criterion that indicates that the new rule behaviour has been learnt. In addition to the binary behavioural data of lever presses over trials, spike counts emitted by mPFC units during the 3 seconds fol- lowing cue onset were collected through single unit recording techniques. Neural and behavioural data from one animal are shown in Figure 9A and 9B, respec- tively. Trials corresponding to the steady state visual and spatial rule (first and last 20 trials, respectively) are not considered in the analysis. Time series with one or two significant CPs were described in the original study byDurstewitz et al. (2010), so PARCS models are estimated for each animal for both the multivariate neural data (multiple response PARCS model; Figure 9A) and the univariate behavioural data (Figure 9B), in addition to PARCS models for the behavioural data as summarised in Figure 9C. As shown in Figure 9B, one neu- ral CP in that exemplary animal matches its behavioural counterpart. The second 25 A B 20 60 1 100 5 20 40 60 80 95 time step time step Figure 8: Multiple CP detection in spatially independent, multivariate, Poisson data; (A) deterministic component of the different covariates in the time series (grey) superimposed on an exemplary time series (blue); y-axis, square-root- transformed spike counts; (B) stacked histograms of correct detection rates over 1000 realisations; transparent bars show candidate CPs excluded by the permuta- tion test; dashed grey, ground truth CPs. neural CP, while not as close to its behavioural counterpart, is only 7 trials apart, and the two are highly correlated across animals, as shown in Figure 9C, concur- ring with the original findings of Durstewitz et al. (2010). Besides the significant correlation, the corresponding black linear regression line lies very close to the di- agonal, indicating that neural and behavioural CPs are not only correlated, but are almost equal. Moreover, those authors report that data from many animals contain at most a single CP (also note the low weight of one of the CPs as estimated from one of the animals using PARCS ). Comparing the PARCS behavioural CP to its 2 1 neural counterpart (blue circles in Figure 9C) shows that correlation remains high and significant. A sample size of s > 5 is usually recommended for evaluating the significance of a correlation. However, the corresponding linear regression line is also close to the diagonal, in further support for the reliability of this result, and in agreement with the original results despite different procedures: For the neural data in the original study, CUSUM-based detection was performed on a multivari- ate discrimination statistic defined across the whole neural population, while here, the model was determined directly from the multiple spike count data. 26 A 30 50 70 90 110 30 50 70 90 110 30 50 70 90 110 time step time step time step B C 1.0 press-right 0.8 0.6 0.4 0.2 press-left 30 50 70 90 110 time step 40 60 80 100 Figure 9: Comparing behavioural and mPFC neural CPs; (A) blue, square-root- transformed spike count data in the three seconds following cue onset from 6 rep- resentative mPFC units of one rat; grey, mean as estimated by inverting the neural multiple response PARCS model. Note potential CP in top-centre unit which was not detected by PARCS since it did not contribute strongly to population-wide CPs; dashed lines, behavioural CPs from the same animal; (B) blue, lever press at each trial; this animal is rewarded for pressing the right lever during the spatial rule; grey, probability of pressing right lever as estimated by inverting the behavioural PARCS model; dashed lines, neural CPs from the same animal (see A); (C) re- lating behavioural and neural CPs; blue, behavioural CPs with higher weight; r, correlation coefficients as computed over all 12 data points (black) and over those where behavioural CPs have the higher weight (blue); p-values, significance levels of corresponding r; black and blue lines, respective least-square linear regression fits to the two sets of data points; red and yellow circles, neural and behavioural CP pairs from the exemplary animal in A and B, respectively. 27 4 Discussion In the current article, we introduced PARCS, a method for detecting multiple step changes, or CPs, in potentially multivariate, temporally dependent data, supported by a bootstrap-based nonparametric test. We also showed that PARCS substan- tially reduces centre bias in estimating CPs compared to the most basic specifica- tion of the CUSUM method, and presented conditions under which it compares to or outperforms the maximum-likelihood CUSUM statistic. Furthermore, we demonstrated that PARCS may achieve higher sensitivity (statistical power) than CUSUM-based methods while at the same time having lower type I errors in multi- ple CP scenarios, mainly because PARCS can make use of the full time series while CUSUM-based methods rely on segmenting the time series for detecting multiple CPs. We finally confirmed previous results pertaining to the acquisition of a new behavioural rule and the role of the medial prefrontal cortex in this process. As already apparent from some of our simulation studies, the basic PARCS method as introduced here leaves room for improvement. In the presence of a single CP, we showed that PARCS strongly reduces the amount of bias toward the centre that results from the direct application of the most basic form of the CUSUM locator statistic. Theoretically-grounded modifications to the CUSUM transforma- tion that reduce this amount of bias rely on down-weighing more centrally-located points (Kirch, 2007). As shown with PARCS, this problem is not quite as se- vere. Nevertheless, since PARCS approximates the CUSUM transformation using a regression model, similar down-weighing could be incorporated into the PARCS procedure as well by using weighted least squares instead of regular least squares (Hastie, Tibshirani, & Friedman, 2009), which is a straightforward amendment. Furthermore, the PARCS method currently requires a liberal guess of the number M of CPs in advance, followed by refinements through nonparametric bootstrap testing. It is desirable, however, especially when no prior information on M is available, to have statistical tests as termination criteria for the forward and back- ward stages. In adaptive regression spline methods (Friedman, 1991; Friedman & Silverman, 1989; Stone et al., 1997), there is strong empirical evidence (Hink- ley, 1969, 1971b) backed by theoretical results (Feder, 1975) that the difference in residual mean-square-error between two nested models that differ in one addi- tional knot is well approximated, albeit conservatively, by a scaled  statistic on 4 degrees of freedom (Friedman, 1991). This led to one nonparametric termina- tion recipe that is based on generalised cross-validation (Craven & Wahba, 1979). Another approach is to infer the piecewise linear regression model with the aid of a parametric test for specifying the number and location of knots, without re- course to iterative procedures (Liu, Wu, & Zidek, 1997). Unfortunately, neither approach is directly applicable to PARCS, since they both require assumptions that 28 are not met in the CUSUM-transformed time series. The CUSUM transformation of the time series is a nonstationary ARMA(1; q) process. Deriving reasonable generalised cross-validation (Craven & Wahba, 1979; Friedman, 1991; Friedman & Silverman, 1989), F-ratio (Durstewitz, 2017; Hastie et al., 2009) or parametric (Liu et al., 1997) test statistics require currently unknown corrections to those tests which account for nonstationarity and the particular form of the ARMA model underlying the CUSUM-transformed data. When multiple CPs are present in the data, PARCS can outperform standard bi- nary segmentation (Bai, 1997; Scott & Knott, 1974). Other segmentation methods also solve the problem of mislocating CPs inherent in the standard procedure (Cho & Fryzlewicz, 2015; Fryzlewicz, 2014; Olshen et al., 2004). Wild binary segmen- tation (WBS; Fryzlewicz, 2014), for instance, relies on sampling local CUSUM transformations of randomly chosen segments of the time series. The candidate CP with the largest value among sampled CUSUM curves is returned to be tested against a criterion, followed by binary segmentation. WBS is preferable to PARCS in that its test statistic and termination criterion when noise is independent are backed up by rigorous theory, and may be the favourable method when segments are large enough for the test statistic to converge. If series of only limited length are available, however, WBS may run into similar problems as standard binary segmentation for CUSUM, since each detection is still followed by partitioning the data further. WBS also, to the best of our knowledge, currently lacks a thorough analysis on the behaviour of its test statistic for dependent data. It is tempting to speculate on the potential for a hybrid method that capitalises on the desirable features of both methods. Computational demands arise in WBS from the need to choose segment range parameters by sampling few thousand CUSUM curves to which PARCS may offer an easy and efficient workaround: Fryzlewicz (2014) demonstrated that the optimal WBS segment choice is the segment bounded by the two CPs closest to the target CP from each side. PARCS could thus provide an informed selection of boundaries by returning candidate CPs in the data and use these to demarcate segments, rather than random sampling as in WBS. In dealing with multivariate data, recent methods tackled the computational de- mands of having a large number of covariates and sparse CP representations (Cho & Fryzlewicz, 2015; Wang & Samworth, 2018). These methods rely on low di- mensional projections of the multivariate CUSUM curve that preserve the CPs and follow this projection by a binary segmentation method. Since PARCS for mul- tivariate time series is also based on the CUSUM transformation, it is straightfor- ward to leverage the computational savings provided by such projection methods in reducing the dimensionality of the PARCS input, while avoiding the drawbacks of binary segmentation methods. This may offer a route for extending PARCS to the important case of multivariate CP detection in mutually dependent time series with 29 spatial dependence, a configuration which these projection methods also consider (Cho & Fryzlewicz, 2015; Wang & Samworth, 2018). Alternatively, nondiagonal covariance structure in multivariate series may be accounted for by extending the PARCS formulation to the multivariate regression spline realm (Friedman, 1991; Stone et al., 1997). Finally, when analysing the neural and behavioural data during the rule switch- ing task, we mentioned that data may also contain trends that are not accounted for by step change time series models (Durstewitz et al., 2010). Caution must be made when analysing real data using CP detection methods in that these meth- ods, PARCS included, assume a step change model underlying the generation of the data and hence may attempt to approximate trends and other nonstationary features by a series of step changes, a point made more explicit by Fryzlewicz (2014) (Durstewitz et al., 2010, therefore removed trends around candidate CPs first). Hence, to avoid wrong conclusions with respect to the source and type of nonstationarity in experimental time series, it may be necessary to either augment change point detection by adequate preprocessing (Durstewitz et al., 2010) or to generalise time series models for CP detection to include other forms of nonsta- tionarity. Author Contributions HT, DD conceived study; HT developed and implemented methods; HT carried out simulations; HT analysed data; HT, DD interpreted results; HT prepared figures; HT wrote manuscript; HT, DD revised and finalised manuscript. Funding This research was funded by grants to DD by the German Research Foundation (DFG) (SPP1665, DU 354/8-2) and through the German Ministry for Education and Research (BMBF) via the e:Med framework (01ZX1311A & 01ZX1314E). Acknowledgements The authors thank Dr. Georgia Koppe and Dr. Eleonora Russo for discussions and Dr. Jeremy Seamans for providing the neural and behavioural data. 30 Data Availability Statement Method implementation will be freely available on an online repository upon pub- lication. References Aminikhanghahi, S., & Cook, D. J. (2017). A survey of methods for time series change point detection. Knowl. Inf. Syst., 51(2), 339–367. doi: 10.1007/ s10115-016-0987-z Antoch, J., Husk ˇ ova, ´ M., & Pra ´sk ˇ ova, ´ Z. (1997). Effect of dependence on statistics for determination of change. J. Stat. Plan. Inference, 60(2), 291–310. doi: 10.1016/S0378-3758(96)00138-3 Antoch, J., Husk ˇ ova, ´ M., & Veraverbeke, N. (1995). Change-point problem and bootstrap. J. Nonparametr. Statist., 5(2), 123–144. doi: 10.1080/ ˇ ´ Antoch, J., & Huskova, M. (2001). Permutation tests in change point analysis. Stat. Probab. Lett., 53(1), 37–46. doi: 10.1016/S0167-7152(01)00009-8 Aston, J. A. D., & Kirch, C. (2012). Evaluating stationarity via change-point alternatives with applications to fMRI data. Ann. Appl. Stat., 6(4), 1906– 1948. doi: 10.1214/12-AOAS565 Bai, J. (1997). Estimating multiple breaks one at a time. Econ. Theory, 13(3), 315–352. doi: 10.1017/S0266466600005831 Basseville, M. (1988). Detecting changes in signals and systems–a survey. Auto- matica, 24(3), 309–326. doi: 10.1016/0005-1098(88)90073-8 Bhattacharya, P. K. (1994). Some aspects of change-point analysis. Lect. Notes Monogr. Ser., 23, 28–56. doi: 10.1214/lnms/1215463112 Brown, R. L., Durbin, J., & Evans, J. M. (1975). Techniques for testing the constancy of regression relationships over time. J. R. Stat. Soc. Series B Stat. Methodol., 149–192. Chen, J., & Gupta, A. K. (2012). Parametric statistical change point analysis: With applications to genetics, medicine, and finance. Basel, Switzerland: Springer Science+Business Media, LLC. Chernoff, H., & Zacks, S. (1964). Estimating the current mean of a normal dis- tribution which is subjected to changes in time. Ann. Math. Stat., 35(3), 999–1018. doi: 10.1214/aoms/1177700517 Cho, H., & Fryzlewicz, P. (2015). Multiple-change-point detection for high dimen- sional time series via sparsified binary segmentation. J. R. Stat. Soc. Series B Stat. Methodol., 77(2), 475–507. doi: 10.1111/rssb.12079 31 Craven, P., & Wahba, G. (1979). Smoothing noisy data with spline functions: Estimating the correct degree of smoothing by the method of generalized cross-validation. Numer. Math., 31(4), 317–403. doi: 10.1007/BF01404567 Csor ¨ go, ¨ M., & Horvath, ´ L. (1997). Limit theorems in change-point analysis (Vol. 18). New York, NY: John Wiley & Sons Inc. Davison, A. C., & Hinkley, D. V. (1997). Bootstrap methods and their application. New York, NY: Cambridge University Press. Dumbgen, L. (1991). The asymptotic behavior of some nonparametric change- point estimators. Ann. Stat., 1471–1495. doi: 10.1214/aos/1176348257 Durstewitz, D. (2017). Advanced data analysis in neuroscience: Integrating sta- tistical and computational models. Springer International Publishing. Durstewitz, D., Vittoz, N. M., Floresco, S. B., & Seamans, J. K. (2010). Abrupt transitions between prefrontal neural ensemble states accompany behavioral transitions during rule learning. Neuron, 66(3), 438–448. doi: 10.1016/ j.neuron.2010.03.029 Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Ann. Stat., 32(2), 407–499. doi: 10.1214/009053604000000067 Elsner, J. B., Niu, X., & Jagger, T. H. (2004). Detecting shifts in hurricane rates using a Markov chain Monte Carlo approach. J. Clim., 17(13), 2652–2666. doi: 10.1175/1520-0442(2004)017h2652:DSIHRUi2.0.CO;2 Fan, J., & Yao, Q. (2003). Nonlinear time series: nonparametric and parametric methods. New York, NY: Springer. Fan, Z., Dror, R. O., Mildorf, T. J., Piana, S., & Shaw, D. E. (2015). Identifying localized changes in large systems: Change-point detection for biomolecular simulations. Proc. Natl. Acad. Sci. U.S.A., 112(24), 7454–9. doi: 10.1073/ pnas.1415846112 Feder, P. I. (1975). The log likelihood ratio in segmented regression. Ann. Stat., 84–97. Friedman, J. H. (1991). Multivariate adaptive regression splines. Ann. Stat., 19(1), 1–67. doi: 10.1214/aos/1176347963 Friedman, J. H., & Silverman, B. W. (1989). Flexible parsimonious smoothing and additive modeling. Technometrics, 31(1), 3–21. doi: 10.1080/00401706 .1989.10488470 Fryzlewicz, P. (2014). Wild binary segmentation for multiple change-point detec- tion. Ann. Stat., 42(6), 2243–2281. doi: 10.1214/14-AOS1245 Gartner ¨ , M., Duvarci, S., Roeper, J., & Schneider, G. (2017). Detecting joint pausiness in parallel spike trains. J. Neurosci. Methods, 285, 69–81. doi: 10.1016/j.jneumeth.2017.05.008 Gombay, E., & Horvath, ´ L. (1996). On the rate of approximations for maximum likelihood tests in change-point models. J. Multivar. Anal., 56(1), 120–152. 32 doi: 10.1006/jmva.1996.0007 Hamilton, J. D. (1994). Time series analysis. Princeton, NJ: Princeton University Press. Hanks, T. D., & Summerfield, C. (2017). Perceptual decision making in rodents, monkeys, and humans. Neuron, 93(1), 15–31. doi: 10.1016/j.neuron.2016 .12.003 Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learn- ing: Data mining, inference, and prediction (second ed.). New York, NY: Springer. Hinkley, D. V. (1969). Inference about the intersection in two-phase regression. Biometrika, 56(3), 495–504. doi: 10.1093/biomet/56.3.495 Hinkley, D. V. (1971a). Inference about the change-point from cumulative sum tests. Biometrika, 58(3), 509–523. doi: 10.1093/biomet/58.3.509 Hinkley, D. V. (1971b). Inference in two-phase regression. J. Am. Stat. Ass., 66(336), 736–743. doi: 10.1080/01621459.1971.10482337 Horvath, ´ L. (1997). Detection of changes in linear sequences. Ann. Inst. Stat. Math., 49(2), 271–283. doi: 10.1023/A:1003110912735 Husk ˇ ova, ´ M. (2004). Permutation principle and bootstrap in change point analysis. In L. Horvath ´ & B. Szyszkowicz (Eds.), Asymptotic methods in stochastics: Festschrift for Miklos ´ Csor ¨ go ˝ (Vol. 44, pp. 273–291). American Mathemat- ical Soc. Husk ˇ ova, ´ M., & Slaby, ` A. (2001). Permutation tests for multiple changes. Kyber- netika, 37(5), 605–622. Jirak, M. (2012). Change-point analysis in increasing dimension. J. Multivar. Anal., 111, 136–159. doi: 10.1016/j.jmva.2012.05.007 Kendall, M. G., & Stuart, A. (1983). The advanced theory of statistics (Vol. 3). London, UK: Griffin. Kihlberg, J., Herson, J., & Schotz, W. (1972). Square root transformation revisited. Appl. Statist., 20, 76–81. doi: 10.1214/aos/1176343000 Kirch, C. (2007). Block permutation principles for the change analysis of de- pendent data. J. Stat. Plan. Inference, 137(7), 2453–2474. doi: 10.1016/ j.jspi.2006.09.026 Latimer, K. W., Yates, J. L., Meister, M. L. R., Huk, A. C., & Pillow, J. W. (2015). Single-trial spike trains in parietal cortex reveal discrete steps during decision-making. Science, 349(6244), 184–187. doi: 10.1126/ science.aaa4056 Liu, J., Wu, S., & Zidek, J. V. (1997). On segmented multivariate regression. Stat. Sin., 497–525. Lombard, F., & Hart, J. (1994). The analysis of change-point data with depen- dent errors. Lect. Notes Monogr. Ser., 23, 194–209. doi: doi:10.1214/lnms/ 33 1215463125 Matteson, D. S., & James, N. A. (2014). A nonparametric approach for multiple change point analysis of multivariate data. J. Am. Stat. Assoc., 109(505), 334–345. doi: 10.1080/01621459.2013.849605 Olshen, A. B., Venkatraman, E., Lucito, R., & Wigler, M. (2004). Circular bi- nary segmentation for the analysis of array-based DNA copy number data. Biostatistics, 5(4), 557–572. doi: 10.1093/biostatistics/kxh008 Page, E. (1954). Continuous inspection schemes. Biometrika, 41(1/2), 100–115. doi: 10.2307/2333009 Paillard, D. (1998). The timing of pleistocene glaciations from a simple multiple- state climate model. Nature, 391(6665), 378. doi: 10.1038/34891 Picard, D. (1985). Testing and estimating change-points in time series. Adv. Appl. Probab., 17(4), 841–867. doi: 10.1017/S0001867800015433 Powell, N. J., & Redish, A. D. (2016). Representational changes of latent strategies in rat medial prefrontal cortex precede changes in behaviour. Nat. Commun., 7(12830). doi: 10.1038/ncomms12830 Quandt, R. E. (1958). The estimation of the parameters of a linear regression system obeying two separate regimes. J. Am. Stat. Assoc., 53(284), 873– 880. doi: 10.1080/01621459.1958.10501484 Roitman, J. D., & Shadlen, M. N. (2002). Response of neurons in the lateral intraparietal area during a combined visual discrimination reaction time task. J. Neurosci., 22(21), 9475–9489. Scott, A. J., & Knott, M. (1974). A cluster analysis method for grouping means in the analysis of variance. Biometrics, 30(3), 507–512. doi: 10.2307/2529204 Shah, S. P., Lam, W. L., Ng, R. T., & Murphy, K. P. (2007). Modeling recurrent DNA copy number alterations in array CGH data. Bioinformatics, 23(13), i450–i458. doi: 10.1093/bioinformatics/btm221 Shumway, R. H., & Stoffer, D. S. (2010). Time series analysis and its applications: With R examples (3rd ed.). New York, NY: Springer. Smith, A. C., Frank, L. M., Wirth, S., Yanike, M., Hu, D., Kubota, Y., . . . Brown, E. N. (2004). Dynamic analysis of learning in behavioral experiments. J. Neurosci., 24(2), 447–461. Smith, P. L. (1982). Curve fitting and modeling with splines using statistical variable selection techniques (NASA Report 166034). Langley Research Center, Hampton, VA. Stock, J. H., & Watson, M. W. (2014). Estimating turning points using large data sets. J. Econom., 178, 368–381. doi: 10.1016/j.jeconom.2013.08.034 Stone, C. J., Hansen, M. H., Kooperberg, C., & Truong, Y. K. (1997). Poly- nomial splines and their tensor products in extended linear modeling: 1994 Wald memorial lecture. Ann. Stat., 25(4), 1371–1470. doi: 10.1214/aos/ 34 1031594728 Strogatz, S. H. (2001). Nonlinear dynamics and chaos: With applications to physics, biology, chemistry, and engineering. Perseus Books Publishing. Vert, J.-P., & Bleakley, K. (2010). Fast detection of multiple change-points shared by many signals using group LARS. In J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, & A. Culotta (Eds.), Advances in neural in- formation processing systems 23 (pp. 2343–2351). Curran Associates, Inc. Wang, T., & Samworth, R. J. (2018). High dimensional change point estimation via sparse projection. J. R. Stat. Soc. Series B Stat. Methodol., 80(1), 57–83. doi: 10.1111/rssb.12243

Journal

StatisticsarXiv (Cornell University)

Published: Feb 10, 2018

References