Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Estimating Fisher discriminant error in a linear integrator model of neural population activity

Estimating Fisher discriminant error in a linear integrator model of neural population activity School of Psychology, University of Decoding approaches provide a useful means of estimating the information Ottawa, 136 Jean Jacques Lussier, Ottawa, ON K1N 6N5, Canada contained in neuronal circuits. In this work, we analyze the expected classification Brain and Mind Research Institute, error of a decoder based on Fisher linear discriminant analysis. We provide expressions University of Ottawa, Ottawa, ON that relate decoding error to the specific parameters of a population model that K1N 6N5, Canada performs linear integration of sensory input. Results show conditions that lead to beneficial and detrimental effects of noise correlation on decoding. Further, the proposed framework sheds light on the contribution of neuronal noise, highlighting cases where, counter-intuitively, increased noise may lead to improved decoding performance. Finally, we examined the impact of dynamical parameters, including neuronal leak and integration time constant, on decoding. Overall, this work presents a fruitful approach to the study of decoding using a comprehensive theoretical framework that merges dynamical parameters with estimates of readout error. Keywords: Linear model; Fisher linear discriminant analysis; Noise correlation 1 Introduction In recent years, neuronal decoding has emerged as a key aspect of understanding the neu- ral code [1]. The aim of decoding algorithms is to read out the sensory-driven responses of a neuronal population and classify them following a given criterion. Popular criteria in- clude Fisher information [2, 3], mutual information [4], and machine learning approaches [5, 6]. While many types of decoders exist [7], a linear readout of neural activity has of- ten been employed to perform sensory classification [8, 9] and predict motor decisions [10, 11]. Further, different classes of linear readouts are amenable to mathematical analy- sis and capture biological learning rules such as Hebbian learning [12]. In this work, we formally analyze the optimal decoding error of a linear decoder based on Fisher linear discriminant analysis (LDA). Assuming discrete classes of stimuli, LDA provides an upper bound on linear decoding capacity [13]. In addition, LDA shows good agreement with decision-making tasks and offers a bridge between cortical activity and behavioral performance [14, 15]. Importantly, most theoretical approaches based on neural decoding are not concerned with how linear decoders would be influenced by specific dynamical parameters of mod- © The Author(s) 2021. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 2 of 22 Figure 1 Fisher linear discrimination of neural activity in a population model. (A) Two neural populations (x and y) where the noise correlation is adjusted via a parameter ρ. Each population receives two distinct inputs (ν and ν ) and a private source of noise whose gain is β and β , respectively. The stimulus-driven response 1 2 x y of each population is described by a tuning curve relating stimulus orientation to firing rate. (B) Activity for populations x and y is shown at discrete time-points (solid black circles). The optimal decision boundary (c) obtained by LDA discriminates amongst the neural activity generated by each of the two inputs. Neural responses follow a Gaussian distribution. The shaded area shows the proportion of discrimination error for stimulus 2 eled neural systems [16]. Here, we address this concern by providing expressions that re- late decoding error to the adjustable parameters of a rate-based population model with a linear neural integrator [17, 18]. This model captures the average spiking activity of neu- ronal populations [19–21] and the quasi-linear responses of neurons found in many exper- imental contexts [22]. Preliminary results have been presented in previous work [13, 14], yet the full analytical solution had remained incomplete and limited to positive noise cor- relation; we now present the complete solution. The framework relies on the simplifying assumption that signal and noise correlations originate from independent sources. While this assumption does not hold in biological circuits, where signal and noise are related [1], it allows us to systematically explore a wide range of scenarios that describe the impact of neuronal inputs, noise, correlations, and dynamical parameters on linear decoding, where the contribution of each parameter can be examined independently. This paper begins by describing the neural integrator model and the LDA readout. Then, we provide expressions for LDA error that rely on the parameters of the integrator model. Finally, we consider the effect of correlation, noise, and dynamical parameters on neuronal decoding using both analytical expressions and numerical simulations. 2 Linear population model As a starting point, we assume two independent neuronal populations, each projecting in a feedforward manner to a readout discriminating amongst two inputs, ν and ν ,thatare 1 2 constant over time (Fig. 1(A)). Each population’s mean firing rate in response to stimuli is conceptualized by a tuning curve where a stimulus feature, for instance visual orien- tation, generates a graded response. This scenario is analogous to analyses that examine population responses after performing a dimensionality reduction to generate a “popula- tion tuning curve” [23]. While a more complex model could account for a heterogeneity of responses within each population, we choose to limit our model to two homogeneous populations in order for the classification problem to remain tractable. Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 3 of 22 The activity of each population is described by a linear neural integrator dx τ =–α x + ν + β ξ (t), x x i i,x x x dt (1) dy τ =–α y + ν + β ξ (t), y y i i,y y y dt where x and y are the firing rates of each population in response to a given stimulus i, τ i i is a time constant, α is a leak term, ξ(t)is Gaussian white noise (N (0, 1)), and β is the gain of the noise. Network parameters τ , α,and β are bound to R . We make no distinction >0 between noise induced by stimuli and noise generated through intrinsic neural activity. While their effect on mean rate activity is similar [24], their impact on noise correlations differs [1]; in the model, we explicitly separate the effect of firing rate and noise correlation. This will be done by controlling noise correlation through a tunable parameter, as detailed in Sect. 4. An advantage of this formalism is that the effect of noise correlation can be systematically isolated from changes in firing rates and signal correlation that would be induced through reciprocal connections between the two populations. Further, depending on the choice of parameters, the addition of recurrent weights to Eq. (1)may preventthe system from reaching a stationary state, which is a fundamental assumption of LDA. 3 Fisher linear discriminant decoder A linear decoder based on LDA reads out the activity of the population model in order to perform a binary discrimination (Fig. 1(B)). Discrimination error generated by LDA provides an estimate of the statistical confidence in distinguishing pairs of stimuli based on network activity. We focus on pairwise discrimination given that error rates obtained from more than two stimuli are well approximated by values obtained from all combinations of pairwise comparisons [25]. LDA assumes that neural activity is sampled from a multivariate Gaussian distribution with class covariance matrix  and class mean vector μ .Further,LDA assumesequal i i class covariance, therefore  =  = . LDA attempts to find a projection line w,per- 1 2 pendicular to the decision boundary, onto which the input space is projected. The optimal projection line maximizes the Fisher criterion J (w) defined as the ratio of the projected between- to within-class variance: w · (μ – μ ) 2 1 J(w)= . w ·  · w Given the assumption of equal class covariance, we set  =2. By taking the derivative of J (w)with respect to w and setting it to zero, one finds the closed-form solution for the optimal projection line to be –1 W =(2) (μ – μ ). (2) 2 1 4 Formulating a model-based linear decoder To analytically derive means (μ and μ ) and covariance () from the neural population 1 2 model, we rearrange Eq. (1) as follows, using population x as example: α ν β x ix x dx = – x dt + ξ (t) dt.(3) i i x τ α τ x x x Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 4 of 22 Given that a white noise process is by definition the time derivative of a Weiner process, ξ(t)= dW /dt,wecan rewriteEq. (3)as dx = θ (μ – x ) dt + λ dW,(4) i x ix i x x,t with θ = α /τ , μ = ν /α ,and λ = β /τ .Equation(4)isanOrstein–Uhlenbeck process x x x ix ix x x x x with known solution –θ t –θ (t–s) x x x (t)= μ +(x – μ )e + λ e dB(s). (5) i ix i0 ix Equation (5) is a mean reverting process whose stable state follows a Gaussian distri- bution. A full derivation of this process is found in Sections A.1–A.2. To summarize this derivation, the expected mean and variance are –θ t E x (t) = μ +(x – μ )e , i ix i0 ix x –2θ t Var x (t) = 1– e . 2θ The stationary mean and variance of Eq. (5)are x,i lim E[x ]= μ = , i ix t→∞ 2 2 λ β x x 2 lim Var(x )= = = σ . t→∞ 2θ 2τ α x x x With the assumption that the mean of x is much larger than the variance, there is negli- gible probability that x would fall below zero. Imposing strictly positive values of x could be achieved by the addition of a constant and would not alter the results obtained from the linear classifier. The readout of neural activity depends on the following feature space: Z ∼ N (μ , ), μ =[μ , μ ] , xi yi σ ρσ σ x y = , ρσ σ σ x y where Z is obtained from the probability distribution of a multivariate Gaussian with mean μ and covariance . Setting the parameter ρ = 0 would be equivalent to a so-called “di- agonal decoder” where off-diagonal elements of the covariance matrix are neglected, thus ignoring noise correlations altogether [16]. The closed form solution of LDA (Eq. (2)) can be expressed using the parameters of the population model (Eq. (1)) as follows. First, the total within-class scatter S is S =2 σ ρσ σ x y =2 ρσ σ σ x y y Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 5 of 22 ⎡  ⎤ 2 2 β β β y x x ⎢ ⎥ 2τ α 2τ α 2τ α x x x x y y ⎢  ⎥ =2 . ⎣ ⎦ 2 2 β β β y y 2τ α 2τ α 2τ α x x y y y y T T To alleviate the notation, we define μ =[ μ , μ ] = μ – μ ,where μ = ν /α , x y u u u 0 1 and ν is the absolute difference between the two stimuli, given an index u that stands foreitherpopulation x or y.Inthisway,Eq. (2)becomes –1 W =(2) μ ⎡ ⎤ 1 1 μ – ρ μ 2 x y 2 2 1 σ σ ⎢ x y ⎥ ⎣ ⎦ 1 1 2(1 – ρ ) μ – ρ μ y x σ 2 2 σ σ x y ⎡  ⎤ τ τ α x x x ν – ρ ν x y 2 2 2 β β β α x x y ⎣  ⎦ = . τ τ α y y y τ 2 x 2(1 – ρ ) ν – ρ ν y x 2 2 2 β β β α y y x From the law of total probability, the error rate of classification is given by ε = P[y =0|k =1]P[k =1] + P[y =1|k =0]P[k = 0], (6) where P[k = 1] is the probability that a randomly sampled point from any distribution be- longs to class j and P[y = i|k = j] is the probability that a point is classified as belonging to class i when it belongs to class j. Given that the classifier is unbiased towards each of the two neural populations, P[k =0]= P[k = 1] = 0.5. To calculate conditional probabil- ities P[y = i|k = j], one must define a threshold c that serves as a boundary between the two distributions. The value of c is chosen to be the midpoint between the means of the projected distributions. We calculate P[y = i|k = j] as the area under the curve of the density function for j in the region where i is the correct class. As a first step, we shift the projected distributions by a factor c, so that the threshold becomes zero to simplify the integration. More specifically, the unshifted threshold c, the means of the shifted distributions η , and their variance ζ are c = W · (μ + μ )+ b, 1 0 η = W · μ + b – c, 2 T ζ = W W, with bias term b. The error rate from Eq. (6)thenbecomes 2 2 0 ∞ –(w–η ) –(w–η ) 1 0 1 1 1 1 2 2 2ζ 2ζ ε =  e dw +  e dw. 2 2 2 2 2ζ π 2ζ π –∞ 0 Details of the full integration of error can be found in Section A.3. The final expression is 1 η ε = erf c  . 2ζ Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 6 of 22 This expression is further simplified by introducing the squared Mahalanobis distance 1 1 ε = erf c √ d,(7) 2 2 where 2 T –1 d = μ  μ.(8) Because of equal class covariance, the above expression has the property that d(μ , μ )= d(μ , μ )= d. 0 1 1 0 Using Eq. (8), we rewrite d from the network parameters: 1 1 1 1 2 2 2 d = μ + μ –2ρ  μ μ , x y x y 2 2 2 1– ρ σ σ 2 2 x y σ σ x y 2 τ τ τ τ y y x x 2 2 = ν + ν –2ρ ν ν . x y x y 2 2 2 2 2 1– ρ β α β α β α β α x y x y x y x y As the ratio μ / σ appears often in the above solution, we simplify our notation by introducing u τ u u r =  = ν . u u β α This expression simplifies the Mahalanobis distance to 2 2 2 d = r + r –2ρr r . x y x y 1– ρ The full derivation of expected error using Mahalanobis distance is found in Sections A.3–A.4. The above analysis provides a relationship between classification error and the network parameters of the population model. In the sections to follow, we explore the various links between these quantities. 5 Noise correlation Neurons that are in close physical proximity exhibit correlations in their activity. An exten- sive body of work has examined the impact of these noise correlations on behavioral tasks [26] and the activity of brain circuits [27–35]. Noise correlations may be advantageous or detrimental to cognitive and sensory processing; however, the specific network-level properties that give rise to these effects have not been fully elucidated. In the proposed model, the effect of noise correlation on classification error is highly dependent upon the sensory inputs (ν and ν ). We distinguish four main cases that lead 1 2 to qualitatively different conclusions on the impact of noise correlations. Details of these analyses are provided in Sections A.5–A.6. Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 7 of 22 Figure 2 Impact of noise correlation on Fisher linear discriminant analysis. (A)Scenariowhere thetuning curves are the same for both populations of neurons, leading to r → r . Top left: illustration of tuning curves x y (black lines) and stimulus orientations (blue and red lines). Top right: example of numerical responses to two stimuli (red and blue circles), with noise correlation of 0.9. Bottom: solid line, analytical estimate. Filled black circles, numerical simulations. (B) Scenario where the tuning curves are offset by a fixed orientation. In this case, r → -r .(C) Symmetrical case arising when one of the populations (for instance, x) generates the same x y firing rate for a range of stimulus orientations, leading to r =0. (D) Case where one population has higher gain, leading to |r |= |r | x y A first case arises when the tuning curves of populations x and y are identical in terms of both their orientation preference and their gain (Fig. 2(A)). In this case, r → r , leading x y to monotonically increasing error as a function of correlation. Intuitively, this happens because correlation forces the firing rate distributions to “stretch” towards each other. We verified the analytical solution by comparing it to numerical estimates of the error rate as a function of noise correlation. These numerical estimates were obtained with Eq. (1), where populations x and y both received inputs ν =11 and ν =14 inorder forthe model 1 2 to mimick a scenario where the two populations have identical tuning properties. The goal here is not to capture the model’s response to a continuum of stimulus values along the tuning curves, but rather to illustrate the behavior of the model using discrete stimuli. We set τ =1, β =1, and α = 1 for both populations. We then numerically generated 5000 points per stimulus class. A subset of 80% of the total number of data points were randomly selected to train the LDA classifier. The proportion of misclassified points was calculated based on the remaining data points. We found good agreement between the numerical estimates and analytical solution (Fig. 2(A)). Note that the range of error may be increased by moving the firing rate distributions closer to each other without altering the overall shape of the function relating error and Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 8 of 22 noise correlation. While the goal here was to show the distribution of readout error across a broad range of correlation values, we acknowledge that not all combinations of tuning curves and noise correlations are physiologically plausible. In fact, while noise correlations in cortex vary across experimental conditions, regions, and behavioral states, they are typ- ically reported to be on the order of 0.1–0.3 for nearby cells [26]. Therefore, extreme values (both positive and negative) are unlikely to arise in living circuits. In a second scenario, the two populations are offset in terms of their orientation pref- erence (Fig. 2(B)). We examined classification error in this scenario by setting the input of population x to ν =11 and ν = 14, while population y was set to ν =14 and ν =11. 1 2 1 2 Analytically, this scenario leads to r → –r , resulting in a monotonically decreasing error x y as correlation increases from –1 to 1. Intuitively, this scenario arises because correlation stretches the distributions of responses along parallel lines, decreasing the overlap be- tween them. A third case arises when the tuning curve of one of the two populations yields the same response for two stimuli (Fig. 2(C)). This happens if the tuning curve of population x ex- hibits a broad region where firing rate remains constant despite changes in stimulus orien- tation. Analytically, this would lead to r = 0. We illustrate this scenario by setting ν =11 x 1 and ν = 11 for population x,and ν =11 and ν = 14 for population y.Thiscaseyields 2 1 2 a “symmetrical” effect of correlation on readout error, where maximum error is found at ρ = 0 and error tends towards zero as ρ approaches either 1 or –1. Finally, a fourth scenario occurs when the two populations have tuning curves that are aligned in terms of orientation preference, but where one population has higher response gain (Fig. 2(D)). This case is defined by |r |= |r |. Error tends to zero as noise correla- x y tion (ρ) goes to either –1 or 1. The correlation associated with maximum error is found somewhere in between these extremes and is given by 2 2 min(r , r ) x y ρ =.(9) r r x y To illustrate this scenario, we set ν =11 and ν =13 for population x,and ν =11 1 2 1 and ν = 14 for population y. Graphically, this scenario arises when noise correlation “stretches” the distribution of responses along parallel lines and their centroids do not align on either dimension. Starting from a correlation of zero, as correlation increases, the distributions will stretch towards each other, thus increasing overlap and error. After a maximum overlap defined by ρ , further stretching of the distributions will force them to spread too thinly for them to overlap, until the extreme case of a correlation of one, where both distributions would appear as perfectly parallel lines, leading to zero error. A continuum of cases exists between the different scenarios illustrated in Fig. 2(A)– (D). For instance, the peak error (ρ )inFig. 2(D) can shift to lower correlation values by offsetting one of the tuning curves, yielding a curve closer to Fig. 2(B). In sum, the above results show that, depending upon the structure of the input deliv- ered to the two neural populations, noise correlations produce widely different effects on classification error. While insights into these results can be obtained without the full for- malism described here [34], such formalism becomes pivotal when examining the effect of specific network parameters, as described next. Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 9 of 22 6 Impact of noise gain on classification error To explore the effect of network parameters on error, we first modify Eq. (9) as follows: 2 2 r ⎨ x min(r , r ) if |r | < |r |, x y x y ρ = = (10) r r x y if |r | > |r |, x y where the ratio r /r can be expressed using network parameters x y β τ α r ν y x y x x = . r ν β τ α y y x y x We define a set containing all network parameters G = {α , τ , β , ν }.If g is a subset u u u u u of these parameters, we can manipulate them using a function f (g) while setting the other parameters to a constant c .Inthisway,wecan rewriteEq. (10)as f (g)c if |r | < |r |, g x y ρ = ⎩ –1 –1 f (g) c if |r | > |r |. x y We can investigate the effect of network parameters on ρ . For example, the effect of noise gain (β and β )on ρ when keeping all other parameters constant except for the x y ∗ input is expressed as τ α β ν x y y x ρ = f (β , β )c = ∗ y x β ,β √ y x β ν τ α x y y x for |r | < |r |. x y For illustration purposes, we explored the scenario described in Fig. 2(A), where two populations have equivalent tuning properties. Keeping all parameters constant while al- tering both β and β simultaneously has no effect on ρ (Fig. 3(A)). The main impact is x y ∗ an increase in the amount of classification error (Fig. 3(B)). This result is not surprising: increasing the gain of the noise worsens readout performance. However, markedly different results emerge in a scenario where tuning curves are offset (Fig. 2(B)) and β is altered while keeping β unchanged. In this case, ρ = f (β )c with x y ∗ x β c given by β ν τ α y x x y c = , ν τ α y y x and f (β )=1/β . Alterations in β impact ρ in a non-monotonic fashion (Fig. 3(C)). A x x x ∗ small increase from β =1 to β =2 shifts ρ towards a more negative value. However, x x ∗ further increasing to β =3 and β =4 increases ρ and alters the relationship between x x ∗ correlation and readout error (Fig. 3(D)). Hidden in these results is a counter-intuitive finding: under certain circumstances, in- creasing β leads to a decrease in classification error. This can be seen with β =10 (Fig. x x 3(D), dashed line), leading to lower error than β = 3 (green line) and β =4 (red line) for x x negative correlations. Intuitively, this can happen when increasing β stretches the distri- bution of activity for population x along a single dimension away from the classification Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 10 of 22 Figure 3 Influence of noise gain on discrimination error. (A) Scenario where the noise gains of both populations (β and β ) are adjusted simultaneously. In this case, the value of noise correlation leading to x y maximal error (ρ ) remains constant. Inset: tuning curves for the two populations. (B) Error as a function of noise correlation for four different values of noise gain, with colors corresponding to the colored circles in panel “A”. Filled circles indicate ρ .(C) Impact of modifying the noise gain of population x only. (D) Different values of noise gain for population x.(E) Scenario taken from panel D of Fig. 2, showing a monotonic decrease in ρ when increasing β .(F)Impactof β on classification error ∗ x x boundary [13]. Similar findings are borne out of graphical explanations where noise co- variance stretches the distribution of firing rates [36]. The benefits of noise gain are even more pronounced in a scenario where one population has higher gain than the other, as in Fig. 2(D). In this case, β monotonically shifts ρ x ∗ towards decreasing values (Fig. 3(E)). For a broad range of positive correlation values, a high noise gain (β >1) leads to lower classification error (Fig. 3(F)). 7 Impact of dynamical parameters The approach described in the previous section can be applied to study the impact of the model’s dynamical parameters on readout error. The two parameters of interest are the leak term (α) and the time constant (τ ). The effect of the time constants τ and τ on ρ can be expressed as x y ∗ β α τ ν y y x x ρ = τ ν β α y y x x Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 11 of 22 Figure 4 Mediating role of dynamical parameters. (A) In a scenario where tuning curves are offset by a fixed orientation, increasing the time constant τ leads to an increase in correlation associated with maximal error (ρ ). (B) Different values of τ (colors corresponding to panel “A”) alter the relation between noise correlation ∗ x and readout error. Inset: examples of firing rate distributions across two stimuli (shown in blue and red). (C) Increasing the leak term α leads to a decrease in ρ .(D) Readout error across different values of α (see panel x ∗ x “C” for colors). Inset: examples of firing rate distributions for |r | < |r |.Tostudy theeffect of asingleterm(e.g., τ ), we set ρ = f (τ )c with c given x y x ∗ x τ τ x x by β ν α y x y c = , τ √ β ν τ α x y y x and f (τ )= τ . Similarly, the role of leak terms α and α on ρ is x x x y ∗ α ν β τ y x y x ρ = . ∗ √ α ν β τ x y x y For a single term (α ), we have ρ = f (α )c with x ∗ x α β ν τ α y x x y c = , α √ β ν τ x y y and f (α )=1/α . Taking one scenario as illustration, we examined the case where tuning x x curves are offset by a fixed orientation (r → -r ). In this case, the time constant affects the x y relation between noise correlation and readout error, with larger values of τ shifting ρ x ∗ towards smaller negative values of correlation (Fig. 4(A)). The reason for this shift follows from an earlier example (Fig. 2(D)), where an increased correlation resulted in greater overlap between the firing rate distributions, but only up to a point beyond which these distributions became too narrow to overlap. With larger values of τ , a given correlation does not create as much overlap as it would for smaller values of τ , thus leading to a shift in ρ . ∗ Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 12 of 22 Figure 5 Nonlinear impact of dynamical parameters on classification error. (A) In a scenario where the tuning curves of the two neural populations are equivalent, τ and ρ have a non-monotonic relation. (B)Ina x ∗ scenario where the gain of one tuning curve is larger, α and ρ are non-monotonically related x ∗ The overall impact of a larger time constant is a decrease in classification error (Fig. 4(B)): as τ increases, there is less overlap between the distributions of firing rate across stimuli (Fig. 4(B), inset). By contrast, shifting the leak term α towardshighervaluesde- creases ρ (Fig. 4(C)) and increases overall readout error (Fig. 4(D)). The impact of in- creasing α on error is due to an increase in the overlap between firing rate distributions (Fig. 4(D), inset). The inverse effects of τ and α on these distributions explain their op- x x posite impact on ρ . More complex, non-monotonic relations between ρ and values of τ and α are found ∗ x x in different scenarios where tuning curves of the two populations are aligned (Fig. 5(A)) or when the gain of one population is larger (Fig. 5(B)). Together, these results show that the integration time constant and leak term of the pop- ulation model mediate the impact of noise correlation on classification error by shifting the value ρ at which correlation reaches maximal error. The impact of network param- eters on readout error is therefore not straightforward to describe but is brought to light using a framework that derives error estimates from the dynamical parameters of a pop- ulation model. 8 Discussion This work described an analytical framework for performing Fisher linear decoding in a rate-based neural model. With this formalism, we began by capturing well-documented findings on the role of noise gain and correlations on discrimination error. Going further, the framework allowed us to analytically examine the mediating role of dynamical param- eters (neuronal leak and time constant) on the relation between noise correlation and er- ror. Overall, this framework suggests that linear decoding is highly sensitive to dynamical model parameters as well as the characteristics of the sensory input. One surprising finding was the presence of conditions where increased neuronal noise led to reduced classification error. This result was especially prominent when the gain of the two population tuning curves was unmatched (Fig. 3(E)–(F)). Taken together, our findings cover all possible qualitative scenarios where noise correlations have either a ben- eficial, detrimental, or null effect on decoding [36]. A related approach termed the leaky competing accumulator model was proposed in order to account for perceptual decision making [37]. Some key differences exist between this model and ours. Firstly, our framework assumes a steady-state of neural activity that is characteristic of a decision point and does not capture the time-course of deliberation. Our Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 13 of 22 framework assumes an optimal bound on decision accuracy given a linear decoder, repre- senting a ceiling in accuracy that would be associated with long response times (typically >500 ms in human subjects). Secondly, the accumulator model provides explicit connec- tions, through lateral inhibition, that modulate correlations. These lateral connections, however, may also impact firing rates. By comparison, our framework isolates analytically the contribution of firing rates and correlations, and examines their relative role on per- ceptual discrimination. It would be challenging to speculate on whether the analytical results provided would generalize to other classes of neural network models, particularly those that include a non- linear transfer function [38]. However, our work opens the door to such analyses by de- scribing a framework for linking neuronal readout and dynamical modeling. Limitations and future work.While theframework describedherestrived to coverall possible scenarios involving firing rates, noise correlations, and network parameters, it is important to emphasize that not all such scenarios are plausible from a physiological standpoint. In particular, the framework treats firing rates and noise correlations as inde- pendent contributors to decoding error and allows for implausible cases where increases in firing rate would lead to an increase, a decrease, or no impact on correlations. Inter- actions between stimulus and noise correlations are a crucial factor limiting the coding capacity of neural circuits [1, 23] and should be considered alongside the dynamical pa- rameters discussed in this work. Several future directions based on the proposed framework will be worth exploring. First, the assumption of equal class covariances in LDA is challenged by experimental work showing input-dependent neuronal variance [39]. This assumption could be relaxed by replacing LDA with quadratic discriminant analysis, albeit at the cost of a more complex solution when relating readout error to model parameters. An extension of the current framework could consider the impact of pooling more than two neural populations, as well as more than two stimuli, when performing decoding. This extension would be helpful in examining the interactions between several populations of neurons, each with a unique tuning curve. Going further, one could examine decoding erroratthe limitofalargenumberofneurons with heterogeneoustuningcurvesthat vary in both orientation preference and gain [2]. Conclusion. In summary, this work described a theoretical framework that merges Fisher linear decoding with a population model of sensory integration. This approach highlighted the role of correlation, neuronal noise, and network parameters, revealing a broad range of potential outcomes where different conditions generated either detrimental, beneficial, or null impacts on classification performance. These results motivate further developments in theoretical work that systematically link neural network models to optimal decoders in order to reveal the impact of key neurophysiological variables on sensory information processing. Appendix A.1 Solving the integrator model as a linear differential equation To solve the integrator model, we began by dropping the unit indices to alleviate the no- tation: dx τ =–αx + ν + βξ(t) dt Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 14 of 22 dx α ν + βξ(t) ⇔ =– x + dt τ τ dx α ν + βξ(t) ⇔ + x = dt τ τ dx ⇔ + p(t)x = r(t) dt with p(t)= α/τ and r(t)=(ν + βξ(t))/τ.Wedefined p(t) dt u(t)= e dt = e = e . Then, dx u(t) + p(t)x = u(t)r(t) dt α dx α α α t t t τ τ τ ⇔ e + e x = e r(t). dt τ Applying the chain rule, u(t)x = u(t)r(t) dt ⇔ d u(t)x = u(t)r(t) dt t t ⇔ d u(s)x = u(s)r(s) ds o 0 ⇔ u(s)x| = u(s)r(s) ds ⇔ u(t)x – u(0)x = u(s)r(s) ds –1 ⇔ x = u(t) x + u(s)r(s) ds α α ν + βξ(s) – t s τ τ = e x + e ds α 1 α – t s τ τ = e x + e ν + βξ(s) ds 0 i t t α 1 α 1 α – t s s τ τ τ = e x + e ν ds + e βξ(s) ds 0 i τ τ 0 o α 1 τν α β α – t s t s τ τ τ = e x + e | + e ξ(s) ds τ α τ α ν α β α – t s s τ τ τ = e x + 1– e + e ξ(s) ds α τ ν β α α α α – t – t – t s τ τ τ τ = x e – 1– e + e e ξ(s) ds, α τ o Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 15 of 22 ν ν α β α i i – t – (t–s) τ τ x = + x – e + e ξ(s) ds. α α τ A.2 Expected value and variance We sought to find the expected mean and variance of the random variable x such that –θt –θ(t–s) x = μ +(x – μ)e + σ e dB . o s The expected mean is –θt –θ(t–s) E[x]= E μ +(x – μ)e + σ e dB 0 s –θt –θ(t–s) = E[μ]+ E (x – μ)e + E σ e dB . 0 s Given the zero-mean property of Ito integrals, –θ(t–s) E σ e dB =0, we have –θt E[x]= μ +(x – μ)e . (11) The expected variance is –θt –θ(t–s) var(x)= var μ +(x – μ)e + σ e dB 0 s 2 –θ(t–s) = σ var e dB 2 2 t t 2 –θ(t–s) –θ(t–s) = σ E e dB – E e dB t s 0 0 2 –θ(t–s) = σ E e dB . By Ito isometry, t t 2 –θ(t–s) 2 –θ(t–s) σ E e dB = σ E e ds . 0 0 Hence, the expected variance can be concisely written as 2 –2θ(t–s) var(x)= σ e ds –2θ(t–s) = σ – –2θ –2θ(t–t) –2θ(t–0) = e e 2θ Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 16 of 22 –2θt = 1– e . 2θ A.3 Classification error The classification error as a function of neural activity is given by 2 2 0 –(w–η ) ∞ –(w–η ) 1 1 1 1 2 2 2ζ 2ζ ε =  e dw +  e dw 2 2 2 2 –∞ 2ζ π 0 2ζ π 1 1 πζ η π η 1 0 2 2 =  erf c  + ζ erf  + ζ 2 2 2 2 2 2 2ζ π 2ζ 2ζ 1 1 πζ η η 1 0 =  erf c  + erf  +1 2 2 2 2 2 2ζ π 2ζ 2ζ 1 η η 1 0 = 1– erf  + erf  +1 2 2 2ζ 2ζ 1 η –η 1 1 = 2– erf + erf 2 2 2ζ 2ζ 1 η = 1– erf 2ζ 1 η = erf c  . 2ζ Substituting the mean and variance from the previous section, this becomes 1 d ε = erf c 2 √ 4 2 d 1 1 = erf c √ d . 2 2 A.4 Mahalanobis distance We began with the following definitions: –1 W =(2) μ, c = W · (μ + μ )+ b, 1 0 η = W · μ + b – c, 2 T ς = W W. Expanding η yields η = W · μ + b – W · (μ + μ )+ b i 1 0 = W · μ – (μ + μ ) . i 1 0 2 Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 17 of 22 Given η = W · (μ + μ ) 1 0 = W · μ =–η , we expanded W using the property u · υ = u υ, –1 η = (2) μ · μ –1 =  μ μ T –1 = μ  μ T –1 = μ  μ. Hence, the squared Mahalanobis distance between means is 2 T –1 d = μ  μ. We can rewrite η as η = d =–η . 1 0 Similarly, for the variance ς , 2 –1 –1 ς = (2) μ (2) μ T –1 –1 = μ   μ T –1 = μ  μ = d . A.5 Derivation of error We analyzed the extrema of the error function in relation to noise correlation by taking its first derivative through the chain rule dε d 1 1 = erf c √ d dρ dρ 2 2 2 1 d = erf c(z) 2 dρ 1 d dz dd = erf c(z) , (12) 2 dz dd dρ Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 18 of 22 with 2 2 2 d = r + r –2ρr r . (13) x y x y 1– ρ The first derivative is given by –z d –2e erf c(z) = dz 1 2 –( d ) 2 2 –2e = √ 1 2 – d –2e = √ . (14) The second derivative is dz d 1 = √ d 2 2 dd dd 2 2 = √ . (15) 4 2d The third derivative is dd d 1 2 2 = r + r –2ρr r x y x y dρ dρ 1– ρ d 1 1 d 2 2 2 2 = r + r –2ρr r + r + r –2ρr r x y x y x y x y 2 2 dρ 1– ρ 1– ρ dρ 2ρ 1 2 2 = r + r –2ρr r + [–2r r ] x y x y x y 2 2 2 (1 – ρ ) 1– ρ 2 2 2 = r + r –2ρr r 2ρ + 1– ρ (–2r r ) x y x y x y 2 2 (1 – ρ ) 2 2 2 = 2ρr +2ρr –2ρ2ρr r + –2r r +2r r ρ x y x y x y x y 2 2 (1 – ρ ) –2 2 2 2 = ρ r r – ρ r + r + r r . (16) x y x y x y 2 2 (1 – ρ ) A.6 Extrema of error We evaluated the extrema of error by finding the points where Eqs. (14)–(16)are equalto zero, 1 2 – d d –2e 0= erf c(z) ⇔ 0= √ , dz π (17) d →∞. We assumed that the ratios r and r are finite and the Euclidean distance between the x y distribution means is finite and non-null. In other words, if d →∞ it is exclusively due to the correlation coefficient. Then, d →∞ ⇔ |ρ|→ 1. (18) Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 19 of 22 We proceeded in a similar fashion for the second derivative (Eq. (15)): dz 1 0= ⇔ 0= √ dd 4 2d (19) d →∞ ⇔ |ρ|→ 1. The third derivative (Eq. (16)) is dd –2 2 2 2 0= ⇔ 0= ρ r r – ρ r + r + r r , x y x y x y 2 2 dρ (1 – ρ ) (20) 2 2 2 ⇔ 0= ρ r r – ρ r + r + r r . x y x y x y Depending on network parameters, two cases are possible. One case arises if one of the ratios, either r or r , is zero. This happens if the mean activity of one population is x y equal across inputs. If the mean activity of both units remained unchanged, the resulting multivariate distributions would overlap, thus breaking the basic assumptions justifying the choice of LDA. In this first case, dd 0= ⇐ 0= ρ if r =0 or r = 0. (21) x y dρ Thesecondcaseoccurswhenneither r nor r is zero: x y 2 2 r + r dd x y 0= ⇐ 0= ρ – ρ +1 dρ r r x y 2 2 2 2 r +r (r +r ) x y x y ± –4 2 2 r r x y r r x y ⇔ ρ = 2 2 2 2 2 2 2 r + r ± (r + r ) –4r r x y x y x y 2r r x y 2 2 4 4 2 2 r + r ± r + r –2r r x y x y x y 2r r x y 2 2 2 2 2 r + r ± (r – r ) x y x y 2r r x y 2 2 2 2 r + r ±|r – r | x y x y 2r r x y 2 2 2 2 2 2 r + r ± [max(r , r )– min(r , r )] x y x y x y = . 2r r x y The last expression can be decomposed into four distinct cases. First, when r → r , x y 2 2 r + r y y ρ → → 1. (22) 2r r y y Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 20 of 22 Second, when r → –r , x y 2 2 r + r y y ρ → → –1. (23) –2r r y y Third, when r = r , we examined the positive and negative roots of ρ.The positive root x y is 2 2 2 2 2 2 r + r + max(r , r )– min(r , r ) x y x y x y ρ = 2r r x y 2 2 max(r , r ) x y = . (24) r r x y 2 2 Because | max(r , r )| > |r r | from the assumption that one ratio is smaller than the other x y x y (or unequal, non-null), this means that |ρ | >1 ∀r , r . Since the correlation is bound in the + x y range ]–1, 1[, the positive root must be rejected. The negative root does not suffer from the same problem, 2 2 2 2 2 2 r + r – max(r , r )+ min(r , r ) x y x y x y ρ = 2r r x y 2 2 min(r , r ) x y = . (25) r r x y Fourth, when either r =0 or r =0, ρ =0. x y A.7 Minima and maxima We determined upward and downward trends of the error curve by calculating the sign of the derivative between the potential maxima (considering that they are mutually exclu- sive). Taking Eqs. (14)–(16) and substituting into Eq. (12), dε 1 d dz dd = erf c(z) dρ 2 dz dd dρ – d 1 –2e 1 –2 2 2 2 = √ √ ρ r r – ρ r + r + r r x y x y x y 2 2 2 π (1 – ρ ) 4 2d dε 2 2 2 ⇔ sign = sign ρ r r – ρ r + r + r r . (26) x y x y x y dρ For the condition where r → r , x y dε sign = sign –ρr dρ =– sign(ρ). (27) For the condition where either r =0or r = 0,wehavealreadyfoundthezerosof ρ r r – x y x y 2 2 ρ(r + r )+ r r to be ρ and ρ .Todetermine sign(dε/dρ), we need to know whether the x y – + x y Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 21 of 22 extremum of the parabola is a minimum or a maximum, 2 2 2 ρ r r – ρ r + r + r r =2>0. x y x y x y dρ Given ρ >1, dε >0 ⇔ p ∈ [–1, ρ ] (28) dρ dε <0 ⇔ p ∈ [ρ , 1]. (29) dρ Regardless of the conditions for r and r , following Eqs. (27)–(29), theerror curveasa x y function of correlation increases from ρ = –1 until its maximum, found at a value of ρ =0 or ρ = ρ , and then decreases until ρ =1. ∗ – Acknowledgements This work benefited from discussions with Brent Doiron and Richard Naud. Funding This work was supported by a Discovery grant to J.P.T. from the Natural Sciences and Engineering Council of Canada (NSERC Grant No. 210977). Abbreviations LDA, Linear discriminant analysis. Availability of data and materials Data sharing not applicable to this article as no datasets were generated or analyzed during the current study. Ethics approval and consent to participate Not applicable. Competing interests The authors declare that they have no competing interests. Consent for publication Not applicable. Authors’ contributions JPT and MC conceptualized the study, designed the work, performed the analyses, interpreted the data, wrote software, drafted the work, and revised the final version of the manuscript. All authors read and approved the final manuscript. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Received: 14 April 2020 Accepted: 3 February 2021 References 1. Moreno-Bote R, Beck J, Kanitscheider I, Pitkow X, Latham P, Pouget A. Information-limiting correlations. Nat Neurosci. 2014;17:1410–7. 2. Ecker AS, Berens P, Tolias AS, Bethge M. The effect of noise correlations in populations of diversely tuned neurons. J Neurosci. 2011;31:14272–83. 3. Shamir M, Sompolinsky H. Implications of neuronal diversity on population coding. Neural Comput. 2006;18:1951–86. 4. Quian Quiroga R, Panzeri S. Extracting information from neuronal populations: information theory and decoding approaches. Nat Rev Neurosci. 2009;10:173–85. 5. Wen H, Shi J, Zhang Y, Lu KH, Cao J, Liu Z. Neural encoding and decoding with deep learning for dynamic natural vision. Cereb Cortex. 2018;28:4136–60. 6. Glasser JI, Benjamin AS, Chowdhury RH, Perich MG, Miller LE, Kording KP. Machine learning for neural decoding. ENeuro. 2020;7:1–16. 7. Kriegeskorte N, Douglas PK. Interpreting encoding and decoding models. Curr Opin Neurobiol. 2019;55:167–79. 8. Klampfl S, David SV, Yin P, Shamma SA, Maass W. A quantitative analysis of information about past and present stimuli encoded by spikes of A1 neurons. J Neurophysiol. 2012;108:1366–80. Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 22 of 22 9. Meyers EM, Freedman DJ, Kreiman G, Miller EK, Poggio T. Dynamic population coding of category information in inferior temporal and prefrontal cortex. J Neurophysiol. 2008;100:1407–19. 10. Nienborg H, Cumming B. Correlations between the activity of sensory neurons and behavior: how much do they tell us about a neuron’s causality? Curr Opin Neurobiol. 2010;20:376–81. 11. Shadlen MN, Britten KH, Newsome WT, Movshon JA. A computational analysis of the relationship between neuronal and behavioral responses to visual motion. J Neurosci. 1996;16:1486–510. 12. Buonomano DV, Maass W. State-dependent computations: spatiotemporal processing in cortical networks. Nat Rev Neurosci. 2009;10:113–25. 13. Calderini M, Zhang S, Berberian N, Thivierge JP. Optimal readout of correlated neural activity in a decision-making circuit. Neural Comput. 2018;30:1573–611. 14. Berberian N, MacPherson A, Giraud E, Richardson L, Thivierge JP. Neuronal pattern separation of motion-relevant input in LIP activity. J Neurophysiol. 2017;117:738–55. 15. Rich EL, Wallis JD. Decoding subjective decisions from orbitofrontal cortex. Nat Neurosci. 2016;19:973–80. 16. Averbeck BB, Lee D. Effects of noise correlations on information encoding and decoding. J Neurophysiol. 2006;95:3633–44. 17. Cain N, Barreiro AK, Shadlen M, Shea-Brown E. Neural integrators for decision making: a favorable tradeoff between robustness and sensitivity. J Neurophysiol. 2013;109:2542–59. 18. Goldman MS. Memory without feedback in a neural network. Neuron. 2009;61:621–34. 19. Ganguli S, Bisley JW, Roitman JD, Shadlen MN, Goldberg ME, Miller KD. One-dimensional dynamics of attention and decision making in LIP. Neuron. 2008;58:15–25. 20. Miri A, Daie K, Arrenberg AB, Baier H, Aksay E, Tank DW. Spatial gradients and multidimensional dynamics in a neural integrator circuit. Nat Neurosci. 2011;14:1150–9. 21. Murphy BK, Miller KD. Balanced amplification: a new mechanism of selective amplification of neural activity patterns. Neuron. 2009;61:635–48. 22. Chance FS, Abbott LF, Reyes AD. Gain modulation from background synaptic input. Neuron. 2002;35:773–82. 23. Rumyantsev OI, Lecoq JA, Hernandez O, Zhang Y, Savall J, Chrapkiewicz R, Li J, Zeng H, Ganguli S, Schnitzer MJ. Fundamental bounds on the fidelity of sensory cortical coding. Nature. 2020;580:100–5. 24. Burak Y, Fiete IR. Fundamental limits on persistent activity in networks of noisy neurons. Proc Natl Acad Sci. 2012;109:17645–50. 25. Cover TM, Thomas JA. Elements of information theory. 2nd ed. New York: Wiley; 2006. 26. Cohen MR, Kohn A. Measuring and interpreting neuronal correlations. Nat Neurosci. 2011;14:811–9. 27. Zohary E, Shadlen MN, Newsome WT. Correlated neuronal discharge rate and its implications for psychophysical performance. Nature. 1994;370:140–3. 28. Brunel N. Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons. J Comput Neurosci. 2000;8:183–208. 29. Bujan AF, Aertsen A, Kumar A. Role of input correlations in shaping the variability and noise correlations of evoked activity in the neocortex. J Neurosci. 2015;35:8611–25. 30. de la Rocha J, Doiron B, Shea-Brown E, Josic´ K, Reyes A. Correlation between neural spike trains increases with firing rate. Nature. 2007;448:802–6. 31. Graupner M, Reyes AD. Synaptic input correlations leading to membrane potential decorrelation of spontaneous activity in cortex. J Neurosci. 2013;33:15075–85. 32. Renart A, de la Rocha J, Bartho P, Hollender L, Parga N, Reyes A, Harris KD. The asynchronous state in cortical circuits. Science. 2010;327:587–90. 33. Salinas E, Sejnowski TJ. Impact of correlated synaptic input on output firing rate and variability in simple neuronal models. J Neurosci. 2000;20:6193–209. 34. Hu Y, Zylberberg J, Shea-Brown E. The sign rule and beyond: boundary effects, flexibility, and noise correlations in neural population codes. PLoS Comput Biol. 2014;10:e1003469. 35. Yim MY, Kumar A, Aertsen A, Rotter S. Impact of correlated inputs to neurons: modeling observations from in vivo intracellular recordings. J Comput Neurosci. 2014;37:293–304. 36. Averbeck BB, Latham PE, Pouget A. Neural correlations, population coding and computation. Nat Rev Neurosci. 2006;7:358–66. 37. Usher M, McClelland JL. The time course of perceptual choice: the leaky, competing accumulator model. Psychol Rev. 2001;108:550–92. 38. Ostojic S, Brunel N. From spiking neuron models to linear-nonlinear models. PLoS Comput Biol. 2011;7:e1001056. 39. Churchland AK, Kiani R, Chaudhuri R, Wang X-J, Pouget A, Shadlen MN. Variance as a signature of neural computations during decision making. Neuron. 2011;69:818–31. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png The Journal of Mathematical Neuroscience Springer Journals

Estimating Fisher discriminant error in a linear integrator model of neural population activity

Loading next page...
 
/lp/springer-journals/estimating-fisher-discriminant-error-in-a-linear-integrator-model-of-bXrAI4Om9V

References (39)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2021
eISSN
2190-8567
DOI
10.1186/s13408-021-00104-4
Publisher site
See Article on Publisher Site

Abstract

School of Psychology, University of Decoding approaches provide a useful means of estimating the information Ottawa, 136 Jean Jacques Lussier, Ottawa, ON K1N 6N5, Canada contained in neuronal circuits. In this work, we analyze the expected classification Brain and Mind Research Institute, error of a decoder based on Fisher linear discriminant analysis. We provide expressions University of Ottawa, Ottawa, ON that relate decoding error to the specific parameters of a population model that K1N 6N5, Canada performs linear integration of sensory input. Results show conditions that lead to beneficial and detrimental effects of noise correlation on decoding. Further, the proposed framework sheds light on the contribution of neuronal noise, highlighting cases where, counter-intuitively, increased noise may lead to improved decoding performance. Finally, we examined the impact of dynamical parameters, including neuronal leak and integration time constant, on decoding. Overall, this work presents a fruitful approach to the study of decoding using a comprehensive theoretical framework that merges dynamical parameters with estimates of readout error. Keywords: Linear model; Fisher linear discriminant analysis; Noise correlation 1 Introduction In recent years, neuronal decoding has emerged as a key aspect of understanding the neu- ral code [1]. The aim of decoding algorithms is to read out the sensory-driven responses of a neuronal population and classify them following a given criterion. Popular criteria in- clude Fisher information [2, 3], mutual information [4], and machine learning approaches [5, 6]. While many types of decoders exist [7], a linear readout of neural activity has of- ten been employed to perform sensory classification [8, 9] and predict motor decisions [10, 11]. Further, different classes of linear readouts are amenable to mathematical analy- sis and capture biological learning rules such as Hebbian learning [12]. In this work, we formally analyze the optimal decoding error of a linear decoder based on Fisher linear discriminant analysis (LDA). Assuming discrete classes of stimuli, LDA provides an upper bound on linear decoding capacity [13]. In addition, LDA shows good agreement with decision-making tasks and offers a bridge between cortical activity and behavioral performance [14, 15]. Importantly, most theoretical approaches based on neural decoding are not concerned with how linear decoders would be influenced by specific dynamical parameters of mod- © The Author(s) 2021. This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 2 of 22 Figure 1 Fisher linear discrimination of neural activity in a population model. (A) Two neural populations (x and y) where the noise correlation is adjusted via a parameter ρ. Each population receives two distinct inputs (ν and ν ) and a private source of noise whose gain is β and β , respectively. The stimulus-driven response 1 2 x y of each population is described by a tuning curve relating stimulus orientation to firing rate. (B) Activity for populations x and y is shown at discrete time-points (solid black circles). The optimal decision boundary (c) obtained by LDA discriminates amongst the neural activity generated by each of the two inputs. Neural responses follow a Gaussian distribution. The shaded area shows the proportion of discrimination error for stimulus 2 eled neural systems [16]. Here, we address this concern by providing expressions that re- late decoding error to the adjustable parameters of a rate-based population model with a linear neural integrator [17, 18]. This model captures the average spiking activity of neu- ronal populations [19–21] and the quasi-linear responses of neurons found in many exper- imental contexts [22]. Preliminary results have been presented in previous work [13, 14], yet the full analytical solution had remained incomplete and limited to positive noise cor- relation; we now present the complete solution. The framework relies on the simplifying assumption that signal and noise correlations originate from independent sources. While this assumption does not hold in biological circuits, where signal and noise are related [1], it allows us to systematically explore a wide range of scenarios that describe the impact of neuronal inputs, noise, correlations, and dynamical parameters on linear decoding, where the contribution of each parameter can be examined independently. This paper begins by describing the neural integrator model and the LDA readout. Then, we provide expressions for LDA error that rely on the parameters of the integrator model. Finally, we consider the effect of correlation, noise, and dynamical parameters on neuronal decoding using both analytical expressions and numerical simulations. 2 Linear population model As a starting point, we assume two independent neuronal populations, each projecting in a feedforward manner to a readout discriminating amongst two inputs, ν and ν ,thatare 1 2 constant over time (Fig. 1(A)). Each population’s mean firing rate in response to stimuli is conceptualized by a tuning curve where a stimulus feature, for instance visual orien- tation, generates a graded response. This scenario is analogous to analyses that examine population responses after performing a dimensionality reduction to generate a “popula- tion tuning curve” [23]. While a more complex model could account for a heterogeneity of responses within each population, we choose to limit our model to two homogeneous populations in order for the classification problem to remain tractable. Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 3 of 22 The activity of each population is described by a linear neural integrator dx τ =–α x + ν + β ξ (t), x x i i,x x x dt (1) dy τ =–α y + ν + β ξ (t), y y i i,y y y dt where x and y are the firing rates of each population in response to a given stimulus i, τ i i is a time constant, α is a leak term, ξ(t)is Gaussian white noise (N (0, 1)), and β is the gain of the noise. Network parameters τ , α,and β are bound to R . We make no distinction >0 between noise induced by stimuli and noise generated through intrinsic neural activity. While their effect on mean rate activity is similar [24], their impact on noise correlations differs [1]; in the model, we explicitly separate the effect of firing rate and noise correlation. This will be done by controlling noise correlation through a tunable parameter, as detailed in Sect. 4. An advantage of this formalism is that the effect of noise correlation can be systematically isolated from changes in firing rates and signal correlation that would be induced through reciprocal connections between the two populations. Further, depending on the choice of parameters, the addition of recurrent weights to Eq. (1)may preventthe system from reaching a stationary state, which is a fundamental assumption of LDA. 3 Fisher linear discriminant decoder A linear decoder based on LDA reads out the activity of the population model in order to perform a binary discrimination (Fig. 1(B)). Discrimination error generated by LDA provides an estimate of the statistical confidence in distinguishing pairs of stimuli based on network activity. We focus on pairwise discrimination given that error rates obtained from more than two stimuli are well approximated by values obtained from all combinations of pairwise comparisons [25]. LDA assumes that neural activity is sampled from a multivariate Gaussian distribution with class covariance matrix  and class mean vector μ .Further,LDA assumesequal i i class covariance, therefore  =  = . LDA attempts to find a projection line w,per- 1 2 pendicular to the decision boundary, onto which the input space is projected. The optimal projection line maximizes the Fisher criterion J (w) defined as the ratio of the projected between- to within-class variance: w · (μ – μ ) 2 1 J(w)= . w ·  · w Given the assumption of equal class covariance, we set  =2. By taking the derivative of J (w)with respect to w and setting it to zero, one finds the closed-form solution for the optimal projection line to be –1 W =(2) (μ – μ ). (2) 2 1 4 Formulating a model-based linear decoder To analytically derive means (μ and μ ) and covariance () from the neural population 1 2 model, we rearrange Eq. (1) as follows, using population x as example: α ν β x ix x dx = – x dt + ξ (t) dt.(3) i i x τ α τ x x x Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 4 of 22 Given that a white noise process is by definition the time derivative of a Weiner process, ξ(t)= dW /dt,wecan rewriteEq. (3)as dx = θ (μ – x ) dt + λ dW,(4) i x ix i x x,t with θ = α /τ , μ = ν /α ,and λ = β /τ .Equation(4)isanOrstein–Uhlenbeck process x x x ix ix x x x x with known solution –θ t –θ (t–s) x x x (t)= μ +(x – μ )e + λ e dB(s). (5) i ix i0 ix Equation (5) is a mean reverting process whose stable state follows a Gaussian distri- bution. A full derivation of this process is found in Sections A.1–A.2. To summarize this derivation, the expected mean and variance are –θ t E x (t) = μ +(x – μ )e , i ix i0 ix x –2θ t Var x (t) = 1– e . 2θ The stationary mean and variance of Eq. (5)are x,i lim E[x ]= μ = , i ix t→∞ 2 2 λ β x x 2 lim Var(x )= = = σ . t→∞ 2θ 2τ α x x x With the assumption that the mean of x is much larger than the variance, there is negli- gible probability that x would fall below zero. Imposing strictly positive values of x could be achieved by the addition of a constant and would not alter the results obtained from the linear classifier. The readout of neural activity depends on the following feature space: Z ∼ N (μ , ), μ =[μ , μ ] , xi yi σ ρσ σ x y = , ρσ σ σ x y where Z is obtained from the probability distribution of a multivariate Gaussian with mean μ and covariance . Setting the parameter ρ = 0 would be equivalent to a so-called “di- agonal decoder” where off-diagonal elements of the covariance matrix are neglected, thus ignoring noise correlations altogether [16]. The closed form solution of LDA (Eq. (2)) can be expressed using the parameters of the population model (Eq. (1)) as follows. First, the total within-class scatter S is S =2 σ ρσ σ x y =2 ρσ σ σ x y y Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 5 of 22 ⎡  ⎤ 2 2 β β β y x x ⎢ ⎥ 2τ α 2τ α 2τ α x x x x y y ⎢  ⎥ =2 . ⎣ ⎦ 2 2 β β β y y 2τ α 2τ α 2τ α x x y y y y T T To alleviate the notation, we define μ =[ μ , μ ] = μ – μ ,where μ = ν /α , x y u u u 0 1 and ν is the absolute difference between the two stimuli, given an index u that stands foreitherpopulation x or y.Inthisway,Eq. (2)becomes –1 W =(2) μ ⎡ ⎤ 1 1 μ – ρ μ 2 x y 2 2 1 σ σ ⎢ x y ⎥ ⎣ ⎦ 1 1 2(1 – ρ ) μ – ρ μ y x σ 2 2 σ σ x y ⎡  ⎤ τ τ α x x x ν – ρ ν x y 2 2 2 β β β α x x y ⎣  ⎦ = . τ τ α y y y τ 2 x 2(1 – ρ ) ν – ρ ν y x 2 2 2 β β β α y y x From the law of total probability, the error rate of classification is given by ε = P[y =0|k =1]P[k =1] + P[y =1|k =0]P[k = 0], (6) where P[k = 1] is the probability that a randomly sampled point from any distribution be- longs to class j and P[y = i|k = j] is the probability that a point is classified as belonging to class i when it belongs to class j. Given that the classifier is unbiased towards each of the two neural populations, P[k =0]= P[k = 1] = 0.5. To calculate conditional probabil- ities P[y = i|k = j], one must define a threshold c that serves as a boundary between the two distributions. The value of c is chosen to be the midpoint between the means of the projected distributions. We calculate P[y = i|k = j] as the area under the curve of the density function for j in the region where i is the correct class. As a first step, we shift the projected distributions by a factor c, so that the threshold becomes zero to simplify the integration. More specifically, the unshifted threshold c, the means of the shifted distributions η , and their variance ζ are c = W · (μ + μ )+ b, 1 0 η = W · μ + b – c, 2 T ζ = W W, with bias term b. The error rate from Eq. (6)thenbecomes 2 2 0 ∞ –(w–η ) –(w–η ) 1 0 1 1 1 1 2 2 2ζ 2ζ ε =  e dw +  e dw. 2 2 2 2 2ζ π 2ζ π –∞ 0 Details of the full integration of error can be found in Section A.3. The final expression is 1 η ε = erf c  . 2ζ Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 6 of 22 This expression is further simplified by introducing the squared Mahalanobis distance 1 1 ε = erf c √ d,(7) 2 2 where 2 T –1 d = μ  μ.(8) Because of equal class covariance, the above expression has the property that d(μ , μ )= d(μ , μ )= d. 0 1 1 0 Using Eq. (8), we rewrite d from the network parameters: 1 1 1 1 2 2 2 d = μ + μ –2ρ  μ μ , x y x y 2 2 2 1– ρ σ σ 2 2 x y σ σ x y 2 τ τ τ τ y y x x 2 2 = ν + ν –2ρ ν ν . x y x y 2 2 2 2 2 1– ρ β α β α β α β α x y x y x y x y As the ratio μ / σ appears often in the above solution, we simplify our notation by introducing u τ u u r =  = ν . u u β α This expression simplifies the Mahalanobis distance to 2 2 2 d = r + r –2ρr r . x y x y 1– ρ The full derivation of expected error using Mahalanobis distance is found in Sections A.3–A.4. The above analysis provides a relationship between classification error and the network parameters of the population model. In the sections to follow, we explore the various links between these quantities. 5 Noise correlation Neurons that are in close physical proximity exhibit correlations in their activity. An exten- sive body of work has examined the impact of these noise correlations on behavioral tasks [26] and the activity of brain circuits [27–35]. Noise correlations may be advantageous or detrimental to cognitive and sensory processing; however, the specific network-level properties that give rise to these effects have not been fully elucidated. In the proposed model, the effect of noise correlation on classification error is highly dependent upon the sensory inputs (ν and ν ). We distinguish four main cases that lead 1 2 to qualitatively different conclusions on the impact of noise correlations. Details of these analyses are provided in Sections A.5–A.6. Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 7 of 22 Figure 2 Impact of noise correlation on Fisher linear discriminant analysis. (A)Scenariowhere thetuning curves are the same for both populations of neurons, leading to r → r . Top left: illustration of tuning curves x y (black lines) and stimulus orientations (blue and red lines). Top right: example of numerical responses to two stimuli (red and blue circles), with noise correlation of 0.9. Bottom: solid line, analytical estimate. Filled black circles, numerical simulations. (B) Scenario where the tuning curves are offset by a fixed orientation. In this case, r → -r .(C) Symmetrical case arising when one of the populations (for instance, x) generates the same x y firing rate for a range of stimulus orientations, leading to r =0. (D) Case where one population has higher gain, leading to |r |= |r | x y A first case arises when the tuning curves of populations x and y are identical in terms of both their orientation preference and their gain (Fig. 2(A)). In this case, r → r , leading x y to monotonically increasing error as a function of correlation. Intuitively, this happens because correlation forces the firing rate distributions to “stretch” towards each other. We verified the analytical solution by comparing it to numerical estimates of the error rate as a function of noise correlation. These numerical estimates were obtained with Eq. (1), where populations x and y both received inputs ν =11 and ν =14 inorder forthe model 1 2 to mimick a scenario where the two populations have identical tuning properties. The goal here is not to capture the model’s response to a continuum of stimulus values along the tuning curves, but rather to illustrate the behavior of the model using discrete stimuli. We set τ =1, β =1, and α = 1 for both populations. We then numerically generated 5000 points per stimulus class. A subset of 80% of the total number of data points were randomly selected to train the LDA classifier. The proportion of misclassified points was calculated based on the remaining data points. We found good agreement between the numerical estimates and analytical solution (Fig. 2(A)). Note that the range of error may be increased by moving the firing rate distributions closer to each other without altering the overall shape of the function relating error and Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 8 of 22 noise correlation. While the goal here was to show the distribution of readout error across a broad range of correlation values, we acknowledge that not all combinations of tuning curves and noise correlations are physiologically plausible. In fact, while noise correlations in cortex vary across experimental conditions, regions, and behavioral states, they are typ- ically reported to be on the order of 0.1–0.3 for nearby cells [26]. Therefore, extreme values (both positive and negative) are unlikely to arise in living circuits. In a second scenario, the two populations are offset in terms of their orientation pref- erence (Fig. 2(B)). We examined classification error in this scenario by setting the input of population x to ν =11 and ν = 14, while population y was set to ν =14 and ν =11. 1 2 1 2 Analytically, this scenario leads to r → –r , resulting in a monotonically decreasing error x y as correlation increases from –1 to 1. Intuitively, this scenario arises because correlation stretches the distributions of responses along parallel lines, decreasing the overlap be- tween them. A third case arises when the tuning curve of one of the two populations yields the same response for two stimuli (Fig. 2(C)). This happens if the tuning curve of population x ex- hibits a broad region where firing rate remains constant despite changes in stimulus orien- tation. Analytically, this would lead to r = 0. We illustrate this scenario by setting ν =11 x 1 and ν = 11 for population x,and ν =11 and ν = 14 for population y.Thiscaseyields 2 1 2 a “symmetrical” effect of correlation on readout error, where maximum error is found at ρ = 0 and error tends towards zero as ρ approaches either 1 or –1. Finally, a fourth scenario occurs when the two populations have tuning curves that are aligned in terms of orientation preference, but where one population has higher response gain (Fig. 2(D)). This case is defined by |r |= |r |. Error tends to zero as noise correla- x y tion (ρ) goes to either –1 or 1. The correlation associated with maximum error is found somewhere in between these extremes and is given by 2 2 min(r , r ) x y ρ =.(9) r r x y To illustrate this scenario, we set ν =11 and ν =13 for population x,and ν =11 1 2 1 and ν = 14 for population y. Graphically, this scenario arises when noise correlation “stretches” the distribution of responses along parallel lines and their centroids do not align on either dimension. Starting from a correlation of zero, as correlation increases, the distributions will stretch towards each other, thus increasing overlap and error. After a maximum overlap defined by ρ , further stretching of the distributions will force them to spread too thinly for them to overlap, until the extreme case of a correlation of one, where both distributions would appear as perfectly parallel lines, leading to zero error. A continuum of cases exists between the different scenarios illustrated in Fig. 2(A)– (D). For instance, the peak error (ρ )inFig. 2(D) can shift to lower correlation values by offsetting one of the tuning curves, yielding a curve closer to Fig. 2(B). In sum, the above results show that, depending upon the structure of the input deliv- ered to the two neural populations, noise correlations produce widely different effects on classification error. While insights into these results can be obtained without the full for- malism described here [34], such formalism becomes pivotal when examining the effect of specific network parameters, as described next. Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 9 of 22 6 Impact of noise gain on classification error To explore the effect of network parameters on error, we first modify Eq. (9) as follows: 2 2 r ⎨ x min(r , r ) if |r | < |r |, x y x y ρ = = (10) r r x y if |r | > |r |, x y where the ratio r /r can be expressed using network parameters x y β τ α r ν y x y x x = . r ν β τ α y y x y x We define a set containing all network parameters G = {α , τ , β , ν }.If g is a subset u u u u u of these parameters, we can manipulate them using a function f (g) while setting the other parameters to a constant c .Inthisway,wecan rewriteEq. (10)as f (g)c if |r | < |r |, g x y ρ = ⎩ –1 –1 f (g) c if |r | > |r |. x y We can investigate the effect of network parameters on ρ . For example, the effect of noise gain (β and β )on ρ when keeping all other parameters constant except for the x y ∗ input is expressed as τ α β ν x y y x ρ = f (β , β )c = ∗ y x β ,β √ y x β ν τ α x y y x for |r | < |r |. x y For illustration purposes, we explored the scenario described in Fig. 2(A), where two populations have equivalent tuning properties. Keeping all parameters constant while al- tering both β and β simultaneously has no effect on ρ (Fig. 3(A)). The main impact is x y ∗ an increase in the amount of classification error (Fig. 3(B)). This result is not surprising: increasing the gain of the noise worsens readout performance. However, markedly different results emerge in a scenario where tuning curves are offset (Fig. 2(B)) and β is altered while keeping β unchanged. In this case, ρ = f (β )c with x y ∗ x β c given by β ν τ α y x x y c = , ν τ α y y x and f (β )=1/β . Alterations in β impact ρ in a non-monotonic fashion (Fig. 3(C)). A x x x ∗ small increase from β =1 to β =2 shifts ρ towards a more negative value. However, x x ∗ further increasing to β =3 and β =4 increases ρ and alters the relationship between x x ∗ correlation and readout error (Fig. 3(D)). Hidden in these results is a counter-intuitive finding: under certain circumstances, in- creasing β leads to a decrease in classification error. This can be seen with β =10 (Fig. x x 3(D), dashed line), leading to lower error than β = 3 (green line) and β =4 (red line) for x x negative correlations. Intuitively, this can happen when increasing β stretches the distri- bution of activity for population x along a single dimension away from the classification Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 10 of 22 Figure 3 Influence of noise gain on discrimination error. (A) Scenario where the noise gains of both populations (β and β ) are adjusted simultaneously. In this case, the value of noise correlation leading to x y maximal error (ρ ) remains constant. Inset: tuning curves for the two populations. (B) Error as a function of noise correlation for four different values of noise gain, with colors corresponding to the colored circles in panel “A”. Filled circles indicate ρ .(C) Impact of modifying the noise gain of population x only. (D) Different values of noise gain for population x.(E) Scenario taken from panel D of Fig. 2, showing a monotonic decrease in ρ when increasing β .(F)Impactof β on classification error ∗ x x boundary [13]. Similar findings are borne out of graphical explanations where noise co- variance stretches the distribution of firing rates [36]. The benefits of noise gain are even more pronounced in a scenario where one population has higher gain than the other, as in Fig. 2(D). In this case, β monotonically shifts ρ x ∗ towards decreasing values (Fig. 3(E)). For a broad range of positive correlation values, a high noise gain (β >1) leads to lower classification error (Fig. 3(F)). 7 Impact of dynamical parameters The approach described in the previous section can be applied to study the impact of the model’s dynamical parameters on readout error. The two parameters of interest are the leak term (α) and the time constant (τ ). The effect of the time constants τ and τ on ρ can be expressed as x y ∗ β α τ ν y y x x ρ = τ ν β α y y x x Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 11 of 22 Figure 4 Mediating role of dynamical parameters. (A) In a scenario where tuning curves are offset by a fixed orientation, increasing the time constant τ leads to an increase in correlation associated with maximal error (ρ ). (B) Different values of τ (colors corresponding to panel “A”) alter the relation between noise correlation ∗ x and readout error. Inset: examples of firing rate distributions across two stimuli (shown in blue and red). (C) Increasing the leak term α leads to a decrease in ρ .(D) Readout error across different values of α (see panel x ∗ x “C” for colors). Inset: examples of firing rate distributions for |r | < |r |.Tostudy theeffect of asingleterm(e.g., τ ), we set ρ = f (τ )c with c given x y x ∗ x τ τ x x by β ν α y x y c = , τ √ β ν τ α x y y x and f (τ )= τ . Similarly, the role of leak terms α and α on ρ is x x x y ∗ α ν β τ y x y x ρ = . ∗ √ α ν β τ x y x y For a single term (α ), we have ρ = f (α )c with x ∗ x α β ν τ α y x x y c = , α √ β ν τ x y y and f (α )=1/α . Taking one scenario as illustration, we examined the case where tuning x x curves are offset by a fixed orientation (r → -r ). In this case, the time constant affects the x y relation between noise correlation and readout error, with larger values of τ shifting ρ x ∗ towards smaller negative values of correlation (Fig. 4(A)). The reason for this shift follows from an earlier example (Fig. 2(D)), where an increased correlation resulted in greater overlap between the firing rate distributions, but only up to a point beyond which these distributions became too narrow to overlap. With larger values of τ , a given correlation does not create as much overlap as it would for smaller values of τ , thus leading to a shift in ρ . ∗ Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 12 of 22 Figure 5 Nonlinear impact of dynamical parameters on classification error. (A) In a scenario where the tuning curves of the two neural populations are equivalent, τ and ρ have a non-monotonic relation. (B)Ina x ∗ scenario where the gain of one tuning curve is larger, α and ρ are non-monotonically related x ∗ The overall impact of a larger time constant is a decrease in classification error (Fig. 4(B)): as τ increases, there is less overlap between the distributions of firing rate across stimuli (Fig. 4(B), inset). By contrast, shifting the leak term α towardshighervaluesde- creases ρ (Fig. 4(C)) and increases overall readout error (Fig. 4(D)). The impact of in- creasing α on error is due to an increase in the overlap between firing rate distributions (Fig. 4(D), inset). The inverse effects of τ and α on these distributions explain their op- x x posite impact on ρ . More complex, non-monotonic relations between ρ and values of τ and α are found ∗ x x in different scenarios where tuning curves of the two populations are aligned (Fig. 5(A)) or when the gain of one population is larger (Fig. 5(B)). Together, these results show that the integration time constant and leak term of the pop- ulation model mediate the impact of noise correlation on classification error by shifting the value ρ at which correlation reaches maximal error. The impact of network param- eters on readout error is therefore not straightforward to describe but is brought to light using a framework that derives error estimates from the dynamical parameters of a pop- ulation model. 8 Discussion This work described an analytical framework for performing Fisher linear decoding in a rate-based neural model. With this formalism, we began by capturing well-documented findings on the role of noise gain and correlations on discrimination error. Going further, the framework allowed us to analytically examine the mediating role of dynamical param- eters (neuronal leak and time constant) on the relation between noise correlation and er- ror. Overall, this framework suggests that linear decoding is highly sensitive to dynamical model parameters as well as the characteristics of the sensory input. One surprising finding was the presence of conditions where increased neuronal noise led to reduced classification error. This result was especially prominent when the gain of the two population tuning curves was unmatched (Fig. 3(E)–(F)). Taken together, our findings cover all possible qualitative scenarios where noise correlations have either a ben- eficial, detrimental, or null effect on decoding [36]. A related approach termed the leaky competing accumulator model was proposed in order to account for perceptual decision making [37]. Some key differences exist between this model and ours. Firstly, our framework assumes a steady-state of neural activity that is characteristic of a decision point and does not capture the time-course of deliberation. Our Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 13 of 22 framework assumes an optimal bound on decision accuracy given a linear decoder, repre- senting a ceiling in accuracy that would be associated with long response times (typically >500 ms in human subjects). Secondly, the accumulator model provides explicit connec- tions, through lateral inhibition, that modulate correlations. These lateral connections, however, may also impact firing rates. By comparison, our framework isolates analytically the contribution of firing rates and correlations, and examines their relative role on per- ceptual discrimination. It would be challenging to speculate on whether the analytical results provided would generalize to other classes of neural network models, particularly those that include a non- linear transfer function [38]. However, our work opens the door to such analyses by de- scribing a framework for linking neuronal readout and dynamical modeling. Limitations and future work.While theframework describedherestrived to coverall possible scenarios involving firing rates, noise correlations, and network parameters, it is important to emphasize that not all such scenarios are plausible from a physiological standpoint. In particular, the framework treats firing rates and noise correlations as inde- pendent contributors to decoding error and allows for implausible cases where increases in firing rate would lead to an increase, a decrease, or no impact on correlations. Inter- actions between stimulus and noise correlations are a crucial factor limiting the coding capacity of neural circuits [1, 23] and should be considered alongside the dynamical pa- rameters discussed in this work. Several future directions based on the proposed framework will be worth exploring. First, the assumption of equal class covariances in LDA is challenged by experimental work showing input-dependent neuronal variance [39]. This assumption could be relaxed by replacing LDA with quadratic discriminant analysis, albeit at the cost of a more complex solution when relating readout error to model parameters. An extension of the current framework could consider the impact of pooling more than two neural populations, as well as more than two stimuli, when performing decoding. This extension would be helpful in examining the interactions between several populations of neurons, each with a unique tuning curve. Going further, one could examine decoding erroratthe limitofalargenumberofneurons with heterogeneoustuningcurvesthat vary in both orientation preference and gain [2]. Conclusion. In summary, this work described a theoretical framework that merges Fisher linear decoding with a population model of sensory integration. This approach highlighted the role of correlation, neuronal noise, and network parameters, revealing a broad range of potential outcomes where different conditions generated either detrimental, beneficial, or null impacts on classification performance. These results motivate further developments in theoretical work that systematically link neural network models to optimal decoders in order to reveal the impact of key neurophysiological variables on sensory information processing. Appendix A.1 Solving the integrator model as a linear differential equation To solve the integrator model, we began by dropping the unit indices to alleviate the no- tation: dx τ =–αx + ν + βξ(t) dt Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 14 of 22 dx α ν + βξ(t) ⇔ =– x + dt τ τ dx α ν + βξ(t) ⇔ + x = dt τ τ dx ⇔ + p(t)x = r(t) dt with p(t)= α/τ and r(t)=(ν + βξ(t))/τ.Wedefined p(t) dt u(t)= e dt = e = e . Then, dx u(t) + p(t)x = u(t)r(t) dt α dx α α α t t t τ τ τ ⇔ e + e x = e r(t). dt τ Applying the chain rule, u(t)x = u(t)r(t) dt ⇔ d u(t)x = u(t)r(t) dt t t ⇔ d u(s)x = u(s)r(s) ds o 0 ⇔ u(s)x| = u(s)r(s) ds ⇔ u(t)x – u(0)x = u(s)r(s) ds –1 ⇔ x = u(t) x + u(s)r(s) ds α α ν + βξ(s) – t s τ τ = e x + e ds α 1 α – t s τ τ = e x + e ν + βξ(s) ds 0 i t t α 1 α 1 α – t s s τ τ τ = e x + e ν ds + e βξ(s) ds 0 i τ τ 0 o α 1 τν α β α – t s t s τ τ τ = e x + e | + e ξ(s) ds τ α τ α ν α β α – t s s τ τ τ = e x + 1– e + e ξ(s) ds α τ ν β α α α α – t – t – t s τ τ τ τ = x e – 1– e + e e ξ(s) ds, α τ o Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 15 of 22 ν ν α β α i i – t – (t–s) τ τ x = + x – e + e ξ(s) ds. α α τ A.2 Expected value and variance We sought to find the expected mean and variance of the random variable x such that –θt –θ(t–s) x = μ +(x – μ)e + σ e dB . o s The expected mean is –θt –θ(t–s) E[x]= E μ +(x – μ)e + σ e dB 0 s –θt –θ(t–s) = E[μ]+ E (x – μ)e + E σ e dB . 0 s Given the zero-mean property of Ito integrals, –θ(t–s) E σ e dB =0, we have –θt E[x]= μ +(x – μ)e . (11) The expected variance is –θt –θ(t–s) var(x)= var μ +(x – μ)e + σ e dB 0 s 2 –θ(t–s) = σ var e dB 2 2 t t 2 –θ(t–s) –θ(t–s) = σ E e dB – E e dB t s 0 0 2 –θ(t–s) = σ E e dB . By Ito isometry, t t 2 –θ(t–s) 2 –θ(t–s) σ E e dB = σ E e ds . 0 0 Hence, the expected variance can be concisely written as 2 –2θ(t–s) var(x)= σ e ds –2θ(t–s) = σ – –2θ –2θ(t–t) –2θ(t–0) = e e 2θ Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 16 of 22 –2θt = 1– e . 2θ A.3 Classification error The classification error as a function of neural activity is given by 2 2 0 –(w–η ) ∞ –(w–η ) 1 1 1 1 2 2 2ζ 2ζ ε =  e dw +  e dw 2 2 2 2 –∞ 2ζ π 0 2ζ π 1 1 πζ η π η 1 0 2 2 =  erf c  + ζ erf  + ζ 2 2 2 2 2 2 2ζ π 2ζ 2ζ 1 1 πζ η η 1 0 =  erf c  + erf  +1 2 2 2 2 2 2ζ π 2ζ 2ζ 1 η η 1 0 = 1– erf  + erf  +1 2 2 2ζ 2ζ 1 η –η 1 1 = 2– erf + erf 2 2 2ζ 2ζ 1 η = 1– erf 2ζ 1 η = erf c  . 2ζ Substituting the mean and variance from the previous section, this becomes 1 d ε = erf c 2 √ 4 2 d 1 1 = erf c √ d . 2 2 A.4 Mahalanobis distance We began with the following definitions: –1 W =(2) μ, c = W · (μ + μ )+ b, 1 0 η = W · μ + b – c, 2 T ς = W W. Expanding η yields η = W · μ + b – W · (μ + μ )+ b i 1 0 = W · μ – (μ + μ ) . i 1 0 2 Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 17 of 22 Given η = W · (μ + μ ) 1 0 = W · μ =–η , we expanded W using the property u · υ = u υ, –1 η = (2) μ · μ –1 =  μ μ T –1 = μ  μ T –1 = μ  μ. Hence, the squared Mahalanobis distance between means is 2 T –1 d = μ  μ. We can rewrite η as η = d =–η . 1 0 Similarly, for the variance ς , 2 –1 –1 ς = (2) μ (2) μ T –1 –1 = μ   μ T –1 = μ  μ = d . A.5 Derivation of error We analyzed the extrema of the error function in relation to noise correlation by taking its first derivative through the chain rule dε d 1 1 = erf c √ d dρ dρ 2 2 2 1 d = erf c(z) 2 dρ 1 d dz dd = erf c(z) , (12) 2 dz dd dρ Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 18 of 22 with 2 2 2 d = r + r –2ρr r . (13) x y x y 1– ρ The first derivative is given by –z d –2e erf c(z) = dz 1 2 –( d ) 2 2 –2e = √ 1 2 – d –2e = √ . (14) The second derivative is dz d 1 = √ d 2 2 dd dd 2 2 = √ . (15) 4 2d The third derivative is dd d 1 2 2 = r + r –2ρr r x y x y dρ dρ 1– ρ d 1 1 d 2 2 2 2 = r + r –2ρr r + r + r –2ρr r x y x y x y x y 2 2 dρ 1– ρ 1– ρ dρ 2ρ 1 2 2 = r + r –2ρr r + [–2r r ] x y x y x y 2 2 2 (1 – ρ ) 1– ρ 2 2 2 = r + r –2ρr r 2ρ + 1– ρ (–2r r ) x y x y x y 2 2 (1 – ρ ) 2 2 2 = 2ρr +2ρr –2ρ2ρr r + –2r r +2r r ρ x y x y x y x y 2 2 (1 – ρ ) –2 2 2 2 = ρ r r – ρ r + r + r r . (16) x y x y x y 2 2 (1 – ρ ) A.6 Extrema of error We evaluated the extrema of error by finding the points where Eqs. (14)–(16)are equalto zero, 1 2 – d d –2e 0= erf c(z) ⇔ 0= √ , dz π (17) d →∞. We assumed that the ratios r and r are finite and the Euclidean distance between the x y distribution means is finite and non-null. In other words, if d →∞ it is exclusively due to the correlation coefficient. Then, d →∞ ⇔ |ρ|→ 1. (18) Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 19 of 22 We proceeded in a similar fashion for the second derivative (Eq. (15)): dz 1 0= ⇔ 0= √ dd 4 2d (19) d →∞ ⇔ |ρ|→ 1. The third derivative (Eq. (16)) is dd –2 2 2 2 0= ⇔ 0= ρ r r – ρ r + r + r r , x y x y x y 2 2 dρ (1 – ρ ) (20) 2 2 2 ⇔ 0= ρ r r – ρ r + r + r r . x y x y x y Depending on network parameters, two cases are possible. One case arises if one of the ratios, either r or r , is zero. This happens if the mean activity of one population is x y equal across inputs. If the mean activity of both units remained unchanged, the resulting multivariate distributions would overlap, thus breaking the basic assumptions justifying the choice of LDA. In this first case, dd 0= ⇐ 0= ρ if r =0 or r = 0. (21) x y dρ Thesecondcaseoccurswhenneither r nor r is zero: x y 2 2 r + r dd x y 0= ⇐ 0= ρ – ρ +1 dρ r r x y 2 2 2 2 r +r (r +r ) x y x y ± –4 2 2 r r x y r r x y ⇔ ρ = 2 2 2 2 2 2 2 r + r ± (r + r ) –4r r x y x y x y 2r r x y 2 2 4 4 2 2 r + r ± r + r –2r r x y x y x y 2r r x y 2 2 2 2 2 r + r ± (r – r ) x y x y 2r r x y 2 2 2 2 r + r ±|r – r | x y x y 2r r x y 2 2 2 2 2 2 r + r ± [max(r , r )– min(r , r )] x y x y x y = . 2r r x y The last expression can be decomposed into four distinct cases. First, when r → r , x y 2 2 r + r y y ρ → → 1. (22) 2r r y y Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 20 of 22 Second, when r → –r , x y 2 2 r + r y y ρ → → –1. (23) –2r r y y Third, when r = r , we examined the positive and negative roots of ρ.The positive root x y is 2 2 2 2 2 2 r + r + max(r , r )– min(r , r ) x y x y x y ρ = 2r r x y 2 2 max(r , r ) x y = . (24) r r x y 2 2 Because | max(r , r )| > |r r | from the assumption that one ratio is smaller than the other x y x y (or unequal, non-null), this means that |ρ | >1 ∀r , r . Since the correlation is bound in the + x y range ]–1, 1[, the positive root must be rejected. The negative root does not suffer from the same problem, 2 2 2 2 2 2 r + r – max(r , r )+ min(r , r ) x y x y x y ρ = 2r r x y 2 2 min(r , r ) x y = . (25) r r x y Fourth, when either r =0 or r =0, ρ =0. x y A.7 Minima and maxima We determined upward and downward trends of the error curve by calculating the sign of the derivative between the potential maxima (considering that they are mutually exclu- sive). Taking Eqs. (14)–(16) and substituting into Eq. (12), dε 1 d dz dd = erf c(z) dρ 2 dz dd dρ – d 1 –2e 1 –2 2 2 2 = √ √ ρ r r – ρ r + r + r r x y x y x y 2 2 2 π (1 – ρ ) 4 2d dε 2 2 2 ⇔ sign = sign ρ r r – ρ r + r + r r . (26) x y x y x y dρ For the condition where r → r , x y dε sign = sign –ρr dρ =– sign(ρ). (27) For the condition where either r =0or r = 0,wehavealreadyfoundthezerosof ρ r r – x y x y 2 2 ρ(r + r )+ r r to be ρ and ρ .Todetermine sign(dε/dρ), we need to know whether the x y – + x y Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 21 of 22 extremum of the parabola is a minimum or a maximum, 2 2 2 ρ r r – ρ r + r + r r =2>0. x y x y x y dρ Given ρ >1, dε >0 ⇔ p ∈ [–1, ρ ] (28) dρ dε <0 ⇔ p ∈ [ρ , 1]. (29) dρ Regardless of the conditions for r and r , following Eqs. (27)–(29), theerror curveasa x y function of correlation increases from ρ = –1 until its maximum, found at a value of ρ =0 or ρ = ρ , and then decreases until ρ =1. ∗ – Acknowledgements This work benefited from discussions with Brent Doiron and Richard Naud. Funding This work was supported by a Discovery grant to J.P.T. from the Natural Sciences and Engineering Council of Canada (NSERC Grant No. 210977). Abbreviations LDA, Linear discriminant analysis. Availability of data and materials Data sharing not applicable to this article as no datasets were generated or analyzed during the current study. Ethics approval and consent to participate Not applicable. Competing interests The authors declare that they have no competing interests. Consent for publication Not applicable. Authors’ contributions JPT and MC conceptualized the study, designed the work, performed the analyses, interpreted the data, wrote software, drafted the work, and revised the final version of the manuscript. All authors read and approved the final manuscript. Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Received: 14 April 2020 Accepted: 3 February 2021 References 1. Moreno-Bote R, Beck J, Kanitscheider I, Pitkow X, Latham P, Pouget A. Information-limiting correlations. Nat Neurosci. 2014;17:1410–7. 2. Ecker AS, Berens P, Tolias AS, Bethge M. The effect of noise correlations in populations of diversely tuned neurons. J Neurosci. 2011;31:14272–83. 3. Shamir M, Sompolinsky H. Implications of neuronal diversity on population coding. Neural Comput. 2006;18:1951–86. 4. Quian Quiroga R, Panzeri S. Extracting information from neuronal populations: information theory and decoding approaches. Nat Rev Neurosci. 2009;10:173–85. 5. Wen H, Shi J, Zhang Y, Lu KH, Cao J, Liu Z. Neural encoding and decoding with deep learning for dynamic natural vision. Cereb Cortex. 2018;28:4136–60. 6. Glasser JI, Benjamin AS, Chowdhury RH, Perich MG, Miller LE, Kording KP. Machine learning for neural decoding. ENeuro. 2020;7:1–16. 7. Kriegeskorte N, Douglas PK. Interpreting encoding and decoding models. Curr Opin Neurobiol. 2019;55:167–79. 8. Klampfl S, David SV, Yin P, Shamma SA, Maass W. A quantitative analysis of information about past and present stimuli encoded by spikes of A1 neurons. J Neurophysiol. 2012;108:1366–80. Calderini and Thivierge Journal of Mathematical Neuroscience (2021) 11:6 Page 22 of 22 9. Meyers EM, Freedman DJ, Kreiman G, Miller EK, Poggio T. Dynamic population coding of category information in inferior temporal and prefrontal cortex. J Neurophysiol. 2008;100:1407–19. 10. Nienborg H, Cumming B. Correlations between the activity of sensory neurons and behavior: how much do they tell us about a neuron’s causality? Curr Opin Neurobiol. 2010;20:376–81. 11. Shadlen MN, Britten KH, Newsome WT, Movshon JA. A computational analysis of the relationship between neuronal and behavioral responses to visual motion. J Neurosci. 1996;16:1486–510. 12. Buonomano DV, Maass W. State-dependent computations: spatiotemporal processing in cortical networks. Nat Rev Neurosci. 2009;10:113–25. 13. Calderini M, Zhang S, Berberian N, Thivierge JP. Optimal readout of correlated neural activity in a decision-making circuit. Neural Comput. 2018;30:1573–611. 14. Berberian N, MacPherson A, Giraud E, Richardson L, Thivierge JP. Neuronal pattern separation of motion-relevant input in LIP activity. J Neurophysiol. 2017;117:738–55. 15. Rich EL, Wallis JD. Decoding subjective decisions from orbitofrontal cortex. Nat Neurosci. 2016;19:973–80. 16. Averbeck BB, Lee D. Effects of noise correlations on information encoding and decoding. J Neurophysiol. 2006;95:3633–44. 17. Cain N, Barreiro AK, Shadlen M, Shea-Brown E. Neural integrators for decision making: a favorable tradeoff between robustness and sensitivity. J Neurophysiol. 2013;109:2542–59. 18. Goldman MS. Memory without feedback in a neural network. Neuron. 2009;61:621–34. 19. Ganguli S, Bisley JW, Roitman JD, Shadlen MN, Goldberg ME, Miller KD. One-dimensional dynamics of attention and decision making in LIP. Neuron. 2008;58:15–25. 20. Miri A, Daie K, Arrenberg AB, Baier H, Aksay E, Tank DW. Spatial gradients and multidimensional dynamics in a neural integrator circuit. Nat Neurosci. 2011;14:1150–9. 21. Murphy BK, Miller KD. Balanced amplification: a new mechanism of selective amplification of neural activity patterns. Neuron. 2009;61:635–48. 22. Chance FS, Abbott LF, Reyes AD. Gain modulation from background synaptic input. Neuron. 2002;35:773–82. 23. Rumyantsev OI, Lecoq JA, Hernandez O, Zhang Y, Savall J, Chrapkiewicz R, Li J, Zeng H, Ganguli S, Schnitzer MJ. Fundamental bounds on the fidelity of sensory cortical coding. Nature. 2020;580:100–5. 24. Burak Y, Fiete IR. Fundamental limits on persistent activity in networks of noisy neurons. Proc Natl Acad Sci. 2012;109:17645–50. 25. Cover TM, Thomas JA. Elements of information theory. 2nd ed. New York: Wiley; 2006. 26. Cohen MR, Kohn A. Measuring and interpreting neuronal correlations. Nat Neurosci. 2011;14:811–9. 27. Zohary E, Shadlen MN, Newsome WT. Correlated neuronal discharge rate and its implications for psychophysical performance. Nature. 1994;370:140–3. 28. Brunel N. Dynamics of sparsely connected networks of excitatory and inhibitory spiking neurons. J Comput Neurosci. 2000;8:183–208. 29. Bujan AF, Aertsen A, Kumar A. Role of input correlations in shaping the variability and noise correlations of evoked activity in the neocortex. J Neurosci. 2015;35:8611–25. 30. de la Rocha J, Doiron B, Shea-Brown E, Josic´ K, Reyes A. Correlation between neural spike trains increases with firing rate. Nature. 2007;448:802–6. 31. Graupner M, Reyes AD. Synaptic input correlations leading to membrane potential decorrelation of spontaneous activity in cortex. J Neurosci. 2013;33:15075–85. 32. Renart A, de la Rocha J, Bartho P, Hollender L, Parga N, Reyes A, Harris KD. The asynchronous state in cortical circuits. Science. 2010;327:587–90. 33. Salinas E, Sejnowski TJ. Impact of correlated synaptic input on output firing rate and variability in simple neuronal models. J Neurosci. 2000;20:6193–209. 34. Hu Y, Zylberberg J, Shea-Brown E. The sign rule and beyond: boundary effects, flexibility, and noise correlations in neural population codes. PLoS Comput Biol. 2014;10:e1003469. 35. Yim MY, Kumar A, Aertsen A, Rotter S. Impact of correlated inputs to neurons: modeling observations from in vivo intracellular recordings. J Comput Neurosci. 2014;37:293–304. 36. Averbeck BB, Latham PE, Pouget A. Neural correlations, population coding and computation. Nat Rev Neurosci. 2006;7:358–66. 37. Usher M, McClelland JL. The time course of perceptual choice: the leaky, competing accumulator model. Psychol Rev. 2001;108:550–92. 38. Ostojic S, Brunel N. From spiking neuron models to linear-nonlinear models. PLoS Comput Biol. 2011;7:e1001056. 39. Churchland AK, Kiani R, Chaudhuri R, Wang X-J, Pouget A, Shadlen MN. Variance as a signature of neural computations during decision making. Neuron. 2011;69:818–31.

Journal

The Journal of Mathematical NeuroscienceSpringer Journals

Published: Feb 19, 2021

There are no references for this article.