Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Causal Network Inference for Neural Ensemble Activity

Causal Network Inference for Neural Ensemble Activity Interactions among cellular components forming a mesoscopic scale brain network (microcircuit) display characteristic neural dynamics. Analysis of microcircuits provides a system-level understanding of the neurobiology of health and disease. Causal discovery aims to detect causal relationships among variables based on observational data. A key barrier in causal discovery is the high dimensionality of the variable space. A method called Causal Inference for Microcircuits (CAIM) is proposed to reconstruct causal networks from calcium imaging or electrophysiology time series. CAIM combines neural recording, Bayesian network modeling, and neuron clustering. Validation experiments based on simulated data and a real-world reaching task dataset dem- onstrated that CAIM accurately revealed causal relationships among neural clusters. . . . Keywords Causal discovery Neuroimaging Dynamic Bayesian network Clustering Introduction and emotion (Ko et al. 2013; Barbera et al. 2016). In contrast to the experimental advances in neural recording techniques, Increasing experimental and computational evidence supports computational analysis of ensemble neural activities is still the existence of a specific pattern of connectivity among ad- emerging. A fundamental problem in microcircuit analysis is jacent neurons during cognition and emotion (Yoshimura and causal discovery. Causal discovery aims to reveal causal struc- Callaway 2005; Yoshimura et al. 2005;Songet al. 2005;Ko tures by analyzing observational data. Several computational et al. 2013; Litwin-Kumar and Doiron 2012). Interactions methods have been developed to infer causal networks from among cellular components forming a mesoscopic scale brain ensemble neural activity, including Granger causality (Chen network (microcircuit) display characteristic neural dynamics. et al. 2006; Hu et al. 2018) and conditional independence A microcircuit lies at the heart of the information processing inference based on dynamic Bayesian networks (DBNs) capability of the brain. It carries out a specific computation of (Eldawlatly et al. 2010). a region. Microcircuits have been shown to encode sensory A key barrier in causal discovery from multiple time series input (Luczak et al. 2007), motor function (Churchland et al. is high dimensionality. For example, calcium imaging can 2007), spatial maps in the entorhinal cortex (Hafting et al. observe ensemble neural activity of hundreds of neurons. 2005), and behavior choice (Harvey et al. 2012). Analysis of Naively applying causal discovery algorithms to such high- microcircuits provides a system-level understanding of the dimensional data causes several problems. First, this naïve neurobiology of health and disease. approach ignores the intrinsic hierarchical structure of the mi- Calcium imaging (Kerr and Nimmerjahn 2012; Ghosh crocircuit. Neurons often form clusters and neurons in the et al. 2011; Scott et al. 2013) and electrophysiology with elec- same cluster have similar functional profiles. For example, trodes are powerful ways to study microcircuits, leading to an D1- and D2-medium spiny neurons (MSNs) in the dorsal stri- understanding of network architecture of behavior, cognition, atum are grouped into spatially compact clusters (Barbera et al. 2016). In the visual cortex, highly connected neurons in a cortical column receive similar visual input (Yoshimura * Rong Chen et al. 2005). These studies suggest that neurons in a microcir- rong.chen.mail@gmail.com cuit form clusters (or modules, communities). Second, con- structing a model from such high-dimensional data with a Department of Diagnostic Radiology and Nuclear Medicine, cluster structure often leads to overfitting (Hastie et al. University of Maryland School of Medicine, 22 South Greene Street, 2009), an unstable model (Sauerbrei et al. 2011;Chen and Baltimore, MD 21201, USA 516 Neuroinform (2021) 19:515–527 Herskovits 2007), and poor parameter estimation (Chen and X ⫫ Y | Z denote that X and Y are conditionally independent Herskovits 2007). given Z. X is not the cause of Y if X ⫫ Y | Z . For a set of t t +1 t The proposed method, called Causal Inference for variables V ={X , …, X }, a causal graphical model is G =(V, 1 p Microcircuits (CAIM), aims to reconstruct causal E), where an edge X → X represents X is a direct cause of X i j i j mesoscopic-scale networks from observational calcium relative to variables in V,and G is a directed acyclic graph. imaging or electrophysiology time series. CAIM com- The assumptions which often are used to relate causal struc- bines neural recording, Bayesian network modeling, and tures to probability densities are the causal Markov assump- neuron clustering. To address the high-dimensionality tion, the causal faithfulness assumption, and the causal suffi- problem, CAIM utilizes clustering to group neurons into ciency assumption (Spirtes et al. 2001). Under these assump- clusters. To solve the causal discovery problem, CAIM tions, a remarkable result according to Geiger and Pearl uses DBNs to identify conditional independence. CAIM (Geiger and Pearl 1990) and Meek (Meek 1995)isthe enables us to move toward a circuit-based approach to Markov completeness theorem: for linear Gaussian and for understand the brain, in which a behavior is understood multinomial causal relations, an algorithm that identifies the to result from specific spatiotemporal patterns of circuit Markov equivalent class is complete (that is, it extracts all activity related to specific neuronal populations. information about the underlying causal structure). This paper is organized as follows. "Background and There are many studies of causal discovery from multiple Related Work" describes the background and related work. time series from problem domains which are not neurosci- "Method" provides the CAIM algorithm, including neuron ence-related, such as inferring gene regulatory networks using clustering and causal network inference. In "Results", valida- time-series gene expression data (Bar-Joseph et al. 2012). A tion experiments on simulated neural activity data and appli- kind of inference framework is growth-shrink. Such methods cation of CAIM to a real-world dataset are presented. first calculate pairwise associations between s and s;and t+ 1 t "Discussion" includes the discussion and issues requiring fur- then remove redundant or spurious connections (Meyer et al. ther investigation are provided, which are followed by 2007). An example of a growth-shrink based method is conclusions. MRNET (Meyer et al. 2007), which uses mutual information between variables and minimum-redundancy-maximum- relevance to infer networks. Another kind of inference frame- Background and Related Work work considers network inference as a regression problem and uses ensemble learning to construct the network. BTNET Network analysis (or connectivity analysis) methods for neu- (Park et al. 2018) is an ensemble learning-based method that ral signals can be classified as synchrony analysis and causal uses a boosted tree to construct the predictive model. discovery. In synchrony analysis, an undirected graph is gen- erated. Synchrony has been extensively studied in neurosci- ence (Averbeck et al. 2006). Correlation, partial correlation, Method and mutual information have been used to measure the asso- ciation between a pair of neurons. CAIM aims to infer causal relationships based on observation- The gold standard of establishing a causal relationship is al calcium imaging or electrophysiology time series. In performing planned or randomized experiments (Fisher CAIM, microcircuits are DBNs (Koller and Friedman 2009) 1970). Pearl proposed an intervention-based framework for representing causal relationships. In a DBN, nodes are vari- causality analysis (Pearl 2009) and distinguished the observa- ables of interest, and edges (links) represent interactions tional conditional probability P(Y|X) and interventional condi- among variables. If a set of nodes, π , causally affects the tional probability P(Y|do(X)) where the do(.) operator is an activity of node i, then there exists a link from the nodes in intervention. The notion of intervention by Pearl implies that π to node i. π is referred to as the parent set of node i. Each i i if we manipulate X and nothing happens, then X is not the node is associated with a binary variable which represents cause of Y;otherwise, X is one of the causes of Y.However, whether the node is activated. Each node is associated with in many scenarios, experiments are too expensive, or not fea- an updating rule that specifies how its state changes over time sible or ethical to carry out. Causal discovery (or effective due to the activation of the parent set. Network dynamics are connectivity analysis) aims to infer cause-effect relations determined by these updating rules. DBNs can characterize among variables based on observational data. Granger pro- system dynamics, handle noisy data, describe locally posed a framework to infer causality based on prediction im- interacting processes, and support causal inference (Chen provement (Granger 1969). An important framework of caus- et al. 2012). al discovery is based on conditional independence (Spirtes CAIM infers causal networks from neural ensemble activ- et al. 2001). This framework considers the dependence be- ities. Neural activities can be recorded by calcium imaging or tween two variables X and Y given a set of variables Z.Let electrophysiology with electrodes. Preprocessing algorithms 517 Neuroinform (2021) 19:515–527 generate binary neuronal events (spike trains or calcium tran- neurons. Neurons within each cluster are more similar to each sient events). A preprocessing pipeline (Barbera et al. 2016) other than neurons assigned to different clusters. Input to neu- can be used to preprocess calcium imaging data, including ron clustering is s . Clustering generates a partition of the 1:T th image registration, cell mask detection, and neuronal event variable space. The partition Ω is a vector whose i elements detection. However, preprocessing is not the focus of Ω(i) is the group membership of neuron i. CAIM. Let P and T denote the number of neurons and the Neuron clustering is based on the similarity between s i,1:T number of time points, respectively. The preprocessing step and s ,where s is the observed trajectory of neuron i. j,1:T i,1:T results in s For neuron i, s = 1 indicates a neuronal event Therefore, neuron clustering focuses on examining the instan- 1:T. i,t of neuron i at time point t, while s = 0 indicates no event. s- taneous synchrony (the zero-lag synchrony) between neuron i,t =[s , …,s ]isa P-dimensional vector representing neural pairs. There are many clustering algorithms (Wiwie et al. t 1,t P,t events of all neurons at time point t. s =(s , …, s )repre- 2018). s is a trajectory with thousands of observation time 1:T 1 T i,1:T sents neural activity for all time points. points. Each time point is a feature. In this clustering problem, Figure 1 shows the architecture of CAIM. Figure 1a is the P is about several hundred and T is several thousand. conceptual framework of CAIM. In CAIM, neurons are Therefore, the clustering algorithm needs to handle high- grouped into clusters. Neurons in the same cluster have similar dimensional data. Since we assume that an object belongs to functional profiles. Each cluster is associated with a latent a single cluster, we don’t use fuzzy clustering such as c-Means variable (the cluster state variable) which represents whether or probabilistic clustering such as Gaussian mixture models. the cluster is activated or not. Let Y (t) denote the state vari- CAIM uses graph-based clustering. A graph is constructed A Z able for cluster A at time point t. Y =[Y (t), …,Y (t)] is a by using kd-trees to identify the approximate nearest neigh- vector representing states of all clusters at time point t. bors for each object (Arya et al. 1998). This graph construc- Y =(Y , …, Y ) represents cluster states for all time points. tion algorithm is computationally efficient. Clusters are de- 1:T 1 T Interactions among clusters are described by a DBN. In this tected by the walktrap algorithm (Pons and Latapy 2006)for DBN, nodes are cluster state variables. The directed temporal graph-based community detection. The walktrap algorithm interaction between two nodes is represented by a transition finds densely connected subgraphs based on random walks. probability table (Fig. 1b). For example, Pr(Y (t + 1) = active | The algorithm starts by assigning each node to its own com- A C Y (t) = active, Y (t) = active) = 0.88 represents activation of munity and calculates the distance for every pair of commu- cluster A and activation of cluster C at time point t result in nities. Communities are merged according to the minimum of the activation of cluster B at time point t + 1 with probability their distances and the process is repeated. The number of 0.88. clusters is estimated by the walktrap algorithm. The walktrap algorithm uses the results of random walks to merge separate Neuron Clustering communities in a bottom-up manner and creates a dendro- gram. Then it uses the modularity score to select where to The goal of neuron clustering is to group P neurons into K cut the dendrogram. Therefore, the number of clusters is homogeneous clusters. Coherence, which is pairwise func- automictically determined by the algorithm. tional association, plays a key role in neural codes After generating the partition, cluster state variables are (Averbeck et al. 2006;Zoharyet al. 1994). Even weak inferred by voting. For cluster A, the percentage of neurons pairwise linear interactions can result in strongly correlated in state 1 at time point t is calculated. If this percentage is A A network states in a neural ensemble (Schneidman et al. greater than a threshold, then Y (t) = 1; otherwise, Y (t)=0. 2006). Therefore, our clustering algorithm centers on examin- Higher threshold results in sparser cluster activation. If the ing coherence. The objects in this clustering problem are majority voting is adopted, the threshold is 50%. Fig. 1 The architecture of CAIM 518 Neuroinform (2021) 19:515–527 Given binary cluster state variables, a loading matrix can be Results calculated to assess the association between cluster state var- iables and neurons. The loading matrix has P rows and K We evaluated CAIM on simulated spike trains, data from a columns. The element (i, j) in this loading matrix is the relative biophysics-based simulation, and real-world neural activity mutual information (Pregowska et al. 2015) between neuron i data for a delayed reaching task. All experiments were con- and cluster j. The relative mutual information is in [0, 1]. ducted in a workstation with Intel Core i7-4720HQ CPU Higher relative mutual information indicates a stronger asso- @2.6GHz (4 cores and 8 virtual cores) and 16G memory. ciation between two binary random variables. Simulated Spike Trains Causal Network Construction In this experiment, we used simulated binary spike trains to evaluate CAIM. The interactions among clusters were de- Causal network construction infers a DBN based on Y scribed by a ground-truth DBN G . An example of the struc- 1:T, * * which is the dataset including cluster states for all time points. ture of G is depicted in Fig. 2a. Parameters in G were set to A DBN is defined as a pair, (B ,B ), where B is a Bayesian represent additive effects. The transition probability table for 1 → 1 network defining the baseline probability distribution; and B node 5 is depicted in Fig. 2b. The data generation process defines the transition probability P(Y | Y ). That is, B is a included sampling and neural data generation. In the sampling t+ 1 t → two-slice temporal Bayesian network (2TBN). The state of step, we sampled G and generated simulated data for cluster node i at time point t+ 1 is determined by the states of its states. Let Y be the trajectory of cluster i. In neural data 1:T parent set before t+ 1, and is independent of the states of any generation, the trajectory of a neuron in cluster i is generated other nodes. We use π to denote the parent set of node i. π is a by flipping the binary state of Y with a probability λ. λ i i 1:T A C subset of Y . For example, in Fig. 1b, Y and Y determine represented noise level and 1-λ characterized the within- t t t B A C Y ,then π = (Y , Y ). cluster homogeneity. We evaluated CAIM for various noise t+ 1 B t t The DBN-based causal discovery assumes causal suffi- level (subtask 1), cluster similarity (subtask 2), and number of ciency, the causal Markov condition, and faithfulness clusters (subtask 3). (Spirtes et al. 2001). Under these conditions, the causal rela- To evaluate neuron clustering, we compared CAIM clus- tionship can be discovered by machine learning algorithms. tering with other clustering methods including K-means, clus- Our algorithm generates a directed weighted graph G model- tering by density peaks, and the Fuzzy c-means (FCM) based ing the linear/nonlinear interactions among cluster state vari- method in (Fellous et al. 2004; Toups et al. 2011). K-means ables. We use a random forest-based method to find the parent defines a cluster as a sphere around the cluster centroid. The set of a node. For a node Y , we construct a random forest number of clusters was estimated by the Calinski-Harabasz t+ 1 A A model to predict Y based on variables in Y =[Y (t), …, index. K-means was randomly initialized 100 times. t +1 t Y (t)]. The implementation is similar to that in (Huynh-Thu Clustering by density peaks is based on the idea that cluster et al. 2010). A random forest ensemble is generated to predict centers are characterized by a higher density than the neigh- Y based on variables in Y . In the model ensemble, each bors of centers and by a relatively large distance from objects t +1 t tree model is constructed based on a bootstrap sample from with higher densities. To detect the cluster structure, we need the originalsampleand,ateachtest node, a subsetofvariables to manually specify two parameters. In the FCM-based meth- is selected at random among all candidate variables in Y od, we first calculated a P× P distance matrix where the (i, j) before determining the best split (to divide a node in a tree element of this matrix is the Manhattan distance between neu- into two daughter nodes). To quantify the variable impor- rons i and j. Then we applied FCM on the columns of distance tance, for each test node in a tree, we compute the reduction matrix. The number of clusters was determined by the gap of variance of the output variable due to the split. For a single statistic. Neuron clustering performance was evaluated by tree, the importance of a variable is computed by summing the two cluster validity indexes: the Silhouette score and Rand variance reduction values of all tree nodes where this variable index (Ye 2003). Higher Silhouette score or Rand index rep- is used to split. For a tree ensemble, the importance score of a resents better clustering. The Silhouette score has a range of variable is the average over all trees. Variable importance of [−1, 1]. A score near 1 indicates that the sample is far from B B A Y is used as the weight for the link Y → Y Higher neighboring clusters, a score of 0 indicates that the sample is t t t +1. weights represent stronger relationships. Random forests have on or very close to the decision boundary, and negative values the capability to model nonlinear and combinational interac- indicate poor assignment. The Rand index determines the sim- tions (interactions involving multiple nodes, instead of ilarity between the estimated label and the ground-truth label pairwise) and handle high-dimensional data. In our implemen- as a function of positive and negative agreements in pairwise tation, we adopt the parameter tuning process of random forest cluster assignments; when two labels agree perfectly, the described in (Huynh-Thu et al. 2010). Rand index is 1. Neuroinform (2021) 19:515–527 519 Fig. 2 The simulated spike train data. a The ground-truth DBN model which describes temporal interactions among cluster state variables. b The transition probability table for cluster 5. c Spike trains of 60 neurons. Noise level is 0.1 B A For causal discovery, we compared our causal network models. For a directed link Y → Y the link strength t t +1, discovery algorithm to Bayesian network structure learning was measured by the frequency of this link appearing in the (BNS), Bayesian network structure learning with resampling model ensemble. CAIM, BNSR, GLMNET generated weight- B A (BNSR) (Chen et al. 2017), and GLMNET. In BNS, we used ed directed graphs. The higher edge weight of Y → Y t t +1 B A the algorithm in (Chen et al. 2012) to detect the parent set of represents a stronger relationship between Y and Y .BNS t t +1 Y . The association among nodes by the Bayesian generated an unweighted graph. t +1 Dirichelet score (Chen et al. 2012), which is the marginal For causal discovery, we used area under the ROC Curve likelihood or evidence P(G | D), where D is the observed data. (AUC) to evaluate algorithms’ performance. AUC was calcu- The Bayesian Dirichelet score is decomposable. That is, we lated based on the generated graph and the ground-truth DBN. can maximize this score node by node. For each node Y , Higher AUC indicated an algorithm achieved better perfor- t +1 we used the algorithm in (Chen et al. 2012) to search for a set mance in detecting the ground-truth DBN structure. of nodes in Y which maximizes the Bayesian Dirichelet score. In subtask 1, we evaluated CAIM with different noise This set of nodes is the parent set of Y . Based on these levels. In this subtask, 60 neurons were grouped into 6 clus- t +1 parent sets, we can generate a graph describing causal inter- ters. Each cluster had 10 neurons. In the simulation, T =5000 actions. In BNSR, bootstrap resampling was used to stabilize and P = 60. The structure of G is depicted in Fig. 2a. Datasets the Bayesian network learning process. We resampled the for three noise levels, 0.1 (low noise level), 0.2 (medium noise original dataset 1000 times and utilized BNS to generate a level), and 0.3 (high noise level), were generated. The first 100 DBN model for each resampled dataset. For an edge Y → observations of all neurons for noise level 0.1 are depicted in Y the edge strength was measured by the frequency of Fig. 2c. In subtask 2, we evaluated CAIM with different clus- t +1, this edge appearing in the model ensemble. In GLMNET, for ter similarity levels. In this subtask, 60 neurons were grouped A A Y ,variables in Y which were most predictive of Y into 6 clusters (each cluster had 10 neurons). T = 5000 and t +1 t t +1 were identified by Lasso and elastic-net regularized general- P = 60. The structure of G is depicted in Fig. 2a. Noise level ized linear models (Friedman et al. 2010). Parameters in was 0.2. We varied parameters of the ground truth DBNs and GLMNET were tuned based on internal cross-validation. To generated datasets with different cluster similarity levels. For a improve model stability, we used bootstrap resampling to re- dataset, cluster similarity was quantified by the average sample the raw dataset 1000 times and generated models for Hamming distances across all cluster pairs. We generated resampled datasets. The model ensemble included 1000 three datasets: low similarity (Hamming distance = 2696), 520 Neuroinform (2021) 19:515–527 middle similarity (Hamming distance = 1461), and high simi- although the AUC of CAIM was consistently higher. larity (Hamming distance = 862). Higher similarity is more Relative to BNS, BNSR achieved significantly higher AUC. challenging for neuron clustering. In subtask 3, we evaluated This is because BNSR is an ensemble learning based method CAIM with different cluster numbers. In this subtask, each and achieves consistent estimates by combining solutions cluster had 10 neurons. We generated datasets with 3 clusters from different bootstrap resampled training data sets. (30 neurons), 6 clusters (60 neurons), and 9 clusters (90 neu- Collectively, these experiments demonstrate that CAIM rons). The structure of G was randomly generated. Noise can detect the cluster structure and achieve the optimal perfor- level was 0.2. mance balance (high AUC and short running time). We found Neuron clustering results for subtask 1 are summarized in that CAIM accurately inferred the causal relationships. Table 1.Figure 3 depicts the loading matrix of neuron clus- tering for noise level 0.3. For all noise levels, CAIM achieved Biophysics Based Simulation the best clustering performance. CAIM always detected the correct number of clusters and identified the correct cluster In this experiment, a biophysics-based simulation was used to structure (Rand index = 1). Neuron clustering results for sub- assess CAIM. The simulation modeled interactions among a task 2 are summarized in Table 2. For different cluster simi- set of integrate-and-fire (I&F) neurons with noise. Such a larity levels, CAIM consistently detected the corrected num- neuron model can represent virtually all postsynaptic poten- ber of clusters and identified the correct cluster structure. tials or currents described in the literature (e.g. α-functions, bi- Neuron clustering results for subtask 3 are summarized in exponential functions) (Brette et al. 2007). The neuron model Table 3. For varying cluster numbers, CAIM detected the (Gütig and Sompolinsky 2006)isasfollows: corrected number of clusters and identified the correct cluster structure. For all experimental conditions, CAIM and FCM dV ðÞ V −V rest consistently achieved higher Silhouette score and Rand index ¼ þ σ  τ ðÞ −0:5 ε ð1Þ dt τ than did K-means and clustering by density peaks. Overall, CAIM achieved the highest Silhouette score and Rand index. where V is the membrane potential, V is the rest potential, ε rest Figures 4, 5 and 6 depict the AUCs of BNS, BNSR, CAIM is a Gaussian random variable with mean 0 and standard de- and GLMNET for subtasks 1, 2, and 3, respectively. CAIM viation 1, τ is the membrane time constant, and σ is a param- achieved the highest AUC in most combinations of experi- eter controlling the noise term. Spikes received through the mental setups and thresholds. For threshold = 0.5, CAIM’s synapses trigger changes in V. A neurons fires if V is greater AUCs were 1 for all scenarios. CAIM was robust to the than a threshold. This neuron cannot generate a second spike threshold to infer binary cluster states. CAIM and BNSR con- for a brief time after the first one (refractoriness). sistently achieved higher AUCs than did BNS and GLMNET. Our simulation included 160 neurons in four groups: A, B, The typical execution time of BNS, BNSR, CAIM and C, and D. Each group had 40 neurons. The ground-truth caus- GLMNET were 0.23 s, 13.73 s, 8.28 s, and 1571.92 s. al graph is depicted in Fig. 7a. Neurons in group A had no CAIM and BNSR had similar execution time while parent nodes. They all received a stimulus. Neurons in group GLMNET had a much longer execution time. Both AUCs B had two or three neurons in group A as parent nodes. and execution times of BNSR and CAIM were similar, Neurons in group C had two or three neurons in group A as Table 1 Clustering results for the Noise level Method Detected number of clusters Silhouette score Rand index simulated spike trains with different noise levels 0.1 CAIM 6 0.268 1.000 K-means 6 0.268 1.000 Clustering by density peaks 7 0.075 0.594 FCM 6 0.268 1.000 0.2 CAIM 6 0.113 1.000 K-means 2 0.062 0.259 Clustering by density peaks 4 0.039 0.488 FCM 6 0.113 1.000 0.3 CAIM 6 0.043 1.000 K-means 2 0.024 0.259 Clustering by density peaks 6 0.013 0.502 FCM 10 0.015 0.748 Neuroinform (2021) 19:515–527 521 Fig. 3 The loading matrix of neuron clustering for subtask 1. Noise level is 0.3 Table 2 Clustering results for the Cluster Method Detected number of Silhouette Rand simulated spike trains with similarity clusters score index different cluster similarities Low CAIM 6 0.145 1.000 K-means 2 0.117 0.259 Clustering by density 5 0.119 0.631 peaks FCM 6 0.145 1.000 Middle CAIM 6 0.108 1.000 K-means 2 0.065 0.259 Clustering by density 4 0.038 0.488 peaks FCM 6 0.108 1.000 High CAIM 6 0.075 1.000 K-means 4 0.058 0.550 Clustering by density 4 0.020 0.414 peaks FCM 10 0.027 0.748 522 Neuroinform (2021) 19:515–527 Table 3 Clustering results for the The number of Method Detected number of Silhouette Rand simulated spike trains with clusters clusters score index varying numbers of clusters 3 CAIM 3 0.124 1.000 K-means 3 0.124 1.000 Clustering by density 5 0.008 0.293 peaks FCM 3 0.124 1.000 6 CAIM 6 0.109 1.000 K-means 2 0.062 0.259 Clustering by density 4 0.043 0.363 peaks FCM 6 0.109 1.000 9 CAIM 9 0.147 1.000 K-means 8 0.138 0.876 Clustering by density 9 0.106 0.823 peaks FCM 10 0.131 0.966 C D parent nodes. If a parent node fired, the membrane potential of Y → Y were 0.90, 0.90, 0.50, and 0.47, respectively. t t +1 the target node increased by w.The w of connections between Other edges had very low weights. The strong links char- A and B was different from that of A and C. Firing of neurons acterize the strong causal relationship in Fig. 7a.Overall, in groups B and C caused firing of neurons in group D. The CAIM was able to identify the causal relationship be- simulated spike trains are depicted in Fig. 7b. tween these neuron groups. CAIM accurately detected the cluster structure with the RAND score 0.98. This weighted graph was robust to the Real-World Neural Activity Data for a Delayed threshold to infer binary cluster states and remained stable Reaching Task for the threshold in [0.3 0.7]. We chose threshold = 0.5. The edge weight had a bimodal distribution. The edge CAIM was evaluated based on a spike dataset acquired during A B A C B D weights of Y → Y , Y → Y , Y → Y ,and t t +1 t t +1 t t +1 the delay period in a standard delayed reaching task Fig. 4 AUCs of BNS, BNSR, CAIM, and GLMNET for the simulated spike train data with varying noise levels Neuroinform (2021) 19:515–527 523 Fig. 5 AUCs of BNS, BNSR, CAIM, and GLMNET for the simulated spike train data with varying cluster similarity levels (Santhanam et al. 2009). A male rhesus monkey performed a neurons in the right premotor cortex. The reaching task dataset standard instructed-delay center-out reaching task. Animal contained two experimental conditions (conditions 1 and 2). protocols were approved by the Stanford University Each condition had 56 trials. The spike train had a length Institutional Animal Care and Use Committee. The dataset between 1018 ms and 1526 ms. Spike trains are binned using contains spike trains recorded simultaneously by a silicon a non-overlapping bin with a width of 20 ms. This bin size was electrode array (Cyberkinetics, Foxborough, MA) from 61 found to work well for population activity recorded in the Fig. 6 AUCs of BNS, BNSR, CAIM, and GLMNET for the simulated spike train data with varying cluster numbers 524 Neuroinform (2021) 19:515–527 Fig. 7 Causal discovery results for the biophysics-based simulation. a The ground-truth causal graph. b The spike trains of cluster states (the first 200 frames) A A B B B D motor cortex (Cowley et al. 2012). Among 61 neurons, 16 Y was driven by Y and Y Y is driven by Y and Y t +1 t t. t +1 t t B A B C neurons had a low firing rate (<5 spikes/s) and were excluded in condition 1,while Y is driven by Y ,Y and Y in t +1 t t t D B D from the analysis. Excluding these low firing neurons from condition 2. Y is driven by Y and Y in condition 1, t +1 t t D D causal discovery doesn’t exclude the possibility that they con- while Y is driven by Y in condition 2. In this analysis, t +1 t tributed to the observed ensemble activity. We excluded them the conditions were predetermined by the experimental de- because these low firing neurons had too few active states sign. Our analysis of the reach-task data demonstrated that needed to firmly establish causal relationships (Chen et al. CAIM can be used for differential causal graph analysis. 2008). CAIM found 4 clusters. Figure 8a depicts the loading ma- trix of neuron clustering. The average within-cluster relative Discussion mutual information was 0.187, while the average between- cluster relative mutual information was 0.016. These results We propose a causal discovery method called CAIM that is demonstrated good cluster separation. The detected causal networks are depicted in Fig. 8b and c based on DBNs. It’s capable of revealing causal interactions among neural dynamics. Relative to static network analysis, which are the networks for two different conditions, respec- tively. Strong links (edges with weights greater than the me- CAIM can model complex spatiotemporal patterns of circuit dian weight) are shown. Both causal graphs demonstrated activity related to a cognitive process or behavior. We validated CAIM based on two simulated studies and a persistence. That is, for a cluster, Y is driven by Y t +1 t. Persistence may reflect continuous firing. The causal graphs real-world spike dataset acquired during the delay period in a standard delayed reaching task. In the simulated spike train for these two conditions also had significant structural differ- A A ences. In condition 1, Y was strongly driven by Y and experiment, we demonstrated that CAIM accurately detected t +1 t causal relationships among neuron clusters. We compared Y . Such a pattern was changed in condition 2. In condition 2, Fig. 8 Causal discovery results for the reach-task dataset. a The loading matrix for neuron clustering. Rows are neurons (split by the cluster label); and columns are clusters. b and c are DBNs for condition 1 and 2. In DBNs, edge weights represent strength of connectivity Neuroinform (2021) 19:515–527 525 CAIM with other methods. For neuron clustering, CAIM P(response | stimuli). A two-layer feed-forward neural net- achieved a higher Rand index than k-means and clustering work is used for decoding. In this neural network, neurons by density peaks. For causal discovery, compared to BNS, in the output layer compute the product of input likelihood BNSR, and GLMNET, CAIM achieved the optimal perfor- functions. Friston suggested a strong correspondence between mance balance in AUC and running time. In the biophysics- the anatomical organization of the neocortex and hierarchical based simulation, we generated simulated data for a set of Bayesian generative models (Friston 2003). In (George and integrate-and-fire neurons with noise. These neurons formed Hawkins 2009), a Bayesian model for cortical circuits is pro- four clusters. CAIM accurately identified cluster structure and posed. This method describes Bayesian belief propagation in a causal relationship between these neuron clusters. For the de- spatio-temporal hierarchical model, called hierarchical tempo- layed reaching experiment, 45 neurons formed 4 clusters. The ral memory (HTM). An HTM node abstracts space as well as causal graphs for two different experimental conditions were time. HTM graphs use Bayesian belief propagation for infer- different. The parent sets of nodes A, B,and D were different ence. Deneve proposed a Bayesian neuron model in which between two conditions. Collectively, these experiments dem- spike trains provide a deterministic, online representation of onstrated that CAIM is a powerful computation framework to a log probability ratio (Deneve 2005). However, the above detect causal relationships among neural dynamics. studies about Bayesian analysis of neural activity data don’t The network generated by CAIM is different from that center on causality inference. generated from synchrony analysis. Synchrony analysis cen- The causal sufficiency assumption is widely used in causal ters on calculating the cross-correlation between two neural discovery in order to make the causal discovery process com- temporal courses. CAIM focuses on modeling the transition putationally tractable. However, if there is an unmeasured dynamics among neural temporal courses. Synchrony analysis time series Z that influences the observed time series Y,then and CAIM provide complementary information about a cog- the approach based on the causal sufficiency assumption can nitive process. lead to incorrect causal conclusions. This is one of the limita- The network model generated by CAIM is explainable; it is tions of CAIM. Our future research will address this limita- a graphical model and has excellent interpretability. CAIM is tion. We will introduce latent variables which represent un- expandable. The computational framework in CAIM can be measured time series, then use the expectation maximization used for other applications such as modeling cortical traveling (EM) to infer properties of partially observed Markov process- waves (Muller et al. 2018). Using the CAIM framework, we es (Geiger et al. 2015). can detect clusters that have neurons with zero-lag synchrony; In CAIM, we assume that the causal structure is invariant then model information propagation in a pathway and focus across time points. If the dependencies in the underlying pro- on the pattern that activation of cluster A at time point t leads cess change over time, the generated model is an average over to activation of cluster B at time point t + 1. The biophysics- different temporal dependency structures. In the future, we based simulation provides an example of information propa- will extend CAIM to handle time-varying causal graphs. In gation in the pathway A→ B→ D. this new framework, we will generate a causal graph for each We have developed algorithms called dynamic network time point and aggregate these causal graphs. analysis to model interactions among neural signals at a mac- In the current framework, we generated a ranking of poten- roscopic scale (Chen et al. 2012;Chenet al. 2017;Chen and tial causal interactions. In some real-world applications, we Herskovits 2015). CAIM and dynamic network analysis han- need to determine a threshold on this ranking to obtain a dle different kinds of temporal data. Dynamic network analy- binary causal graph. In future work, we will develop algo- sis is designed to generate a network model from longitudinal rithms to overcome this challenge. One method is based on MR data. Longitudinal MR data are short temporal sequences. the likelihood function. For a generated binary graph, we can For most longitudinal image data, the number of visits for calculate a score to represent the likelihood that observed data each subject is small, often less than ten. Therefore, dynamic is generated from the binary graph; and choose a threshold to network analysis requires data from many subjects to generate maximize the likelihood (Chen and Herskovits 2015). This a stable model, assuming that the brain network model is process should be inside a cross-validation procedure to avoid invariant across subjects. CAIM is designed to generate a overfitting. network model from data streams which include thousands In this paper, the interactions among neural activities of data points. Therefore, CAIM does not assume that the are represented by a 2TBN. The 2TBN represents a first- brain network model is invariant across subjects. order time-invariant Markov process. We adopted the Bayesian methods have been used to model neural activity 2TBN representation to simplify the computation. In data. Ma et al. proposed a Bayesian framework to describe CAIM, we group neurons into clusters, effectively reduc- how populations of neurons represent uncertainty to perform ing the dimensionality of model space. An alternative ap- Bayesian inference (Ma et al. 2006). The probabilistic rela- proach for dimension reduction is projecting variables in- tionship between stimuli and response is formalized as to a low-dimensional space and modeling dynamics 526 Neuroinform (2021) 19:515–527 Chen, R., & Herskovits, E. H. (2007). Clinical diagnosis based on among latent variables. In the future, we will develop Bayesian classification of functional magnetic-resonance data. such algorithms. Neuroinformatics, 5(3), 178–188. In conclusion, CAIM provides a powerful computational Chen, R., & Herskovits, E. H. (2015). Predictive structural dynamic net- framework to infer causal graphs based on high-dimensional work analysis. J Neurosci Methods, 245,58–63. Chen, Y., Bressler, S. L., & Ding, M. (2006). Frequency decomposition observational neural activity data. We envisage that CAIM of conditional Granger causality and application to multivariate neu- will be of great value in understanding spatiotemporal patterns ral field potential data. J Neurosci Methods, 150(2), 228–237. of circuit activity related to a specific behavior. Chen, R., Hillis, A. E., Pawlak, M., & Herskovits, E. H. (2008). Voxelwise Bayesian lesion-deficit analysis. Neuroimage, 40(4), 1633–1642. Chen, R., Resnick, S. M., Davatzikos, C., & Herskovits, E. H. (2012). Information Sharing Statmement Dynamic Bayesian network modeling for longitudinal brain mor- phometry. Neuroimage, 59(3), 2330–2338. The data of the delayed reaching task is available at https:// Chen, R., Zheng, Y., Nixon, E., & Herskovits, E. H. (2017). Dynamic users.ece.cmu.edu/~byronyu/software/DataHigh/get_started. network model with continuous valued nodes for longitudinal brain html. The simulated data and the software package are freely morphometry. Neuroimage, 155,605–611. Churchland, M. M., Yu, B. M., Sahani, M., & Shenoy, K. V. (2007). available for academic purposes on request. Techniques for extracting single-trial activity patterns from large- scale neural recordings. Curr Opin Neurobiol, 17(5), 609–618. Acknowledgments This work was supported by the NIH NINDS Cowley, B. R., Kaufman, M. T., Churchland, M. M., Ryu, S. I., (R01NS110421) and the BRAIN Initiative. Shenoy, K. V., & Yu, B. M. (2012). DataHigh: Graphical user interface for visualizing and interacting with high-dimensional Compliance with Ethical Standards neural activity. Proc Annu Int Conf IEEE Eng Med Biol Soc EMBS, 10(6), 4607–4610. Declaration of Interest none. Deneve, S. (2005). Bayesian inference in spiking neurons. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in Neural Information Open Access This article is licensed under a Creative Commons Processing Systems, 17 (pp. 353–360). Vancouver: MIT Press. Attribution 4.0 International License, which permits use, sharing, adap- Eldawlatly, S., Zhou, Y., Jin, R., & Oweiss, K. G. (2010). On the tation, distribution and reproduction in any medium or format, as long as use of dynamic Bayesian networks in reconstructing functional you give appropriate credit to the original author(s) and the source, pro- neuronal networks from spike train ensembles. Neural Comput, vide a link to the Creative Commons licence, and indicate if changes were 22(1), 158–189. made. The images or other third party material in this article are included Fellous, J.-M., Tiesinga, P. H., Thomas, P. J., & Sejnowski, T. J. (2004). in the article's Creative Commons licence, unless indicated otherwise in a Discovering spike patterns in neuronal responses. J Neurosci, credit line to the material. If material is not included in the article's 24(12), 2989–3001. Creative Commons licence and your intended use is not permitted by Fisher, F. M. (1970). A correspondence principle for simultaneous equa- statutory regulation or exceeds the permitted use, you will need to obtain tion models. Econom J Econom Soc,73–92. permission directly from the copyright holder. To view a copy of this Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization licence, visit http://creativecommons.org/licenses/by/4.0/. paths for generalized linear models via coordinate descent. J Stat Softw, 33(1), 1–22. Friston, K. (2003). Learning and inference in the brain. Neural Netw, 16(9), 1325–1352. Geiger, D. & Pearl, J. (1990), On the logic of causal models. In Machine References Intelligence and Pattern Recognition,vol.9, Elsevier,pp. 3–14. Geiger, P., Zhang, K., Schoelkopf, B., Gong, M., & Janzing, D. (2015). Arya, S., Mount, D. M., Netanyahu, N. S., Silverman, R., & Wu, A. Y. Causal inference by identification of vector autoregressive processes (1998). An optimal algorithm for approximate nearest neighbor with hidden components. In International Conference on Machine searching fixed dimensions. JACM, 45(6), 891–923. Learning,(pp. 1917–1925). Lille, France: ICML’15. Averbeck, B. B., Latham, P. E., & Pouget, A. (2006). Neural cor- George, D., & Hawkins, J. (2009). Towards a mathematical theory of relations, population coding and computation. Nat Rev cortical micro-circuits. PLoS Computational Biology, 5(10), Neurosci, 7(5), 358–366. e1000532. Barbera, G., Liang, B., Zhang, L., Gerfen, C. R., Culurciello, E., Chen, Ghosh, K. K., Burns, L. D., Cocker, E. D., Nimmerjahn, A., Ziv, Y., R., Li, Y., & Lin, D. T. (2016). Spatially compact neural clusters in Gamal, A. E., & Schnitzer, M. J. (2011). Miniaturized integration the dorsal striatum encode locomotion relevant information. of a fluorescence microscope. Nat Methods, 8(10), 871–878. Neuron, 92(1), 202–213. Granger, C. W. J. (1969). Investigating causal relations by econometric Bar-Joseph, Z., Gitter, A., & Simon, I. (2012). Studying and modelling models and cross-spectral methods. Econometrica, 37(3), 424–438. dynamic biological processes using time-series gene expression da- Gütig, R., & Sompolinsky, H. (2006). The tempotron: A neuron that ta. Nat Rev Genet, 13(8), 552–564. learns spike timing-based decisions. Nat Neurosci, 9(3), 420–428. Brette, R., Rudolph, M., Carnevale, T., Hines, M., Beeman, D., Bower, J. Hafting, T., Fyhn, M., Molden, S., Moser, M.-B., & Moser, E. I. (2005). M., Diesmann, M., Morrison, A., Goodman, P. H., Harris Jr., F. C., Microstructure of a spatial map in the entorhinal cortex. Nature, Zirpe, M., Natschläger, T., Pecevski, D., Ermentrout, B., Djurfeldt, 436(7052), 801–806. M., Lansner, A., Rochel, O., Vieville, T., Muller, E., Davison, A. P., el Boustani, S., & Destexhe, A. (2007). Simulation of networks of Harvey, C. D., Coen, P., & Tank, D. W. (2012). Choice-specific se- spiking neurons: A review of tools and strategies. J Comput quences in parietal cortex during a virtual-navigation decision task. Neurosci, 23(3), 349–398. Nature, 484(7392), 62–68. Neuroinform (2021) 19:515–527 527 Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Pons, P., & Latapy, M. (2006). Computing Communities in Large Statistical Learning: Data Mining, Inference, and Prediction.New Networks Using Random Walks. J. Graph Algorithms Appl, 10, York: Springer. https://doi.org/10.1007/978-0-387-84858-7. 191–218. Hu, M., Li, W., & Liang, H. (2018). A copula-based Granger causality Pregowska, A., Szczepanski, J., & Wajnryb, E. (2015). Mutual informa- measure for the analysis of neural spike train data. IEEE/ACM Trans tion against correlations in binary communication channels. BMC Comput Biol Bioinforma, 15(2), 562–569. Neurosci, 16,32. Huynh-Thu, V. A., Irrthum, A., Wehenkel, L., & Geurts, P. (2010). Santhanam, G., Yu, B. M., Gilja, V., Ryu, S. I., Afshar, A., Sahani, M., & Inferring regulatory networks from expression data using tree- Shenoy, K. V. (2009). Factor-analysis methods for higher- based methods. PLoS One, 5(9), 1–10. https://doi.org/10.1371/ performance neural prostheses. J Neurophysiol, 102(2), 1315–1330. journal.pone.0012776. Sauerbrei, W., Boulesteix, A.-L., & Binder, H. (2011). Stability investi- Kerr, J. N. D., & Nimmerjahn, A. (2012). Functional imaging in freely gations of multivariable regression models derived from low- and moving animals. Curr Opin Neurobiol, 22(1), 45–53. high-dimensional data. J Biopharm Stat, 21(6), 1206–1231. Ko, H., Cossell, L., Baragli, C., Antolik, J., Clopath, C., Hofer, S. B., & Schneidman, E., Berry, M. J., Segev, R., & Bialek, W. (2006). Weak Mrsic-Flogel, T. D. (2013). The emergence of functional microcir- pairwise correlations imply strongly correlated network states in a cuits in visual cortex. Nature, 496(7443), 96–100. neural population. Nature, 440(7087), 1007–1012. Koller, D., & Friedman, N. (2009). Probabilistic Graphical Models: Scott, B. B., Brody, C. D., & Tank, D. W. (2013). Cellular resolution Principles and Techniques. Cambridge: MIT Press. functional imaging in behaving rats using voluntary head restraint. Litwin-Kumar, A., & Doiron, B. (2012). Slow dynamics and high vari- Neuron, 80(2), 371–384. ability in balanced cortical networks with clustered connections. Nat Song, S., et al. (2005). Highly nonrandom features of synaptic connec- Neurosci, 15(11), 1498–1505. tivity in local cortical circuits. PLoS Biology, 3(3), e68. Luczak, A., Bartho, P., Marguet, S. L., Buzsaki, G., & Harris, K. D. Spirtes, P., Glymour, C., & Scheines, R. (2001). Causation, Prediction, (2007). Sequential structure of neocortical spontaneous activity and Search (2nd ed.). Cambridge: MIT Press. in vivo. Proc Natl Acad Sci U S A, 104(1), 347–352. Toups, J. V., Fellous, J.-M., Thomas, P. J., Sejnowski, T. J., & Tiesinga, Ma, W. J., Beck, J. M., Latham, P. E., & Pouget, A. (2006). Bayesian P. H. (2011). Finding the event structure of neuronal spike trains. inference with probabilistic population codes. Nat Neurosci, 9(11), Neural Comput, 23(9), 2169–2208. 1432–1438. Wiwie, C., Baumbach, J., & Röttger, R. (2018). Guiding biomedical Meek, C. (1995). Strong Completeness and Faithfulness in Bayesian clustering with ClustEval. Nat Protoc, 13(6), 1429–1444. Networks. In Proceedings of the Eleventh Conference on Ye, N. (2003). The Handbook of Data Mining, vol. 7, no. 1.Mahwah: Uncertainty in Artificial Intelligence, (pp. 411–418). San Lawrence Erlbaum Associates, Inc.. Francisco: Morgan Kaufmann Publishers Inc. Yoshimura, Y., & Callaway, E. M. (2005). Fine-scale specificity of cor- Meyer, P. E., Kontos, K., Lafitte, F., & Bontempi, G. (2007). tical networks depends on inhibitory cell type and connectivity. Nat Information-theoretic inference of large transcriptional regulatory Neurosci, 8(11), 1552–1559. networks. EURASIP J Bioinforma Syst Biol, 2007, 79879. Yoshimura, Y., Dantzker, J. L. M., & Callaway, E. M. (2005). Excitatory Muller, L., Chavane, F., Reynolds, J., & Sejnowski, T. J. (2018). Cortical cortical neurons from fine-scale functional networks. Nature, travelling waves: Mechanisms and computational principles. Nat 433(February), 868–873. Rev Neurosci, 19(5), 255–268. Zohary, E., Shadlen, M. N., & Newsome, W. T. (1994). Correlated neu- Park, S., Kim, J. M., Shin, W., Han, S. W., Jeon, M., Jang, H. J., et al. ronal discharge rate and its implications for psychophysical perfor- (2018). BTNET: boosted tree based gene regulatory network infer- mance. Nature, 370(6485), 140–143. ence algorithm using time-course measurement data. BMC Systems Biology, 12(2), 69–77. https://doi.org/10.1186/s12918-018-0547-0. Pearl, J. (2009). Causality: Models, Reasoning and Inference (2nd ed.). Publisher’sNote Springer Nature remains neutral with regard to jurisdic- New York: Cambridge University Press. tional claims in published maps and institutional affiliations. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Neuroinformatics Springer Journals

Causal Network Inference for Neural Ensemble Activity

Neuroinformatics , Volume 19 (3) – Jan 4, 2021

Loading next page...
 
/lp/springer-journals/causal-network-inference-for-neural-ensemble-activity-zSQnAKABh8

References (97)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2021
ISSN
1539-2791
eISSN
1559-0089
DOI
10.1007/s12021-020-09505-4
Publisher site
See Article on Publisher Site

Abstract

Interactions among cellular components forming a mesoscopic scale brain network (microcircuit) display characteristic neural dynamics. Analysis of microcircuits provides a system-level understanding of the neurobiology of health and disease. Causal discovery aims to detect causal relationships among variables based on observational data. A key barrier in causal discovery is the high dimensionality of the variable space. A method called Causal Inference for Microcircuits (CAIM) is proposed to reconstruct causal networks from calcium imaging or electrophysiology time series. CAIM combines neural recording, Bayesian network modeling, and neuron clustering. Validation experiments based on simulated data and a real-world reaching task dataset dem- onstrated that CAIM accurately revealed causal relationships among neural clusters. . . . Keywords Causal discovery Neuroimaging Dynamic Bayesian network Clustering Introduction and emotion (Ko et al. 2013; Barbera et al. 2016). In contrast to the experimental advances in neural recording techniques, Increasing experimental and computational evidence supports computational analysis of ensemble neural activities is still the existence of a specific pattern of connectivity among ad- emerging. A fundamental problem in microcircuit analysis is jacent neurons during cognition and emotion (Yoshimura and causal discovery. Causal discovery aims to reveal causal struc- Callaway 2005; Yoshimura et al. 2005;Songet al. 2005;Ko tures by analyzing observational data. Several computational et al. 2013; Litwin-Kumar and Doiron 2012). Interactions methods have been developed to infer causal networks from among cellular components forming a mesoscopic scale brain ensemble neural activity, including Granger causality (Chen network (microcircuit) display characteristic neural dynamics. et al. 2006; Hu et al. 2018) and conditional independence A microcircuit lies at the heart of the information processing inference based on dynamic Bayesian networks (DBNs) capability of the brain. It carries out a specific computation of (Eldawlatly et al. 2010). a region. Microcircuits have been shown to encode sensory A key barrier in causal discovery from multiple time series input (Luczak et al. 2007), motor function (Churchland et al. is high dimensionality. For example, calcium imaging can 2007), spatial maps in the entorhinal cortex (Hafting et al. observe ensemble neural activity of hundreds of neurons. 2005), and behavior choice (Harvey et al. 2012). Analysis of Naively applying causal discovery algorithms to such high- microcircuits provides a system-level understanding of the dimensional data causes several problems. First, this naïve neurobiology of health and disease. approach ignores the intrinsic hierarchical structure of the mi- Calcium imaging (Kerr and Nimmerjahn 2012; Ghosh crocircuit. Neurons often form clusters and neurons in the et al. 2011; Scott et al. 2013) and electrophysiology with elec- same cluster have similar functional profiles. For example, trodes are powerful ways to study microcircuits, leading to an D1- and D2-medium spiny neurons (MSNs) in the dorsal stri- understanding of network architecture of behavior, cognition, atum are grouped into spatially compact clusters (Barbera et al. 2016). In the visual cortex, highly connected neurons in a cortical column receive similar visual input (Yoshimura * Rong Chen et al. 2005). These studies suggest that neurons in a microcir- rong.chen.mail@gmail.com cuit form clusters (or modules, communities). Second, con- structing a model from such high-dimensional data with a Department of Diagnostic Radiology and Nuclear Medicine, cluster structure often leads to overfitting (Hastie et al. University of Maryland School of Medicine, 22 South Greene Street, 2009), an unstable model (Sauerbrei et al. 2011;Chen and Baltimore, MD 21201, USA 516 Neuroinform (2021) 19:515–527 Herskovits 2007), and poor parameter estimation (Chen and X ⫫ Y | Z denote that X and Y are conditionally independent Herskovits 2007). given Z. X is not the cause of Y if X ⫫ Y | Z . For a set of t t +1 t The proposed method, called Causal Inference for variables V ={X , …, X }, a causal graphical model is G =(V, 1 p Microcircuits (CAIM), aims to reconstruct causal E), where an edge X → X represents X is a direct cause of X i j i j mesoscopic-scale networks from observational calcium relative to variables in V,and G is a directed acyclic graph. imaging or electrophysiology time series. CAIM com- The assumptions which often are used to relate causal struc- bines neural recording, Bayesian network modeling, and tures to probability densities are the causal Markov assump- neuron clustering. To address the high-dimensionality tion, the causal faithfulness assumption, and the causal suffi- problem, CAIM utilizes clustering to group neurons into ciency assumption (Spirtes et al. 2001). Under these assump- clusters. To solve the causal discovery problem, CAIM tions, a remarkable result according to Geiger and Pearl uses DBNs to identify conditional independence. CAIM (Geiger and Pearl 1990) and Meek (Meek 1995)isthe enables us to move toward a circuit-based approach to Markov completeness theorem: for linear Gaussian and for understand the brain, in which a behavior is understood multinomial causal relations, an algorithm that identifies the to result from specific spatiotemporal patterns of circuit Markov equivalent class is complete (that is, it extracts all activity related to specific neuronal populations. information about the underlying causal structure). This paper is organized as follows. "Background and There are many studies of causal discovery from multiple Related Work" describes the background and related work. time series from problem domains which are not neurosci- "Method" provides the CAIM algorithm, including neuron ence-related, such as inferring gene regulatory networks using clustering and causal network inference. In "Results", valida- time-series gene expression data (Bar-Joseph et al. 2012). A tion experiments on simulated neural activity data and appli- kind of inference framework is growth-shrink. Such methods cation of CAIM to a real-world dataset are presented. first calculate pairwise associations between s and s;and t+ 1 t "Discussion" includes the discussion and issues requiring fur- then remove redundant or spurious connections (Meyer et al. ther investigation are provided, which are followed by 2007). An example of a growth-shrink based method is conclusions. MRNET (Meyer et al. 2007), which uses mutual information between variables and minimum-redundancy-maximum- relevance to infer networks. Another kind of inference frame- Background and Related Work work considers network inference as a regression problem and uses ensemble learning to construct the network. BTNET Network analysis (or connectivity analysis) methods for neu- (Park et al. 2018) is an ensemble learning-based method that ral signals can be classified as synchrony analysis and causal uses a boosted tree to construct the predictive model. discovery. In synchrony analysis, an undirected graph is gen- erated. Synchrony has been extensively studied in neurosci- ence (Averbeck et al. 2006). Correlation, partial correlation, Method and mutual information have been used to measure the asso- ciation between a pair of neurons. CAIM aims to infer causal relationships based on observation- The gold standard of establishing a causal relationship is al calcium imaging or electrophysiology time series. In performing planned or randomized experiments (Fisher CAIM, microcircuits are DBNs (Koller and Friedman 2009) 1970). Pearl proposed an intervention-based framework for representing causal relationships. In a DBN, nodes are vari- causality analysis (Pearl 2009) and distinguished the observa- ables of interest, and edges (links) represent interactions tional conditional probability P(Y|X) and interventional condi- among variables. If a set of nodes, π , causally affects the tional probability P(Y|do(X)) where the do(.) operator is an activity of node i, then there exists a link from the nodes in intervention. The notion of intervention by Pearl implies that π to node i. π is referred to as the parent set of node i. Each i i if we manipulate X and nothing happens, then X is not the node is associated with a binary variable which represents cause of Y;otherwise, X is one of the causes of Y.However, whether the node is activated. Each node is associated with in many scenarios, experiments are too expensive, or not fea- an updating rule that specifies how its state changes over time sible or ethical to carry out. Causal discovery (or effective due to the activation of the parent set. Network dynamics are connectivity analysis) aims to infer cause-effect relations determined by these updating rules. DBNs can characterize among variables based on observational data. Granger pro- system dynamics, handle noisy data, describe locally posed a framework to infer causality based on prediction im- interacting processes, and support causal inference (Chen provement (Granger 1969). An important framework of caus- et al. 2012). al discovery is based on conditional independence (Spirtes CAIM infers causal networks from neural ensemble activ- et al. 2001). This framework considers the dependence be- ities. Neural activities can be recorded by calcium imaging or tween two variables X and Y given a set of variables Z.Let electrophysiology with electrodes. Preprocessing algorithms 517 Neuroinform (2021) 19:515–527 generate binary neuronal events (spike trains or calcium tran- neurons. Neurons within each cluster are more similar to each sient events). A preprocessing pipeline (Barbera et al. 2016) other than neurons assigned to different clusters. Input to neu- can be used to preprocess calcium imaging data, including ron clustering is s . Clustering generates a partition of the 1:T th image registration, cell mask detection, and neuronal event variable space. The partition Ω is a vector whose i elements detection. However, preprocessing is not the focus of Ω(i) is the group membership of neuron i. CAIM. Let P and T denote the number of neurons and the Neuron clustering is based on the similarity between s i,1:T number of time points, respectively. The preprocessing step and s ,where s is the observed trajectory of neuron i. j,1:T i,1:T results in s For neuron i, s = 1 indicates a neuronal event Therefore, neuron clustering focuses on examining the instan- 1:T. i,t of neuron i at time point t, while s = 0 indicates no event. s- taneous synchrony (the zero-lag synchrony) between neuron i,t =[s , …,s ]isa P-dimensional vector representing neural pairs. There are many clustering algorithms (Wiwie et al. t 1,t P,t events of all neurons at time point t. s =(s , …, s )repre- 2018). s is a trajectory with thousands of observation time 1:T 1 T i,1:T sents neural activity for all time points. points. Each time point is a feature. In this clustering problem, Figure 1 shows the architecture of CAIM. Figure 1a is the P is about several hundred and T is several thousand. conceptual framework of CAIM. In CAIM, neurons are Therefore, the clustering algorithm needs to handle high- grouped into clusters. Neurons in the same cluster have similar dimensional data. Since we assume that an object belongs to functional profiles. Each cluster is associated with a latent a single cluster, we don’t use fuzzy clustering such as c-Means variable (the cluster state variable) which represents whether or probabilistic clustering such as Gaussian mixture models. the cluster is activated or not. Let Y (t) denote the state vari- CAIM uses graph-based clustering. A graph is constructed A Z able for cluster A at time point t. Y =[Y (t), …,Y (t)] is a by using kd-trees to identify the approximate nearest neigh- vector representing states of all clusters at time point t. bors for each object (Arya et al. 1998). This graph construc- Y =(Y , …, Y ) represents cluster states for all time points. tion algorithm is computationally efficient. Clusters are de- 1:T 1 T Interactions among clusters are described by a DBN. In this tected by the walktrap algorithm (Pons and Latapy 2006)for DBN, nodes are cluster state variables. The directed temporal graph-based community detection. The walktrap algorithm interaction between two nodes is represented by a transition finds densely connected subgraphs based on random walks. probability table (Fig. 1b). For example, Pr(Y (t + 1) = active | The algorithm starts by assigning each node to its own com- A C Y (t) = active, Y (t) = active) = 0.88 represents activation of munity and calculates the distance for every pair of commu- cluster A and activation of cluster C at time point t result in nities. Communities are merged according to the minimum of the activation of cluster B at time point t + 1 with probability their distances and the process is repeated. The number of 0.88. clusters is estimated by the walktrap algorithm. The walktrap algorithm uses the results of random walks to merge separate Neuron Clustering communities in a bottom-up manner and creates a dendro- gram. Then it uses the modularity score to select where to The goal of neuron clustering is to group P neurons into K cut the dendrogram. Therefore, the number of clusters is homogeneous clusters. Coherence, which is pairwise func- automictically determined by the algorithm. tional association, plays a key role in neural codes After generating the partition, cluster state variables are (Averbeck et al. 2006;Zoharyet al. 1994). Even weak inferred by voting. For cluster A, the percentage of neurons pairwise linear interactions can result in strongly correlated in state 1 at time point t is calculated. If this percentage is A A network states in a neural ensemble (Schneidman et al. greater than a threshold, then Y (t) = 1; otherwise, Y (t)=0. 2006). Therefore, our clustering algorithm centers on examin- Higher threshold results in sparser cluster activation. If the ing coherence. The objects in this clustering problem are majority voting is adopted, the threshold is 50%. Fig. 1 The architecture of CAIM 518 Neuroinform (2021) 19:515–527 Given binary cluster state variables, a loading matrix can be Results calculated to assess the association between cluster state var- iables and neurons. The loading matrix has P rows and K We evaluated CAIM on simulated spike trains, data from a columns. The element (i, j) in this loading matrix is the relative biophysics-based simulation, and real-world neural activity mutual information (Pregowska et al. 2015) between neuron i data for a delayed reaching task. All experiments were con- and cluster j. The relative mutual information is in [0, 1]. ducted in a workstation with Intel Core i7-4720HQ CPU Higher relative mutual information indicates a stronger asso- @2.6GHz (4 cores and 8 virtual cores) and 16G memory. ciation between two binary random variables. Simulated Spike Trains Causal Network Construction In this experiment, we used simulated binary spike trains to evaluate CAIM. The interactions among clusters were de- Causal network construction infers a DBN based on Y scribed by a ground-truth DBN G . An example of the struc- 1:T, * * which is the dataset including cluster states for all time points. ture of G is depicted in Fig. 2a. Parameters in G were set to A DBN is defined as a pair, (B ,B ), where B is a Bayesian represent additive effects. The transition probability table for 1 → 1 network defining the baseline probability distribution; and B node 5 is depicted in Fig. 2b. The data generation process defines the transition probability P(Y | Y ). That is, B is a included sampling and neural data generation. In the sampling t+ 1 t → two-slice temporal Bayesian network (2TBN). The state of step, we sampled G and generated simulated data for cluster node i at time point t+ 1 is determined by the states of its states. Let Y be the trajectory of cluster i. In neural data 1:T parent set before t+ 1, and is independent of the states of any generation, the trajectory of a neuron in cluster i is generated other nodes. We use π to denote the parent set of node i. π is a by flipping the binary state of Y with a probability λ. λ i i 1:T A C subset of Y . For example, in Fig. 1b, Y and Y determine represented noise level and 1-λ characterized the within- t t t B A C Y ,then π = (Y , Y ). cluster homogeneity. We evaluated CAIM for various noise t+ 1 B t t The DBN-based causal discovery assumes causal suffi- level (subtask 1), cluster similarity (subtask 2), and number of ciency, the causal Markov condition, and faithfulness clusters (subtask 3). (Spirtes et al. 2001). Under these conditions, the causal rela- To evaluate neuron clustering, we compared CAIM clus- tionship can be discovered by machine learning algorithms. tering with other clustering methods including K-means, clus- Our algorithm generates a directed weighted graph G model- tering by density peaks, and the Fuzzy c-means (FCM) based ing the linear/nonlinear interactions among cluster state vari- method in (Fellous et al. 2004; Toups et al. 2011). K-means ables. We use a random forest-based method to find the parent defines a cluster as a sphere around the cluster centroid. The set of a node. For a node Y , we construct a random forest number of clusters was estimated by the Calinski-Harabasz t+ 1 A A model to predict Y based on variables in Y =[Y (t), …, index. K-means was randomly initialized 100 times. t +1 t Y (t)]. The implementation is similar to that in (Huynh-Thu Clustering by density peaks is based on the idea that cluster et al. 2010). A random forest ensemble is generated to predict centers are characterized by a higher density than the neigh- Y based on variables in Y . In the model ensemble, each bors of centers and by a relatively large distance from objects t +1 t tree model is constructed based on a bootstrap sample from with higher densities. To detect the cluster structure, we need the originalsampleand,ateachtest node, a subsetofvariables to manually specify two parameters. In the FCM-based meth- is selected at random among all candidate variables in Y od, we first calculated a P× P distance matrix where the (i, j) before determining the best split (to divide a node in a tree element of this matrix is the Manhattan distance between neu- into two daughter nodes). To quantify the variable impor- rons i and j. Then we applied FCM on the columns of distance tance, for each test node in a tree, we compute the reduction matrix. The number of clusters was determined by the gap of variance of the output variable due to the split. For a single statistic. Neuron clustering performance was evaluated by tree, the importance of a variable is computed by summing the two cluster validity indexes: the Silhouette score and Rand variance reduction values of all tree nodes where this variable index (Ye 2003). Higher Silhouette score or Rand index rep- is used to split. For a tree ensemble, the importance score of a resents better clustering. The Silhouette score has a range of variable is the average over all trees. Variable importance of [−1, 1]. A score near 1 indicates that the sample is far from B B A Y is used as the weight for the link Y → Y Higher neighboring clusters, a score of 0 indicates that the sample is t t t +1. weights represent stronger relationships. Random forests have on or very close to the decision boundary, and negative values the capability to model nonlinear and combinational interac- indicate poor assignment. The Rand index determines the sim- tions (interactions involving multiple nodes, instead of ilarity between the estimated label and the ground-truth label pairwise) and handle high-dimensional data. In our implemen- as a function of positive and negative agreements in pairwise tation, we adopt the parameter tuning process of random forest cluster assignments; when two labels agree perfectly, the described in (Huynh-Thu et al. 2010). Rand index is 1. Neuroinform (2021) 19:515–527 519 Fig. 2 The simulated spike train data. a The ground-truth DBN model which describes temporal interactions among cluster state variables. b The transition probability table for cluster 5. c Spike trains of 60 neurons. Noise level is 0.1 B A For causal discovery, we compared our causal network models. For a directed link Y → Y the link strength t t +1, discovery algorithm to Bayesian network structure learning was measured by the frequency of this link appearing in the (BNS), Bayesian network structure learning with resampling model ensemble. CAIM, BNSR, GLMNET generated weight- B A (BNSR) (Chen et al. 2017), and GLMNET. In BNS, we used ed directed graphs. The higher edge weight of Y → Y t t +1 B A the algorithm in (Chen et al. 2012) to detect the parent set of represents a stronger relationship between Y and Y .BNS t t +1 Y . The association among nodes by the Bayesian generated an unweighted graph. t +1 Dirichelet score (Chen et al. 2012), which is the marginal For causal discovery, we used area under the ROC Curve likelihood or evidence P(G | D), where D is the observed data. (AUC) to evaluate algorithms’ performance. AUC was calcu- The Bayesian Dirichelet score is decomposable. That is, we lated based on the generated graph and the ground-truth DBN. can maximize this score node by node. For each node Y , Higher AUC indicated an algorithm achieved better perfor- t +1 we used the algorithm in (Chen et al. 2012) to search for a set mance in detecting the ground-truth DBN structure. of nodes in Y which maximizes the Bayesian Dirichelet score. In subtask 1, we evaluated CAIM with different noise This set of nodes is the parent set of Y . Based on these levels. In this subtask, 60 neurons were grouped into 6 clus- t +1 parent sets, we can generate a graph describing causal inter- ters. Each cluster had 10 neurons. In the simulation, T =5000 actions. In BNSR, bootstrap resampling was used to stabilize and P = 60. The structure of G is depicted in Fig. 2a. Datasets the Bayesian network learning process. We resampled the for three noise levels, 0.1 (low noise level), 0.2 (medium noise original dataset 1000 times and utilized BNS to generate a level), and 0.3 (high noise level), were generated. The first 100 DBN model for each resampled dataset. For an edge Y → observations of all neurons for noise level 0.1 are depicted in Y the edge strength was measured by the frequency of Fig. 2c. In subtask 2, we evaluated CAIM with different clus- t +1, this edge appearing in the model ensemble. In GLMNET, for ter similarity levels. In this subtask, 60 neurons were grouped A A Y ,variables in Y which were most predictive of Y into 6 clusters (each cluster had 10 neurons). T = 5000 and t +1 t t +1 were identified by Lasso and elastic-net regularized general- P = 60. The structure of G is depicted in Fig. 2a. Noise level ized linear models (Friedman et al. 2010). Parameters in was 0.2. We varied parameters of the ground truth DBNs and GLMNET were tuned based on internal cross-validation. To generated datasets with different cluster similarity levels. For a improve model stability, we used bootstrap resampling to re- dataset, cluster similarity was quantified by the average sample the raw dataset 1000 times and generated models for Hamming distances across all cluster pairs. We generated resampled datasets. The model ensemble included 1000 three datasets: low similarity (Hamming distance = 2696), 520 Neuroinform (2021) 19:515–527 middle similarity (Hamming distance = 1461), and high simi- although the AUC of CAIM was consistently higher. larity (Hamming distance = 862). Higher similarity is more Relative to BNS, BNSR achieved significantly higher AUC. challenging for neuron clustering. In subtask 3, we evaluated This is because BNSR is an ensemble learning based method CAIM with different cluster numbers. In this subtask, each and achieves consistent estimates by combining solutions cluster had 10 neurons. We generated datasets with 3 clusters from different bootstrap resampled training data sets. (30 neurons), 6 clusters (60 neurons), and 9 clusters (90 neu- Collectively, these experiments demonstrate that CAIM rons). The structure of G was randomly generated. Noise can detect the cluster structure and achieve the optimal perfor- level was 0.2. mance balance (high AUC and short running time). We found Neuron clustering results for subtask 1 are summarized in that CAIM accurately inferred the causal relationships. Table 1.Figure 3 depicts the loading matrix of neuron clus- tering for noise level 0.3. For all noise levels, CAIM achieved Biophysics Based Simulation the best clustering performance. CAIM always detected the correct number of clusters and identified the correct cluster In this experiment, a biophysics-based simulation was used to structure (Rand index = 1). Neuron clustering results for sub- assess CAIM. The simulation modeled interactions among a task 2 are summarized in Table 2. For different cluster simi- set of integrate-and-fire (I&F) neurons with noise. Such a larity levels, CAIM consistently detected the corrected num- neuron model can represent virtually all postsynaptic poten- ber of clusters and identified the correct cluster structure. tials or currents described in the literature (e.g. α-functions, bi- Neuron clustering results for subtask 3 are summarized in exponential functions) (Brette et al. 2007). The neuron model Table 3. For varying cluster numbers, CAIM detected the (Gütig and Sompolinsky 2006)isasfollows: corrected number of clusters and identified the correct cluster structure. For all experimental conditions, CAIM and FCM dV ðÞ V −V rest consistently achieved higher Silhouette score and Rand index ¼ þ σ  τ ðÞ −0:5 ε ð1Þ dt τ than did K-means and clustering by density peaks. Overall, CAIM achieved the highest Silhouette score and Rand index. where V is the membrane potential, V is the rest potential, ε rest Figures 4, 5 and 6 depict the AUCs of BNS, BNSR, CAIM is a Gaussian random variable with mean 0 and standard de- and GLMNET for subtasks 1, 2, and 3, respectively. CAIM viation 1, τ is the membrane time constant, and σ is a param- achieved the highest AUC in most combinations of experi- eter controlling the noise term. Spikes received through the mental setups and thresholds. For threshold = 0.5, CAIM’s synapses trigger changes in V. A neurons fires if V is greater AUCs were 1 for all scenarios. CAIM was robust to the than a threshold. This neuron cannot generate a second spike threshold to infer binary cluster states. CAIM and BNSR con- for a brief time after the first one (refractoriness). sistently achieved higher AUCs than did BNS and GLMNET. Our simulation included 160 neurons in four groups: A, B, The typical execution time of BNS, BNSR, CAIM and C, and D. Each group had 40 neurons. The ground-truth caus- GLMNET were 0.23 s, 13.73 s, 8.28 s, and 1571.92 s. al graph is depicted in Fig. 7a. Neurons in group A had no CAIM and BNSR had similar execution time while parent nodes. They all received a stimulus. Neurons in group GLMNET had a much longer execution time. Both AUCs B had two or three neurons in group A as parent nodes. and execution times of BNSR and CAIM were similar, Neurons in group C had two or three neurons in group A as Table 1 Clustering results for the Noise level Method Detected number of clusters Silhouette score Rand index simulated spike trains with different noise levels 0.1 CAIM 6 0.268 1.000 K-means 6 0.268 1.000 Clustering by density peaks 7 0.075 0.594 FCM 6 0.268 1.000 0.2 CAIM 6 0.113 1.000 K-means 2 0.062 0.259 Clustering by density peaks 4 0.039 0.488 FCM 6 0.113 1.000 0.3 CAIM 6 0.043 1.000 K-means 2 0.024 0.259 Clustering by density peaks 6 0.013 0.502 FCM 10 0.015 0.748 Neuroinform (2021) 19:515–527 521 Fig. 3 The loading matrix of neuron clustering for subtask 1. Noise level is 0.3 Table 2 Clustering results for the Cluster Method Detected number of Silhouette Rand simulated spike trains with similarity clusters score index different cluster similarities Low CAIM 6 0.145 1.000 K-means 2 0.117 0.259 Clustering by density 5 0.119 0.631 peaks FCM 6 0.145 1.000 Middle CAIM 6 0.108 1.000 K-means 2 0.065 0.259 Clustering by density 4 0.038 0.488 peaks FCM 6 0.108 1.000 High CAIM 6 0.075 1.000 K-means 4 0.058 0.550 Clustering by density 4 0.020 0.414 peaks FCM 10 0.027 0.748 522 Neuroinform (2021) 19:515–527 Table 3 Clustering results for the The number of Method Detected number of Silhouette Rand simulated spike trains with clusters clusters score index varying numbers of clusters 3 CAIM 3 0.124 1.000 K-means 3 0.124 1.000 Clustering by density 5 0.008 0.293 peaks FCM 3 0.124 1.000 6 CAIM 6 0.109 1.000 K-means 2 0.062 0.259 Clustering by density 4 0.043 0.363 peaks FCM 6 0.109 1.000 9 CAIM 9 0.147 1.000 K-means 8 0.138 0.876 Clustering by density 9 0.106 0.823 peaks FCM 10 0.131 0.966 C D parent nodes. If a parent node fired, the membrane potential of Y → Y were 0.90, 0.90, 0.50, and 0.47, respectively. t t +1 the target node increased by w.The w of connections between Other edges had very low weights. The strong links char- A and B was different from that of A and C. Firing of neurons acterize the strong causal relationship in Fig. 7a.Overall, in groups B and C caused firing of neurons in group D. The CAIM was able to identify the causal relationship be- simulated spike trains are depicted in Fig. 7b. tween these neuron groups. CAIM accurately detected the cluster structure with the RAND score 0.98. This weighted graph was robust to the Real-World Neural Activity Data for a Delayed threshold to infer binary cluster states and remained stable Reaching Task for the threshold in [0.3 0.7]. We chose threshold = 0.5. The edge weight had a bimodal distribution. The edge CAIM was evaluated based on a spike dataset acquired during A B A C B D weights of Y → Y , Y → Y , Y → Y ,and t t +1 t t +1 t t +1 the delay period in a standard delayed reaching task Fig. 4 AUCs of BNS, BNSR, CAIM, and GLMNET for the simulated spike train data with varying noise levels Neuroinform (2021) 19:515–527 523 Fig. 5 AUCs of BNS, BNSR, CAIM, and GLMNET for the simulated spike train data with varying cluster similarity levels (Santhanam et al. 2009). A male rhesus monkey performed a neurons in the right premotor cortex. The reaching task dataset standard instructed-delay center-out reaching task. Animal contained two experimental conditions (conditions 1 and 2). protocols were approved by the Stanford University Each condition had 56 trials. The spike train had a length Institutional Animal Care and Use Committee. The dataset between 1018 ms and 1526 ms. Spike trains are binned using contains spike trains recorded simultaneously by a silicon a non-overlapping bin with a width of 20 ms. This bin size was electrode array (Cyberkinetics, Foxborough, MA) from 61 found to work well for population activity recorded in the Fig. 6 AUCs of BNS, BNSR, CAIM, and GLMNET for the simulated spike train data with varying cluster numbers 524 Neuroinform (2021) 19:515–527 Fig. 7 Causal discovery results for the biophysics-based simulation. a The ground-truth causal graph. b The spike trains of cluster states (the first 200 frames) A A B B B D motor cortex (Cowley et al. 2012). Among 61 neurons, 16 Y was driven by Y and Y Y is driven by Y and Y t +1 t t. t +1 t t B A B C neurons had a low firing rate (<5 spikes/s) and were excluded in condition 1,while Y is driven by Y ,Y and Y in t +1 t t t D B D from the analysis. Excluding these low firing neurons from condition 2. Y is driven by Y and Y in condition 1, t +1 t t D D causal discovery doesn’t exclude the possibility that they con- while Y is driven by Y in condition 2. In this analysis, t +1 t tributed to the observed ensemble activity. We excluded them the conditions were predetermined by the experimental de- because these low firing neurons had too few active states sign. Our analysis of the reach-task data demonstrated that needed to firmly establish causal relationships (Chen et al. CAIM can be used for differential causal graph analysis. 2008). CAIM found 4 clusters. Figure 8a depicts the loading ma- trix of neuron clustering. The average within-cluster relative Discussion mutual information was 0.187, while the average between- cluster relative mutual information was 0.016. These results We propose a causal discovery method called CAIM that is demonstrated good cluster separation. The detected causal networks are depicted in Fig. 8b and c based on DBNs. It’s capable of revealing causal interactions among neural dynamics. Relative to static network analysis, which are the networks for two different conditions, respec- tively. Strong links (edges with weights greater than the me- CAIM can model complex spatiotemporal patterns of circuit dian weight) are shown. Both causal graphs demonstrated activity related to a cognitive process or behavior. We validated CAIM based on two simulated studies and a persistence. That is, for a cluster, Y is driven by Y t +1 t. Persistence may reflect continuous firing. The causal graphs real-world spike dataset acquired during the delay period in a standard delayed reaching task. In the simulated spike train for these two conditions also had significant structural differ- A A ences. In condition 1, Y was strongly driven by Y and experiment, we demonstrated that CAIM accurately detected t +1 t causal relationships among neuron clusters. We compared Y . Such a pattern was changed in condition 2. In condition 2, Fig. 8 Causal discovery results for the reach-task dataset. a The loading matrix for neuron clustering. Rows are neurons (split by the cluster label); and columns are clusters. b and c are DBNs for condition 1 and 2. In DBNs, edge weights represent strength of connectivity Neuroinform (2021) 19:515–527 525 CAIM with other methods. For neuron clustering, CAIM P(response | stimuli). A two-layer feed-forward neural net- achieved a higher Rand index than k-means and clustering work is used for decoding. In this neural network, neurons by density peaks. For causal discovery, compared to BNS, in the output layer compute the product of input likelihood BNSR, and GLMNET, CAIM achieved the optimal perfor- functions. Friston suggested a strong correspondence between mance balance in AUC and running time. In the biophysics- the anatomical organization of the neocortex and hierarchical based simulation, we generated simulated data for a set of Bayesian generative models (Friston 2003). In (George and integrate-and-fire neurons with noise. These neurons formed Hawkins 2009), a Bayesian model for cortical circuits is pro- four clusters. CAIM accurately identified cluster structure and posed. This method describes Bayesian belief propagation in a causal relationship between these neuron clusters. For the de- spatio-temporal hierarchical model, called hierarchical tempo- layed reaching experiment, 45 neurons formed 4 clusters. The ral memory (HTM). An HTM node abstracts space as well as causal graphs for two different experimental conditions were time. HTM graphs use Bayesian belief propagation for infer- different. The parent sets of nodes A, B,and D were different ence. Deneve proposed a Bayesian neuron model in which between two conditions. Collectively, these experiments dem- spike trains provide a deterministic, online representation of onstrated that CAIM is a powerful computation framework to a log probability ratio (Deneve 2005). However, the above detect causal relationships among neural dynamics. studies about Bayesian analysis of neural activity data don’t The network generated by CAIM is different from that center on causality inference. generated from synchrony analysis. Synchrony analysis cen- The causal sufficiency assumption is widely used in causal ters on calculating the cross-correlation between two neural discovery in order to make the causal discovery process com- temporal courses. CAIM focuses on modeling the transition putationally tractable. However, if there is an unmeasured dynamics among neural temporal courses. Synchrony analysis time series Z that influences the observed time series Y,then and CAIM provide complementary information about a cog- the approach based on the causal sufficiency assumption can nitive process. lead to incorrect causal conclusions. This is one of the limita- The network model generated by CAIM is explainable; it is tions of CAIM. Our future research will address this limita- a graphical model and has excellent interpretability. CAIM is tion. We will introduce latent variables which represent un- expandable. The computational framework in CAIM can be measured time series, then use the expectation maximization used for other applications such as modeling cortical traveling (EM) to infer properties of partially observed Markov process- waves (Muller et al. 2018). Using the CAIM framework, we es (Geiger et al. 2015). can detect clusters that have neurons with zero-lag synchrony; In CAIM, we assume that the causal structure is invariant then model information propagation in a pathway and focus across time points. If the dependencies in the underlying pro- on the pattern that activation of cluster A at time point t leads cess change over time, the generated model is an average over to activation of cluster B at time point t + 1. The biophysics- different temporal dependency structures. In the future, we based simulation provides an example of information propa- will extend CAIM to handle time-varying causal graphs. In gation in the pathway A→ B→ D. this new framework, we will generate a causal graph for each We have developed algorithms called dynamic network time point and aggregate these causal graphs. analysis to model interactions among neural signals at a mac- In the current framework, we generated a ranking of poten- roscopic scale (Chen et al. 2012;Chenet al. 2017;Chen and tial causal interactions. In some real-world applications, we Herskovits 2015). CAIM and dynamic network analysis han- need to determine a threshold on this ranking to obtain a dle different kinds of temporal data. Dynamic network analy- binary causal graph. In future work, we will develop algo- sis is designed to generate a network model from longitudinal rithms to overcome this challenge. One method is based on MR data. Longitudinal MR data are short temporal sequences. the likelihood function. For a generated binary graph, we can For most longitudinal image data, the number of visits for calculate a score to represent the likelihood that observed data each subject is small, often less than ten. Therefore, dynamic is generated from the binary graph; and choose a threshold to network analysis requires data from many subjects to generate maximize the likelihood (Chen and Herskovits 2015). This a stable model, assuming that the brain network model is process should be inside a cross-validation procedure to avoid invariant across subjects. CAIM is designed to generate a overfitting. network model from data streams which include thousands In this paper, the interactions among neural activities of data points. Therefore, CAIM does not assume that the are represented by a 2TBN. The 2TBN represents a first- brain network model is invariant across subjects. order time-invariant Markov process. We adopted the Bayesian methods have been used to model neural activity 2TBN representation to simplify the computation. In data. Ma et al. proposed a Bayesian framework to describe CAIM, we group neurons into clusters, effectively reduc- how populations of neurons represent uncertainty to perform ing the dimensionality of model space. An alternative ap- Bayesian inference (Ma et al. 2006). The probabilistic rela- proach for dimension reduction is projecting variables in- tionship between stimuli and response is formalized as to a low-dimensional space and modeling dynamics 526 Neuroinform (2021) 19:515–527 Chen, R., & Herskovits, E. H. (2007). Clinical diagnosis based on among latent variables. In the future, we will develop Bayesian classification of functional magnetic-resonance data. such algorithms. Neuroinformatics, 5(3), 178–188. In conclusion, CAIM provides a powerful computational Chen, R., & Herskovits, E. H. (2015). Predictive structural dynamic net- framework to infer causal graphs based on high-dimensional work analysis. J Neurosci Methods, 245,58–63. Chen, Y., Bressler, S. L., & Ding, M. (2006). Frequency decomposition observational neural activity data. We envisage that CAIM of conditional Granger causality and application to multivariate neu- will be of great value in understanding spatiotemporal patterns ral field potential data. J Neurosci Methods, 150(2), 228–237. of circuit activity related to a specific behavior. Chen, R., Hillis, A. E., Pawlak, M., & Herskovits, E. H. (2008). Voxelwise Bayesian lesion-deficit analysis. Neuroimage, 40(4), 1633–1642. Chen, R., Resnick, S. M., Davatzikos, C., & Herskovits, E. H. (2012). Information Sharing Statmement Dynamic Bayesian network modeling for longitudinal brain mor- phometry. Neuroimage, 59(3), 2330–2338. The data of the delayed reaching task is available at https:// Chen, R., Zheng, Y., Nixon, E., & Herskovits, E. H. (2017). Dynamic users.ece.cmu.edu/~byronyu/software/DataHigh/get_started. network model with continuous valued nodes for longitudinal brain html. The simulated data and the software package are freely morphometry. Neuroimage, 155,605–611. Churchland, M. M., Yu, B. M., Sahani, M., & Shenoy, K. V. (2007). available for academic purposes on request. Techniques for extracting single-trial activity patterns from large- scale neural recordings. Curr Opin Neurobiol, 17(5), 609–618. Acknowledgments This work was supported by the NIH NINDS Cowley, B. R., Kaufman, M. T., Churchland, M. M., Ryu, S. I., (R01NS110421) and the BRAIN Initiative. Shenoy, K. V., & Yu, B. M. (2012). DataHigh: Graphical user interface for visualizing and interacting with high-dimensional Compliance with Ethical Standards neural activity. Proc Annu Int Conf IEEE Eng Med Biol Soc EMBS, 10(6), 4607–4610. Declaration of Interest none. Deneve, S. (2005). Bayesian inference in spiking neurons. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in Neural Information Open Access This article is licensed under a Creative Commons Processing Systems, 17 (pp. 353–360). Vancouver: MIT Press. Attribution 4.0 International License, which permits use, sharing, adap- Eldawlatly, S., Zhou, Y., Jin, R., & Oweiss, K. G. (2010). On the tation, distribution and reproduction in any medium or format, as long as use of dynamic Bayesian networks in reconstructing functional you give appropriate credit to the original author(s) and the source, pro- neuronal networks from spike train ensembles. Neural Comput, vide a link to the Creative Commons licence, and indicate if changes were 22(1), 158–189. made. The images or other third party material in this article are included Fellous, J.-M., Tiesinga, P. H., Thomas, P. J., & Sejnowski, T. J. (2004). in the article's Creative Commons licence, unless indicated otherwise in a Discovering spike patterns in neuronal responses. J Neurosci, credit line to the material. If material is not included in the article's 24(12), 2989–3001. Creative Commons licence and your intended use is not permitted by Fisher, F. M. (1970). A correspondence principle for simultaneous equa- statutory regulation or exceeds the permitted use, you will need to obtain tion models. Econom J Econom Soc,73–92. permission directly from the copyright holder. To view a copy of this Friedman, J., Hastie, T., & Tibshirani, R. (2010). Regularization licence, visit http://creativecommons.org/licenses/by/4.0/. paths for generalized linear models via coordinate descent. J Stat Softw, 33(1), 1–22. Friston, K. (2003). Learning and inference in the brain. Neural Netw, 16(9), 1325–1352. Geiger, D. & Pearl, J. (1990), On the logic of causal models. In Machine References Intelligence and Pattern Recognition,vol.9, Elsevier,pp. 3–14. Geiger, P., Zhang, K., Schoelkopf, B., Gong, M., & Janzing, D. (2015). Arya, S., Mount, D. M., Netanyahu, N. S., Silverman, R., & Wu, A. Y. Causal inference by identification of vector autoregressive processes (1998). An optimal algorithm for approximate nearest neighbor with hidden components. In International Conference on Machine searching fixed dimensions. JACM, 45(6), 891–923. Learning,(pp. 1917–1925). Lille, France: ICML’15. Averbeck, B. B., Latham, P. E., & Pouget, A. (2006). Neural cor- George, D., & Hawkins, J. (2009). Towards a mathematical theory of relations, population coding and computation. Nat Rev cortical micro-circuits. PLoS Computational Biology, 5(10), Neurosci, 7(5), 358–366. e1000532. Barbera, G., Liang, B., Zhang, L., Gerfen, C. R., Culurciello, E., Chen, Ghosh, K. K., Burns, L. D., Cocker, E. D., Nimmerjahn, A., Ziv, Y., R., Li, Y., & Lin, D. T. (2016). Spatially compact neural clusters in Gamal, A. E., & Schnitzer, M. J. (2011). Miniaturized integration the dorsal striatum encode locomotion relevant information. of a fluorescence microscope. Nat Methods, 8(10), 871–878. Neuron, 92(1), 202–213. Granger, C. W. J. (1969). Investigating causal relations by econometric Bar-Joseph, Z., Gitter, A., & Simon, I. (2012). Studying and modelling models and cross-spectral methods. Econometrica, 37(3), 424–438. dynamic biological processes using time-series gene expression da- Gütig, R., & Sompolinsky, H. (2006). The tempotron: A neuron that ta. Nat Rev Genet, 13(8), 552–564. learns spike timing-based decisions. Nat Neurosci, 9(3), 420–428. Brette, R., Rudolph, M., Carnevale, T., Hines, M., Beeman, D., Bower, J. Hafting, T., Fyhn, M., Molden, S., Moser, M.-B., & Moser, E. I. (2005). M., Diesmann, M., Morrison, A., Goodman, P. H., Harris Jr., F. C., Microstructure of a spatial map in the entorhinal cortex. Nature, Zirpe, M., Natschläger, T., Pecevski, D., Ermentrout, B., Djurfeldt, 436(7052), 801–806. M., Lansner, A., Rochel, O., Vieville, T., Muller, E., Davison, A. P., el Boustani, S., & Destexhe, A. (2007). Simulation of networks of Harvey, C. D., Coen, P., & Tank, D. W. (2012). Choice-specific se- spiking neurons: A review of tools and strategies. J Comput quences in parietal cortex during a virtual-navigation decision task. Neurosci, 23(3), 349–398. Nature, 484(7392), 62–68. Neuroinform (2021) 19:515–527 527 Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Pons, P., & Latapy, M. (2006). Computing Communities in Large Statistical Learning: Data Mining, Inference, and Prediction.New Networks Using Random Walks. J. Graph Algorithms Appl, 10, York: Springer. https://doi.org/10.1007/978-0-387-84858-7. 191–218. Hu, M., Li, W., & Liang, H. (2018). A copula-based Granger causality Pregowska, A., Szczepanski, J., & Wajnryb, E. (2015). Mutual informa- measure for the analysis of neural spike train data. IEEE/ACM Trans tion against correlations in binary communication channels. BMC Comput Biol Bioinforma, 15(2), 562–569. Neurosci, 16,32. Huynh-Thu, V. A., Irrthum, A., Wehenkel, L., & Geurts, P. (2010). Santhanam, G., Yu, B. M., Gilja, V., Ryu, S. I., Afshar, A., Sahani, M., & Inferring regulatory networks from expression data using tree- Shenoy, K. V. (2009). Factor-analysis methods for higher- based methods. PLoS One, 5(9), 1–10. https://doi.org/10.1371/ performance neural prostheses. J Neurophysiol, 102(2), 1315–1330. journal.pone.0012776. Sauerbrei, W., Boulesteix, A.-L., & Binder, H. (2011). Stability investi- Kerr, J. N. D., & Nimmerjahn, A. (2012). Functional imaging in freely gations of multivariable regression models derived from low- and moving animals. Curr Opin Neurobiol, 22(1), 45–53. high-dimensional data. J Biopharm Stat, 21(6), 1206–1231. Ko, H., Cossell, L., Baragli, C., Antolik, J., Clopath, C., Hofer, S. B., & Schneidman, E., Berry, M. J., Segev, R., & Bialek, W. (2006). Weak Mrsic-Flogel, T. D. (2013). The emergence of functional microcir- pairwise correlations imply strongly correlated network states in a cuits in visual cortex. Nature, 496(7443), 96–100. neural population. Nature, 440(7087), 1007–1012. Koller, D., & Friedman, N. (2009). Probabilistic Graphical Models: Scott, B. B., Brody, C. D., & Tank, D. W. (2013). Cellular resolution Principles and Techniques. Cambridge: MIT Press. functional imaging in behaving rats using voluntary head restraint. Litwin-Kumar, A., & Doiron, B. (2012). Slow dynamics and high vari- Neuron, 80(2), 371–384. ability in balanced cortical networks with clustered connections. Nat Song, S., et al. (2005). Highly nonrandom features of synaptic connec- Neurosci, 15(11), 1498–1505. tivity in local cortical circuits. PLoS Biology, 3(3), e68. Luczak, A., Bartho, P., Marguet, S. L., Buzsaki, G., & Harris, K. D. Spirtes, P., Glymour, C., & Scheines, R. (2001). Causation, Prediction, (2007). Sequential structure of neocortical spontaneous activity and Search (2nd ed.). Cambridge: MIT Press. in vivo. Proc Natl Acad Sci U S A, 104(1), 347–352. Toups, J. V., Fellous, J.-M., Thomas, P. J., Sejnowski, T. J., & Tiesinga, Ma, W. J., Beck, J. M., Latham, P. E., & Pouget, A. (2006). Bayesian P. H. (2011). Finding the event structure of neuronal spike trains. inference with probabilistic population codes. Nat Neurosci, 9(11), Neural Comput, 23(9), 2169–2208. 1432–1438. Wiwie, C., Baumbach, J., & Röttger, R. (2018). Guiding biomedical Meek, C. (1995). Strong Completeness and Faithfulness in Bayesian clustering with ClustEval. Nat Protoc, 13(6), 1429–1444. Networks. In Proceedings of the Eleventh Conference on Ye, N. (2003). The Handbook of Data Mining, vol. 7, no. 1.Mahwah: Uncertainty in Artificial Intelligence, (pp. 411–418). San Lawrence Erlbaum Associates, Inc.. Francisco: Morgan Kaufmann Publishers Inc. Yoshimura, Y., & Callaway, E. M. (2005). Fine-scale specificity of cor- Meyer, P. E., Kontos, K., Lafitte, F., & Bontempi, G. (2007). tical networks depends on inhibitory cell type and connectivity. Nat Information-theoretic inference of large transcriptional regulatory Neurosci, 8(11), 1552–1559. networks. EURASIP J Bioinforma Syst Biol, 2007, 79879. Yoshimura, Y., Dantzker, J. L. M., & Callaway, E. M. (2005). Excitatory Muller, L., Chavane, F., Reynolds, J., & Sejnowski, T. J. (2018). Cortical cortical neurons from fine-scale functional networks. Nature, travelling waves: Mechanisms and computational principles. Nat 433(February), 868–873. Rev Neurosci, 19(5), 255–268. Zohary, E., Shadlen, M. N., & Newsome, W. T. (1994). Correlated neu- Park, S., Kim, J. M., Shin, W., Han, S. W., Jeon, M., Jang, H. J., et al. ronal discharge rate and its implications for psychophysical perfor- (2018). BTNET: boosted tree based gene regulatory network infer- mance. Nature, 370(6485), 140–143. ence algorithm using time-course measurement data. BMC Systems Biology, 12(2), 69–77. https://doi.org/10.1186/s12918-018-0547-0. Pearl, J. (2009). Causality: Models, Reasoning and Inference (2nd ed.). Publisher’sNote Springer Nature remains neutral with regard to jurisdic- New York: Cambridge University Press. tional claims in published maps and institutional affiliations.

Journal

NeuroinformaticsSpringer Journals

Published: Jan 4, 2021

Keywords: Causal discovery; Neuroimaging; Dynamic Bayesian network; Clustering

There are no references for this article.