Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A Review of Hidden Markov Models and Recurrent Neural Networks for Event Detection and Localization in Biomedical Signals

A Review of Hidden Markov Models and Recurrent Neural Networks for Event Detection and... A Review of Hidden Markov Models and Recurrent Neural Networks for Event Detection and Localization in Biomedical Signals a b a,c,d,e,< Yassin Khalifa , Danilo Mandic and Ervin Sejdić Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA, USA Department of Electrical and Computer Engineering, Imperial College, London, SW7 2BT United Kingdom Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA, USA Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA A R T I C L E I N F O A B S T R A C T Keywords: Biomedical signals carry signature rhythms of complex physiological processes that control our Event Detection daily bodily activity. The properties of these rhythms indicate the nature of interaction dynamics Hidden Markov Models among physiological processes that maintain a homeostasis. Abnormalities associated with diseases or Recurrent Neural Networks disorders usually appear as disruptions in the structure of the rhythms which makes isolating these Deep Learning rhythms and the ability to differentiate between them, indispensable. Computer aided diagnosis systems Biomedical Signal Processing are ubiquitous nowadays in almost every medical facility and more closely in wearable technology, Transfer Learning and rhythm or event detection is the first of many intelligent steps that they perform. How these rhythms are isolated? How to develop a model that can describe the transition between processes in time? Many methods exist in the literature that address these questions and perform the decoding of biomedical signals into separate rhythms. In here, we demystify the most effective methods that are used for detection and isolation of rhythms or events in time series and highlight the way in which they were applied to different biomedical signals and how they contribute to information fusion. The key strengths and limitations of these methods are also discussed as well as the challenges encountered with application in biomedical signals. ability, and fast response. Of these assistive technologies, 1. Introduction computer aided diagnosis and wearable systems are powered Physiological processes are complex tasks performed by by the virtual side of artificial intelligence (machine learning the different systems of the human body in a rarely periodic techniques) and play a vital role in anomaly detection, moni- but rather irregular manner to deliver an action that could be toring, and even emergency response [3]. The rise of such biochemical, electrical, or mechanical [1, 2]. Some of these systems has led to the evolution of biomedical signal analysis actions are obvious like heart beating, breathing, and other which has been the focus of researchers for the last couple of physical activities and some are not as obvious like hormonal decades. This evolution not only included the macro-analysis stimulation that regulates multiple body functions. The action of gross processes but also the detection and analysis of micro- produced can be usually manifested as some sort of a signal events within each gross process [3]. As mentioned before, that holds information about the parent physiological process biomedical signals carry the signatures of many processes [2]. Disruptions in these physiological processes associated and artifacts, which makes the extraction/identification of the with diseases, lead to the development of pathological pro- specific part of interest (called event or epoch), the first step cesses that alter the performance of the human body. Both of any systematic signal analysis or monitoring [4]. Further, normal and pathological processes in addition to other arti- the need for robust event extraction algorithms for biomedical facts from the environment and surrounding processes, are all signals is driven by the exponential growth of the amount held in the manifested signals and the associated changes in and complexity of data generated by biomedical systems [5]. their waveform. These signals are called biomedical signals Moreover, reducing the human-dependent steps in the analy- and can be of many forms including the electrical form (po- sis, mitigates the reliability and subjectivity issues associated tential or current changes) or physical (force or temperature) with human tolerance. [2]. Epoch extraction is not only essential for systematic signal Artificial intelligence is currently taking over to empower analysis, but also substantial to information fusion for multi- a variety of assistive technologies that help solve the problems channel systems and/or sensor networks which represent a of the healthcare sector given the continuously increasing large portion of biomedical-signal-based decision-making cost and shortage of professional caregivers. These technolo- systems nowadays. Multiple fusion models can employ epoch gies are advancing to perform not only diagnosis but also extraction and event detection to overcome different obstacles intervention and curing due to the superior sensitivity, adapt- including but not limited to signal synchronization and feature fusion [6, 7]. In complementary data-level fusion, events can Corresponding author Email address: esejdic@ieee.org (E. Sejdić) be used to align signals as preparation for feature extraction URL: www.imedlab.org (E. Sejdić) such as using heart beats to align the signals from multiple ORCID(s): 0000-0003-4987-8298 (E. Sejdić) electrocardiography (ECG) leads. In feature-level fusion Khalifa et al., 2020 Page 1 of 21 arXiv:2012.06104v1 [cs.LG] 11 Dec 2020 models, event detection can be used to combine features state-of-the-art practices and results. We show the theoretical from different signals during only the events of interest that and practical aspects for most of the methods and the way in contribute to morphology analysis and the decision-making which they were used to handle the time modeling in event process [7, 8]. detection problems. Further, we discuss the recent major ma- Epoch extraction algorithms have been used repeatedly in chine learning applications in biomedical signal processing segmentation of many biomedical signals, including, but not and the anticipated advances for future implementations. limited to, heart sound and ECG [9, 10], electroencephalog- raphy (EEG) [11–13], and swallowing vibrations [14–17]. 2. Hidden Markov Models Such algorithms immensely depend on modeling time-series, A time series can be characterized using either deter- the paradigm that is not explicitly provided by regular ma- ministic or stochastic models. Deterministic models usually chine learning and sequence-agnostic models such as support describe the series using some specific properties such as be- vector machines, regression, and feed forward neural net- works [18]. These models depend on a major assumption ing the sum of sinusoids or exponentials and aim to estimate that the training and test examples are independent and not the values of the parameters contributing to these properties related in time or space which in result initiates a reset to the (e.g. amplitude, frequency, and phase of the sinusoids) [20]. entire state of the model [18]. Particularly speaking, splitting On the other hand, statistical models assume that the series can be described through a parametric random process whose time series into data chunks and using consecutive chunks parameters can be estimated in a well-defined way [20, 24]. independently in building models is unacceptable because HMMs belong to the category of statistical models and usu- even in the case of modeling a time series with iid processes, ally are referred to as probabilistic functions of Markov chains the underlying processes might be longer than a single chunk in the literature [20, 25]. which induces dependency between consecutive chunks. Sliding window approach has been introduced to tackle 2.1. Markov Chains the problem of dependence between consecutive chunks through Markov chain is a stochastic process modeled by a finite using an overlap which guarantees that a part of each chunk state machine that can be described at any instance of time will be carried over to the next chunk. Although this might to be one of N distinct states. These states can be tags or be useful in modeling many processes, it fails to model long symbols representing the problem of interest. The machine range dependencies and requires the optimization of both may stay at the same state or switch to another state at reg- data chunk and overlap lengths to best represent the target ularly spaced discrete times according to a set of transition processes. Additionally, using windowing in time domain probabilities associated with each state [20, 24] and the tran- provokes a sort of distortion to the frequency representation sition probabilities are assumed to be time independent. The due to the leakage effect and can only be used for modeling initial state is deemed to be known and the transition proba- fixed-length input/output scenarios [18]. All of this raised bilities are described using the transition matrix: A = ^a `; the need for models capable of selectively transferring states ij where a is the transition probability from state S to state across time, processing sequences of not necessarily indepen- ij i S and both i, and j can take values from 1 to N. The ac- dent elements, and yielding a computational paradigm that j tual state at time t is denoted as q and for a full description can handle variable-length inputs and outputs [19]. It was not of the probabilistic model, the current state as well as at that long before the researchers started to bring stochastic- least the state previous to it (for a first order Markov chain), based models [20] and design deep recurrent networks [19] need to be specified. The first order Markov chain assumes to perfectly fit the event extraction problems and overcome that the current state depends only on the previous state: the limitations of regular machine learning methodologies. P.q = jðq = i; q = k;§/ = P.q = jðq = i/. This Multiple models have been offered for time dependency t t*1 t*2 t t*1 results in the following properties for the transition probabili- representation including Hidden Markov models (HMMs) ties: and Recurrent Neural Networks (RNNs). HMMs were intro- duced as an extension to Markov chains to probabilistically model a sequence of observations based on an unobserved a = P.q = jðq = i/; i g 1; j f N ij t t*1 sequence of states [20]. On the other hand, RNNs generalize a g 0 the feed-forward neural networks with the ability to process ij sequential data one step at a time while selectively trans- ferring information across sequence elements [18]. Hence, a = 1 ij j=1 RNNs are successful in modeling sequences with unknown length, components that are not independent, and multi-scale The probability of being at state S at t = 1 is denoted as sequential dependencies [19, 21, 22]. Further, RNNs over- , and the initial probability distribution as: came a major HMM limitation in modeling the long-range dependencies within the sequences [18, 23]. In this manuscript, we review the fundamental methods = P[q = S ]; 1 f i f N i 1 i developed for event extraction in biomedical signals and un- = [ ;  ;§ ;  ] 1 2 N ravel the key differences between these methods based on the Khalifa et al., 2020 Page 2 of 21 An example of a 4-states Markov chain is shown in Fig. Algorithm 1: HMM as observations generator 1. This stochastic process is called the observable Markov 1 Set t = 1; model since each state corresponds to a visible (observable) 2 Choose an initial state q = S according to ; 1 i 3 while t f T do event. 4 Choose O = v according to the observation distribution in t k the current state (b .k/); 5 Move from the current state S to the new state q = S i t+1 j according to a ; ij 6 set t = t + 1; 7 end 8 Result: O = ^O ; O ;§ ; O ` 1 2 T • Decoding: Choosing the optimal hidden state sequence Q = ^q ; q ;§ ; q ` that best represents a given ob- 1 2 T servation sequence (O = ^O ; O ;§ ; O `). 1 2 T • Estimation: Adjusting the model parameters .A; B; / Figure 1: An example of a Markov chain with 4 states, S to S , 1 4 to maximize the likelihood of a given sequence of ob- and selected state transitions. A set of probabilities is associated servations O. with each state to indicate how the system undergoes state change from one state to itself or another at regular discrete 2.3. Likelihood Problem Solution times. In the case of Markov chains, where the states are not hidden, the computation of the likelihood is much easier as it narrows the computational burden to just multiplying the tran- 2.2. Hidden Markov Models sition probabilities within the underlying state sequence. In So far, we introduced Markov chains in which each state HMMs, states are hidden which necessitates including all pos- corresponds to an observable event, however this is insuf- sible state sequences in computing the joint probability (N ficient for most of the applications where the states cannot possible hidden state sequences). A dynamic programming always be observable. Therefore, Markov chain models are solution called the forward-backward algorithm was created extended to HMMs which can be widely used in many appli- for the likelihood problem with a simple time complexity [20]. cations [20]. HMM is considered a doubly stochastic process The forward-backward algorithm sums the probabilities of all with one of them hidden or not observable; states, in this case, possible state sequences that could be included in generating are hidden from the observer [20]. An HMM is characterized the target observation sequence. The algorithm considers through the following properties [20, 24]: an efficient way to calculate the probability through defin- ing and inductively computing the forward variable .t; i/ 1. The number of states, N, included in the model. As which represents the probability of the partial observation mentioned before, the states are usually hidden in HMMs sequence P.O O § O ; q = S ð/ [20, 27, 28]. The for- but sometimes they have a physical significance. 1 2 t t i ward algorithm for the likelihood problem is fully described q Ë ^S ; S ;§ ; S ` as follows: t 1 2 N 2. The number of distinct observations, a state can take, Algorithm 2: The forward algorithm M. 1 O = ^O ; O ;§ ; O `; 1 2 T 3. The state transition matrix or distribution A = ^a `. ij 2 S Ë ^S ; S ;§ ; S `; 1 2 N 4. The observation probability distribution for each state 3 Create the forward probability table [T; N]; 4 foreach state S Ë ^S ; S ;§ ; S ` do B = ^b .k/` = P[v at tðq = S ]; where v repre- 1 2 N j k t j k 5 [1; S] }   b .O /; // Initialization S S 1 sents an element of the distinct observations that a state 6 end can take and 1 f j f N; 1 f k f M. 7 foreach time step t Ë 2;3;§ ; T do 8 foreach state S Ë ^S ; S ;§ ; S ` do 5. The initial state distribution  = ^ `. 1 2 N 9 [t; S] } [t * 1; S]  a  b .O /; ‚ S t S;S When known, the previously mentioned parameters can S=S be used to fully describe the HMM (.A; B; /) and generate // Induction an observation sequence O = ^O ; O ;§ ; O ` as in the 1 2 T 10 end algorithm shown in Algorithm 1. 11 end For the model to be useful for trending applications, it ³ 12 P.Oð.A; B; // } [T; S]; // Termination must address three fundamental problems [26]: S=S 13 Result: P.Oð.A; B; // • Likelihood: Computing the probability of an observa- tion sequence O = ^O ; O ;§ ; O `, given the model 1 2 T (P.Oð/). As a part of the forward-backward algorithm, another Khalifa et al., 2020 Page 3 of 21 variable is considered that will be of help in the solution Algorithm 4: The Viterbi algorithm of the estimation problem. The variable is called the back- 1 O = ^O ; O ;§ ; O `; 1 2 T ward probability table, .t; i/ = P.O ; O ; § ; O ðq = t+1 t+2 T t 2 S Ë ^S ; S ;§ ; S `; 1 2 N 3 Create the best path probability table [T; N]; S ; .A; B; //, which represents the probability of the par- 4 Create the state index table (the index of state that by adding to tial observation sequence that starts one time step after the the path, maximizes ) [T; N]; current observation, given the current state S and the model. 5 foreach state S Ë ^S ; S ;§ ; S ` do 1 2 N The backward probability can be calculated in a similar way 6 [1; S] }   b .O /; // Initialization S S 1 7 [1; S] } 0; as the forward probability (Algorithm 3). 8 end 9 foreach time step t Ë 2;3;§ ; T do Algorithm 3: Computing the backward probability 10 foreach state S Ë ^S ; S ;§ ; S ` do 1 2 N 1 Create the backward probability table [T; N]; ‚ 11 [t; S] } max [t * 1; S]  a  b O ; // Induction S t S;S S=S 2 foreach state S Ë ^S ; S ;§ ; S ` do 1 1 2 N 3 [T; S] } 1; // Initialization 12 [t; S] } arg max [t * 1; S]  a  b O ; ‚ S t S;S 4 end S=S 5 foreach time step t Ë T * 1; T * 2;§ ;1 do 13 end 6 foreach state S Ë ^S ; S ;§ ; S ` do 1 2 N 14 end 7 [t; S] } [t + 1; S]  a  b .O /; ‚ ‚ 15 P } max [T; S]; // Termination S;S S t+1 ‚ S=S S=S // Induction < 16 q } arg max [T; S]; S=S 8 end 17 for t Ë ^T; T * 1; T * 2; § ; 2` do 9 end 18 q } [t; q ]; // Backtracking 10 Result: [T; N] t*1 19 end < < < 20 Result: The optimal state sequence: q ; q ; § ; q 1 2 T 2.4. Decoding Problem Solution: The Viterbi Algorithm Finding the optimal hidden states sequence that best rep- resents a sequence of observations is more challenging com- „ „ Q.; / = P.O ; q ð/log P.O ; q ð/; 1:T 1:T 1:T 1:T pared to the likelihood problem. Unlike the likelihood prob- Åq lem, the decoding problem does not have an exact solution T*1 unless the model is degenerate, which makes it hard to choose where P.O ; q ð/ =  a b .O /, and  is 1:T 1:T q ;q q t+1 the optimality criterion that judges the state sequence [20]. t t+1 t+1 t=1 For example, one may choose states based on the individ- the initial model. The iterations are performed based on the ual likelihood of occurrence which achieves the maximum calculations by the forward-backward probabilities described number of correct states individually but not for the overall previously in the solution of the first two problems, and they computed sequence [20]. Another way to solve the decod- go as described in Algorithm 5. ing problem can be achieved through running the forward- backward algorithm for all possible hidden state sequences 2.6. Continuous Density HMM and choose the sequence with the maximum likelihood prob- The previously described adaptations for HMM problems ability, however this is computationally unfeasible [26]. are based on the requirement that the observations are dis- In the same way as the forward-backward algorithm, the crete which is considered restrictive because in most cases Viterbi algorithm solves the decoding problem using dynamic they are continuous. Therefore, a necessary first step will programming. The algorithm recursively computes the prob- be the transformation of continuous observation sequence ability of being in a state S at time t taking in consideration into a discrete vector. This can be done through dividing the most probable state sequence (path) q ; q ; § ; q the observations’ space into sub-spaces and using codebooks 1 2 t*1 that leads to this state. The Viterbi algorithm is shown in to give discrete symbol/value for each sub-space [24]; how- Algorithm 4. ever, this introduces quantization errors into the problem. One way to overcome this, is using continuous observation 2.5. Model Estimation Problem Solution densities in HMM’s. The finite mixture representation of The third problem can be formulated as finding HMM’s the observation density function, is one of the representa- model parameters .A; B;/ to maximize the conditional prob- tions that has a formulated re-estimation procedure: b .O/ = ability of observation sequence, given that model [20]. Such a problem doesn’t have an analytical solution, however, itera- c N[O;  ; U ]; where 1 f j f N, O is the observa- jm jm jm m=1 tive methods can be used to find a local maxima for P.Oð/. th tion vector, c is the mixture coefficient for the m mixture jm Here, we focus on the Baum-Welch algorithm that is based in state j, and N is an elliptically or long-concave symmetric on the expectation-maximization method [29, 30]. The al- density with a mean vector of  and a covariance matrix jm gorithm is based on maximizing Baum’s auxiliary function of U [31–33]. A Gaussian density function is usually used jm over the updated model parameters , Khalifa et al., 2020 Page 4 of 21 estimation can be defined through including the state duration Algorithm 5: The estimation algorithm in the calculation of forward and backward variables. The 1 O = ^O ; O ;§ ; O `; 1 2 T re-estimation formulae can be found in detail in the tutorial 2 S Ë ^S ; S ;§ ; S `; 1 2 N 3 Initialize  = .A; B; /; of Rabiner [20]. 4 repeat 5 Using the forward-backward algorithm and  calculate [T; N] and [T; N]; 6 Create the probability tables [T; N; N] (the probability of being in a state S at time t and a state S at time t + 1) and i j [T; N] (the probability of being in a state S at time t); 7 foreach time step t Ë 2;3;§ ; T do 8 foreach state S Ë ^S ; S ;§ ; S ` do 1 2 N 9 foreach state S Ë ^S ; S ;§ ; S ` do 1 2 N 10 [t; S; S ] } (a) [t;S]a b .O / [t+1;S ] < < S;S S t+1 S S N N ³ ³ [t;S]a <b <.O / [t+1;S ] S;S S t+1 S=S S =S 1 1 11 end 12 [t; S] } [t; S; S]; S=S (b) 13 end 14 end Figure 2: An illustration of interstate connections in HMMs. (a) 15 „ } [1; S]; represents a normal HMM with self transitions from each state T*1 [t;S;S] back to itself. (b) represents a variable duration HMM with no t=1 16 a„ } ; S;S T*1 ³ self state transition and specified state duration densities. [t;S] t=1 [t;S] t=1 s:t: O =v t k 17 b .k/ } ; S T 3. Recurrent Neural Networks [t;S] t=1 „ „ „ „ 18  = .A; B;/; Neural networks are biologically-inspired computational 19 until Convergence; models that are composed of a set of artificial neurons (nodes) 20 Result: .A; B; / joined with directed weighted edges which recently became popular as pattern classifiers [18, 36]. The network is usually activated by feeding an input that then spreads throughout for N; however, other non-Gaussian models have been con- the network along the edges. Many types of neural networks sidered as well in many applications [34, 35]. The pdf is have evolved since its first appearance; however, they will fall guaranteed to be normalized, given that the mixture coeffi- under two main categories, the networks whose connections form cycles and the ones that are acyclic [36]. RNNs are the cients satisfy the following stochastic conditions: c = 1 jm type of neural networks that introduces the notion of time by m=1 and c g 1, where 1 f j f N; 1 f m f M. The parame- using cyclic edges between adjacent steps. RNNs have been jm ters of the observation density function (c ;  ; U ) can proposed in many forms including Elman networks, Jordan jm jm jm be estimated through the modified Baum-Welch algorithm networks, and echo state networks [37–40]. [20]. Using continuous density in HMM makes it more accu- rate; however, it requires a larger dataset and a more complex algorithm to train. 2.7. State Duration in HMM One of the convenient ways to include state duration in HMMs, especially with physical signals, is through explicitly modeling the duration density and setting the self-transition coefficients into zeros [20]. The transition from a state to Figure 3: A simple RNN with a single hidden layer. At each another only occurs after a certain number of observations, time step t, output is produced through passing activations as specified by duration density, is made in the current state as in a feedforward network. Activations are passed to next node shown in Fig. 2. In normal HMMs, the states have expo- at time t + 1 as well to achieve recurrence. nential duration densities that depend on the self transition coeeficients a and a as in Fig. 2(a). In HMMs where state ii jj duration is modeled by explicit duration densities, there is no As shown in Fig. 3, the hidden units at time t receive input self transition and the transition happens only after a specific from the current input x and the previous hidden unit value number of observations determined by the duration density h . The output y is calculated using the current hidden t*1 t as in Fig. 2(b). The re-estimation formulae needed for model unit value h . Time dependency is created between time steps Khalifa et al., 2020 Page 5 of 21 by means of recurrent connections between hidden units. In dependencies by Elman [37]. a forward pass, all the computations are specified using the 3.2. Training of RNNs following two equations: h =  W x + W h + b , t h x t h t*1 h The expression of a generic RNN can be represented as y =  W h + b ; where W and W represent the ma- t y y t y x y h = F.h ; x ; / = W  .h / + W x + b , where trices of weights between the hidden units and both input t t*1 t h h t*1 x t h refers to the network parameters W : recurrent weight matrix, and output respectively and W is the matrix of weights be- h W : input weight matrix, and b : the bias. Initial state h , tween adjacent time steps. b and b are bias vectors which x h 0 h y is usually set to zero, provided by user, or learned. Network allow offset learning at each node. Nonlinearity is introduced performance on a certain task is measured through a cost through the activation functions  and  which can be hy- h y function " = " , where " = L.h /, T is the sequence perbolic tangent function (tanh), sigmoid, or rectified linear t t t 1ftfT length (total number of time steps), and L is the cost operator unit (ReLU). In a simple RNN unit, tanh is usually used. that measures the performance of the network (e.g. squared error and entropy). Necessary gradients for optimization can be computed using backpropagation through time (BPTT), where the network is unrolled in time so that the application of backpropagation is feasible as shown in Fig. 5. (a) (b) Figure 5: Unfolded recurrent neural network in time [42]. " Figure 4: Early designs of RNNs. The dotted arrows represent denotes the error calculated from the output, h represents the the edges feeding at the next time step. (a) Jordan network. hidden state, and x represents the input at time t. Output units are connected to context units that provide feed- back at next time step to hidden units and themselves. (b) )" A gradient component is calculated through the sum- Elman network. Hidden units are connected to the context mation of temporal components as follows: units that provide feedback to the hidden units only at the next time step. )" )" ) ) 1ftfT 3.1. Early RNN Architectures 0 1 )" )" )h )h t t t k Jordan [41] introduced an early form of recurrence in =   ) )h )h ) t k networks by adding extra "special" units called context or 1fkft state units that feed values to the hidden units in the following time step. The network was as simple as a multi-layer feed- Ç Ç )h )h t i T ¨ = = W diag  h (1) forward network with the context units taking input from the i*1 h h )h )h k i*1 tgi>k tgi>k network output at the current time step and feed them back to themselves and the hidden units at the next time step as The effect that the network parameters () at step k have shown in Fig. 4 (a). The context units allow the network over the cost at subsequent steps (t > k), can be measured to remember its outputs at previous time steps and being )" )h )h t t k through the temporal gradient component   . self connected enables sending information across time steps )h )h ) t k In Eq. 1, the matrix factors are in the form of a product of without intermediate output perturbation [18]. Elman [37] t*k Jacobian matrices which will either explode or shrink to also introduced a simple architecture in which the context zero depending on whether the recurrent weights are greater units are associated with each each hidden layer unit at the or smaller than one [42]. The vanishing gradient is common current time step and give feedback to the same hidden unit when using sigmoid activations, while the exploding gradient at the next time step as shown in Fig. 4 (b). This notation of self-connected hidden units became the basis for the work This formulation doesn’t contradict with the previously mentioned and design of long-short term memory (LSTM) units [19]. formulation h =  W x + W h + b and both have the same be- t h x t h t*1 h havior [42]. This type of recurrence has been demonstrated to learn time Khalifa et al., 2020 Page 6 of 21 is more common when using rectified linear unit activations [18, 42]. Enforcing the weights through regularization to values that help avoid gradient vanishing and exploding, is one of the solutions to such a problem. Truncated backpropa- gation through time (TBPTT) is also used as another solution for exploding gradient through setting a maximum number of time steps through which the error is propagated [18]. 3.3. Current RNN Designs Although early designs of RNNs helped to map input into output sequences through using contextual information, this contextual mapping had limited range and the influence of input on hidden layers and thus output, either vanishes (a) or blows up due to cycling through the network recurrent connections as described previously [43]. Gradient vanish- ing/exploding problem has led to the emergence of new net- work designs that improve convergence [44, 45]. Of these designs, LSTM, gated recurrent units (GRU), and bidirec- tional RNNs (BRNN) have proved superiority in long-range contextual mappings and employing both future and past contexts to determine the output of the network [18]. Both LSTM and GRU resemble a standard RNN but with each hidden node replaced by a complete cell as shown in Fig. 6. They also employ a unity-weighted recurrent edge to ensure the transfer of gradient across time steps without decaying or exploding. LSTM forms the long-term memory through the (b) weights which change slowly during training. On the other Figure 6: Current designs of RNNs. The symbols used in hand, short term memory is formed by transient activations both diagrams are as follows, : represents concatenation, + that pass between successive node [18]. GRU is an LSTM represents element-wise summation,  represents element-wise alternative that has a simpler structure and is faster to train; multiplication,  represents a sigmoid activation, and tanh rep- however, it still provides comparable performance to LSTM resents a hyperbolic tangent (tanh) activation. (a) Schematic [46]. of an LSTM unit which is typically composed of three main In an LSTM unit, a forget gate is an adaptive gate whose parts, input, output, and forget gates. (b) Schematic of a GRU output is squashed through a sigmoid activation in order unit which is a simplified version of LSTM with only reset and to reset the memory blocks once they are out of date and update gates. prevent information storage for arbitrary time lags [47]. The input gate is a sigmoid activated gate whose function is to regulate the new information to be written to the cell state. 4. Critical Differences between HMMs and The output gate is also a sigmoid activated gate that regulates RNNs the internal state after being dynamically customized through a tanh activation to be forwarded as the unit output. In the As demonstrated in the previous sections, construction same way, the GRU unit has a similar design; however, it of hidden Markov models relies on a representing state space doesn’t have an output gate. It has a reset gate that works as a from which states are drawn. Scaling such system has long forget gate and an update gate to regulate the write operation been considered to be difficult or infeasible even with the pres- into the unit output from both the state of the past time step ence of dynamic programming solutions such as the Viterbi and the input from the current time step. algorithm due to the quadratic complexity nature of the infer- On the other hand, BRNNs resemble a standard RNN ence problem and transition probability matrix which causes architecture as well but with two hidden layers instead of the model parameter estimation and inference to scale in time one and each hidden layer is connected to both input and as the size of the state space grows [48]. Modeling long output . One hidden layer passes activations in the forward range dependencies also is impractical in HMMs as transi- directions (from the past time steps) and the other layer passes tions occur from a state to the following with no memory the activations in the backward direction (from future time of the previous state unless a new space is created with all steps). BRNN is in fact a wiring method for RNN hidden possible cross-transitions at each time window which leads layers regardless of the type of the nodes, which makes it to exponential growth of the state space size [18, 23]. On compatible with most RNN architectures including LSTM the other hand, the number of states that can be represented and GRU [18, 44]. by a hidden layer in RNNs increases exponentially with the number of nodes in the layer leading to nodes that can carry Khalifa et al., 2020 Page 7 of 21 information from contexts of arbitrary lengths. Moreover, clude the different sleep stages and sleep disorders, epileptic despite of the exponential growth of the expressive power of seizures, the effect of music or other artifacts, and the motor the network, training and inference complexities only grow imagery tasks. quadratically at most [18]. From a theoretical point of view, 6.1. Sleep Staging in EEG RNNs can be efficient in the perception of long contexts; Sleep is an essential part of the human life cycle and plays however, this comes at the cost of error propagation. Highly a vital role in maintaining most of the body functionality [99]. sampled inputs as in the case of raw waveforms, can lead Sleep disorders include problems with initiating sleep, insom- to elongation of the range through which the error signal nia, and sleep apnea syndrome (SAS) [100]. Diagnosis of propagates, thus making the network hard to optimize and sleep disorders can be done through identifying sleep stages reducing the efficiency of computational acceleration tools in an overnight polysomnogram (PSG) which utilizes EEG such as GPUs [49, 50]. as one of its sensing modalities [101]. Visual scoring of the PSG components is the basic way to categorize sleep epochs 5. Event Detection in Electrocardiography and as any manual rating, it suffers from subjectivity and ECG is the graphical interpretation of skin-recorded elec- inter-rater tolerance. Many attempts have been proposed in trical activity of the electric field originating in the heart the literature to remedy the problems of expert-based visual [74]. ECG provides information that is not readily available scoring of the different components of PSG. The attempts through other methods about heart activity and is consid- employed multiple algorithms to achieve automatic sleep ered the most commonly used procedure in the diagnosis of staging including Markov models and neural networks. Here, cardiac diseases due to the fact that it is non-invasive, sim- we list the recent publications (Table 2) for sleep staging and ple, and cost-effective. This makes ECG subject to intense the detailed description of the methods used within the scope research related to the automatic analysis to reduce the subjec- of our review. tivity and the time spent on interpreting hours of recordings 6.2. Epilepsy detection in EEG [54, 75, 76]. ECG is a time periodic signal, which allows Epilepsy is one of the episodic disorders of the brain to mark out an elementary beat that constitutes the basis for that is characterized by recurrent seizures, unjustified by ECG signal analysis [54]. For instance, heart rate can be any known immediate cause [102, 103]. Epileptic seizure estimated through the detection of QRS-complex from an is the clinical manifestation that results from the abnormal ECG signal and the time interval between successive QRS- excessive discharge of some set of neurons in the brain [102]. complexes (also known as R-R interval) can be used to detect The seizure consists of transient abnormal alterations of sen- premature ectopic beats [74]. In that sense, ECG beat de- tection is considered fundamental for most of the automated sory, motor, consciousness, or psychic behavior [102, 103]. analysis algorithms. A detailed description of the recent pub- Around 80% of the epileptic seizures can be effectively treated lications that cover event detection in ECG using different if early discovered [104]. Although seizure activity can be methods, is included in Table 1. easily distinguished in EEG as transient spikes and relatively quiescent periods, it is a time-consuming process and needs clinicians to devote a tremendous amount of time going 6. Event Detection in Electroencephalography through hours and days of EEG activity [105]. An efficient EEG is mostly a non-invasive technique to measure the and reliable seizure prediction/detection method can be of a electrical activity of the brain through a set of electrodes great help for the diagnosis, treatment, and even early warn- placed on the subject’s scalp. EEG exhibits highly non- ing for patients to stop activities that might be of a significant stationary behavior and significant non-linear dynamics [77]. danger during an episode like driving. Several methods have The excitatory and inhibitory postsynaptic potentials of the been proposed for seizure prediction, at which EEG signal cortical nerve cells are considered the main source of EEG features are temporally analyzed and compared to heuristic signals [78]. EEG can be invasive if acquired using subdu- thresholds to trigger a warning for seizures; however, these ral electrode grids or using depth electrodes and is called methods lack generalization when investigated on extensive intracranial EEG (iEEG); however, typical EEG signals are datasets [106–111]. This can be referred to using feature sets recorded from scalp locations specified by the 10-20 electrode that are not highly affected by the transition from seizure-free placement criterion designed by the International Federation to peri-ictal or seizure states or simply the effect cannot be of Societies for Electroencephalography and have an am- tracked using low-order statistics [106]. Therefore, stochastic- plitude of 10-100 V and a frequency range of 1-100 Hz based models, multivariate analysis, and long-range analysis [77, 78]. EEG signals are used in the diagnosis of multiple methods were investigated to provide better performance and neurological disorders including epilepsy, lesions, tumors, generalization for EEG-based epileptic seizure prediction. In and depression and their characteristics depend strongly on Table 3, we review the recent publications that use HMMs the age and state of the subject. There are multiple events and RNNs for seizure prediction. that influence EEG and require the tedious job of analyzing hours of recordings to be extracted. These events range from 6.3. BCI Tasks in EEG the diagnosis/detection of certain seizures and syndromes to Motor imagery alters the the neural activity of the brain’s the tasks of brain computer interface (BCI). These events in- sensorimotor cortex in a way that is as observable as if the Khalifa et al., 2020 Page 8 of 21 Table 1 Summary of event detection work done in ECG event detection Publication Event under investigation Implementation details Dataset Gersch et al. [51], 1975 Premature Ventricular Contraction A three states Markov chain was used to model R-R interval (quan- Clinical test data from patients with (PVC) through R-R intervals tized as short, regular, or long) sequences and then the model is atrial fibrillation (AF) used to characterize rhythms through the probability that the ob- served R-R symbol sequence is generated by any of a set of models generated from multiple cardiac arrhythmias. Theb manuscript used a maximum likelihood approach to determine the arrhythmia type. Coast et al. [52], 1990 Beat detection for arrhythmia anal- A parallel combination of HMMs (one for each arrhythmia type), The American Heart Association ysis is used to classify arrhythmia. The classification process is inferred (AHA) ventricular arrhythmia through determining the most likely path through the parallel models. database [53] All ECG waveform parts were included in the states of each model. The results reported in this study relied on single ECG channel and didn’t include multi-channel ECG fusion. Andreao et al. [54], 2006 ECG beat detection and segmenta- An HMM was constructed for ECG beat with each waveform part QT database [55] tion represented in the model including the isoelectric parts (ISO, P, PQ, QRS, ST, T). Model parameters were estimated using Baum-Welch method and the number of states in each model were specified empir- ically to achieve a good complexity-performance compromise. The proposed segmentation in this study was based on a single channel but the authors provided insights about the possibility of adaptation with multi-channel fusion. Sandberg et al. [56], 2008 Atrial fibrillation frequency tracking An HMM is used for frequency tracking to overcome the corruption Simulated atrial fibrillation signals with of residual ECG by muscular activity or insufficient beat cancellation. four different frequency trends: con- States of the HMM were used to represent the underlying frequencies stant frequency, varying frequency, in short-time Fourier transform while observations corresponded to gradually decreasing frequency, and the estimated frequency of specific time intervals from the signal. stepwise decreasing frequency. Experiments were performed on single channel simulated signals with inclusion of mutli-channel fusion. Oliveira et al. [57], 2017 Automatic segmentation (beat) of An ECG channel along with a phonocardiogram were fused in a sin- A self-recorded dataset from healthy ECG and Phonocardiogram (PCG) gle coupled HMM for beat detection. The coupled HMM was con- male adults. structed to consider the high dynamics and non-stationarity of the signals where the channels were assumed to be co-dependent through past states and observations. Each of ECG and phonocardiogram was modeled using 4 states. This study introduced a decision-level fusion through combining two channels in a single HMM. The study also experimented two different coupled HMMs, a fully connected where transition can happen between any two states from both chan- nels and a partially connected model where certain limitations were added over transitions through considering the prior knowledge of the relationship between heart sounds and ECG components. Übeyli [58], 2009 Arrhythmia detection/classification An Elman-based RNN is used for beat classification with the Four types of ECG beats obtained Levenberg-Marquardat algorithm for training (a least-squares esti- from Physiobank Database [59]. mation algorithm based on the maximum neighborhood idea). This model used power spectral density (calculated with three different methods; Pisarenko, MUSIC, and Minimum-Norm) of ECG signals as input. All the models trained in this study, used feature-level fusion. Zhang et al. [60], 2017 Supraventriular and verntricular ec- An LSTM-based RNN preceded by a density-based clustering for MIT-BIH Arrhythmia database topic beat detection (SVEB and training data selection from a large data pool. In this implementation, (MITDB) [61]. VEB) the authors fed the RNN with the current ECG beat and the T wave part from the former beat to automatically learn the underlying features. The RNN layers were followed by two fully connected layers in order to combine the temporal features and generate the desired output. This study only used a single channel ECG (limb lead II) with no multi-channel fusion. Xiong et al. [62], 2017 Atrial fibrillation automatic detec- A 3 layer RNN was implemented to extract the temporal features The 2017 PhysioNet/CinC Challenge tion from the raw ECG signals. No multi-channel fusion was performed dataset [59]. in this study and only a single ECG channel was employed. Schwab et al. [49], 2017 Different cardiac arrhythmia classi- In this work a combination of GRU and bidirectional LSTM The 2017 PhysioNet/CinC Challenge fication/detection (BLSTM) based RNNs and nonparameteric Hidden Semi-Markov dataset [59]. Models (HSMM), was used for building the beat classification model and then a blender [63] was used to combine the predictions from the models. No multi-channel fusion was performed in this study and only a single ECG lead was employed. Zihlmann et al. [64], 2017 Atrial fibrillation detection A single layer LSTM-based convolutional RNN (CRNN) was con- The 2017 PhysioNet/CinC Challenge structed for atrial fibrillation detection in arbitrary length ECG record- dataset [59]. ings. This work employed the log spectrogram as an input to the CRNN to increase the accuracy. No multi-channel fusion was per- formed in this study and only a single ECG lead was used. Limam and Precioso [65], Atrial fibrillation detection A two layer LSTM-based CRNN was used for atrial fibrillation detec- The 2017 PhysioNet/CinC Challenge 2017 tion from single-lead ECG and heart rate. Feature-level fusion was dataset [59]. performed after the convolutional neural network (CNN) layers to combine features from both inputs. The output from the RNN was used to either feed a dense layer to perform classification directly or train an SVM for classification and the results from both models were compared. Chang et al. [66], 2018 Atrial fibrillation detection A single layer LSTM-based RNN was constructed for atrial fibrillation Multiple datasets for atrial fibrillation detection in multi-lead ECG. This model also used spectrograms of and normal sinus rhythms [59, 61, 67– the input ECG signals to feed the network. Feature-level fusion 70]. was performed to combine spectrograms of multi-lead ECG before feeding into the LSTM units. Lui and Chow [71], 2018 Myocardial infarction classification A deep single-layer LSTM based CRNN was used for classifying ECG The Physikalisch-Technische Bun- beats from single-lead ECG. Multiple models were performed includ- desanstalt (PTB) diagnostic ECG ing a direct 4-class beat classifier from the LSTM CRNN via dense database [70] and the AF classifica- layers and 4-class beat classifier via the fusion of multiple one-versus- tion from a short single lead ECG one binary classification networks using stacking. recording: Physionet/computing in cardiology challenge 2017 database (AF-Challenge) [72]. Singh et al. [73], 2018 Arrhythmia detection 3 models were built for arrhythmia detection, each of them is based MIT-BIH Arrhythmia database on a different type of RNN. Regular RNNs, GRU, and LSTM were (MITDB) [61]. used for each of the three models. Each model included 3 layers of different unit sizes with a dense layer to generate a classification output (normal/abnormal). No multi-channel fusion was performed in this study and only a single ECG lead (ML2) was employed. movement was really executed [132]. Identification of the relatively low cost of the systems used and the high temporal transient patterns in EEG signals during the different motor resolution [135]. This type of BCI is called asynchronous imagery tasks like imagining the movement of one of the BCI because the subject is free to invoke specific thought limbs, is recognized among the most promising and widely [132]. On the other hand, synchronous BCI includes the used techniques of BCI [133–136]. This is referred to the generation of specific mental states in response to external Khalifa et al., 2020 Page 9 of 21 Table 2 Summary of EEG-based sleep staging. Publication Event under investigation Implementation details Dataset Flexerand et al. [79], Sleep staging in combined A three state (wakefulness, deep sleep, and rapid eye movement sleep) Gaussian obser- Nine whole-night sleep 2002 EEG and EMG vation HMM (GOHMM) was used and sleep stages were represented as mixtures of the recordings from a group of basic three states. The probability of being in any of the three states was computed for nine healthy adults. 1 sec windows so that a continuous probability monitoring can be achieved. Expectation- maximization algorithm was used for parameter estimation and the Viterbi algorithm was used to calculate the posteriori estimate for being in each state. Feature-level fusion was performed on features from EEG channels (C3 and C4) and EMG. Flexer et al. [80], 2005 Sleep staging in single channel A three state (wakefulness, deep sleep, and rapid eye movement sleep) Gaussian ob- Two datasets were used, EEG (C3) servation HMM (GOHMM) was used and sleep stages were represented as mixtures of the first consists of 40 the basic three states. The probability of being in any of the three states was com- whole night sleep record- puted for 1 sec windows so that a continuous probability monitoring can be achieved. ings from healthy adults Expectation-maximization algorithm was used for parameter estimation and the Viterbi and the second consists of algorithm was used to calculate the posteriori estimate for being in each state. No 28 whole night sleep record- multi-channel fusion was performed in this study and only a single EEG channel was ings of healthy adults. used. Doroshenkov et al. [81], Sleep staging using two chan- A six state HMM was constructed for the purpose of sleep staging. Baum-welch algo- Sleep-EDF database [82]. 2007 nel EEG (Fpz-Cz and Pz-Oz) rithm was used for model’s parameter estimation and the Viterby algorithm for state sequence decoding. Feature-level fusion was performed for features calculated from the two EEG channels. Bianchi et al. [83], 2012 Sleep cycle (quantifying prob- An eight state HMM was constructed for sleep-wake activity. The connectivity be- Sleep Heart Health Study abilistic transitions between tween states was inferred through exponential fitting of subsets of the pooled bouts database [84]. stages and multi-exponential and adjacent-stage analysis. dynamics) and fragmentation in case of apnea in PSG Pan et al. [85], 2012 Sleep staging using central A six state transition-constrained discrete HMM was constructed for sleep staging. Thir- PSG including six chan- EEG (C3-A2), chin elec- teen features were utilized including temporal and spectrum analyses of the EEG, EOG nel EEG, EOG, EMG, and tromyography (EMG), and and EMG signals with feature-level fusion employed. ECG signals, was obtained electrooculogram (EOG) from 20 healthy subjects. Yaghouby and Sunderam Sleep staging and scoring A five state Gaussian HMM was constructed for sleep staging with Baum-Welch al- Sleep-EDF database [82]. [86], 2015 (quasi-supervised) in PSG gorithm for parameter estimation. In this implementation, feature-level fusion was achieved through feeding augmented vector of PSG features and human rated scores into the estimation algorithm in order to obtain the parameters to maximize the likeli- hood that a model with larger number of states explains the data. Onton et al. [87], 2016 Sleep staging in 2-channel A five state Gaussian HMM was constructed for sleep staging with expectation- A self recorded data from home EEG (FP1-A2 and FP2- maximization algorithm for parameter estimation and the Viterbi algorithm to find the 51 participants who were A2) and electrodermal activity maximum a posteriori estimate of state sequence. In this implementation, the relative medication-free and self- (EDA) power across the entire night was averaged in five frequency bands and fed into the reported asymptomatic model (feature-level fusion). sleepers and wit no history of neurologic or psychiatric disorders. Davidson et al. [88], Behavioral microsleep detec- This study utilized an LSTM-based RNN to detect the lapses in visuomotor performance A self-recorded dataset 2005 tion in EEG (P3-01 and P4- associated with behavioral microsleep events. The network used the power spectral from 15 subjects perform- 02) density of 1 sec windows of the used two channels (calculated using the covariance ing visuomotor tracking method) with feature-level fusion in place to combine data. The network included 6 task. LSTM blocks of 3 memory cells each. Hsu et al. [89], 2013 Automatic deep sleep staging This study utilized an Elman recurrent neural network that works on the energy features Sleep-EDF database [82]. in single channel EEG (Fpz- extracted from a single channel EEG to perform 5-level sleep staging. No multi-channel Cz) fusion was employed in this study. Supratak et al. [90], Automatic sleep staging in sin- A convolutional RNN (CRNN) was constructed to work directly of the raw signal data. Montreal Archive of Sleep 2017 gle channel EEG (Fpz-Cz or Two branches of CNN, each of 4 layers, were used for representation learning and their Studies (MASS) [91] and Pz-Oz) outputs were combined and fed into a two layer LSTM-based BRNN with skip branch Sleep-EDF database [82]. to generate the sleep stage. No multi-channel fusion was employed in this study. Biswal et al. [92], 2017 Automatic sleep staging Raw EEG signals were split into 30-seconds windows, then the spectrogram and ex- 10,000 PSG studies with pert defined features were extracted and fused at the feature-level. The best accuracy multi-channel EEG data reported among different RNN architectures, was reported for a 5-layer LSTM-based (F3, F4, C3, C4, O1 and RNN. This study presented also an LSTM-based CRNN architecture to extract spa- O2 referenced to the con- tial features automatically and then pass them to the RNN part for temporal context tralateral mastoid, M1 or extraction. M2). Phan et al. [93], 2018 Automatic deep sleep staging A two-layer GRU-based BRNN was constructed to learn temporal features from the Sleep-EDF database [82]. in single channel EEG (Fpz- single channel EEG. This implementation included an attention mechanism that was Cz) applied on the BRNN output features. The weighted output was then used to feed a linear SVM classifier. No multi-channel fusion has been employed in this study. Bresch et al. [94], 2018 Sleep staging in single-channel An LSTM-based CRNN with 3 CNN layers and 3 LSTM layers, was built to process The SIESTA database [95] EEG 30-seconds windows of raw EEG data (FPz, left EOG, and right EOG referenced to and a self-recorded dataset M2). No multi-channel fusion has been employed in this study. with 147 recordings from 29 healthy subjects. Phan et al. [96], 2019 Automatic sleep staging This study featured multi-modality fusion on the feature level between EEG, EOG, and Montreal Archive of Sleep EMG. All were split into windows and converted into time-frequency representation Studies (MASS) Dataset using filter banks. The fused data were fed into a BRNN that is used to encode the [91]. features, then the output is passed through an attention layer followed by another BRNN that performs the cclassification of the sleep stage. Michielli et al. [97], 2019 Automatic sleep staging in sin- A dual branch LSTM-based RNN was constructed for the classification of 5 different Sleep-EDF database [59]. gle channel EEG sleep stages. the network starts with a preprocessing and feature extraction stages and then the data is distributed over two branches. The first branch uses mRMR for feature selection followed by a one layer LSTM and fully connected layer to classify between 4 classes only (W, N1-REM, N2 and N3). The second branch uses PCA for feature selection followed by a 2 layer LSTM and a fully connected layer for binary classification. The LSTM in the second branch takes the classification output from the first branch to consider only the combined stage N1-REM for separation. No multi-channel fusion has been employed in this study. Sun et al. [98], 2019 Sleep staging in single channel A two stage network was built to perform the classification. The first stage is time Sleep-EDF database [59]. EEG distributed stage that included two parallel branches, the first included a window deep belief network for feature extraction followed by a dense layer and a second branch with hand-crafted features extraction then a dense layer. The two branches were then fused through another dense layer and fed as an input to an LSTM-based BRNN (the second stage) to generate the classes. stimuli [132]. EEG analysis for BCI applications includes as desynchronization [132]. To overcome such a limitation, the processing of EEG oscillatory activity and the different probabilistic models like HMMs and models capable of rep- shifts in its sub-bands in addition to the event-related poten- resenting long range dependencies have been proposed into tials like VEP and P300 [132, 137]. Many modeling schemes the implementation of BCI systems. As follows in Table 4, have been introduced to solve the of multi-class BCI prob- we list the recent work the relies on HMMs and RNNs in BCI lem; however, most of them process EEG signals in short systems and uses EEG as the source signal. windows where stationarity is assumed, which limits the mod- eling process and excludes the dynamic EEG patterns such Khalifa et al., 2020 Page 10 of 21 Table 3 Summary of EEG-based seizure prediction. Publication Event under investigation Implementation details Dataset Wong et al. [112], 2007 Evaluation framework for A three state HMM (baseline, detected, and seizure) was constructed to iEEG data collected from patients seizure prediction in iEEG evaluate the prediction algorithms of epileptic seizures. The prediction al- diagnosed with mesial temporal gorithm is used to generate a binary sequence which is combined with the lobe epilepsy using 20-36 surgically ground truth (binary detector outputs plus gold-standard human seizure mark- implanted electrodes on the brain or ings) and converted into a trinary observation sequence. The trinary vector brain substance [113]. is used to train the HMM using Baum-Welch which is then used to Viterbi decode the observation sequences into the hidden states sequence. A hy- pothesis test that a statistical association exists between the detected and seizure states, is performed through counting the transitions from detected state into seizure states in the HMM output. Santaniello et al. [106], Early detection of seizures in Multichannel iEEG were used and Welch’s cross power spectral density was Data collected from male Sprague- 2011 iEEG from a rat model calculated over windows of 3 sec for each pair of channels which were used Dawley rats with four implanted as input for the detection model. A two state HMM was constructed to skull screw EEG electrodes placed map the iEEG signals into either normal or peri-ictal states. Baum-Wlech bifrontally and posteriorly behind algorithm was used for parameter estimation and a Bayesian evolution model bregma and a fifth depth electrode was used determine the time of state transition. placed in hippocampus, were col- lected and used for this study. Direito et al. [114], 2012 Identification of the different The relative power in EEG sub-bands (delta, theta, alpha, beta, and gamma) EPILEPSIAE database [115]. states of epileptic brain was calculated and used for computing the topographic maps of each sub- band. The maps were then segmented and used overtime to train a 4 state (preictal, ictal, postictal and interictal) HMM. The Baum–Welch algorithm was used to train the model and the Viterbi algorithm to decode the state- sequence. Abdullah et al. [104], 2012 Seizure detection in iEEG A three state discrete HMM was built to classify iEEG segments into one of Freiburg Seizure Prediction EEG three states (ictal, preictal, and interictal). Seven level decomposition sta- (FSPEEG) database [108]. tionary wavelet transform (SWT) was applied on the signals (as input features for the model) and a code book was created to perform vector quantization. Baum-Welch algorithm was used for model parameter estimation and the Viterbi algorithm for recognition. This study employed a feature-level fusion model to feed the data into the prediction model. Smart and Chen [105], Seizure detection in scalp This study used a 5 sec sliding window with 1 sec increments to process CHB-MIT Scalp EEG Database 2015 EEG the EEG signals. A set of 45 measurements was calculated for each slid- [116]. ing window then principal component analysis (PCA) was used to reduce dimensionality. One of the used models was HMM, particularly a two state (seizure and non-seizure) HMM was constructed to perform the detection. Baum-Welch was used here as well to estimate the model parameters.This study used a feature-level fusion model for multi-channel EEG data to feed the data into the prediction model. Petrosian et al. [117], 2000 Onset detection of epileptic Both raw EEG data and their wavelet transform "daub4" were used in train- Scalp and iEEG data were collected seizures in both scalp and in- ing an Elman RNN. This study used a feature-level fusion model for multi- from two patients who were under- tracranial EEG channel EEG data to provide an input for the RNN. going long-term electrophysiologi- cal monitoring for epilepsy. Güler et al. [118], 2005 Identification of subject con- Lyapunov exponents of the EEG signals were used to train an Elman RNN Publicly available epilepsy dataset dition in terms of epilepsy for the identification task. This study used a feature-level fusion model for by University of Bonn [119]. (healthy, epilepsy patient dur- multi-channel EEG data to train the RNN. ing seizure-free interval, and epilepsy patient during seizure episode) using surface and in- tracranial EEG Kumar et al. [120], 2008 Automatic detection of epilep- Wavelet and spectral entropy were extracted from the EEG signals and used Publicly available epilepsy dataset tic seizure in surface and in- to train an Elman RNN. This study used a feature-level fusion model for by University of Bonn [119]. tracranial EEG multi-channel EEG data to train the RNN. Minasyan et al. [121], 2010 Automatic detection of epilep- A set of time domain, spectral domain, wavelet domain, and information EEG dataset from 25 patients hos- tic seizures prior to or imme- theoretic features were used to train an ELman RNN per each channel of pitalized for long-term EEG mon- diately after clinical onset in the EEG and the output is combined in time and space through a decision itoring in five centers including scalp EEG making module that performs a decision-level fusion in order to declare a Thomas Jefferson University, Dart- seizure event if N out of M channels declared it. mouth University, University of Vir- ginia, UCLA and University of Michigan medical centers. Naderi and Mahdavi-Nasab Automatic detection of epilep- Power spectral density was calculated for EEG signals using Welch method Publicly available epilepsy dataset [122], 2010 tic seizure in surface and in- then a dimensionality reduction algorithm was applied and the output was by University of Bonn [119]. tracranial EEG used to train an ELman RNN. This study used a feature-level fusion model for multi-channel EEG data to train the RNN. Vidyaratne et al. [123], Automated patient specific The preprocessed (denoised) EEG signals were segmented into 1 sec non CHB-MIT Scalp EEG Database 2016 seizure detection using scalp overlapping epochs and used to train a BRNN. Data from all channels were [116]. EEG used simultaneously (feature-level fusion model). Talathi [124], 2017 Epileptic seizures detection Single-channel EEG data (no multi-channel fusion) were used to train a Publicly available epilepsy dataset GRU-based RNN that classifies each EEG segment into one of three states: by University of Bonn [119]. healthy, inter-ictal, or ictal. Two layers of GRU were used, the first was followed by a fully connected layer and the second was followed by a logistic regression classification layer. Golmohammadi et al. [125], Epileptic seizure detection Linear frequency cepstral coefficient feature extraction was performed for the A subset of the TUH EEG Corpus 2017 EEG data and used to feed a CRNN that is based on a bidirectional LSTM. (TUEEG) [126] that has been man- Features from multi-channel EEG were fused prior to feeding into the CRNN. ually annotated for seizure events The network used in this study employed both 2D and 1D CNN at different [127]. stages. Another network where LSTM was replaced with GRU was devloped as well for comparison. Raghu et al. [128], 2017 Epileptic seizures classifica- This study developed two techniques that are based on Elman RNN that Publicly available epilepsy dataset tion works on features extracted from EEG signals. The first technique used by University of Bonn [119]. wavelet decomposition with the estimation of log energy and norm entropy to feed the RNN classifier (normal vs preictal). The second way extracted the log energy entropy to feed the RNN classifier. Abdelhameed et al. [129], Epileptic seizure detection This study used raw EEG signals to feed a 1D CRNN that is based on Publicly available epilepsy dataset 2018 bidirectional LSTM to classify EEG segments into one of two states (normal- by University of Bonn [119]. ictal and normal-ictal-interictal). Daoud and Bayoumi [130], Epileptic seizure prediction This study used raw EEG signals to feed a 2D CRNN that is based on a A dataset recorded at Children’s 2018 bidirectional LSTM to classify EEG segments into one of two classes (preictal Hospital Boston which is publicly and interictal). available [59, 116]. Hussein et al. [131], 2019 Epileptic seizures detection This study developed an LSTM-RNN that takes raw EEG signals as input Publicly available epilepsy dataset in order to create predictions. The network was composed of a one layer by University of Bonn [119]. LSTM followed by a fully connected layer and an average pooling layer to combine the temporal features and then an output softmax layer. recorded either using surface electrodes or via needle elec- 7. Event detection in EMG trodes; however, surface EMG (sEMG) is rarely used clin- Electromyography (EMG) is the method of sensing the ically in the evaluation of neuromuscular function and its electric potential evoked by the activity of muscle fibers as use is limited to the measurement of voluntary muscle activ- driven by the spikes from spinal motor neurons. EMGs are ity [161]. Routine evaluation of the neuromuscular function Khalifa et al., 2020 Page 11 of 21 Table 4 Summary of EEG-based BCI systems. Publication Event under investigation Implementation details Dataset Obermaier et al. [138], 5 tasks BCI system (imagining A 5 state HMM with 8 (max) Gaussian mixtures per state, was used Data from 3 male subjects were col- 2001 left-hand, right-hand, foot, tongue to model the spatiotemporal patterns in each signal segment. Fea- lected for motor imagery tasks with movements, or simple calculation). tures were extracted from all electrodes and fused into a combined the participants free of any medical or feature vector and it had its dimensionality reduced before use in central nervous system conditions. building the model. The expectation-maximization algorithm was used for the estimation of the transition matrix and the mixtures. Obermaier et al. [139], Two class motor imagery (left and Two 5 state HMMs (one for each class) with 8 (max) Gaussian Data from 4 male subjects were col- 2001 right hands) BCI mixtures per state, was used to model the spatiotemporal patterns lected for motor imagery tasks with in each signal segment. The Hjorth parameters of two channels the participants free of any medical or (C3 and C4) were fused and fed into the HMM models to calculate central nervous system conditions. the single best path probabilities for both models. The expectation- maximization algorithm was used for the estimation of the transition matrix and the mixtures. Pfurtscheller et al. [140], Two class motor imagery BCI for Two HMMs, one for each class, were trained and the maximal proba- Signals from two bipolar channels were 2003 virtual keyboard control bility achieved by the respective HMM-model represents the chosen acquired from three able-bodied sub- class. jects. Solhjoo et al. [141], 2005 EEG-based mental task classifica- Discrete HMM and multi-Gaussian HMM -based classifiers have been Dataset III of BCI Competition II tion (left or right hand movement) used for raw EEG signals. (2003) provided by the BCI research group at Graz University [142]. Suk and Lee [143], 2010 Multi-class motor imagery classifica- In this study, dynamic patterns in EEG signals were modeled using Dataset IIa of BCI Competition IV tion two layers HMM. First time-domain patterns were extracted from (2008) provided by the BCI research the signals and have dimension reduced using PCA. Second, the like- group at Graz University [144]. lihood for each channel is computed in the first layer of HMM and assembled in vector whose dimension is reduced with PCA as well. fi- nally, the class label is calculated through the largest likelihood in the upper layer of HMM. Baum-Welch algorithm was used to estimate the parameters of the initial state distribution, the state transition probability distribution, and the observation probability distribution and Viterbi algorithm was used for decoding the state sequence. Speier et al. [145], 2014 P300 speller An HMM was used to model typing as a sequential process where Data were collected from 15 healthy each character selection is influenced by previous selections. The graduate students and faculty with nor- Viterbi algorithm was used to decode the optimal sequence of target mal or corrected to normal vision be- characters. tween the ages of 20 and 35. Erfanian and Mahmoudi Real-time adaptive noise canceler A recurrent multi-layer perceptron with a single hidden layer was A simulated EEG dataset was used for [146], 2005 for ocular artifact suppression in trained for the noise canceling with the inputs as the contaminated this study, generated through Gaussian EEG EEG signal and the reference EOG. white noise-based autoregressive pro- cess. Forney and Anderson [147], EEG signal forecasting and mental An Elman RNN was trained for forecasting EEG a single time step 4 class dataset was collected from 3 2011 tasks classification ahead then an Elman RNN-based classifier was trained to classify subjects including combinations of the the mental task associated with the EEG signals. following mental tasks: clenching of right hand, shaking of left leg, visu- alization of a tumbling cube, counting backward from 100 by 3’s, and singing a favorite song. Balderas et al. [148], 2015 EEG classification for 2 class motor An LSTM based classifier was trained and evaluated for EEG oscilla- BCI competition IV (2007) dataset 2b imagery (left hand and right hand) tory components classification and compared with the regular neural [149] network implementations. Maddula et al. [150], 2017 P300 BCI classification A 3D CNN in conjunction with a 2D CNN were combined with an Data from P300 segment speller were LSTM-based RNN to capture spatio-temporal patterns in EEG. collected, where the subjects mentally noted whenever the flashed letter is part of their target [151]. Thomas et al. [152], 2017 Steady-state visual evoked poten- A single layer BRNN was used to perform classification and compared 5-class SSVEP dataset [153]. tial (SSVEP)-based BCI classifica- to different architecture and traditional classifying techniques. tion Spampinato et al. [154], Visual object classifier using EEG An LSTM based encoder to learn high order and temporal feature A subset of ImageNet dataset (40 2017 signals evoked by visual stimuli representations from EEG signals and then a classifier is used for classes) [155] was used to generate vi- identifying the visual object tat generated the stimuli. The authors sual stimuli for six subjects while EEG here tested different architectures for the encoder including a com- data is recorded. mon LSTM for all channels, channel LSTMs + common LSTM, and Common LSTM + fully connected layer. The authors also trained a CNN-based regressor for generating the EEG features to replace the whole EEG module and work only using source images of visual stimuli. Hosman et al. [156], 2019 Intercortical BCI for cursor control An single layer LSTM-based decoder was built with three outputs Intercortical neural signals recorded to generate the cursor speed in x and y directions in addition to the from three participants, each with distanc to target. 2 96-channel micro-electrode arrays [157]. Zhang et al. [158], 2020 EEG-Based Human Intention In this study, multi-channel raw EEG sequences into mesh-like rep- EEG Motor Movement/Imagery Recognition resentations that can capture spatiotemporal characteristics of EEG Dataset [59, 159]. and its acquisition. These meshes are then fed into deep neural networks that perform the recognition process. Multiple network ar- chitectures were investigated including a CRNN that starts with a 2D CNN that processes the meshes followed by a two-layer LSTM-based RNN to extract the temporal features, then a fully connected layer and an output layer. The second network investigated was composed of two parallel branches the first was a two layer LSTM-based RNN to extract the temporal features and the second was a multi-layer 2D/3D CNN to extract the spatial features and the output from the two branches is fused and used for recognition. This study used fusion on both data-level and feature-level. Tortora et al. [160], 2020 BCI for gait decoding from EEG EEG data were preprocessed to remove motion artifacts through EEG data were recorded from 11 sub- high pass filtration and independent component analysis. Differ- jects walking on a treadmill using a 64- ent frequency bands were then extracted and a separate classifier channel amplifier and 10/20 montage. is trained based on each frequency band. The classifiers were based on a two-layer LSTM-based RNN followed by a fully connected layer, a softmax layer, and an output layer that manifests the prediction output. is typically performed using needle (invasive) EMG that, grasping control, and gesture based interfaces [163]. A myo- despite of its effectiveness and the availability of several electric signal usually has its manifested events as two states, electrode types that suite many clinical questions, is often the first is the transient state which emanates as the muscle painful and traumatic and may lead to the destruction of sev- goes from the resting state to voluntary contraction. The eral muscle fibers [161, 162]. sEMG has been widely used as second is the steady state which represents maintaining the control signals for multiple applications especially in rehabil- contraction level in the muscle [163]. It has been shown that itation including but not limited to body-powered prostheses, the steady state segments are more robust as control signals Khalifa et al., 2020 Page 12 of 21 Table 5 Summary of event detection in EMG signals. Publication Event under investigation Implementation details Dataset Chan and Englehart [165], Continuous identification of six An HMM with uniformly distributed initial states and Gaussian ob- 4-channel sEMG collected from the 2005 classes hand movement in sEMG servation probability density function whose parameters can be com- forearm of 11 subjects for six distinct pletely estimated from the training data, was constructed for the motions (wrist flexion, wrist extension, detection process. The expectation-maximization algorithm wasn’t supination, pronation, hand open, and used here due to the assumption of uniform initial state probabilities hand close) [166]. and directly estimating the Gaussian parameters from the training data. Overlapping 256 ms observation windows were used and in each observation window the root mean square value and the first 6 autoregressive coefficients were computed as features. Zhang et al. [167], 2011 Hand gesture recognition in acceler- In this work, the authors actually identified the active segments via sEMG and 3d acceleration were col- ation and sEMG processing and thresholding of the average signal of the multichannel lected from two right-handed subjects sEMG. The onset is when the energy is higher than a certain thresh- who performed 72 Chinese sign lan- old and the offset when the energy is lower than another threshold. guage words in a sequence with 12 Features from time, frequency, and time-frequency domains were repetitions per motion, and a prede- extracted from both acceleration signals and sEMG, and fed to five- fined 40 sentences with 2 repetitions state HMMs for classification. Baum-Welch algorithm was used for per sentence. training with Gaussian multivariate distribution for observations. De- cision making here is done in a tree-structure (decision-level fusion) through four layers of classifiers with the last layer as the HMM. Wheeler et al. [168], 2006 Hand gesture recognition in sEMG Moving average was used on the sEMG signals to provide the input Data from one participant repeating 4 for continuous left-to-right HMMs with tied Gaussian mixtures. The gestures on a joystick (left, right, up, training was performed using the Baum-Welch algorithm and the and down) for 50 times per gesture, real-time recall was performed with The Viterbi algorithm. The were collected using four pairs of dry models were also initialized using K-means clustering so that the electrodes. Another portion of data states were partitioned to equalize the amount of variance within was collected using 8 pairs of wet elec- each state. This study employed feature-level fusion to combine trodes on gestures of typing on a num- multi-channel data. ber pad keyboard (0-9) for 40 strokes on each key. Monsifrot et al. [169], 2014 Extraction of the activity of individ- The iEMG signal was modeled as a sum of independent filtered The method introduced was tested ual motor neurons in single channel spike trains embedded in noise. A Markov model of sparse signals over both simulated and experimental intramuscular EMG (iEMG) was introduced where the sparsity of the trains was exploited through iEMG signals. the simulated signals modeling the time between spikes as discrete weibull distribution. An were generated via Markov model un- online estimation method for the weibull distribution parameters was der 10 kHz sampling frequency and introduced as well as an implementation of the impulse responses of with filter shapes obtained from experi- the model. mental iEMG for more realistic simula- tion. The experimental iEMG signals were acquired from the extensor digi- torum of a healthy subject with teflon coated stainless steel wire electrodes. Lee [170], 2008 sEMG-based speech recognition A continuous HMM was constructed with Gaussian mixtures model EMG signals were collected from ar- adopted for sEMG-based word recognition based on log mel-filter ticulatory facial muscles from 8 Ko- bank spectrogram of the windowed EMG signals. The segmental K- rean male subjects. The subjects were means algorithm was used for optimal HMM parameters estimation asked to pronounce each word from th th a 60-word vocabulary in a consistent where HMM parameters for the i state and k word are estimated manner in addition to generating a ran- from the observations of the corresponding state of the same word. dom set of words based on this vocab- Viterbi algorithm was used for the decoding process. ulary. Chan et al. [171], 2002 sEMG-based automatic speech A six state left-right HMM with single mixture observation densities, sEMG from five articulatory facial mus- recognition was constructed for identifying the words based on three features cles were collected. The dataset used extracted from sEMG that included the first two autoregressive coef- here was a subset of the dataset de- ficients and the integrated absolute value. HMM was trained in this scribed in [172] with ten-English word work using the expectation-maximization algorithm. vocabulary. Li et al. [173], 2014 Identification/prediction of func- A nonlinear ARX-type RNN was used to predict the stimulated mus- The experiments were conducted on 5 tional electrical stimulation (FES)- cular torque and track muscle fatigue. The model takes the eEMG subjects with spinal cord injuries. induced muscular dynamics with as an input and produces the predicted torque. evoked EMG (eEMG) Xia et al. [174], 2018 Hand motion estimation from A CRNN with 3 CNN layers and 2 LSTM layers was used for the sEMG signals were collected from 8 sEMG prediction and the model used the power spectral density as input. healthy subjects using 5 pairs of bipo- lar electrodes placed on shoulder to record EMG from biceps brachii, tri- ceps brachii, anterior deltoid, posterior deltoid, and middle deltoid. The hand position in 3D space was tracked as the objective for this system. Quivira et al. [175], 2018 Simple hand finger movement iden- An LSTM-based RNN was used to implement a recurrent mixture 8 channel EMG signals were collected tification in sEMG density network (RMDN) [176] that probabilistically model the out- from the proximal forearm region, tar- put of the Network in order to capture the complex features present geting most muscles used in hand ma- the hand movement. nipulation. The hand pose tracking was performed with a Leap Motion sensor and the subjects were asked to perform 7 hand gestures with repeti- tions per gesture. Hu et al. [177], 2018 Hand gesture recognition in sEMG sEMG signals from all channels were segmented into windows of fixed Experiments were performed over the size and transformed into an image representation that was then fed first and second sub-databases of Ni- into a CNN with two convolutional layers, two locally connected naPro (Non Invasive Adaptive Pros- layers, and three fully connected layers followed by an LSTM-based thetics) database [178]. RNN and an attention layer to enhance the output of the network. Samadani [179], 2018 EMG-Based Hand Gesture Classifi- Different RNN architectures were tested in this study to chose the Publicly-available NinaPro hand ges- cation best performing architecture. The evaluated models included uni and ture dataset (NinaPro2) was used bidirectional LSTM- and GRU-based RNNs with attention mecha- [180]. nisms. The models worked on the preprocessed (denoised) raw EMG signals. Simão et al. [181], 2019 EMG-based online gestures classifi- Features were extracted from multi-channel EMG (standard devia- the synthetic sequences of the UC2018 cation tion along each time frame) and fed into a dynamic RNN model that DualMyo dataset [182] and a similar is composed of a dense layer followed by an LSTM-based RNN layer subset of the NinaPro DB5 dataset and another dense layer followed by the output layer. This model [183] was compared to a similar GRU-based model and another static feed forward neural network model. This study used combined feature vector as an input for the models. compared to the transient state due to longer duration and 8. Event detection in other biomedical signals better classification rates [164]. As follows in Table 5, we Physiological monitoring is an essential part of all care give a review about the recent advances in the detection of units nowadays and it is not limited to the aforementioned myoelectric events in EMG signals. biomedical signals only. Tens of variables are collected in the form of time series containing hundreds of events that are of importance to the diagnosis and treatment/rehabilitation. Khalifa et al., 2020 Page 13 of 21 Event detection methods have had a strong presence in the labeling or interpretation of the biomedical signals is not analysis of such series. For instance, cardiovascular disorders only an exhausting task, but also requires extensive domain are not only assessed through ECG but also phonocardiogram knowledge and expertise to perform. is used as an easier way for general practitioner to identify One way that can be used to enhance the expressive power the changes in heart sounds. Extracting the cardiac cycle has of stochastic models such as HMM, is the inclusion of non- been one of the major problems in phonocardiogram as well Gaussian mixtures which can boost the performance in many and was addresses using HMMs in multiple pieces of work cases because Gaussianity is not always a reasonable assump- [184–187]. On the other hand, most of RNN based methods tion in many applications. One of the mixtures that was in phonocardiogram, have been used for pure classification proposed as an extension for non-Gaussian mixtures, is in- purposes and anomaly recognition [152]. dependent component analyzers mixture model (ICAMM) 3D acceleration is an emerging technology as well, that and it has been applied in multiple biomedical signal appli- has been extensively used in the assessment and detection cations such as sleep disorders detection and classification of of many medical conditions in swallowing [188] and human neuropsychological tasks in EEG [34, 35]. gait analysis [189]. In swallowing, acceleration signals have An additional way to increase the model capacity and its been used for the detection of pharyngeal swallowing activity ability to model the underlying sequence of events, is through via maximum likelihood methods with minimum descrip- using strongly representing domain features. One of the most tion length in [16] and using short time Fourier transform popular domains representations, is wavelet decomposition and neural networks in [14]. RNNs were also employed for which has proven its superiority to provide high level repre- event detection in swallowing acceleration signals including sentation of events in a wide variety of biomedical signals the upper esophageal sphincter opening in [15, 190], laryn- such as phonocardiograms [9, 187], EEG [104, 117, 120, geal vestibule closure [191], and hyoid bone motion during 121], and EMG [164]. Handcrafting features, however, is not swallowing [192]. In gait analysis, HMMs were used for an easy task and requires an extensive domain knowledge recognition and extraction in multiple occasions [193–196] and significant efforts to come up with cues that trigger the as well as RNNs [197–199]. identification of specific signal components. Furthermore, mapping the feature space into a more comprehensive space of less dimensionality is often a paramount operation prior to 9. Challenges and Future Directions building the model. Given the previous factors, models that Event detection in biomedical signals is a critical step for are able to learn high level representations simultaneously diagnosis and intervention procedures that are extensively from raw signals and have the massive expressive power to used on a daily basis in nearly every standard clinical set- model tasks involving long time lags, can be of a great benefit ting. It also represents the core of various eHealth technolo- [200]. gies that employ wearable devices and regular monitoring of physiological signs. Being such a fundamental operation 9.2. High Capacity Models Embedding Feature that controls the clinical decision making process, it necessi- Extraction tates precise detection in a fairly complex environment that The evolution of deep learning has revolutionized the way contains multiple events occurring concurrently. Particularly, in which problems are addressed and instead of classification false positive rate in clinical testing is an important indicator and detection systems that solely relied on handcrafted fea- for how well the detection model generalizes and differenti- tures, end-to-end systems are being trained to take care of ates between the event of interest and the background noise. all steps from the raw input till the final output. End-to-end Building such highly accurate models depends on many fac- systems are complex, although rich, processing pipelines that tors that include the diversity in the used dataset and labels make the most of the available information through using a in addition to model capacity. unified scheme that trains the system as a whole from the input till the output is produced [201]. It has been shown that 9.1. Classical Models Scaling: Challenges deep architectures can replace handcrafted feature extraction As mentioned before, biomedical signals are the mani- stages and work directly on raw data to produce high levels festation of well-coordinated, yet complex physiological pro- of abstraction. RNNs have been introduced in 1996 for the cesses which involve various anatomical structures that are identification of arm kinematics during hand drawing from close in position and share several functions. Hence, the col- raw EMG signals [202] and then the same architecture was lected signals pick not only the target physiological process adopted for lower limb kinematics in [203]. In both studies, but also other unavoidable neighbor processes. An example the authors verified that an RNN was able to map the relation- of that is the detection of the combined activation for multiple ship between raw EMG signals and limbs’ kinematics during muscles in sEMG, eye blinking along with neural activity drawing for the arm and human locomotion for the lower in EEG, and head movement along with swallowing vibra- limb. Chauhan and Vig [204] and Sujadevi et al. [205] have tions in swallowing accelerometry. Extraction of the event also used more sophisticated multi-layer LSTM-based RNN of interest in this case requires the exhausting labeling of the architectures on raw ECG signals for arrhythmia detection. underlying set of processes in order to be able to build the Spampinato et al. [154] have employed RNNs as well to ex- predefined state space for classical stochastic methods such tract discriminative brain manifold for visual categories from as HMM, from which the state sequence is drawn. Manual Khalifa et al., 2020 Page 14 of 21 EEG signals. Further, Vidyaratne et al. [123] used RNNs 9.3. Transfer Learning for seizure detection in EEG; however, they used a denoised Despite the fact that most of the previously mentioned and segmented version of the signals. As mentioned earlier, methods are achieving great results on certain datasets, it is although RNNs are efficient in modeling long contexts, they popular that they can easily overfit the data, resulting in poor tend to have the error signals propagate through a tremendous generalization. Thus, it requires not only very large but also number of steps when being fed highly sampled inputs such diverse datasets to train and validate models that well gener- as raw signals which affects the network optimizability and alize. In biomedical signal processing field, the collection training speed [49, 50]. of such datasets may pose a challenge towards developing In this regard, convolutional neural networks (CNNs) reliable models. Strictly speaking, it may not be feasible to have been utilized to perceive small local contexts which find a large population of subjects when studying a rare dis- then are propagated to an RNN for the perception of tem- ease and yet if it is feasible, it is extremely difficult to acquire poral contexts or a feed-forward network for a classification the expert reference annotations for the underlying dataset or prediction target. CNNs were introduced as a solution [216]. Many factors contribute to this, as mentioned before, to enable recognition systems to learn hierarchical internal the noisy nature of biomedical signals increases the difficulty representations that form the scenes in vision applications of manual interpretation and necessitates the presence of ref- (pixels form edglets, edglets form motifs, motifs form parts, erence modalities to acquire accurate information about the parts form objects and objects form scenes) [206, 207]. Thus, processes such as collection of x-ray videofluoroscopy si- CNNs are basically multi-stage trainable architectures that multaneously with swallowing accelerometry [217]. Another are stacked on top of each other to learn each level of the fea- factor is that the experts annotating the data need to maintain ture hierarchy [200, 206]. Each stage is usually composed of high record of reliability across time and to be compared to three layers, a filter bank layer, a non-linear activation layer, peer experts which might be difficult to achieve or require and a pooling layer. A filter bank layer extracts particular continuous training and checking of the experts’ reliability. features at all locations on the input. The non-linear activa- One way to overcome limited- size and/or diversity datasets, tion works as a regulator that determines whether a neuron is to utilize the the pretrained models from relatively different should fire or not through checking the its value and decid- domains and apply them to solve the particular targeted prob- ing if the following connections should consider this neron lem or so-called transfer learning [218]. In transfer learning, activated [200]. A pooling layer represents a dimensionality the pretrained model’s weights are used as initialization and reduction procedure that processes the feature maps in order then fine-tuned accordingly to fit the new dataset. In most to produce lower resolution maps that are robust to the small cases, retraining happens in a much lower ( 10 times smaller) variations in the location of features [206]. The coefficients learning rate than the original. Transfer learning has been of the filters are the trainable parameters in the CNNs and used for event detection and classification tasks in multiple they are updated simultaneously by the training algorithm to biomedical signals including ECG for cardiac arrhythmia de- minimize the discrepancy between the actual output and the tection [219], EEG for drowsiness detection [220] and driving desired output [206]. fatigue detection [221], and EMG for hand gesture classifi- The design concept of CNNs first evolved for vision ap- cation [222]. However, one thing worth mentioning is that plications; but since then, the same concept is being adopted transfer learning sometimes may not help perform better than for pattern analysis and recognition in biomedical signals the originally trained model if there exist huge differences be- [174, 208–212]. For instance, Shashikumar et al. [210] used tween the datasets or deterioration in inter-subject variability a 5-layer 2D CNN followed by a BRNN in association with [218]. soft attention mechanism to process the wavelet transform of ECG signals for the detection of atrial fibrillation. Tan 10. Conclusion et al. [211] also used a 1D 2-layer CNN with a 3-layer LSTM- In this paper, we provided a comprehensive review of based RNN for the detection of coronary artery disease in event extraction methods in biomedical signals, in particu- ECG. Further, Xiong et al. [212] used a residual convolu- lar hidden Markov models and recurrent neural networks. tional recurrent neural network for the detection of cardiac HMM is a probabilistic model that represents a sequence arrhythmia in ECG. All these experiments using RNNs on of observations in terms of a hidden sequence of states and top of CNNs for biomedical signal analysis were successful sets the concepts and methods on how to find the optimal to produce extremely high levels of abstraction and rich tem- state sequence that best describes the observations. RNN poral representation that can perceive long range contexts is a type of neural networks that was introduced to model without human intervention in addition to being easier to the time dependency and perform contextual mapping in se- optimize computationally. CNNs have been also utilized in quences. This review showed that the presence of dynamic association with fully connected networks to increase the programming algorithms like the EM and Viterbi, led to the capacity of HMMs in connectionist hybrid DNN-HMM mod- wide spread of HMMs which were used to dynamically tran- els due to the ability of CNNs to process high-dimensional scribe the context of many biomedical signals. It wasn’t too multi-step inputs [213]. Such hybrid systems provided state long until HMMs became insufficient for time series mod- of the art performance especially in the field of handwriting eling needs, specifically modeling long range dependencies recognition [214, 215]. Khalifa et al., 2020 Page 15 of 21 and larger state spaces, and RNNs started to gradually re- networks in high resolution cervical auscultation, IEEE Journal of Biomedical and Health Informatics (2020). place HMMs in time-dependent contextual mappings. So [16] E. Sejdić, C. M. Steele, T. Chau, Segmentation of dual-axis swal- far, RNNs have proven superiority in time series modeling lowing accelerometry signals in healthy subjects with analysis of especially in biomedical signals and continue to expand their anthropometric effects on duration of swallowing activities, IEEE domination in building automatic detection and diagnosis Transactions on Biomedical Engineering 56 (2009) 1090–1097. systems through the emerging designs and practices experi- [17] S. Damouras, E. Sejdić, C. M. Steele, T. Chau, An online swallow detection algorithm based on the quadratic variation of dual-axis mented in nearly every field. accelerometry, IEEE Transactions on Signal Processing 58 (2010) 3352–3359. [18] Z. C. Lipton, J. Berkowitz, C. Elkan, A critical review of re- Acknowledgments current neural networks for sequence learning, arXiv preprint The work reported in this manuscript was supported by arXiv:1506.00019 (2015). the National Science Foundation under the CAREER Award [19] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation 9 (1997) 1735–1780. Number 1652203. The content is solely the responsibility [20] L. R. Rabiner, A Tutorial on hidden Markov-models and selected of the authors and does not necessarily represent the official applications in speech recognition, Proceedings of the IEEE 77 (1989) views of the National Science Foundation. 257–286. [21] P. J. Werbos, Backpropagation through time - what it does and how to do it, Proceedings of the IEEE 78 (1990) 1550–1560. References [22] D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning representa- [1] L. Glass, Synchronization and rhythmic processes in physiology, tions by back-propagating errors, Nature 323 (1986) 533–536. Nature 410 (2001) 277–284. [23] A. Graves, G. Wayne, I. Danihelka, Neural turing machines, CoRR [2] R. M. Rangayyan, N. P. Reddy, Biomedical signal analysis: A case- abs/1410.5401 (2014). study approach, Annals of Biomedical Engineering 30 (2002) 983– [24] A. Cohen, Hidden Markov models in biomedical signal processing, 983. in: Proceedings of the 20th Annual International Conference of the [3] P. Rashidi, A. Mihailidis, A survey on ambient-assisted living tools IEEE Engineering in Medicine and Biology Society, volume 3, IEEE, for older adults, IEEE Journal of Biomedical and Health Informatics 1998, pp. 1145–1150. 17 (2013) 579–590. [25] L. E. Baum, T. Petrie, Statistical inference for probabilistic functions [4] J. Kim, M. Kim, I. Won, S. Yang, K. Lee, W. Huh, A biomedical of finite state Markov chains, Annals of Mathematical Statistics 37 signal segmentation algorithm for event detection based on slope (1966) 1554–1563. tracing, in: Proceedings of the 31st Annual International Conference [26] D. Jurafsky, J. H. Martin, Speech and language processing, 2nd ed., of the IEEE Engineering in Medicine and Biology Society, IEEE, Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 2009. 2009, pp. 1889–1892. doi:10.1109/IEMBS.2009.5333874. [27] L. E. Baum, J. A. Eagon, An Inequality with Applications to Sta- [5] J. Andreu-Perez, C. C. Poon, R. D. Merrifield, S. T. Wong, G. Z. tistical Estimation for Probabilistic Functions of Markov Processes Yang, Big data for health, IEEE Journal of Biomedical and Health and to a Model for Ecology, Bulletin of the American Mathematical Informatics 19 (2015) 1193–1208. Society 73 (1967) 360–363. [6] R. Gravina, P. Alinia, H. Ghasemzadeh, G. Fortino, Multi-sensor [28] L. E. Baum, G. R. Sell, Growth transformations for functions on fusion in body sensor networks: State-of-the-art and research chal- manifolds, Pacific Journal of Mathematics 27 (1968) 211–227. lenges, Information Fusion 35 (2017) 68–80. [29] L. Baum, An inequality and associated maximization technique in [7] D. P. Mandic, D. Obradovic, A. Kuh, T. Adali, U. Trutschell, M. Golz, statistical estimation of probabilistic functions of a Markov process, P. De Wilde, J. Barria, A. Constantinides, J. Chambers, W. Duch, in: Proceedings of the 3rd Symposium on Inequalities, volume 3, J. Kacprzyk, E. Oja, S. Zadrożny, Data Fusion for Modern Engineer- 1972, pp. 1–8. ing Applications: An Overview, 2005. [30] A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum likelihood from [8] D. Mandic, M. Golz, A. Kuh, D. Obradovic, T. Tanaka, Signal Pro- incomplete data via the EM algorithm, Journal of the Royal Statistical cessing Techniques for Knowledge Extraction and Information Fu- Society: Series B (Statistical Methodology) 39 (1977) 1–38. sion, Springer US, 2008. [31] L. A. Liporace, Maximum-likelihood estimation for multivariate [9] L. Huiying, L. Sakari, H. Iiro, A heart sound segmentation algorithm observations of Markov sources, IEEE Transactions on Information using wavelet decomposition and reconstruction, in: Proceedings of Theory 28 (1982) 729–734. the 19th Annual International Conference of the IEEE Engineering [32] B. H. Juang, Maximum-likelihood estimation for mixture multivariate in Medicine and Biology Society, volume 4, IEEE, 1997, pp. 1630– stochastic observations of Markov-chains, AT&T Technical Journal 1633. 64 (1985) 1235–1249. [10] J. Pan, W. J. Tompkins, A real-time QRS detection algorithm, IEEE [33] Levinson, S, M. Sondhi, Maximum likelihood estimation for multi- Transactions on Biomedical Engineering 32 (1985) 230–236. variate mixture observations of markov chains, IEEE Transactions [11] V. Srinivasan, C. Eswaran, N. Sriraam, Approximate entropy-based on Information Theory 32 (1986) 307–309. epileptic EEG detection using artificial neural networks, IEEE Trans- [34] G. Safont, A. Salazar, L. Vergara, E. Gómez, V. Villanueva, Mul- actions on Information Technology in Biomedicine 11 (2007) 288– tichannel dynamic modeling of non-Gaussian mixtures, Pattern Recognition 93 (2019) 312–323. [12] N. Kannathal, M. L. Choo, U. R. Acharya, P. K. Sadasivan, Entropies [35] A. Salazar, L. Vergara, R. Miralles, On including sequential depen- for detection of epilepsy in EEG, Computer Methods and Programs dence in ICA mixture models, Signal Processing 90 (2010) 2314– in Biomedicine 80 (2005) 187–94. [13] A. Schlogl, F. Lee, H. Bischof, G. Pfurtscheller, Characterization of [36] A. Graves, Supervised sequence labelling, in: Supervised sequence four-class motor imagery EEG data for the BCI-competition 2005, labelling with recurrent neural networks, Springer, 2012, pp. 5–13. Journal of Neural Engineering 2 (2005) L14–L22. [37] J. L. Elman, Finding structure in time, Cognitive Science 14 (1990) [14] Y. Khalifa, J. L. Coyle, E. Sejdić, Non-invasive identification of 179 – 211. swallows via deep learning in high resolution cervical auscultation [38] M. I. Jordan, Attractor dynamics and parallelism in a connectionist recordings, Scientific Reports 10 (2020) 8704. sequential machine, in: J. Diederich (Ed.), Artificial Neural Networks, [15] Y. Khalifa, C. Donohue, J. Coyle, E. Sejdić, Upper esophageal IEEE Press, Piscataway, NJ, USA, 1990, pp. 112–127. sphincter opening segmentation with convolutional recurrent neural Khalifa et al., 2020 Page 16 of 21 [39] H. Jaeger, The "echo state" approach to analysing and training recur- [58] E. D. Übeyli, Combining recurrent neural networks with eigenvector rent neural networks-with an erratum note, Technical Report, German methods for classification of ECG beats, Digital Signal Processing National Research Center for Information Technology, 2001. 19 (2009) 320–329. [40] Y. Khalifa, Z. Zhang, E. Sejdić, Sparse recovery of time-frequency [59] A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. representations via recurrent neural networks, in: Proceedings of the Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, H. E. 22nd International Conference on Digital Signal Processing, ACM, Stanley, PhysioBank, PhysioToolkit, and PhysioNet: Components of 2017, pp. 1–5. a new research resource for complex physiologic signals, Circulation [41] M. I. Jordan, Serial order: A parallel distributed processing approach, 101 (2000) E215–E220. in: Neural Network Models of Cognition, volume 121 of Advances [60] C. Zhang, G. Wang, J. Zhao, P. Gao, J. Lin, H. Yang, Patient-specific in Psychology, North-Holland, 1997, pp. 471–495. ECG classification based on recurrent neural networks and clustering [42] R. Pascanu, T. Mikolov, Y. Bengio, On the difficulty of training technique, in: Proceedings of the 13th International Conference on recurrent neural networks, in: Proceedings of the 30th International Biomedical Engineering, 2017, pp. 63–67. Conference on Machine Learning, volume 28, 2013, pp. III–1310– [61] G. B. Moody, R. G. Mark, The impact of the MIT-BIH arrhythmia III–1318. database, IEEE Engineering in Medicine and Biology Magazine 20 [43] Y. Bengio, P. Simard, P. Frasconi, Learning long-term dependen- (2001) 45–50. cies with gradient descent is difficult, IEEE Transactions on Neural [62] Z. Xiong, M. K. Stiles, J. Zhao, Robust ECG signal classification Networks 5 (1994) 157–166. for detection of atrial fibrillation using a novel neural network, in: [44] A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, Computing in Cardiology, volume 44, 2017, pp. 1–4. J. Schmidhuber, A novel connectionist system for unconstrained [63] D. H. Wolpert, Stacked generalization, Neural Networks 5 (1992) handwriting recognition, IEEE Transactions on Pattern Analysis and 241–259. Machine Intelligence 31 (2009) 855–868. [64] M. Zihlmann, D. Perekrestenko, M. Tschannen, Convolutional recur- [45] X. Glorot, Y. Bengio, Understanding the difficulty of training deep rent neural networks for electrocardiogram classification, in: Com- feedforward neural networks, in: Y. W. Teh, M. Titterington (Eds.), puting in Cardiology, 2017, pp. 1–4. Proceedings of the 13th International Conference on Artificial Intelli- [65] M. Limam, F. Precioso, Atrial fibrillation detection and ECG classifi- gence and Statistics, volume 9 of Proceedings of Machine Learning cation based on convolutional recurrent neural network, in: Comput- Research, PMLR, 2010, pp. 249–256. ing in Cardiology, 2017, pp. 1–4. doi:10.22489/CinC.2017.171-325. [46] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, [66] Y. Chang, S. Wu, L. Tseng, H. Chao, C. Ko, AF detection by exploit- H. Schwenk, Y. Bengio, Learning phrase representations using RNN ing the spectral and temporal characteristics of ECG signals with the encoder–decoder for statistical machine translation, in: Proceed- LSTM model, in: Computing in Cardiology, volume 45, 2018, pp. ings of the Conference on Empirical Methods in Natural Language 1–4. Processing, 2014, pp. 1724–1734. [67] S. Petrutiu, A. V. Sahakian, S. Swiryn, Abrupt changes in fibrillatory [47] F. A. Gers, J. Schmidhuber, F. Cummins, Learning to forget: Contin- wave characteristics at the termination of paroxysmal atrial fibrillation ual prediction with LSTM, in: Proceedings of the 9th International in humans, EP Europace 9 (2007) 466–470. Conference on Artificial Neural Networks, volume 2, IEEE, 1999, [68] A. Taddei, G. Distante, M. Emdin, P. Pisani, G. B. Moody, C. Zee- pp. 850–855. doi:10.1049/cp:19991218. lenberg, C. Marchesi, The European ST-T database: standard for [48] A. Viterbi, Error bounds for convolutional codes and an asymptoti- evaluating systems for the analysis of ST-T changes in ambulatory cally optimum decoding algorithm, IEEE Transactions on Informa- electrocardiography, European Heart Journal 13 (1992) 1164–1172. tion Theory 13 (1967) 260–269. [69] F. M. Nolle, F. K. Badura, J. M. Catlett, R. W. Bowser, M. H. Sketch, [49] P. Schwab, G. C. Scebba, J. Zhang, M. Delai, W. Karlen, Beat by CREI-GARD, a new concept in computerized arrhythmia monitoring beat: Classifying cardiac arrhythmias with recurrent neural networks, systems, Computers in Cardiology 13 (1987) 515–518. in: Computing in Cardiology, volume 44, 2017, pp. 1–4. [70] R. Bousseljot, D. Kreiseler, A. Schnabel, Nutzung der EKG- [50] M. F. Stollenga, W. Byeon, M. Liwicki, J. Schmidhuber, Parallel Signaldatenbank CARDIODAT der PTB über das Internet, Biomedi- multi-dimensional LSTM, with application to fast biomedical volu- zinische Technik/Biomedical Engineering 40 (1995) 317–318. metric image segmentation, arXiv preprint arXiv:1506.07452 (2015). [71] H. W. Lui, K. L. Chow, Multiclass classification of myocardial infarc- [51] W. Gersch, P. Lilly, E. Dong, PVC detection by the heart-beat interval tion with convolutional and recurrent neural networks for portable data—Markov chain approach, Computers and Biomedical Research ECG devices, Informatics in Medicine Unlocked 13 (2018) 26–33. 8 (1975) 370 – 378. [72] G. D. Clifford, C. Liu, B. Moody, L. H. Lehman, I. Silva, Q. Li, A. E. [52] D. A. Coast, R. M. Stern, G. G. Cano, S. A. Briller, An approach Johnson, R. G. Mark, AF classification from a short single lead ECG to cardiac arrhythmia analysis using hidden Markov models, IEEE recording: The PhysioNet/computing in cardiology challenge 2017, Transactions on Biomedical Engineering 37 (1990) 826–836. 2017, pp. 1–4. doi:10.22489/CinC.2017.065-469. [53] R. E. Hermes, D. B. Geselowitz, G. Oliver, Development, distribution, [73] S. Singh, S. K. Pandey, U. Pawar, R. R. Janghel, Classification of ECG and use of the American Heart Association database for ventricular Arrhythmia using Recurrent Neural Networks, Procedia Computer arrhythmia detector evaluation, Computers in Cardiology (1980) Science 132 (2018) 1290–1297. 263–266. [74] A. Kadish, A. E. Buxton, H. Kennedy, B. P. Knight, J. W. Mason, [54] R. V. Andreao, B. Dorizzi, J. Boudy, ECG signal analysis through hid- C. Schuger, C. Tracy, W. L. Winters, A. W. Boone, M. Elnicki, den Markov models, IEEE Transactions on Biomedical Engineering J. W. Hirshfeld, B. H. Lorell, G. Rodgers, H. H. Weitz, ACC/AHA 53 (2006) 1541–1549. clinical competence statement on electrocardiography and ambulatory [55] P. Laguna, R. G. Mark, A. Goldberg, G. B. Moody, A database for electrocardiography, Journal of the American College of Cardiology evaluation of algorithms for measurement of QT and other waveform 38 (2001) 3169–3178. intervals in the ECG, in: Computers in Cardiology, 1997, pp. 673– [75] M. H Crawford, S. Bernstein, P. Deedwania, J. Dimarco, K. J Ferrick, 676. A. Garson, L. Green, H. Leon Greene, M. Silka, P. H Stone, C. Tracy, [56] F. Sandberg, M. Stridh, L. Sornmo, Frequency tracking of atrial R. Gibbons, ACC/AHA guidelines for ambulatory electrocardiog- fibrillation using hidden Markov models, IEEE Transactions on raphy, Journal of the American College of Cardiology 34 (1999) Biomedical Engineering 55 (2008) 502–511. 912–948. [57] J. Oliveira, C. Sousa, M. T. Coimbra, Coupled hidden Markov model [76] K. S. Sayed, A. F. Khalaf, Y. M. Kadah, Arrhythmia classification for automatic ECG and PCG segmentation, in: Proceedings of the based on novel distance series transform of phase space trajectories, IEEE International Conference on Acoustics, Speech and Signal in: Proceedings of the 37th Annual International Conference of the Processing, 2017, pp. 1023–1027. IEEE Engineering in Medicine and Biology Society, 2015, pp. 5195– Khalifa et al., 2020 Page 17 of 21 5198. neering in Medicine and Biology Magazine 20 (2001) 51–57. [77] D. L. Schomer, F. L. Da Silva, Niedermeyer’s electroencephalogra- [96] H. Phan, F. Andreotti, N. Cooray, O. Y. Chén, M. De Vos, Se- phy: basic principles, clinical applications, and related fields, 6th ed., qSleepNet: End-to-End Hierarchical Recurrent Neural Network for Lippincott Williams \& Wilkins, 2012. Sequence-to-Sequence Automatic Sleep Staging, IEEE Transactions [78] D. P. Subha, P. K. Joseph, U. R. Acharya, C. M. Lim, EEG signal on Neural Systems and Rehabilitation Engineering 27 (2019) 400– analysis: A survey, Journal of Medical Systems 34 (2010) 195–212. 410. [79] A. Flexerand, G. Dorffner, P. Sykacekand, I. Rezek, An automatic, [97] N. Michielli, U. R. Acharya, F. Molinari, Cascaded LSTM recurrent continuous and probabilistic sleep stager based on a hidden markov neural network for automated sleep stage classification using single- model, Applied Artificial Intelligence 16 (2002) 199–207. channel EEG signals, Computers in Biology and Medicine 106 (2019) [80] A. Flexer, G. Gruber, G. Dorffner, A reliable probabilistic sleep stager 71–81. based on a single EEG signal, Artificial Intelligence in Medicine 33 [98] C. Sun, J. Fan, C. Chen, W. Li, W. Chen, A Two-Stage Neural (2005) 199–207. Network for Sleep Stage Classification Based on Feature Learning, [81] L. G. Doroshenkov, V. A. Konyshev, S. V. Selishchev, Classification Sequence Learning, and Data Augmentation, IEEE Access 7 (2019) of human sleep stages based on EEG processing using hidden Markov 109386–109397. models, Biomedical Engineering 41 (2007) 25–28. [99] S. H. Sheldon, R. Ferber, M. H. Kryger, Principles and practice of [82] B. Kemp, A. H. Zwinderman, B. Tuk, H. A. Kamphuisen, J. J. Oberye, pediatric sleep medicine, 1st ed., Elsevier Health Sciences, 2005. Analysis of a sleep-dependent neuronal feedback loop: the slow- [100] D. Y. Kang, P. N. DeYoung, A. Malhotra, R. L. Owens, T. P. Coleman, wave microcontinuity of the EEG, IEEE Transactions on Biomedical A state space and density estimation framework for sleep staging in Engineering 47 (2000) 1185–1194. obstructive sleep apnea, IEEE Transactions on Biomedical Engineer- [83] M. T. Bianchi, N. A. Eiseman, S. S. Cash, J. Mietus, C. K. Peng, R. J. ing 65 (2018) 1201–1212. Thomas, Probabilistic sleep architecture models in patients with and [101] A. Roebuck, V. Monasterio, E. Gederi, M. Osipov, J. Behar, A. Mal- without sleep apnea, Journal of Sleep Research 21 (2012) 330–341. hotra, T. Penzel, G. D. Clifford, A review of signals used in sleep [84] S. F. Quan, B. V. Howard, C. Iber, J. P. Kiley, F. J. Nieto, G. T. analysis, Physiological Measurement 35 (2013) R1–R57. O’Connor, D. M. Rapoport, S. Redline, J. Robbins, J. M. Samet, P. W. [102] C. on Epidemiology and Prognosis, I. L. A. Epilepsy, Guidelines for Wahl, The sleep heart health study: Design, rationale, and methods, epidemiologic studies on epilepsy, Epilepsia 34 (1993) 592–596. Sleep 20 (1997) 1077–1085. [103] W. W. Lytton, Computer modelling of epilepsy, Nature Reviews: [85] S. T. Pan, C. E. Kuo, J. H. Zeng, S. F. Liang, A transition-constrained Neuroscience 9 (2008) 626–637. discrete hidden Markov model for automatic sleep staging, Biomedi- [104] M. H. Abdullah, J. M. Abdullah, M. Z. Abdullah, Seizure detection cal Engineering Online 11 (2012) 52. by means of hidden Markov model and stationary wavelet transform [86] F. Yaghouby, S. Sunderam, Quasi-supervised scoring of human sleep of electroencephalograph signals, in: Proceedings of the IEEE- in polysomnograms using augmented input variables, Computers in EMBS International Conference on Biomedical and Health Informat- Biology and Medicine 59 (2015) 54–63. ics, IEEE, 2012, pp. 62–65. doi:10.1109/BHI.2012.6211506. [87] J. A. Onton, D. Y. Kang, T. P. Coleman, Visualization of whole-night [105] O. Smart, M. Chen, Semi-automated patient-specific scalp EEG sleep EEG from 2-channel mobile recording device reveals distinct seizure detection with unsupervised machine learning, in: Proceed- deep sleep stages with differential electrodermal activity, Frontiers ings of the IEEE Conference on Computational Intelligence in Bioin- in Human Neuroscience 10 (2016) 605. formatics and Computational Biology, 2015, pp. 1–7. [88] P. R. Davidson, R. D. Jones, M. T. R. Peiris, Detecting behavioral [106] S. Santaniello, D. L. Sherman, M. A. Mirski, N. V. Thakor, S. V. microsleeps using EEG and LSTM recurrent neural networks, in: Sarma, A Bayesian framework for analyzing iEEG data from a rat Proceedings of the 20th Annual International Conference of the IEEE model of epilepsy, in: Proceedings of the 33rd Annual International Engineering in Medicine and Biology Society, IEEE, 2005, pp. 5754– Conference of the IEEE Engineering in Medicine and Biology Soci- 5757. ety, 2011, pp. 1435–1438. [89] Y. L. Hsu, Y. T. Yang, J. S. Wang, C. Y. Hsu, Automatic sleep [107] F. Mormann, T. Kreuz, C. Rieke, R. G. Andrzejak, A. Kraskov, stage recurrent neural classifier using energy features of EEG signals, P. David, C. E. Elger, K. Lehnertz, On the predictability of epileptic Neurocomputing 104 (2013) 105–114. seizures, Clinical Neurophysiology 116 (2005) 569–587. [90] A. Supratak, H. Dong, C. Wu, Y. Guo, DeepSleepNet: A model [108] T. Maiwald, M. Winterhalder, R. Aschenbrenner-Scheibe, H. U. Voss, for automatic sleep stage scoring based on raw single-channel EEG, A. Schulze-Bonhage, J. Timmer, Comparison of three nonlinear IEEE Transactions on Neural Systems and Rehabilitation Engineering seizure prediction methods by means of the seizure prediction char- 25 (2017) 1998–2008. acteristic, Physica D-Nonlinear Phenomena 194 (2004) 357–368. [91] C. O’Reilly, N. Gosselin, J. Carrier, T. Nielsen, Montreal archive of [109] P. E. McSharry, L. A. Smith, L. Tarassenko, Prediction of epileptic sleep studies: An open-access resource for instrument benchmarking seizures: Are nonlinear methods relevant?, Nature Medicine 9 (2003) and exploratory research, Journal of Sleep Research 23 (2014) 628– 241–242. 635. [110] Y. C. Lai, M. A. Harrison, M. G. Frei, I. Osorio, Controlled test for [92] S. Biswal, J. Kulas, H. Sun, B. Goparaju, M. B. Westover, M. T. predictive power of Lyapunov exponents: their inability to predict Bianchi, J. Sun, SLEEPNET: Automated Sleep Staging System via epileptic seizures, Chaos 14 (2004) 630–642. Deep Learning, arXiv preprint arXiv:1707.08262 (2017). [111] M. Winterhalder, T. Maiwald, H. U. Voss, R. Aschenbrenner-Scheibe, [93] H. Phan, F. Andreotti, N. Cooray, O. Y. Chén, M. D. Vos, Auto- J. Timmer, A. Schulze-Bonhage, The seizure prediction characteris- matic sleep stage classification using single-channel EEG: Learning tic: A general framework to assess and compare seizure prediction sequential features with attention-based recurrent neural networks, in: methods, Epilepsy & Behavior 4 (2003) 318–325. Proceedings of the 40th Annual International Conference of the IEEE [112] S. Wong, A. B. Gardner, A. M. Krieger, B. Litt, A stochastic Engineering in Medicine and Biology Society, 2018, pp. 1452–1455. framework for evaluating seizure prediction algorithms using hidden [94] E. Bresch, U. Großekathöfer, G. Garcia-Molina, Recurrent Deep Markov models, Journal of Neurophysiology 97 (2007) 2525–2532. Neural Networks for Real-Time Sleep Stage Classification From [113] A. B. Gardner, A. M. Krieger, G. Vachtsevanos, B. Litt, One-class Single Channel EEG, Frontiers in Computational Neuroscience 12 novelty detection for seizure analysis from intracranial EEG, Journal (2018) 85. of Machine Learning Research 7 (2006) 1025–1044. [95] G. Klosh, B. Kemp, T. Penzel, A. Schlogl, P. Rappelsberger, [114] B. Direito, C. Teixeira, B. Ribeiro, M. Castelo-Branco, F. Sales, E. Trenker, G. Gruber, J. Zeithofer, B. Saletu, W. M. Herrmann, S. L. A. Dourado, Modeling epileptic brain states using EEG spectral Himanen, D. Kunz, M. J. Barbanoj, J. Roschke, A. Varri, G. Dorffner, analysis and topographic mapping, Journal of Neuroscience Methods The SIESTA project polygraphic and clinical database, IEEE Engi- 210 (2012) 220–229. Khalifa et al., 2020 Page 18 of 21 [115] M. Ihle, H. Feldwisch-Drentrup, C. A. Teixeira, A. Witon, B. Schelter, [134] G. Schalk, E. C. Leuthardt, Brain-computer interfaces using electro- J. Timmer, A. Schulze-Bonhage, EPILEPSIAE - a European epilepsy corticographic signals, IEEE Reviews in Biomedical Engineering 4 database, Computer Methods and Programs in Biomedicine 106 (2011) 140–154. (2012) 127–138. [135] K. Sayed, M. Kamel, M. Alhaddad, H. M. Malibary, Y. M. Kadah, [116] A. H. Shoeb, Application of machine learning to epileptic seizure on- Characterization of phase space trajectories for Brain-Computer In- set detection and treatment, {PhD} {Thesis}, Massachusetts Institute terface, Biomedical Signal Processing and Control 38 (2017) 55–66. of Technology, 2009. [136] K. Sayed, M. Kamel, M. Alhaddad, H. M. Malibary, Y. M. [117] A. Petrosian, D. Prokhorov, R. Homan, R. Dasheiff, D. Wunsch, Kadah, Extracting phase space morphological features for Recurrent neural network based prediction of epileptic seizures in electroencephalogram-based brain-computer interface, Journal of intra- and extracranial EEG, Neurocomputing 30 (2000) 201–218. Medical Imaging and Health Informatics 7 (2017) 771–774. [118] N. F. Güler, E. D. Übeyli, n. Güler, Recurrent neural networks [137] E. Donchin, K. M. Spencer, R. Wijesinghe, The mental prosthesis: employing Lyapunov exponents for EEG signals classification, Expert Assessing the speed of a P300-based brain-computer interface, IEEE Systems with Applications 29 (2005) 506–514. Transactions on Rehabilitation Engineering 8 (2000) 174–179. [119] R. G. Andrzejak, K. Lehnertz, F. Mormann, C. Rieke, P. David, C. E. [138] B. Obermaier, C. Neuper, C. Guger, G. Pfurtscheller, Information Elger, Indications of nonlinear deterministic and finite-dimensional transfer rate in a five-classes brain-computer interface, IEEE Trans- structures in time series of brain electrical activity: dependence on actions on Neural Systems and Rehabilitation Engineering 9 (2001) recording region and brain state, Physical Review. E: Statistical, 283–288. Nonlinear, and Soft Matter Physics 64 (2001) 061907. [139] B. Obermaier, C. Guger, C. Neuper, G. Pfurtscheller, Hidden Markov [120] S. P. Kumar, N. Sriraam, P. G. Benakop, Automated detection of models for online classification of single trial EEG data, Pattern epileptic seizures using wavelet entropy feature with recurrent neural Recognition Letters 22 (2001) 1299–1309. network classifier, in: Proceedings of the IEEE Region 10 Interna- [140] G. Pfurtscheller, C. Neuper, G. R. Muller, B. Obermaier, G. Krausz, tional Conference, 2008, pp. 1–5. A. Schlogl, R. Scherer, B. Graimann, C. Keinrath, D. Skliris, [121] G. R. Minasyan, J. B. Chatten, M. J. Chatten, R. N. Harner, Patient- M. Wortz, G. Supp, C. Schrank, Graz-BCI: State of the art and specific early seizure detection from scalp EEG, Journal of Clinical clinical applications, IEEE Transactions on Neural Systems and Neurophysiology 27 (2010) 163–178. Rehabilitation Engineering 11 (2003) 177–180. [122] M. A. Naderi, H. Mahdavi-Nasab, Analysis and classification of EEG [141] S. Solhjoo, A. M. Nasrabadi, M. R. H. Golpayegani, Classification signals using spectral analysis and recurrent neural networks, in: Pro- of chaotic signals using HMM classifiers: EEG-based mental task ceedings of the 17th Iranian Conference of Biomedical Engineering, classification, in: Proceedings of the 13th European Signal Processing 2010, pp. 1–4. Conference, 2005, pp. 1–4. [123] L. Vidyaratne, A. Glandon, M. Alam, K. M. Iftekharuddin, Deep [142] G. Pfurtscheller, A. Schlögl, Dataset III: Motor imagery, Technical recurrent neural network for seizure detection, in: Proceedings of Report, 2003. the IEEE International Joint Conference on Neural Networks, IEEE, [143] H. Suk, S. Lee, Two-layer hidden Markov models for multi-class 2016, pp. 1202–1207. motor imagery classification, in: Proceedings of the 1st Workshop on [124] S. S. Talathi, Deep Recurrent Neural Networks for seizure detection Brain Decoding: Pattern Recognition Challenges in Neuroimaging, and early seizure detection systems, arXiv preprint arXiv:1706.03283 2010, pp. 5–8. (2017). [144] C. Brunner, R. Leeb, G. Müller-Putz, A. Schlögl, G. Pfurtscheller, [125] M. Golmohammadi, S. Ziyabari, V. Shah, E. Von Weltin, C. Camp- Dataset IIa: Graz dataset A, Technical Report, 2008. bell, I. Obeid, J. Picone, Gated recurrent networks for seizure [145] W. Speier, C. Arnold, J. Lu, A. Deshpande, N. Pouratian, Integrating detection, in: Proceedings of the 2017 IEEE Signal Process- language information with a hidden Markov model to improve com- ing in Medicine and Biology Symposium (SPMB), 2017, pp. 1–5. munication rate in the P300 speller, IEEE Transactions on Neural doi:10.1109/SPMB.2017.8257020. Systems and Rehabilitation Engineering 22 (2014) 678–684. [126] I. Obeid, J. Picone, The Temple University Hospital EEG Data [146] A. Erfanian, B. Mahmoudi, Real-time ocular artifact suppression Corpus, Frontiers in Neuroscience 10 (2016). using recurrent neural network for electro-encephalogram based brain- [127] M. Golmohammadi, V. Shah, S. Lopez, S. Ziyabari, S. Yang, J. Ca- computer interface, Medical & Biological Engineering & Computing maratta, I. Obeid, J. Picone, The TUH EEG seizure corpus, in: 43 (2005) 296–305. Proceedings of the American Clinical Neurophysiology Society An- [147] E. M. Forney, C. W. Anderson, Classification of EEG during imagined nual Meeting, 2017, p. 1. mental tasks by forecasting with Elman recurrent neural networks, in: [128] S. Raghu, N. Sriraam, G. P. Kumar, Classification of epileptic seizures Proceedings of the IEEE International Joint Conference on Neural using wavelet packet log energy and norm entropies with recurrent Networks, IEEE, 2011, pp. 2749–2755. Elman neural network classifier, Cognitive Neurodynamics 11 (2017) [148] D. Balderas, A. Molina, P. Ponce, Alternative classification tech- 51–66. niques for brain-computer interfaces for smart sensor manufacturing [129] A. M. Abdelhameed, H. G. Daoud, M. Bayoumi, Deep Convolu- environments, IFAC-PapersOnLine 48 (2015) 680–685. tional Bidirectional LSTM Recurrent Neural Network for Epilep- [149] R. Leeb, F. Lee, C. Keinrath, R. Scherer, H. Bischof, G. Pfurtscheller, tic Seizure Detection, in: 2018 16th IEEE International New Brain-computer communication: Motivation, aim, and impact of Circuits and Systems Conference (NEWCAS), 2018, pp. 139–143. exploring a virtual apartment, IEEE Transactions on Neural Systems doi:10.1109/NEWCAS.2018.8585542. and Rehabilitation Engineering 15 (2007) 473–482. [130] H. Daoud, M. Bayoumi, Deep Learning based Reliable Early Epilep- [150] R. Maddula, J. Stivers, M. Mousavi, S. Ravindran, V. de Sa, Deep tic Seizure Predictor, in: 2018 IEEE Biomedical Circuits and Sys- recurrent convolutional neural networks for classifying P300 BCI tems Conference (BioCAS), 2018, pp. 1–4. doi:10.1109/BIOCAS.2018. signals, in: Proceedings of the 7th Graz Brain-Computer Interface 8584678. Conference, 2017. [131] R. Hussein, H. Palangi, R. K. Ward, Z. J. Wang, Optimized deep [151] J. Stivers, V. de Sa, Spelling in parallel: Towards a rapid, spatially neural network architecture for robust detection of epileptic seizures independent BCI, in: Proceedings of the 7th Graz Brain-Computer using EEG signals, Clinical Neurophysiology 130 (2019) 25–37. Interface Conference, 2017. [132] G. Pfurtscheller, C. Neuper, Motor imagery and direct brain-computer [152] J. Thomas, T. Maszczyk, N. Sinha, T. Kluge, J. Dauwels, Deep communication, Proceedings of the IEEE 89 (2001) 1123–1134. learning-based classification for brain-computer interfaces, in: Pro- [133] E. C. Leuthardt, G. Schalk, J. R. Wolpaw, J. G. Ojemann, D. W. Moran, ceedings of the IEEE International Conference on Systems, Man, and A brain-computer interface using electrocorticographic signals in Cybernetics, 2017, pp. 234–239. humans, Journal of Neural Engineering 1 (2004) 63–71. [153] V. P. Oikonomou, G. Liaros, K. Georgiadis, E. Chatzilari, K. Adam, Khalifa et al., 2020 Page 19 of 21 S. Nikolopoulos, I. Kompatsiaris, Comparative evaluation of [173] Z. Li, M. Hayashibe, C. Fattal, D. Guiraud, Muscle fatigue tracking state-of-the-art algorithms for SSVEP-based BCIs, arXiv preprint with evoked EMG via recurrent neural network: Toward personal- arXiv:1602.00904 (2016). ized neuroprosthetics, IEEE Computational Intelligence Magazine 9 [154] C. Spampinato, S. Palazzo, I. Kavasidis, D. Giordano, N. Souly, (2014) 38–46. M. Shah, Deep learning human mind for automated visual classifica- [174] P. Xia, J. Hu, Y. Peng, EMG-based estimation of limb movement tion, in: Proceedings of the IEEE Conference on Computer Vision using deep learning with recurrent convolutional neural networks, and Pattern Recognition, 2017, pp. 6809–6817. Artificial Organs 42 (2018) E67–E77. [155] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. H. [175] F. Quivira, T. Koike-Akino, Y. Wang, D. Erdogmus, Translating Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, L. Fei- sEMG signals to continuous hand poses using recurrent neural net- Fei, ImageNet large scale visual recognition challenge, International works, in: Proceedings of the IEEE-EMBS International Conference Journal of Computer Vision 115 (2015) 211–252. on Biomedical and Health Informatics, 2018, pp. 166–169. [156] T. Hosman, M. Vilela, D. Milstein, J. N. Kelemen, D. M. Brandman, [176] A. Graves, Generating sequences with recurrent neural networks, L. R. Hochberg, J. D. Simeral, BCI decoder performance comparison CoRR abs/1308.0850 (2013). of an LSTM recurrent neural network and a Kalman filter in retro- [177] Y. Hu, Y. Wong, W. Wei, Y. Du, M. Kankanhalli, W. Geng, A spective simulation, in: Proceedings of the 2019 9th International novel attention-based hybrid CNN-RNN architecture for sEMG-based IEEE/EMBS Conference on Neural Engineering (NER), 2019, pp. gesture recognition, PloS One 13 (2018) e0206049. 1066–1071. doi:10.1109/NER.2019.8717140. [178] M. Atzori, A. Gijsberts, C. Castellini, B. Caputo, A. G. Hager, S. Elsig, [157] L. R. Hochberg, M. D. Serruya, G. M. Friehs, J. A. Mukand, M. Saleh, G. Giatsidis, F. Bassetto, H. Muller, Electromyography data for non- A. H. Caplan, A. Branner, D. Chen, R. D. Penn, J. P. Donoghue, invasive naturally-controlled robotic hand prostheses, Scientific data Neuronal ensemble control of prosthetic devices by a human with 1 (2014) 140053. tetraplegia, Nature 442 (2006) 164–171. [179] A. Samadani, Gated Recurrent Neural Networks for EMG-Based [158] D. Zhang, L. Yao, K. Chen, S. Wang, X. Chang, Y. Liu, Making Hand Gesture Classification. A Comparative Study, in: Proceedings Sense of Spatio-Temporal Preserving Representations for EEG-Based of the 2018 40th Annual International Conference of the IEEE En- Human Intention Recognition, IEEE Transactions on Cybernetics 50 gineering in Medicine and Biology Society (EMBC), 2018, pp. 1–4. (2020) 3033–3044. doi:10.1109/EMBC.2018.8512531. [159] G. Schalk, D. J. McFarland, T. Hinterberger, N. Birbaumer, J. R. [180] M. Atzori, A. Gijsberts, S. Heynen, A.-G. M. Hager, O. Deriaz, Wolpaw, BCI2000: a general-purpose brain-computer interface (BCI) P. van der Smagt, C. Castellini, B. Caputo, H. Müller, Building system, IEEE Transactions on Biomedical Engineering 51 (2004) the Ninapro database: A resource for the biorobotics community, 1034–1043. in: Proceedings of the 2012 4th IEEE RAS EMBS International [160] S. Tortora, S. Ghidoni, C. Chisari, S. Micera, F. Artoni, Deep Conference on Biomedical Robotics and Biomechatronics (BioRob), learning-based BCI for gait decoding from EEG with LSTM recurrent 2012, pp. 1258–1265. doi:10.1109/BioRob.2012.6290287. neural network, Journal of Neural Engineering 17 (2020) 046011. [181] M. Simão, P. Neto, O. Gibaru, EMG-based online classification of [161] M. J. Zwarts, D. F. Stegeman, Multichannel surface EMG: Basic gestures with recurrent neural networks, Pattern Recognition Letters aspects and clinical utility, Muscle & Nerve 28 (2003) 1–17. 128 (2019) 45–51. [162] J. Y. Hogrel, Clinical applications of surface electromyography in [182] M. Simão, P. Neto, O. Gibaru, UC2018 DualMyo Hand Gesture neuromuscular disorders, Neurophysiologie Clinique 35 (2005) 59– Dataset, 2018. 71. [183] S. Pizzolato, L. Tagliapietra, M. Cognolato, M. Reggiani, H. Müller, [163] M. A. Oskoei, H. S. Hu, Myoelectric control systems-A survey, M. Atzori, Comparison of six electromyography acquisition setups on Biomedical Signal Processing and Control 2 (2007) 275–294. hand movement classification tasks, PloS One 12 (2017) e0186132. [164] K. Englehart, B. Hudgins, P. A. Parker, A wavelet-based continuous [184] S. E. Schmidt, C. Holst-Hansen, C. Graff, E. Toft, J. J. Struijk, Seg- classification scheme for multifunction myoelectric control, IEEE mentation of heart sound recordings by a duration-dependent hidden Transactions on Biomedical Engineering 48 (2001) 302–311. Markov model, Physiological Measurement 31 (2010) 513–529. [165] A. D. Chan, K. B. Englehart, Continuous myoelectric control for [185] A. D. Ricke, R. J. Povinelli, M. T. Johnson, Automatic segmentation powered prostheses using hidden Markov models, IEEE Transactions of heart sound signals using hidden markov models, in: Computers on Biomedical Engineering 52 (2005) 121–124. in Cardiology, volume 32, 2005, pp. 953–956. [166] K. Englehart, B. Hudgins, A. D. C. Chan, Continuous multifunc- [186] P. Sedighian, A. W. Subudhi, F. Scalzo, S. Asgari, Pediatric heart tion myoelectric control using pattern recognition, Technology and sound segmentation using Hidden Markov Model, in: Proceedings of disability 15 (2003) 95–103. the 36th Annual International Conference of the IEEE Engineering [167] X. Zhang, X. Chen, Y. Li, V. Lantz, K. Q. Wang, J. H. Yang, A in Medicine and Biology Society, 2014, pp. 5490–5493. framework for hand gesture recognition based on accelerometer and [187] C. S. Lima, D. Barbosa, Automatic segmentation of the second EMG sensors, IEEE Transactions on Systems Man and Cybernetics cardiac sound by using wavelets and hidden Markov models, in: Part a-Systems and Humans 41 (2011) 1064–1076. Proceedings of the 30th Annual International Conference of the IEEE [168] K. R. Wheeler, M. H. Chang, K. H. Knuth, Gesture-based control Engineering in Medicine and Biology Society, 2008, pp. 334–337. and EMG decomposition, IEEE Transactions on Systems Man and [188] E. Sejdić, G. A. Malandraki, J. L. Coyle, Computational deglutition: Cybernetics Part C-Applications and Reviews 36 (2006) 503–514. Using signal- and image-processing methods to understand swallow- [169] J. Monsifrot, E. Le Carpentier, Y. Aoustin, D. Farina, Sequential ing and associated disorders, IEEE Signal Processing Magazine 36 decoding of intramuscular EMG signals via estimation of a Markov (2019) 138–146. model, IEEE Transactions on Neural Systems and Rehabilitation [189] P. B. Shull, W. Jirattigalachote, M. A. Hunt, M. R. Cutkosky, S. L. Engineering 22 (2014) 1030–1040. Delp, Quantified self and human movement: a review on the clin- [170] K. S. Lee, EMG-based speech recognition using hidden markov mod- ical impact of wearable sensing and feedback for gait analysis and els with global control variables, IEEE Transactions on Biomedical intervention, Gait and Posture 40 (2014) 11–9. Engineering 55 (2008) 930–940. [190] C. Donohue, Y. Khalifa, S. Perera, E. Sejdić, J. L. Coyle, How [171] A. D. Chan, K. Englehart, B. Hudgins, D. F. Lovely, Hidden Markov Closely do Machine Ratings of Duration of UES Opening During model classification of myoelectric signals in speech, IEEE Engi- Videofluoroscopy Approximate Clinician Ratings Using Temporal neering in Medicine and Biology Magazine 21 (2002) 143–146. Kinematic Analyses and the MBSImP?, Dysphagia (2020). [172] A. D. Chan, K. Englehart, B. Hudgins, D. F. Lovely, Myo-electric [191] S. Mao, A. Sabry, Y. Khalifa, J. L. Coyle, E. Sejdić, Estimation signals to augment speech recognition, Medical & Biological Engi- of laryngeal closure duration during swallowing without invasive neering & Computing 39 (2001) 500–504. X-rays, Future Generation Computer Systems (2020). Khalifa et al., 2020 Page 20 of 21 [192] S. Mao, Z. Zhang, Y. Khalifa, C. Donohue, J. L. Coyle, E. Sejdić, [208] H. Cecotti, A. Graser, Convolutional neural networks for P300 detec- Neck sensor-supported hyoid bone movement tracking during swal- tion with application to brain-computer interfaces, IEEE Transactions lowing, Royal Society Open Science 6 (2019) 181982. on Pattern Analysis and Machine Intelligence 33 (2011) 433–445. [193] C. Nickel, C. Busch, S. Rangarajan, M. Möbius, Using hidden Markov [209] S. Kiranyaz, T. Ince, M. Gabbouj, Real-time patient-specific ECG models for accelerometer-based biometric gait recognition, in: Pro- classification by 1-D convolutional neural networks, IEEE Transac- ceedings of the IEEE 7th International Colloquium on Signal Pro- tions on Biomedical Engineering 63 (2016) 664–675. cessing and its Applications, 2011, pp. 58–63. [210] S. P. Shashikumar, A. J. Shah, G. D. Clifford, S. Nemati, Detection [194] A. Mannini, A. M. Sabatini, A hidden Markov model-based tech- of paroxysmal atrial fibrillation using attention-based bidirectional re- nique for gait segmentation using a foot-mounted gyroscope, in: current neural networks, in: Proceedings of the 24th ACM SIGKDD Proceedings of the 33rd Annual International Conference of the IEEE International Conference on Knowledge Discovery & Data Mining, Engineering in Medicine and Biology Society, 2011, pp. 4369–4373. KDD ’18, ACM, London, United Kingdom, 2018, pp. 715–723. [195] C. Nickel, C. Busch, Classifying Accelerometer Data via Hidden doi:10.1145/3219819.3219912. Markov Models to Authenticate People by the Way They Walk, IEEE [211] J. H. Tan, Y. Hagiwara, W. Pang, I. Lim, S. L. Oh, M. Adam, R. S. Aerospace and Electronic Systems Magazine 28 (2013) 29–35. Tan, M. Chen, U. R. Acharya, Application of stacked convolutional [196] G. Panahandeh, N. Mohammadiha, A. Leijon, P. Handel, Continuous and long short-term memory network for accurate identification of hidden Markov model for pedestrian activity classification and gait CAD ECG signals, Computers in Biology and Medicine 94 (2018) analysis, IEEE Transactions on Instrumentation and Measurement 19–26. 62 (2013) 1073–1083. [212] Z. Xiong, M. P. Nash, E. Cheng, V. V. Fedorov, M. K. Stiles, J. Zhao, [197] M. Inoue, S. Inoue, T. Nishida, Deep recurrent neural network for ECG signal classification for the detection of cardiac arrhythmias mobile human activity recognition with high throughput, Artificial using a convolutional recurrent neural network, Physiological Mea- Life and Robotics 23 (2018) 173–185. surement 39 (2018) 094006. [198] A. Lisowska, G. Wheeler, V. Ceballos Inza, I. Poole, An evaluation [213] Y. M. Saidutta, J. Zou, F. Fekri, Increasing the learning Capacity of of supervised, novelty-based and hybrid approaches to fall detec- BCI Systems via CNN-HMM models, in: Proceedings of the 2018 tion using silmee accelerometer data, in: Proceedings of the IEEE 40th Annual International Conference of the IEEE Engineering in International Conference on Computer Vision, 2015, pp. 402–408. Medicine and Biology Society (EMBC), IEEE, 2018, pp. 1–4. [199] T. Theodoridis, V. Solachidis, N. Vretos, P. Daras, Human fall de- [214] Z.-R. Wang, J. Du, W.-C. Wang, J.-F. Zhai, J.-S. Hu, A compre- tection from acceleration measurements using a recurrent neural hensive study of hybrid neural network hidden Markov model for network, in: N. Maglaveras, I. Chouvarda, P. de Carvalho (Eds.), Pre- offline handwritten Chinese text recognition, International Journal on cision Medicine Powered by pHealth and Connected Health, Springer Document Analysis and Recognition (IJDAR) 21 (2018) 241–251. Singapore, 2017, pp. 145–149. [215] Z.-R. Wang, J. Du, J.-M. Wang, Writer-aware CNN for parsimonious [200] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning HMM-based offline handwritten Chinese text recognition, Pattern applied to document recognition, Proceedings of the IEEE 86 (1998) Recognition 100 (2020) 107102. 2278–2324. [216] N. C. Dvornek, D. Yang, P. Ventola, J. S. Duncan, Learning gen- [201] T. Glasmachers, Limits of End-to-End Learning, arXiv preprint eralizable recurrent neural networks from small task-fMRI datasets, arXiv:1704.08305 (2017). in: A. F. Frangi, J. A. Schnabel, C. Davatzikos, C. Alberola-López, [202] G. Cheron, J.-P. Draye, M. Bourgeios, G. Libert, A dynamic neu- G. Fichtinger (Eds.), Proceedings of the 21st Conference on Medi- ral network identification of electromyography and arm trajectory cal Image Computing and Computer Assisted Intervention, Springer relationship during complex movements, IEEE Transactions on International Publishing, 2018, pp. 329–337. Biomedical Engineering 43 (1996) 552–558. [217] C. Yu, Y. Khalifa, E. Sejdić, Silent aspiration detection in high reso- [203] G. Cheron, F. Leurs, A. Bengoetxea, J. P. Draye, M. Destrée, B. Dan, lution cervical auscultations, in: Proceedings of the IEEE-EMBS In- A dynamic recurrent neural network for multiple muscles electromyo- ternational Conference on Biomedical and Health Informatics, 2019, graphic mapping to elevation angles of the lower limb in human pp. 1–4. locomotion, Journal of Neuroscience Methods 129 (2003) 95–104. [218] S. J. Pan, Q. A. Yang, A Survey on transfer learning, IEEE Transac- [204] S. Chauhan, L. Vig, Anomaly detection in ECG time signals via tions on Knowledge and Data Engineering 22 (2010) 1345–1359. deep long short-term memory networks, in: Proceedings of the IEEE [219] A. Isin, S. Ozdalili, Cardiac arrhythmia detection using deep learning, International Conference on Data Science and Advanced Analytics, Procedia Computer Science 120 (2017) 268 – 275. 2015, pp. 1–7. doi:10.1109/DSAA.2015.7344872. [220] C. Wei, Y. Lin, Y. Wang, T. Jung, N. Bigdely-Shamlo, C. Lin, Se- [205] V. G. Sujadevi, K. P. Soman, R. Vinayakumar, Real-time detection lective transfer learning for EEG-based drowsiness detection, in: of atrial fibrillation from short time single lead ECG traces using Proceedings of the IEEE International Conference on Systems, Man, recurrent neural networks, in: S. M. Thampi, S. Mitra, J. Mukhopad- and Cybernetics, 2015, pp. 3229–3232. hyay, K.-C. Li, A. P. James, S. Berretti (Eds.), Intelligent Systems [221] Y.-Q. Zhang, W.-L. Zheng, B.-L. Lu, Transfer components be- Technologies and Applications, Advances in Intelligent Systems and tween subjects for EEG-based driving fatigue detection, in: S. Arik, Computing, Springer International Publishing, Cham, 2017, pp. 212– T. Huang, W. K. Lai, Q. Liu (Eds.), Proceedings of the 29th Confer- 221. doi:10.1007/978-3-319-68385-0_18. ence on Neural Information Processing Systems, Springer Interna- [206] Y. LeCun, K. Kavukcuoglu, C. Farabet, Convolutional networks and tional Publishing, 2015, pp. 61–68. applications in vision, in: Proceedings of the 2010 IEEE International [222] U. Côté-Allard, C. L. Fall, A. Drouin, A. Campeau-Lecours, C. Gos- Symposium on Circuits and Systems, 2010, pp. 253–256. doi:10.1109/ selin, K. Glette, F. Laviolette, B. Gosselin, Deep learning for elec- ISCAS.2010.5537907. tromyographic hand gesture signal classification using transfer learn- [207] Y. LeCun, B. E. Boser, J. S. Denker, D. Henderson, R. E. Howard, ing, IEEE Transactions on Neural Systems and Rehabilitation Engi- W. E. Hubbard, L. D. Jackel, Handwritten Digit Recognition with a neering 27 (2019) 760–771. Back-Propagation Network, in: D. S. Touretzky (Ed.), Proceedings of the 3rd Conference on Neural Information Processing Systems, Morgan-Kaufmann, 1990, pp. 396–404. Khalifa et al., 2020 Page 21 of 21 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Quantitative Biology arXiv (Cornell University)

A Review of Hidden Markov Models and Recurrent Neural Networks for Event Detection and Localization in Biomedical Signals

Quantitative Biology , Volume 2020 (2012) – Dec 11, 2020

Loading next page...
 
/lp/arxiv-cornell-university/a-review-of-hidden-markov-models-and-recurrent-neural-networks-for-kVQ807PrTY
ISSN
1566-2535
eISSN
ARCH-3345
DOI
10.1016/j.inffus.2020.11.008
Publisher site
See Article on Publisher Site

Abstract

A Review of Hidden Markov Models and Recurrent Neural Networks for Event Detection and Localization in Biomedical Signals a b a,c,d,e,< Yassin Khalifa , Danilo Mandic and Ervin Sejdić Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, PA, USA Department of Electrical and Computer Engineering, Imperial College, London, SW7 2BT United Kingdom Department of Bioengineering, University of Pittsburgh, Pittsburgh, PA, USA Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA A R T I C L E I N F O A B S T R A C T Keywords: Biomedical signals carry signature rhythms of complex physiological processes that control our Event Detection daily bodily activity. The properties of these rhythms indicate the nature of interaction dynamics Hidden Markov Models among physiological processes that maintain a homeostasis. Abnormalities associated with diseases or Recurrent Neural Networks disorders usually appear as disruptions in the structure of the rhythms which makes isolating these Deep Learning rhythms and the ability to differentiate between them, indispensable. Computer aided diagnosis systems Biomedical Signal Processing are ubiquitous nowadays in almost every medical facility and more closely in wearable technology, Transfer Learning and rhythm or event detection is the first of many intelligent steps that they perform. How these rhythms are isolated? How to develop a model that can describe the transition between processes in time? Many methods exist in the literature that address these questions and perform the decoding of biomedical signals into separate rhythms. In here, we demystify the most effective methods that are used for detection and isolation of rhythms or events in time series and highlight the way in which they were applied to different biomedical signals and how they contribute to information fusion. The key strengths and limitations of these methods are also discussed as well as the challenges encountered with application in biomedical signals. ability, and fast response. Of these assistive technologies, 1. Introduction computer aided diagnosis and wearable systems are powered Physiological processes are complex tasks performed by by the virtual side of artificial intelligence (machine learning the different systems of the human body in a rarely periodic techniques) and play a vital role in anomaly detection, moni- but rather irregular manner to deliver an action that could be toring, and even emergency response [3]. The rise of such biochemical, electrical, or mechanical [1, 2]. Some of these systems has led to the evolution of biomedical signal analysis actions are obvious like heart beating, breathing, and other which has been the focus of researchers for the last couple of physical activities and some are not as obvious like hormonal decades. This evolution not only included the macro-analysis stimulation that regulates multiple body functions. The action of gross processes but also the detection and analysis of micro- produced can be usually manifested as some sort of a signal events within each gross process [3]. As mentioned before, that holds information about the parent physiological process biomedical signals carry the signatures of many processes [2]. Disruptions in these physiological processes associated and artifacts, which makes the extraction/identification of the with diseases, lead to the development of pathological pro- specific part of interest (called event or epoch), the first step cesses that alter the performance of the human body. Both of any systematic signal analysis or monitoring [4]. Further, normal and pathological processes in addition to other arti- the need for robust event extraction algorithms for biomedical facts from the environment and surrounding processes, are all signals is driven by the exponential growth of the amount held in the manifested signals and the associated changes in and complexity of data generated by biomedical systems [5]. their waveform. These signals are called biomedical signals Moreover, reducing the human-dependent steps in the analy- and can be of many forms including the electrical form (po- sis, mitigates the reliability and subjectivity issues associated tential or current changes) or physical (force or temperature) with human tolerance. [2]. Epoch extraction is not only essential for systematic signal Artificial intelligence is currently taking over to empower analysis, but also substantial to information fusion for multi- a variety of assistive technologies that help solve the problems channel systems and/or sensor networks which represent a of the healthcare sector given the continuously increasing large portion of biomedical-signal-based decision-making cost and shortage of professional caregivers. These technolo- systems nowadays. Multiple fusion models can employ epoch gies are advancing to perform not only diagnosis but also extraction and event detection to overcome different obstacles intervention and curing due to the superior sensitivity, adapt- including but not limited to signal synchronization and feature fusion [6, 7]. In complementary data-level fusion, events can Corresponding author Email address: esejdic@ieee.org (E. Sejdić) be used to align signals as preparation for feature extraction URL: www.imedlab.org (E. Sejdić) such as using heart beats to align the signals from multiple ORCID(s): 0000-0003-4987-8298 (E. Sejdić) electrocardiography (ECG) leads. In feature-level fusion Khalifa et al., 2020 Page 1 of 21 arXiv:2012.06104v1 [cs.LG] 11 Dec 2020 models, event detection can be used to combine features state-of-the-art practices and results. We show the theoretical from different signals during only the events of interest that and practical aspects for most of the methods and the way in contribute to morphology analysis and the decision-making which they were used to handle the time modeling in event process [7, 8]. detection problems. Further, we discuss the recent major ma- Epoch extraction algorithms have been used repeatedly in chine learning applications in biomedical signal processing segmentation of many biomedical signals, including, but not and the anticipated advances for future implementations. limited to, heart sound and ECG [9, 10], electroencephalog- raphy (EEG) [11–13], and swallowing vibrations [14–17]. 2. Hidden Markov Models Such algorithms immensely depend on modeling time-series, A time series can be characterized using either deter- the paradigm that is not explicitly provided by regular ma- ministic or stochastic models. Deterministic models usually chine learning and sequence-agnostic models such as support describe the series using some specific properties such as be- vector machines, regression, and feed forward neural net- works [18]. These models depend on a major assumption ing the sum of sinusoids or exponentials and aim to estimate that the training and test examples are independent and not the values of the parameters contributing to these properties related in time or space which in result initiates a reset to the (e.g. amplitude, frequency, and phase of the sinusoids) [20]. entire state of the model [18]. Particularly speaking, splitting On the other hand, statistical models assume that the series can be described through a parametric random process whose time series into data chunks and using consecutive chunks parameters can be estimated in a well-defined way [20, 24]. independently in building models is unacceptable because HMMs belong to the category of statistical models and usu- even in the case of modeling a time series with iid processes, ally are referred to as probabilistic functions of Markov chains the underlying processes might be longer than a single chunk in the literature [20, 25]. which induces dependency between consecutive chunks. Sliding window approach has been introduced to tackle 2.1. Markov Chains the problem of dependence between consecutive chunks through Markov chain is a stochastic process modeled by a finite using an overlap which guarantees that a part of each chunk state machine that can be described at any instance of time will be carried over to the next chunk. Although this might to be one of N distinct states. These states can be tags or be useful in modeling many processes, it fails to model long symbols representing the problem of interest. The machine range dependencies and requires the optimization of both may stay at the same state or switch to another state at reg- data chunk and overlap lengths to best represent the target ularly spaced discrete times according to a set of transition processes. Additionally, using windowing in time domain probabilities associated with each state [20, 24] and the tran- provokes a sort of distortion to the frequency representation sition probabilities are assumed to be time independent. The due to the leakage effect and can only be used for modeling initial state is deemed to be known and the transition proba- fixed-length input/output scenarios [18]. All of this raised bilities are described using the transition matrix: A = ^a `; the need for models capable of selectively transferring states ij where a is the transition probability from state S to state across time, processing sequences of not necessarily indepen- ij i S and both i, and j can take values from 1 to N. The ac- dent elements, and yielding a computational paradigm that j tual state at time t is denoted as q and for a full description can handle variable-length inputs and outputs [19]. It was not of the probabilistic model, the current state as well as at that long before the researchers started to bring stochastic- least the state previous to it (for a first order Markov chain), based models [20] and design deep recurrent networks [19] need to be specified. The first order Markov chain assumes to perfectly fit the event extraction problems and overcome that the current state depends only on the previous state: the limitations of regular machine learning methodologies. P.q = jðq = i; q = k;§/ = P.q = jðq = i/. This Multiple models have been offered for time dependency t t*1 t*2 t t*1 results in the following properties for the transition probabili- representation including Hidden Markov models (HMMs) ties: and Recurrent Neural Networks (RNNs). HMMs were intro- duced as an extension to Markov chains to probabilistically model a sequence of observations based on an unobserved a = P.q = jðq = i/; i g 1; j f N ij t t*1 sequence of states [20]. On the other hand, RNNs generalize a g 0 the feed-forward neural networks with the ability to process ij sequential data one step at a time while selectively trans- ferring information across sequence elements [18]. Hence, a = 1 ij j=1 RNNs are successful in modeling sequences with unknown length, components that are not independent, and multi-scale The probability of being at state S at t = 1 is denoted as sequential dependencies [19, 21, 22]. Further, RNNs over- , and the initial probability distribution as: came a major HMM limitation in modeling the long-range dependencies within the sequences [18, 23]. In this manuscript, we review the fundamental methods = P[q = S ]; 1 f i f N i 1 i developed for event extraction in biomedical signals and un- = [ ;  ;§ ;  ] 1 2 N ravel the key differences between these methods based on the Khalifa et al., 2020 Page 2 of 21 An example of a 4-states Markov chain is shown in Fig. Algorithm 1: HMM as observations generator 1. This stochastic process is called the observable Markov 1 Set t = 1; model since each state corresponds to a visible (observable) 2 Choose an initial state q = S according to ; 1 i 3 while t f T do event. 4 Choose O = v according to the observation distribution in t k the current state (b .k/); 5 Move from the current state S to the new state q = S i t+1 j according to a ; ij 6 set t = t + 1; 7 end 8 Result: O = ^O ; O ;§ ; O ` 1 2 T • Decoding: Choosing the optimal hidden state sequence Q = ^q ; q ;§ ; q ` that best represents a given ob- 1 2 T servation sequence (O = ^O ; O ;§ ; O `). 1 2 T • Estimation: Adjusting the model parameters .A; B; / Figure 1: An example of a Markov chain with 4 states, S to S , 1 4 to maximize the likelihood of a given sequence of ob- and selected state transitions. A set of probabilities is associated servations O. with each state to indicate how the system undergoes state change from one state to itself or another at regular discrete 2.3. Likelihood Problem Solution times. In the case of Markov chains, where the states are not hidden, the computation of the likelihood is much easier as it narrows the computational burden to just multiplying the tran- 2.2. Hidden Markov Models sition probabilities within the underlying state sequence. In So far, we introduced Markov chains in which each state HMMs, states are hidden which necessitates including all pos- corresponds to an observable event, however this is insuf- sible state sequences in computing the joint probability (N ficient for most of the applications where the states cannot possible hidden state sequences). A dynamic programming always be observable. Therefore, Markov chain models are solution called the forward-backward algorithm was created extended to HMMs which can be widely used in many appli- for the likelihood problem with a simple time complexity [20]. cations [20]. HMM is considered a doubly stochastic process The forward-backward algorithm sums the probabilities of all with one of them hidden or not observable; states, in this case, possible state sequences that could be included in generating are hidden from the observer [20]. An HMM is characterized the target observation sequence. The algorithm considers through the following properties [20, 24]: an efficient way to calculate the probability through defin- ing and inductively computing the forward variable .t; i/ 1. The number of states, N, included in the model. As which represents the probability of the partial observation mentioned before, the states are usually hidden in HMMs sequence P.O O § O ; q = S ð/ [20, 27, 28]. The for- but sometimes they have a physical significance. 1 2 t t i ward algorithm for the likelihood problem is fully described q Ë ^S ; S ;§ ; S ` as follows: t 1 2 N 2. The number of distinct observations, a state can take, Algorithm 2: The forward algorithm M. 1 O = ^O ; O ;§ ; O `; 1 2 T 3. The state transition matrix or distribution A = ^a `. ij 2 S Ë ^S ; S ;§ ; S `; 1 2 N 4. The observation probability distribution for each state 3 Create the forward probability table [T; N]; 4 foreach state S Ë ^S ; S ;§ ; S ` do B = ^b .k/` = P[v at tðq = S ]; where v repre- 1 2 N j k t j k 5 [1; S] }   b .O /; // Initialization S S 1 sents an element of the distinct observations that a state 6 end can take and 1 f j f N; 1 f k f M. 7 foreach time step t Ë 2;3;§ ; T do 8 foreach state S Ë ^S ; S ;§ ; S ` do 5. The initial state distribution  = ^ `. 1 2 N 9 [t; S] } [t * 1; S]  a  b .O /; ‚ S t S;S When known, the previously mentioned parameters can S=S be used to fully describe the HMM (.A; B; /) and generate // Induction an observation sequence O = ^O ; O ;§ ; O ` as in the 1 2 T 10 end algorithm shown in Algorithm 1. 11 end For the model to be useful for trending applications, it ³ 12 P.Oð.A; B; // } [T; S]; // Termination must address three fundamental problems [26]: S=S 13 Result: P.Oð.A; B; // • Likelihood: Computing the probability of an observa- tion sequence O = ^O ; O ;§ ; O `, given the model 1 2 T (P.Oð/). As a part of the forward-backward algorithm, another Khalifa et al., 2020 Page 3 of 21 variable is considered that will be of help in the solution Algorithm 4: The Viterbi algorithm of the estimation problem. The variable is called the back- 1 O = ^O ; O ;§ ; O `; 1 2 T ward probability table, .t; i/ = P.O ; O ; § ; O ðq = t+1 t+2 T t 2 S Ë ^S ; S ;§ ; S `; 1 2 N 3 Create the best path probability table [T; N]; S ; .A; B; //, which represents the probability of the par- 4 Create the state index table (the index of state that by adding to tial observation sequence that starts one time step after the the path, maximizes ) [T; N]; current observation, given the current state S and the model. 5 foreach state S Ë ^S ; S ;§ ; S ` do 1 2 N The backward probability can be calculated in a similar way 6 [1; S] }   b .O /; // Initialization S S 1 7 [1; S] } 0; as the forward probability (Algorithm 3). 8 end 9 foreach time step t Ë 2;3;§ ; T do Algorithm 3: Computing the backward probability 10 foreach state S Ë ^S ; S ;§ ; S ` do 1 2 N 1 Create the backward probability table [T; N]; ‚ 11 [t; S] } max [t * 1; S]  a  b O ; // Induction S t S;S S=S 2 foreach state S Ë ^S ; S ;§ ; S ` do 1 1 2 N 3 [T; S] } 1; // Initialization 12 [t; S] } arg max [t * 1; S]  a  b O ; ‚ S t S;S 4 end S=S 5 foreach time step t Ë T * 1; T * 2;§ ;1 do 13 end 6 foreach state S Ë ^S ; S ;§ ; S ` do 1 2 N 14 end 7 [t; S] } [t + 1; S]  a  b .O /; ‚ ‚ 15 P } max [T; S]; // Termination S;S S t+1 ‚ S=S S=S // Induction < 16 q } arg max [T; S]; S=S 8 end 17 for t Ë ^T; T * 1; T * 2; § ; 2` do 9 end 18 q } [t; q ]; // Backtracking 10 Result: [T; N] t*1 19 end < < < 20 Result: The optimal state sequence: q ; q ; § ; q 1 2 T 2.4. Decoding Problem Solution: The Viterbi Algorithm Finding the optimal hidden states sequence that best rep- resents a sequence of observations is more challenging com- „ „ Q.; / = P.O ; q ð/log P.O ; q ð/; 1:T 1:T 1:T 1:T pared to the likelihood problem. Unlike the likelihood prob- Åq lem, the decoding problem does not have an exact solution T*1 unless the model is degenerate, which makes it hard to choose where P.O ; q ð/ =  a b .O /, and  is 1:T 1:T q ;q q t+1 the optimality criterion that judges the state sequence [20]. t t+1 t+1 t=1 For example, one may choose states based on the individ- the initial model. The iterations are performed based on the ual likelihood of occurrence which achieves the maximum calculations by the forward-backward probabilities described number of correct states individually but not for the overall previously in the solution of the first two problems, and they computed sequence [20]. Another way to solve the decod- go as described in Algorithm 5. ing problem can be achieved through running the forward- backward algorithm for all possible hidden state sequences 2.6. Continuous Density HMM and choose the sequence with the maximum likelihood prob- The previously described adaptations for HMM problems ability, however this is computationally unfeasible [26]. are based on the requirement that the observations are dis- In the same way as the forward-backward algorithm, the crete which is considered restrictive because in most cases Viterbi algorithm solves the decoding problem using dynamic they are continuous. Therefore, a necessary first step will programming. The algorithm recursively computes the prob- be the transformation of continuous observation sequence ability of being in a state S at time t taking in consideration into a discrete vector. This can be done through dividing the most probable state sequence (path) q ; q ; § ; q the observations’ space into sub-spaces and using codebooks 1 2 t*1 that leads to this state. The Viterbi algorithm is shown in to give discrete symbol/value for each sub-space [24]; how- Algorithm 4. ever, this introduces quantization errors into the problem. One way to overcome this, is using continuous observation 2.5. Model Estimation Problem Solution densities in HMM’s. The finite mixture representation of The third problem can be formulated as finding HMM’s the observation density function, is one of the representa- model parameters .A; B;/ to maximize the conditional prob- tions that has a formulated re-estimation procedure: b .O/ = ability of observation sequence, given that model [20]. Such a problem doesn’t have an analytical solution, however, itera- c N[O;  ; U ]; where 1 f j f N, O is the observa- jm jm jm m=1 tive methods can be used to find a local maxima for P.Oð/. th tion vector, c is the mixture coefficient for the m mixture jm Here, we focus on the Baum-Welch algorithm that is based in state j, and N is an elliptically or long-concave symmetric on the expectation-maximization method [29, 30]. The al- density with a mean vector of  and a covariance matrix jm gorithm is based on maximizing Baum’s auxiliary function of U [31–33]. A Gaussian density function is usually used jm over the updated model parameters , Khalifa et al., 2020 Page 4 of 21 estimation can be defined through including the state duration Algorithm 5: The estimation algorithm in the calculation of forward and backward variables. The 1 O = ^O ; O ;§ ; O `; 1 2 T re-estimation formulae can be found in detail in the tutorial 2 S Ë ^S ; S ;§ ; S `; 1 2 N 3 Initialize  = .A; B; /; of Rabiner [20]. 4 repeat 5 Using the forward-backward algorithm and  calculate [T; N] and [T; N]; 6 Create the probability tables [T; N; N] (the probability of being in a state S at time t and a state S at time t + 1) and i j [T; N] (the probability of being in a state S at time t); 7 foreach time step t Ë 2;3;§ ; T do 8 foreach state S Ë ^S ; S ;§ ; S ` do 1 2 N 9 foreach state S Ë ^S ; S ;§ ; S ` do 1 2 N 10 [t; S; S ] } (a) [t;S]a b .O / [t+1;S ] < < S;S S t+1 S S N N ³ ³ [t;S]a <b <.O / [t+1;S ] S;S S t+1 S=S S =S 1 1 11 end 12 [t; S] } [t; S; S]; S=S (b) 13 end 14 end Figure 2: An illustration of interstate connections in HMMs. (a) 15 „ } [1; S]; represents a normal HMM with self transitions from each state T*1 [t;S;S] back to itself. (b) represents a variable duration HMM with no t=1 16 a„ } ; S;S T*1 ³ self state transition and specified state duration densities. [t;S] t=1 [t;S] t=1 s:t: O =v t k 17 b .k/ } ; S T 3. Recurrent Neural Networks [t;S] t=1 „ „ „ „ 18  = .A; B;/; Neural networks are biologically-inspired computational 19 until Convergence; models that are composed of a set of artificial neurons (nodes) 20 Result: .A; B; / joined with directed weighted edges which recently became popular as pattern classifiers [18, 36]. The network is usually activated by feeding an input that then spreads throughout for N; however, other non-Gaussian models have been con- the network along the edges. Many types of neural networks sidered as well in many applications [34, 35]. The pdf is have evolved since its first appearance; however, they will fall guaranteed to be normalized, given that the mixture coeffi- under two main categories, the networks whose connections form cycles and the ones that are acyclic [36]. RNNs are the cients satisfy the following stochastic conditions: c = 1 jm type of neural networks that introduces the notion of time by m=1 and c g 1, where 1 f j f N; 1 f m f M. The parame- using cyclic edges between adjacent steps. RNNs have been jm ters of the observation density function (c ;  ; U ) can proposed in many forms including Elman networks, Jordan jm jm jm be estimated through the modified Baum-Welch algorithm networks, and echo state networks [37–40]. [20]. Using continuous density in HMM makes it more accu- rate; however, it requires a larger dataset and a more complex algorithm to train. 2.7. State Duration in HMM One of the convenient ways to include state duration in HMMs, especially with physical signals, is through explicitly modeling the duration density and setting the self-transition coefficients into zeros [20]. The transition from a state to Figure 3: A simple RNN with a single hidden layer. At each another only occurs after a certain number of observations, time step t, output is produced through passing activations as specified by duration density, is made in the current state as in a feedforward network. Activations are passed to next node shown in Fig. 2. In normal HMMs, the states have expo- at time t + 1 as well to achieve recurrence. nential duration densities that depend on the self transition coeeficients a and a as in Fig. 2(a). In HMMs where state ii jj duration is modeled by explicit duration densities, there is no As shown in Fig. 3, the hidden units at time t receive input self transition and the transition happens only after a specific from the current input x and the previous hidden unit value number of observations determined by the duration density h . The output y is calculated using the current hidden t*1 t as in Fig. 2(b). The re-estimation formulae needed for model unit value h . Time dependency is created between time steps Khalifa et al., 2020 Page 5 of 21 by means of recurrent connections between hidden units. In dependencies by Elman [37]. a forward pass, all the computations are specified using the 3.2. Training of RNNs following two equations: h =  W x + W h + b , t h x t h t*1 h The expression of a generic RNN can be represented as y =  W h + b ; where W and W represent the ma- t y y t y x y h = F.h ; x ; / = W  .h / + W x + b , where trices of weights between the hidden units and both input t t*1 t h h t*1 x t h refers to the network parameters W : recurrent weight matrix, and output respectively and W is the matrix of weights be- h W : input weight matrix, and b : the bias. Initial state h , tween adjacent time steps. b and b are bias vectors which x h 0 h y is usually set to zero, provided by user, or learned. Network allow offset learning at each node. Nonlinearity is introduced performance on a certain task is measured through a cost through the activation functions  and  which can be hy- h y function " = " , where " = L.h /, T is the sequence perbolic tangent function (tanh), sigmoid, or rectified linear t t t 1ftfT length (total number of time steps), and L is the cost operator unit (ReLU). In a simple RNN unit, tanh is usually used. that measures the performance of the network (e.g. squared error and entropy). Necessary gradients for optimization can be computed using backpropagation through time (BPTT), where the network is unrolled in time so that the application of backpropagation is feasible as shown in Fig. 5. (a) (b) Figure 5: Unfolded recurrent neural network in time [42]. " Figure 4: Early designs of RNNs. The dotted arrows represent denotes the error calculated from the output, h represents the the edges feeding at the next time step. (a) Jordan network. hidden state, and x represents the input at time t. Output units are connected to context units that provide feed- back at next time step to hidden units and themselves. (b) )" A gradient component is calculated through the sum- Elman network. Hidden units are connected to the context mation of temporal components as follows: units that provide feedback to the hidden units only at the next time step. )" )" ) ) 1ftfT 3.1. Early RNN Architectures 0 1 )" )" )h )h t t t k Jordan [41] introduced an early form of recurrence in =   ) )h )h ) t k networks by adding extra "special" units called context or 1fkft state units that feed values to the hidden units in the following time step. The network was as simple as a multi-layer feed- Ç Ç )h )h t i T ¨ = = W diag  h (1) forward network with the context units taking input from the i*1 h h )h )h k i*1 tgi>k tgi>k network output at the current time step and feed them back to themselves and the hidden units at the next time step as The effect that the network parameters () at step k have shown in Fig. 4 (a). The context units allow the network over the cost at subsequent steps (t > k), can be measured to remember its outputs at previous time steps and being )" )h )h t t k through the temporal gradient component   . self connected enables sending information across time steps )h )h ) t k In Eq. 1, the matrix factors are in the form of a product of without intermediate output perturbation [18]. Elman [37] t*k Jacobian matrices which will either explode or shrink to also introduced a simple architecture in which the context zero depending on whether the recurrent weights are greater units are associated with each each hidden layer unit at the or smaller than one [42]. The vanishing gradient is common current time step and give feedback to the same hidden unit when using sigmoid activations, while the exploding gradient at the next time step as shown in Fig. 4 (b). This notation of self-connected hidden units became the basis for the work This formulation doesn’t contradict with the previously mentioned and design of long-short term memory (LSTM) units [19]. formulation h =  W x + W h + b and both have the same be- t h x t h t*1 h havior [42]. This type of recurrence has been demonstrated to learn time Khalifa et al., 2020 Page 6 of 21 is more common when using rectified linear unit activations [18, 42]. Enforcing the weights through regularization to values that help avoid gradient vanishing and exploding, is one of the solutions to such a problem. Truncated backpropa- gation through time (TBPTT) is also used as another solution for exploding gradient through setting a maximum number of time steps through which the error is propagated [18]. 3.3. Current RNN Designs Although early designs of RNNs helped to map input into output sequences through using contextual information, this contextual mapping had limited range and the influence of input on hidden layers and thus output, either vanishes (a) or blows up due to cycling through the network recurrent connections as described previously [43]. Gradient vanish- ing/exploding problem has led to the emergence of new net- work designs that improve convergence [44, 45]. Of these designs, LSTM, gated recurrent units (GRU), and bidirec- tional RNNs (BRNN) have proved superiority in long-range contextual mappings and employing both future and past contexts to determine the output of the network [18]. Both LSTM and GRU resemble a standard RNN but with each hidden node replaced by a complete cell as shown in Fig. 6. They also employ a unity-weighted recurrent edge to ensure the transfer of gradient across time steps without decaying or exploding. LSTM forms the long-term memory through the (b) weights which change slowly during training. On the other Figure 6: Current designs of RNNs. The symbols used in hand, short term memory is formed by transient activations both diagrams are as follows, : represents concatenation, + that pass between successive node [18]. GRU is an LSTM represents element-wise summation,  represents element-wise alternative that has a simpler structure and is faster to train; multiplication,  represents a sigmoid activation, and tanh rep- however, it still provides comparable performance to LSTM resents a hyperbolic tangent (tanh) activation. (a) Schematic [46]. of an LSTM unit which is typically composed of three main In an LSTM unit, a forget gate is an adaptive gate whose parts, input, output, and forget gates. (b) Schematic of a GRU output is squashed through a sigmoid activation in order unit which is a simplified version of LSTM with only reset and to reset the memory blocks once they are out of date and update gates. prevent information storage for arbitrary time lags [47]. The input gate is a sigmoid activated gate whose function is to regulate the new information to be written to the cell state. 4. Critical Differences between HMMs and The output gate is also a sigmoid activated gate that regulates RNNs the internal state after being dynamically customized through a tanh activation to be forwarded as the unit output. In the As demonstrated in the previous sections, construction same way, the GRU unit has a similar design; however, it of hidden Markov models relies on a representing state space doesn’t have an output gate. It has a reset gate that works as a from which states are drawn. Scaling such system has long forget gate and an update gate to regulate the write operation been considered to be difficult or infeasible even with the pres- into the unit output from both the state of the past time step ence of dynamic programming solutions such as the Viterbi and the input from the current time step. algorithm due to the quadratic complexity nature of the infer- On the other hand, BRNNs resemble a standard RNN ence problem and transition probability matrix which causes architecture as well but with two hidden layers instead of the model parameter estimation and inference to scale in time one and each hidden layer is connected to both input and as the size of the state space grows [48]. Modeling long output . One hidden layer passes activations in the forward range dependencies also is impractical in HMMs as transi- directions (from the past time steps) and the other layer passes tions occur from a state to the following with no memory the activations in the backward direction (from future time of the previous state unless a new space is created with all steps). BRNN is in fact a wiring method for RNN hidden possible cross-transitions at each time window which leads layers regardless of the type of the nodes, which makes it to exponential growth of the state space size [18, 23]. On compatible with most RNN architectures including LSTM the other hand, the number of states that can be represented and GRU [18, 44]. by a hidden layer in RNNs increases exponentially with the number of nodes in the layer leading to nodes that can carry Khalifa et al., 2020 Page 7 of 21 information from contexts of arbitrary lengths. Moreover, clude the different sleep stages and sleep disorders, epileptic despite of the exponential growth of the expressive power of seizures, the effect of music or other artifacts, and the motor the network, training and inference complexities only grow imagery tasks. quadratically at most [18]. From a theoretical point of view, 6.1. Sleep Staging in EEG RNNs can be efficient in the perception of long contexts; Sleep is an essential part of the human life cycle and plays however, this comes at the cost of error propagation. Highly a vital role in maintaining most of the body functionality [99]. sampled inputs as in the case of raw waveforms, can lead Sleep disorders include problems with initiating sleep, insom- to elongation of the range through which the error signal nia, and sleep apnea syndrome (SAS) [100]. Diagnosis of propagates, thus making the network hard to optimize and sleep disorders can be done through identifying sleep stages reducing the efficiency of computational acceleration tools in an overnight polysomnogram (PSG) which utilizes EEG such as GPUs [49, 50]. as one of its sensing modalities [101]. Visual scoring of the PSG components is the basic way to categorize sleep epochs 5. Event Detection in Electrocardiography and as any manual rating, it suffers from subjectivity and ECG is the graphical interpretation of skin-recorded elec- inter-rater tolerance. Many attempts have been proposed in trical activity of the electric field originating in the heart the literature to remedy the problems of expert-based visual [74]. ECG provides information that is not readily available scoring of the different components of PSG. The attempts through other methods about heart activity and is consid- employed multiple algorithms to achieve automatic sleep ered the most commonly used procedure in the diagnosis of staging including Markov models and neural networks. Here, cardiac diseases due to the fact that it is non-invasive, sim- we list the recent publications (Table 2) for sleep staging and ple, and cost-effective. This makes ECG subject to intense the detailed description of the methods used within the scope research related to the automatic analysis to reduce the subjec- of our review. tivity and the time spent on interpreting hours of recordings 6.2. Epilepsy detection in EEG [54, 75, 76]. ECG is a time periodic signal, which allows Epilepsy is one of the episodic disorders of the brain to mark out an elementary beat that constitutes the basis for that is characterized by recurrent seizures, unjustified by ECG signal analysis [54]. For instance, heart rate can be any known immediate cause [102, 103]. Epileptic seizure estimated through the detection of QRS-complex from an is the clinical manifestation that results from the abnormal ECG signal and the time interval between successive QRS- excessive discharge of some set of neurons in the brain [102]. complexes (also known as R-R interval) can be used to detect The seizure consists of transient abnormal alterations of sen- premature ectopic beats [74]. In that sense, ECG beat de- tection is considered fundamental for most of the automated sory, motor, consciousness, or psychic behavior [102, 103]. analysis algorithms. A detailed description of the recent pub- Around 80% of the epileptic seizures can be effectively treated lications that cover event detection in ECG using different if early discovered [104]. Although seizure activity can be methods, is included in Table 1. easily distinguished in EEG as transient spikes and relatively quiescent periods, it is a time-consuming process and needs clinicians to devote a tremendous amount of time going 6. Event Detection in Electroencephalography through hours and days of EEG activity [105]. An efficient EEG is mostly a non-invasive technique to measure the and reliable seizure prediction/detection method can be of a electrical activity of the brain through a set of electrodes great help for the diagnosis, treatment, and even early warn- placed on the subject’s scalp. EEG exhibits highly non- ing for patients to stop activities that might be of a significant stationary behavior and significant non-linear dynamics [77]. danger during an episode like driving. Several methods have The excitatory and inhibitory postsynaptic potentials of the been proposed for seizure prediction, at which EEG signal cortical nerve cells are considered the main source of EEG features are temporally analyzed and compared to heuristic signals [78]. EEG can be invasive if acquired using subdu- thresholds to trigger a warning for seizures; however, these ral electrode grids or using depth electrodes and is called methods lack generalization when investigated on extensive intracranial EEG (iEEG); however, typical EEG signals are datasets [106–111]. This can be referred to using feature sets recorded from scalp locations specified by the 10-20 electrode that are not highly affected by the transition from seizure-free placement criterion designed by the International Federation to peri-ictal or seizure states or simply the effect cannot be of Societies for Electroencephalography and have an am- tracked using low-order statistics [106]. Therefore, stochastic- plitude of 10-100 V and a frequency range of 1-100 Hz based models, multivariate analysis, and long-range analysis [77, 78]. EEG signals are used in the diagnosis of multiple methods were investigated to provide better performance and neurological disorders including epilepsy, lesions, tumors, generalization for EEG-based epileptic seizure prediction. In and depression and their characteristics depend strongly on Table 3, we review the recent publications that use HMMs the age and state of the subject. There are multiple events and RNNs for seizure prediction. that influence EEG and require the tedious job of analyzing hours of recordings to be extracted. These events range from 6.3. BCI Tasks in EEG the diagnosis/detection of certain seizures and syndromes to Motor imagery alters the the neural activity of the brain’s the tasks of brain computer interface (BCI). These events in- sensorimotor cortex in a way that is as observable as if the Khalifa et al., 2020 Page 8 of 21 Table 1 Summary of event detection work done in ECG event detection Publication Event under investigation Implementation details Dataset Gersch et al. [51], 1975 Premature Ventricular Contraction A three states Markov chain was used to model R-R interval (quan- Clinical test data from patients with (PVC) through R-R intervals tized as short, regular, or long) sequences and then the model is atrial fibrillation (AF) used to characterize rhythms through the probability that the ob- served R-R symbol sequence is generated by any of a set of models generated from multiple cardiac arrhythmias. Theb manuscript used a maximum likelihood approach to determine the arrhythmia type. Coast et al. [52], 1990 Beat detection for arrhythmia anal- A parallel combination of HMMs (one for each arrhythmia type), The American Heart Association ysis is used to classify arrhythmia. The classification process is inferred (AHA) ventricular arrhythmia through determining the most likely path through the parallel models. database [53] All ECG waveform parts were included in the states of each model. The results reported in this study relied on single ECG channel and didn’t include multi-channel ECG fusion. Andreao et al. [54], 2006 ECG beat detection and segmenta- An HMM was constructed for ECG beat with each waveform part QT database [55] tion represented in the model including the isoelectric parts (ISO, P, PQ, QRS, ST, T). Model parameters were estimated using Baum-Welch method and the number of states in each model were specified empir- ically to achieve a good complexity-performance compromise. The proposed segmentation in this study was based on a single channel but the authors provided insights about the possibility of adaptation with multi-channel fusion. Sandberg et al. [56], 2008 Atrial fibrillation frequency tracking An HMM is used for frequency tracking to overcome the corruption Simulated atrial fibrillation signals with of residual ECG by muscular activity or insufficient beat cancellation. four different frequency trends: con- States of the HMM were used to represent the underlying frequencies stant frequency, varying frequency, in short-time Fourier transform while observations corresponded to gradually decreasing frequency, and the estimated frequency of specific time intervals from the signal. stepwise decreasing frequency. Experiments were performed on single channel simulated signals with inclusion of mutli-channel fusion. Oliveira et al. [57], 2017 Automatic segmentation (beat) of An ECG channel along with a phonocardiogram were fused in a sin- A self-recorded dataset from healthy ECG and Phonocardiogram (PCG) gle coupled HMM for beat detection. The coupled HMM was con- male adults. structed to consider the high dynamics and non-stationarity of the signals where the channels were assumed to be co-dependent through past states and observations. Each of ECG and phonocardiogram was modeled using 4 states. This study introduced a decision-level fusion through combining two channels in a single HMM. The study also experimented two different coupled HMMs, a fully connected where transition can happen between any two states from both chan- nels and a partially connected model where certain limitations were added over transitions through considering the prior knowledge of the relationship between heart sounds and ECG components. Übeyli [58], 2009 Arrhythmia detection/classification An Elman-based RNN is used for beat classification with the Four types of ECG beats obtained Levenberg-Marquardat algorithm for training (a least-squares esti- from Physiobank Database [59]. mation algorithm based on the maximum neighborhood idea). This model used power spectral density (calculated with three different methods; Pisarenko, MUSIC, and Minimum-Norm) of ECG signals as input. All the models trained in this study, used feature-level fusion. Zhang et al. [60], 2017 Supraventriular and verntricular ec- An LSTM-based RNN preceded by a density-based clustering for MIT-BIH Arrhythmia database topic beat detection (SVEB and training data selection from a large data pool. In this implementation, (MITDB) [61]. VEB) the authors fed the RNN with the current ECG beat and the T wave part from the former beat to automatically learn the underlying features. The RNN layers were followed by two fully connected layers in order to combine the temporal features and generate the desired output. This study only used a single channel ECG (limb lead II) with no multi-channel fusion. Xiong et al. [62], 2017 Atrial fibrillation automatic detec- A 3 layer RNN was implemented to extract the temporal features The 2017 PhysioNet/CinC Challenge tion from the raw ECG signals. No multi-channel fusion was performed dataset [59]. in this study and only a single ECG channel was employed. Schwab et al. [49], 2017 Different cardiac arrhythmia classi- In this work a combination of GRU and bidirectional LSTM The 2017 PhysioNet/CinC Challenge fication/detection (BLSTM) based RNNs and nonparameteric Hidden Semi-Markov dataset [59]. Models (HSMM), was used for building the beat classification model and then a blender [63] was used to combine the predictions from the models. No multi-channel fusion was performed in this study and only a single ECG lead was employed. Zihlmann et al. [64], 2017 Atrial fibrillation detection A single layer LSTM-based convolutional RNN (CRNN) was con- The 2017 PhysioNet/CinC Challenge structed for atrial fibrillation detection in arbitrary length ECG record- dataset [59]. ings. This work employed the log spectrogram as an input to the CRNN to increase the accuracy. No multi-channel fusion was per- formed in this study and only a single ECG lead was used. Limam and Precioso [65], Atrial fibrillation detection A two layer LSTM-based CRNN was used for atrial fibrillation detec- The 2017 PhysioNet/CinC Challenge 2017 tion from single-lead ECG and heart rate. Feature-level fusion was dataset [59]. performed after the convolutional neural network (CNN) layers to combine features from both inputs. The output from the RNN was used to either feed a dense layer to perform classification directly or train an SVM for classification and the results from both models were compared. Chang et al. [66], 2018 Atrial fibrillation detection A single layer LSTM-based RNN was constructed for atrial fibrillation Multiple datasets for atrial fibrillation detection in multi-lead ECG. This model also used spectrograms of and normal sinus rhythms [59, 61, 67– the input ECG signals to feed the network. Feature-level fusion 70]. was performed to combine spectrograms of multi-lead ECG before feeding into the LSTM units. Lui and Chow [71], 2018 Myocardial infarction classification A deep single-layer LSTM based CRNN was used for classifying ECG The Physikalisch-Technische Bun- beats from single-lead ECG. Multiple models were performed includ- desanstalt (PTB) diagnostic ECG ing a direct 4-class beat classifier from the LSTM CRNN via dense database [70] and the AF classifica- layers and 4-class beat classifier via the fusion of multiple one-versus- tion from a short single lead ECG one binary classification networks using stacking. recording: Physionet/computing in cardiology challenge 2017 database (AF-Challenge) [72]. Singh et al. [73], 2018 Arrhythmia detection 3 models were built for arrhythmia detection, each of them is based MIT-BIH Arrhythmia database on a different type of RNN. Regular RNNs, GRU, and LSTM were (MITDB) [61]. used for each of the three models. Each model included 3 layers of different unit sizes with a dense layer to generate a classification output (normal/abnormal). No multi-channel fusion was performed in this study and only a single ECG lead (ML2) was employed. movement was really executed [132]. Identification of the relatively low cost of the systems used and the high temporal transient patterns in EEG signals during the different motor resolution [135]. This type of BCI is called asynchronous imagery tasks like imagining the movement of one of the BCI because the subject is free to invoke specific thought limbs, is recognized among the most promising and widely [132]. On the other hand, synchronous BCI includes the used techniques of BCI [133–136]. This is referred to the generation of specific mental states in response to external Khalifa et al., 2020 Page 9 of 21 Table 2 Summary of EEG-based sleep staging. Publication Event under investigation Implementation details Dataset Flexerand et al. [79], Sleep staging in combined A three state (wakefulness, deep sleep, and rapid eye movement sleep) Gaussian obser- Nine whole-night sleep 2002 EEG and EMG vation HMM (GOHMM) was used and sleep stages were represented as mixtures of the recordings from a group of basic three states. The probability of being in any of the three states was computed for nine healthy adults. 1 sec windows so that a continuous probability monitoring can be achieved. Expectation- maximization algorithm was used for parameter estimation and the Viterbi algorithm was used to calculate the posteriori estimate for being in each state. Feature-level fusion was performed on features from EEG channels (C3 and C4) and EMG. Flexer et al. [80], 2005 Sleep staging in single channel A three state (wakefulness, deep sleep, and rapid eye movement sleep) Gaussian ob- Two datasets were used, EEG (C3) servation HMM (GOHMM) was used and sleep stages were represented as mixtures of the first consists of 40 the basic three states. The probability of being in any of the three states was com- whole night sleep record- puted for 1 sec windows so that a continuous probability monitoring can be achieved. ings from healthy adults Expectation-maximization algorithm was used for parameter estimation and the Viterbi and the second consists of algorithm was used to calculate the posteriori estimate for being in each state. No 28 whole night sleep record- multi-channel fusion was performed in this study and only a single EEG channel was ings of healthy adults. used. Doroshenkov et al. [81], Sleep staging using two chan- A six state HMM was constructed for the purpose of sleep staging. Baum-welch algo- Sleep-EDF database [82]. 2007 nel EEG (Fpz-Cz and Pz-Oz) rithm was used for model’s parameter estimation and the Viterby algorithm for state sequence decoding. Feature-level fusion was performed for features calculated from the two EEG channels. Bianchi et al. [83], 2012 Sleep cycle (quantifying prob- An eight state HMM was constructed for sleep-wake activity. The connectivity be- Sleep Heart Health Study abilistic transitions between tween states was inferred through exponential fitting of subsets of the pooled bouts database [84]. stages and multi-exponential and adjacent-stage analysis. dynamics) and fragmentation in case of apnea in PSG Pan et al. [85], 2012 Sleep staging using central A six state transition-constrained discrete HMM was constructed for sleep staging. Thir- PSG including six chan- EEG (C3-A2), chin elec- teen features were utilized including temporal and spectrum analyses of the EEG, EOG nel EEG, EOG, EMG, and tromyography (EMG), and and EMG signals with feature-level fusion employed. ECG signals, was obtained electrooculogram (EOG) from 20 healthy subjects. Yaghouby and Sunderam Sleep staging and scoring A five state Gaussian HMM was constructed for sleep staging with Baum-Welch al- Sleep-EDF database [82]. [86], 2015 (quasi-supervised) in PSG gorithm for parameter estimation. In this implementation, feature-level fusion was achieved through feeding augmented vector of PSG features and human rated scores into the estimation algorithm in order to obtain the parameters to maximize the likeli- hood that a model with larger number of states explains the data. Onton et al. [87], 2016 Sleep staging in 2-channel A five state Gaussian HMM was constructed for sleep staging with expectation- A self recorded data from home EEG (FP1-A2 and FP2- maximization algorithm for parameter estimation and the Viterbi algorithm to find the 51 participants who were A2) and electrodermal activity maximum a posteriori estimate of state sequence. In this implementation, the relative medication-free and self- (EDA) power across the entire night was averaged in five frequency bands and fed into the reported asymptomatic model (feature-level fusion). sleepers and wit no history of neurologic or psychiatric disorders. Davidson et al. [88], Behavioral microsleep detec- This study utilized an LSTM-based RNN to detect the lapses in visuomotor performance A self-recorded dataset 2005 tion in EEG (P3-01 and P4- associated with behavioral microsleep events. The network used the power spectral from 15 subjects perform- 02) density of 1 sec windows of the used two channels (calculated using the covariance ing visuomotor tracking method) with feature-level fusion in place to combine data. The network included 6 task. LSTM blocks of 3 memory cells each. Hsu et al. [89], 2013 Automatic deep sleep staging This study utilized an Elman recurrent neural network that works on the energy features Sleep-EDF database [82]. in single channel EEG (Fpz- extracted from a single channel EEG to perform 5-level sleep staging. No multi-channel Cz) fusion was employed in this study. Supratak et al. [90], Automatic sleep staging in sin- A convolutional RNN (CRNN) was constructed to work directly of the raw signal data. Montreal Archive of Sleep 2017 gle channel EEG (Fpz-Cz or Two branches of CNN, each of 4 layers, were used for representation learning and their Studies (MASS) [91] and Pz-Oz) outputs were combined and fed into a two layer LSTM-based BRNN with skip branch Sleep-EDF database [82]. to generate the sleep stage. No multi-channel fusion was employed in this study. Biswal et al. [92], 2017 Automatic sleep staging Raw EEG signals were split into 30-seconds windows, then the spectrogram and ex- 10,000 PSG studies with pert defined features were extracted and fused at the feature-level. The best accuracy multi-channel EEG data reported among different RNN architectures, was reported for a 5-layer LSTM-based (F3, F4, C3, C4, O1 and RNN. This study presented also an LSTM-based CRNN architecture to extract spa- O2 referenced to the con- tial features automatically and then pass them to the RNN part for temporal context tralateral mastoid, M1 or extraction. M2). Phan et al. [93], 2018 Automatic deep sleep staging A two-layer GRU-based BRNN was constructed to learn temporal features from the Sleep-EDF database [82]. in single channel EEG (Fpz- single channel EEG. This implementation included an attention mechanism that was Cz) applied on the BRNN output features. The weighted output was then used to feed a linear SVM classifier. No multi-channel fusion has been employed in this study. Bresch et al. [94], 2018 Sleep staging in single-channel An LSTM-based CRNN with 3 CNN layers and 3 LSTM layers, was built to process The SIESTA database [95] EEG 30-seconds windows of raw EEG data (FPz, left EOG, and right EOG referenced to and a self-recorded dataset M2). No multi-channel fusion has been employed in this study. with 147 recordings from 29 healthy subjects. Phan et al. [96], 2019 Automatic sleep staging This study featured multi-modality fusion on the feature level between EEG, EOG, and Montreal Archive of Sleep EMG. All were split into windows and converted into time-frequency representation Studies (MASS) Dataset using filter banks. The fused data were fed into a BRNN that is used to encode the [91]. features, then the output is passed through an attention layer followed by another BRNN that performs the cclassification of the sleep stage. Michielli et al. [97], 2019 Automatic sleep staging in sin- A dual branch LSTM-based RNN was constructed for the classification of 5 different Sleep-EDF database [59]. gle channel EEG sleep stages. the network starts with a preprocessing and feature extraction stages and then the data is distributed over two branches. The first branch uses mRMR for feature selection followed by a one layer LSTM and fully connected layer to classify between 4 classes only (W, N1-REM, N2 and N3). The second branch uses PCA for feature selection followed by a 2 layer LSTM and a fully connected layer for binary classification. The LSTM in the second branch takes the classification output from the first branch to consider only the combined stage N1-REM for separation. No multi-channel fusion has been employed in this study. Sun et al. [98], 2019 Sleep staging in single channel A two stage network was built to perform the classification. The first stage is time Sleep-EDF database [59]. EEG distributed stage that included two parallel branches, the first included a window deep belief network for feature extraction followed by a dense layer and a second branch with hand-crafted features extraction then a dense layer. The two branches were then fused through another dense layer and fed as an input to an LSTM-based BRNN (the second stage) to generate the classes. stimuli [132]. EEG analysis for BCI applications includes as desynchronization [132]. To overcome such a limitation, the processing of EEG oscillatory activity and the different probabilistic models like HMMs and models capable of rep- shifts in its sub-bands in addition to the event-related poten- resenting long range dependencies have been proposed into tials like VEP and P300 [132, 137]. Many modeling schemes the implementation of BCI systems. As follows in Table 4, have been introduced to solve the of multi-class BCI prob- we list the recent work the relies on HMMs and RNNs in BCI lem; however, most of them process EEG signals in short systems and uses EEG as the source signal. windows where stationarity is assumed, which limits the mod- eling process and excludes the dynamic EEG patterns such Khalifa et al., 2020 Page 10 of 21 Table 3 Summary of EEG-based seizure prediction. Publication Event under investigation Implementation details Dataset Wong et al. [112], 2007 Evaluation framework for A three state HMM (baseline, detected, and seizure) was constructed to iEEG data collected from patients seizure prediction in iEEG evaluate the prediction algorithms of epileptic seizures. The prediction al- diagnosed with mesial temporal gorithm is used to generate a binary sequence which is combined with the lobe epilepsy using 20-36 surgically ground truth (binary detector outputs plus gold-standard human seizure mark- implanted electrodes on the brain or ings) and converted into a trinary observation sequence. The trinary vector brain substance [113]. is used to train the HMM using Baum-Welch which is then used to Viterbi decode the observation sequences into the hidden states sequence. A hy- pothesis test that a statistical association exists between the detected and seizure states, is performed through counting the transitions from detected state into seizure states in the HMM output. Santaniello et al. [106], Early detection of seizures in Multichannel iEEG were used and Welch’s cross power spectral density was Data collected from male Sprague- 2011 iEEG from a rat model calculated over windows of 3 sec for each pair of channels which were used Dawley rats with four implanted as input for the detection model. A two state HMM was constructed to skull screw EEG electrodes placed map the iEEG signals into either normal or peri-ictal states. Baum-Wlech bifrontally and posteriorly behind algorithm was used for parameter estimation and a Bayesian evolution model bregma and a fifth depth electrode was used determine the time of state transition. placed in hippocampus, were col- lected and used for this study. Direito et al. [114], 2012 Identification of the different The relative power in EEG sub-bands (delta, theta, alpha, beta, and gamma) EPILEPSIAE database [115]. states of epileptic brain was calculated and used for computing the topographic maps of each sub- band. The maps were then segmented and used overtime to train a 4 state (preictal, ictal, postictal and interictal) HMM. The Baum–Welch algorithm was used to train the model and the Viterbi algorithm to decode the state- sequence. Abdullah et al. [104], 2012 Seizure detection in iEEG A three state discrete HMM was built to classify iEEG segments into one of Freiburg Seizure Prediction EEG three states (ictal, preictal, and interictal). Seven level decomposition sta- (FSPEEG) database [108]. tionary wavelet transform (SWT) was applied on the signals (as input features for the model) and a code book was created to perform vector quantization. Baum-Welch algorithm was used for model parameter estimation and the Viterbi algorithm for recognition. This study employed a feature-level fusion model to feed the data into the prediction model. Smart and Chen [105], Seizure detection in scalp This study used a 5 sec sliding window with 1 sec increments to process CHB-MIT Scalp EEG Database 2015 EEG the EEG signals. A set of 45 measurements was calculated for each slid- [116]. ing window then principal component analysis (PCA) was used to reduce dimensionality. One of the used models was HMM, particularly a two state (seizure and non-seizure) HMM was constructed to perform the detection. Baum-Welch was used here as well to estimate the model parameters.This study used a feature-level fusion model for multi-channel EEG data to feed the data into the prediction model. Petrosian et al. [117], 2000 Onset detection of epileptic Both raw EEG data and their wavelet transform "daub4" were used in train- Scalp and iEEG data were collected seizures in both scalp and in- ing an Elman RNN. This study used a feature-level fusion model for multi- from two patients who were under- tracranial EEG channel EEG data to provide an input for the RNN. going long-term electrophysiologi- cal monitoring for epilepsy. Güler et al. [118], 2005 Identification of subject con- Lyapunov exponents of the EEG signals were used to train an Elman RNN Publicly available epilepsy dataset dition in terms of epilepsy for the identification task. This study used a feature-level fusion model for by University of Bonn [119]. (healthy, epilepsy patient dur- multi-channel EEG data to train the RNN. ing seizure-free interval, and epilepsy patient during seizure episode) using surface and in- tracranial EEG Kumar et al. [120], 2008 Automatic detection of epilep- Wavelet and spectral entropy were extracted from the EEG signals and used Publicly available epilepsy dataset tic seizure in surface and in- to train an Elman RNN. This study used a feature-level fusion model for by University of Bonn [119]. tracranial EEG multi-channel EEG data to train the RNN. Minasyan et al. [121], 2010 Automatic detection of epilep- A set of time domain, spectral domain, wavelet domain, and information EEG dataset from 25 patients hos- tic seizures prior to or imme- theoretic features were used to train an ELman RNN per each channel of pitalized for long-term EEG mon- diately after clinical onset in the EEG and the output is combined in time and space through a decision itoring in five centers including scalp EEG making module that performs a decision-level fusion in order to declare a Thomas Jefferson University, Dart- seizure event if N out of M channels declared it. mouth University, University of Vir- ginia, UCLA and University of Michigan medical centers. Naderi and Mahdavi-Nasab Automatic detection of epilep- Power spectral density was calculated for EEG signals using Welch method Publicly available epilepsy dataset [122], 2010 tic seizure in surface and in- then a dimensionality reduction algorithm was applied and the output was by University of Bonn [119]. tracranial EEG used to train an ELman RNN. This study used a feature-level fusion model for multi-channel EEG data to train the RNN. Vidyaratne et al. [123], Automated patient specific The preprocessed (denoised) EEG signals were segmented into 1 sec non CHB-MIT Scalp EEG Database 2016 seizure detection using scalp overlapping epochs and used to train a BRNN. Data from all channels were [116]. EEG used simultaneously (feature-level fusion model). Talathi [124], 2017 Epileptic seizures detection Single-channel EEG data (no multi-channel fusion) were used to train a Publicly available epilepsy dataset GRU-based RNN that classifies each EEG segment into one of three states: by University of Bonn [119]. healthy, inter-ictal, or ictal. Two layers of GRU were used, the first was followed by a fully connected layer and the second was followed by a logistic regression classification layer. Golmohammadi et al. [125], Epileptic seizure detection Linear frequency cepstral coefficient feature extraction was performed for the A subset of the TUH EEG Corpus 2017 EEG data and used to feed a CRNN that is based on a bidirectional LSTM. (TUEEG) [126] that has been man- Features from multi-channel EEG were fused prior to feeding into the CRNN. ually annotated for seizure events The network used in this study employed both 2D and 1D CNN at different [127]. stages. Another network where LSTM was replaced with GRU was devloped as well for comparison. Raghu et al. [128], 2017 Epileptic seizures classifica- This study developed two techniques that are based on Elman RNN that Publicly available epilepsy dataset tion works on features extracted from EEG signals. The first technique used by University of Bonn [119]. wavelet decomposition with the estimation of log energy and norm entropy to feed the RNN classifier (normal vs preictal). The second way extracted the log energy entropy to feed the RNN classifier. Abdelhameed et al. [129], Epileptic seizure detection This study used raw EEG signals to feed a 1D CRNN that is based on Publicly available epilepsy dataset 2018 bidirectional LSTM to classify EEG segments into one of two states (normal- by University of Bonn [119]. ictal and normal-ictal-interictal). Daoud and Bayoumi [130], Epileptic seizure prediction This study used raw EEG signals to feed a 2D CRNN that is based on a A dataset recorded at Children’s 2018 bidirectional LSTM to classify EEG segments into one of two classes (preictal Hospital Boston which is publicly and interictal). available [59, 116]. Hussein et al. [131], 2019 Epileptic seizures detection This study developed an LSTM-RNN that takes raw EEG signals as input Publicly available epilepsy dataset in order to create predictions. The network was composed of a one layer by University of Bonn [119]. LSTM followed by a fully connected layer and an average pooling layer to combine the temporal features and then an output softmax layer. recorded either using surface electrodes or via needle elec- 7. Event detection in EMG trodes; however, surface EMG (sEMG) is rarely used clin- Electromyography (EMG) is the method of sensing the ically in the evaluation of neuromuscular function and its electric potential evoked by the activity of muscle fibers as use is limited to the measurement of voluntary muscle activ- driven by the spikes from spinal motor neurons. EMGs are ity [161]. Routine evaluation of the neuromuscular function Khalifa et al., 2020 Page 11 of 21 Table 4 Summary of EEG-based BCI systems. Publication Event under investigation Implementation details Dataset Obermaier et al. [138], 5 tasks BCI system (imagining A 5 state HMM with 8 (max) Gaussian mixtures per state, was used Data from 3 male subjects were col- 2001 left-hand, right-hand, foot, tongue to model the spatiotemporal patterns in each signal segment. Fea- lected for motor imagery tasks with movements, or simple calculation). tures were extracted from all electrodes and fused into a combined the participants free of any medical or feature vector and it had its dimensionality reduced before use in central nervous system conditions. building the model. The expectation-maximization algorithm was used for the estimation of the transition matrix and the mixtures. Obermaier et al. [139], Two class motor imagery (left and Two 5 state HMMs (one for each class) with 8 (max) Gaussian Data from 4 male subjects were col- 2001 right hands) BCI mixtures per state, was used to model the spatiotemporal patterns lected for motor imagery tasks with in each signal segment. The Hjorth parameters of two channels the participants free of any medical or (C3 and C4) were fused and fed into the HMM models to calculate central nervous system conditions. the single best path probabilities for both models. The expectation- maximization algorithm was used for the estimation of the transition matrix and the mixtures. Pfurtscheller et al. [140], Two class motor imagery BCI for Two HMMs, one for each class, were trained and the maximal proba- Signals from two bipolar channels were 2003 virtual keyboard control bility achieved by the respective HMM-model represents the chosen acquired from three able-bodied sub- class. jects. Solhjoo et al. [141], 2005 EEG-based mental task classifica- Discrete HMM and multi-Gaussian HMM -based classifiers have been Dataset III of BCI Competition II tion (left or right hand movement) used for raw EEG signals. (2003) provided by the BCI research group at Graz University [142]. Suk and Lee [143], 2010 Multi-class motor imagery classifica- In this study, dynamic patterns in EEG signals were modeled using Dataset IIa of BCI Competition IV tion two layers HMM. First time-domain patterns were extracted from (2008) provided by the BCI research the signals and have dimension reduced using PCA. Second, the like- group at Graz University [144]. lihood for each channel is computed in the first layer of HMM and assembled in vector whose dimension is reduced with PCA as well. fi- nally, the class label is calculated through the largest likelihood in the upper layer of HMM. Baum-Welch algorithm was used to estimate the parameters of the initial state distribution, the state transition probability distribution, and the observation probability distribution and Viterbi algorithm was used for decoding the state sequence. Speier et al. [145], 2014 P300 speller An HMM was used to model typing as a sequential process where Data were collected from 15 healthy each character selection is influenced by previous selections. The graduate students and faculty with nor- Viterbi algorithm was used to decode the optimal sequence of target mal or corrected to normal vision be- characters. tween the ages of 20 and 35. Erfanian and Mahmoudi Real-time adaptive noise canceler A recurrent multi-layer perceptron with a single hidden layer was A simulated EEG dataset was used for [146], 2005 for ocular artifact suppression in trained for the noise canceling with the inputs as the contaminated this study, generated through Gaussian EEG EEG signal and the reference EOG. white noise-based autoregressive pro- cess. Forney and Anderson [147], EEG signal forecasting and mental An Elman RNN was trained for forecasting EEG a single time step 4 class dataset was collected from 3 2011 tasks classification ahead then an Elman RNN-based classifier was trained to classify subjects including combinations of the the mental task associated with the EEG signals. following mental tasks: clenching of right hand, shaking of left leg, visu- alization of a tumbling cube, counting backward from 100 by 3’s, and singing a favorite song. Balderas et al. [148], 2015 EEG classification for 2 class motor An LSTM based classifier was trained and evaluated for EEG oscilla- BCI competition IV (2007) dataset 2b imagery (left hand and right hand) tory components classification and compared with the regular neural [149] network implementations. Maddula et al. [150], 2017 P300 BCI classification A 3D CNN in conjunction with a 2D CNN were combined with an Data from P300 segment speller were LSTM-based RNN to capture spatio-temporal patterns in EEG. collected, where the subjects mentally noted whenever the flashed letter is part of their target [151]. Thomas et al. [152], 2017 Steady-state visual evoked poten- A single layer BRNN was used to perform classification and compared 5-class SSVEP dataset [153]. tial (SSVEP)-based BCI classifica- to different architecture and traditional classifying techniques. tion Spampinato et al. [154], Visual object classifier using EEG An LSTM based encoder to learn high order and temporal feature A subset of ImageNet dataset (40 2017 signals evoked by visual stimuli representations from EEG signals and then a classifier is used for classes) [155] was used to generate vi- identifying the visual object tat generated the stimuli. The authors sual stimuli for six subjects while EEG here tested different architectures for the encoder including a com- data is recorded. mon LSTM for all channels, channel LSTMs + common LSTM, and Common LSTM + fully connected layer. The authors also trained a CNN-based regressor for generating the EEG features to replace the whole EEG module and work only using source images of visual stimuli. Hosman et al. [156], 2019 Intercortical BCI for cursor control An single layer LSTM-based decoder was built with three outputs Intercortical neural signals recorded to generate the cursor speed in x and y directions in addition to the from three participants, each with distanc to target. 2 96-channel micro-electrode arrays [157]. Zhang et al. [158], 2020 EEG-Based Human Intention In this study, multi-channel raw EEG sequences into mesh-like rep- EEG Motor Movement/Imagery Recognition resentations that can capture spatiotemporal characteristics of EEG Dataset [59, 159]. and its acquisition. These meshes are then fed into deep neural networks that perform the recognition process. Multiple network ar- chitectures were investigated including a CRNN that starts with a 2D CNN that processes the meshes followed by a two-layer LSTM-based RNN to extract the temporal features, then a fully connected layer and an output layer. The second network investigated was composed of two parallel branches the first was a two layer LSTM-based RNN to extract the temporal features and the second was a multi-layer 2D/3D CNN to extract the spatial features and the output from the two branches is fused and used for recognition. This study used fusion on both data-level and feature-level. Tortora et al. [160], 2020 BCI for gait decoding from EEG EEG data were preprocessed to remove motion artifacts through EEG data were recorded from 11 sub- high pass filtration and independent component analysis. Differ- jects walking on a treadmill using a 64- ent frequency bands were then extracted and a separate classifier channel amplifier and 10/20 montage. is trained based on each frequency band. The classifiers were based on a two-layer LSTM-based RNN followed by a fully connected layer, a softmax layer, and an output layer that manifests the prediction output. is typically performed using needle (invasive) EMG that, grasping control, and gesture based interfaces [163]. A myo- despite of its effectiveness and the availability of several electric signal usually has its manifested events as two states, electrode types that suite many clinical questions, is often the first is the transient state which emanates as the muscle painful and traumatic and may lead to the destruction of sev- goes from the resting state to voluntary contraction. The eral muscle fibers [161, 162]. sEMG has been widely used as second is the steady state which represents maintaining the control signals for multiple applications especially in rehabil- contraction level in the muscle [163]. It has been shown that itation including but not limited to body-powered prostheses, the steady state segments are more robust as control signals Khalifa et al., 2020 Page 12 of 21 Table 5 Summary of event detection in EMG signals. Publication Event under investigation Implementation details Dataset Chan and Englehart [165], Continuous identification of six An HMM with uniformly distributed initial states and Gaussian ob- 4-channel sEMG collected from the 2005 classes hand movement in sEMG servation probability density function whose parameters can be com- forearm of 11 subjects for six distinct pletely estimated from the training data, was constructed for the motions (wrist flexion, wrist extension, detection process. The expectation-maximization algorithm wasn’t supination, pronation, hand open, and used here due to the assumption of uniform initial state probabilities hand close) [166]. and directly estimating the Gaussian parameters from the training data. Overlapping 256 ms observation windows were used and in each observation window the root mean square value and the first 6 autoregressive coefficients were computed as features. Zhang et al. [167], 2011 Hand gesture recognition in acceler- In this work, the authors actually identified the active segments via sEMG and 3d acceleration were col- ation and sEMG processing and thresholding of the average signal of the multichannel lected from two right-handed subjects sEMG. The onset is when the energy is higher than a certain thresh- who performed 72 Chinese sign lan- old and the offset when the energy is lower than another threshold. guage words in a sequence with 12 Features from time, frequency, and time-frequency domains were repetitions per motion, and a prede- extracted from both acceleration signals and sEMG, and fed to five- fined 40 sentences with 2 repetitions state HMMs for classification. Baum-Welch algorithm was used for per sentence. training with Gaussian multivariate distribution for observations. De- cision making here is done in a tree-structure (decision-level fusion) through four layers of classifiers with the last layer as the HMM. Wheeler et al. [168], 2006 Hand gesture recognition in sEMG Moving average was used on the sEMG signals to provide the input Data from one participant repeating 4 for continuous left-to-right HMMs with tied Gaussian mixtures. The gestures on a joystick (left, right, up, training was performed using the Baum-Welch algorithm and the and down) for 50 times per gesture, real-time recall was performed with The Viterbi algorithm. The were collected using four pairs of dry models were also initialized using K-means clustering so that the electrodes. Another portion of data states were partitioned to equalize the amount of variance within was collected using 8 pairs of wet elec- each state. This study employed feature-level fusion to combine trodes on gestures of typing on a num- multi-channel data. ber pad keyboard (0-9) for 40 strokes on each key. Monsifrot et al. [169], 2014 Extraction of the activity of individ- The iEMG signal was modeled as a sum of independent filtered The method introduced was tested ual motor neurons in single channel spike trains embedded in noise. A Markov model of sparse signals over both simulated and experimental intramuscular EMG (iEMG) was introduced where the sparsity of the trains was exploited through iEMG signals. the simulated signals modeling the time between spikes as discrete weibull distribution. An were generated via Markov model un- online estimation method for the weibull distribution parameters was der 10 kHz sampling frequency and introduced as well as an implementation of the impulse responses of with filter shapes obtained from experi- the model. mental iEMG for more realistic simula- tion. The experimental iEMG signals were acquired from the extensor digi- torum of a healthy subject with teflon coated stainless steel wire electrodes. Lee [170], 2008 sEMG-based speech recognition A continuous HMM was constructed with Gaussian mixtures model EMG signals were collected from ar- adopted for sEMG-based word recognition based on log mel-filter ticulatory facial muscles from 8 Ko- bank spectrogram of the windowed EMG signals. The segmental K- rean male subjects. The subjects were means algorithm was used for optimal HMM parameters estimation asked to pronounce each word from th th a 60-word vocabulary in a consistent where HMM parameters for the i state and k word are estimated manner in addition to generating a ran- from the observations of the corresponding state of the same word. dom set of words based on this vocab- Viterbi algorithm was used for the decoding process. ulary. Chan et al. [171], 2002 sEMG-based automatic speech A six state left-right HMM with single mixture observation densities, sEMG from five articulatory facial mus- recognition was constructed for identifying the words based on three features cles were collected. The dataset used extracted from sEMG that included the first two autoregressive coef- here was a subset of the dataset de- ficients and the integrated absolute value. HMM was trained in this scribed in [172] with ten-English word work using the expectation-maximization algorithm. vocabulary. Li et al. [173], 2014 Identification/prediction of func- A nonlinear ARX-type RNN was used to predict the stimulated mus- The experiments were conducted on 5 tional electrical stimulation (FES)- cular torque and track muscle fatigue. The model takes the eEMG subjects with spinal cord injuries. induced muscular dynamics with as an input and produces the predicted torque. evoked EMG (eEMG) Xia et al. [174], 2018 Hand motion estimation from A CRNN with 3 CNN layers and 2 LSTM layers was used for the sEMG signals were collected from 8 sEMG prediction and the model used the power spectral density as input. healthy subjects using 5 pairs of bipo- lar electrodes placed on shoulder to record EMG from biceps brachii, tri- ceps brachii, anterior deltoid, posterior deltoid, and middle deltoid. The hand position in 3D space was tracked as the objective for this system. Quivira et al. [175], 2018 Simple hand finger movement iden- An LSTM-based RNN was used to implement a recurrent mixture 8 channel EMG signals were collected tification in sEMG density network (RMDN) [176] that probabilistically model the out- from the proximal forearm region, tar- put of the Network in order to capture the complex features present geting most muscles used in hand ma- the hand movement. nipulation. The hand pose tracking was performed with a Leap Motion sensor and the subjects were asked to perform 7 hand gestures with repeti- tions per gesture. Hu et al. [177], 2018 Hand gesture recognition in sEMG sEMG signals from all channels were segmented into windows of fixed Experiments were performed over the size and transformed into an image representation that was then fed first and second sub-databases of Ni- into a CNN with two convolutional layers, two locally connected naPro (Non Invasive Adaptive Pros- layers, and three fully connected layers followed by an LSTM-based thetics) database [178]. RNN and an attention layer to enhance the output of the network. Samadani [179], 2018 EMG-Based Hand Gesture Classifi- Different RNN architectures were tested in this study to chose the Publicly-available NinaPro hand ges- cation best performing architecture. The evaluated models included uni and ture dataset (NinaPro2) was used bidirectional LSTM- and GRU-based RNNs with attention mecha- [180]. nisms. The models worked on the preprocessed (denoised) raw EMG signals. Simão et al. [181], 2019 EMG-based online gestures classifi- Features were extracted from multi-channel EMG (standard devia- the synthetic sequences of the UC2018 cation tion along each time frame) and fed into a dynamic RNN model that DualMyo dataset [182] and a similar is composed of a dense layer followed by an LSTM-based RNN layer subset of the NinaPro DB5 dataset and another dense layer followed by the output layer. This model [183] was compared to a similar GRU-based model and another static feed forward neural network model. This study used combined feature vector as an input for the models. compared to the transient state due to longer duration and 8. Event detection in other biomedical signals better classification rates [164]. As follows in Table 5, we Physiological monitoring is an essential part of all care give a review about the recent advances in the detection of units nowadays and it is not limited to the aforementioned myoelectric events in EMG signals. biomedical signals only. Tens of variables are collected in the form of time series containing hundreds of events that are of importance to the diagnosis and treatment/rehabilitation. Khalifa et al., 2020 Page 13 of 21 Event detection methods have had a strong presence in the labeling or interpretation of the biomedical signals is not analysis of such series. For instance, cardiovascular disorders only an exhausting task, but also requires extensive domain are not only assessed through ECG but also phonocardiogram knowledge and expertise to perform. is used as an easier way for general practitioner to identify One way that can be used to enhance the expressive power the changes in heart sounds. Extracting the cardiac cycle has of stochastic models such as HMM, is the inclusion of non- been one of the major problems in phonocardiogram as well Gaussian mixtures which can boost the performance in many and was addresses using HMMs in multiple pieces of work cases because Gaussianity is not always a reasonable assump- [184–187]. On the other hand, most of RNN based methods tion in many applications. One of the mixtures that was in phonocardiogram, have been used for pure classification proposed as an extension for non-Gaussian mixtures, is in- purposes and anomaly recognition [152]. dependent component analyzers mixture model (ICAMM) 3D acceleration is an emerging technology as well, that and it has been applied in multiple biomedical signal appli- has been extensively used in the assessment and detection cations such as sleep disorders detection and classification of of many medical conditions in swallowing [188] and human neuropsychological tasks in EEG [34, 35]. gait analysis [189]. In swallowing, acceleration signals have An additional way to increase the model capacity and its been used for the detection of pharyngeal swallowing activity ability to model the underlying sequence of events, is through via maximum likelihood methods with minimum descrip- using strongly representing domain features. One of the most tion length in [16] and using short time Fourier transform popular domains representations, is wavelet decomposition and neural networks in [14]. RNNs were also employed for which has proven its superiority to provide high level repre- event detection in swallowing acceleration signals including sentation of events in a wide variety of biomedical signals the upper esophageal sphincter opening in [15, 190], laryn- such as phonocardiograms [9, 187], EEG [104, 117, 120, geal vestibule closure [191], and hyoid bone motion during 121], and EMG [164]. Handcrafting features, however, is not swallowing [192]. In gait analysis, HMMs were used for an easy task and requires an extensive domain knowledge recognition and extraction in multiple occasions [193–196] and significant efforts to come up with cues that trigger the as well as RNNs [197–199]. identification of specific signal components. Furthermore, mapping the feature space into a more comprehensive space of less dimensionality is often a paramount operation prior to 9. Challenges and Future Directions building the model. Given the previous factors, models that Event detection in biomedical signals is a critical step for are able to learn high level representations simultaneously diagnosis and intervention procedures that are extensively from raw signals and have the massive expressive power to used on a daily basis in nearly every standard clinical set- model tasks involving long time lags, can be of a great benefit ting. It also represents the core of various eHealth technolo- [200]. gies that employ wearable devices and regular monitoring of physiological signs. Being such a fundamental operation 9.2. High Capacity Models Embedding Feature that controls the clinical decision making process, it necessi- Extraction tates precise detection in a fairly complex environment that The evolution of deep learning has revolutionized the way contains multiple events occurring concurrently. Particularly, in which problems are addressed and instead of classification false positive rate in clinical testing is an important indicator and detection systems that solely relied on handcrafted fea- for how well the detection model generalizes and differenti- tures, end-to-end systems are being trained to take care of ates between the event of interest and the background noise. all steps from the raw input till the final output. End-to-end Building such highly accurate models depends on many fac- systems are complex, although rich, processing pipelines that tors that include the diversity in the used dataset and labels make the most of the available information through using a in addition to model capacity. unified scheme that trains the system as a whole from the input till the output is produced [201]. It has been shown that 9.1. Classical Models Scaling: Challenges deep architectures can replace handcrafted feature extraction As mentioned before, biomedical signals are the mani- stages and work directly on raw data to produce high levels festation of well-coordinated, yet complex physiological pro- of abstraction. RNNs have been introduced in 1996 for the cesses which involve various anatomical structures that are identification of arm kinematics during hand drawing from close in position and share several functions. Hence, the col- raw EMG signals [202] and then the same architecture was lected signals pick not only the target physiological process adopted for lower limb kinematics in [203]. In both studies, but also other unavoidable neighbor processes. An example the authors verified that an RNN was able to map the relation- of that is the detection of the combined activation for multiple ship between raw EMG signals and limbs’ kinematics during muscles in sEMG, eye blinking along with neural activity drawing for the arm and human locomotion for the lower in EEG, and head movement along with swallowing vibra- limb. Chauhan and Vig [204] and Sujadevi et al. [205] have tions in swallowing accelerometry. Extraction of the event also used more sophisticated multi-layer LSTM-based RNN of interest in this case requires the exhausting labeling of the architectures on raw ECG signals for arrhythmia detection. underlying set of processes in order to be able to build the Spampinato et al. [154] have employed RNNs as well to ex- predefined state space for classical stochastic methods such tract discriminative brain manifold for visual categories from as HMM, from which the state sequence is drawn. Manual Khalifa et al., 2020 Page 14 of 21 EEG signals. Further, Vidyaratne et al. [123] used RNNs 9.3. Transfer Learning for seizure detection in EEG; however, they used a denoised Despite the fact that most of the previously mentioned and segmented version of the signals. As mentioned earlier, methods are achieving great results on certain datasets, it is although RNNs are efficient in modeling long contexts, they popular that they can easily overfit the data, resulting in poor tend to have the error signals propagate through a tremendous generalization. Thus, it requires not only very large but also number of steps when being fed highly sampled inputs such diverse datasets to train and validate models that well gener- as raw signals which affects the network optimizability and alize. In biomedical signal processing field, the collection training speed [49, 50]. of such datasets may pose a challenge towards developing In this regard, convolutional neural networks (CNNs) reliable models. Strictly speaking, it may not be feasible to have been utilized to perceive small local contexts which find a large population of subjects when studying a rare dis- then are propagated to an RNN for the perception of tem- ease and yet if it is feasible, it is extremely difficult to acquire poral contexts or a feed-forward network for a classification the expert reference annotations for the underlying dataset or prediction target. CNNs were introduced as a solution [216]. Many factors contribute to this, as mentioned before, to enable recognition systems to learn hierarchical internal the noisy nature of biomedical signals increases the difficulty representations that form the scenes in vision applications of manual interpretation and necessitates the presence of ref- (pixels form edglets, edglets form motifs, motifs form parts, erence modalities to acquire accurate information about the parts form objects and objects form scenes) [206, 207]. Thus, processes such as collection of x-ray videofluoroscopy si- CNNs are basically multi-stage trainable architectures that multaneously with swallowing accelerometry [217]. Another are stacked on top of each other to learn each level of the fea- factor is that the experts annotating the data need to maintain ture hierarchy [200, 206]. Each stage is usually composed of high record of reliability across time and to be compared to three layers, a filter bank layer, a non-linear activation layer, peer experts which might be difficult to achieve or require and a pooling layer. A filter bank layer extracts particular continuous training and checking of the experts’ reliability. features at all locations on the input. The non-linear activa- One way to overcome limited- size and/or diversity datasets, tion works as a regulator that determines whether a neuron is to utilize the the pretrained models from relatively different should fire or not through checking the its value and decid- domains and apply them to solve the particular targeted prob- ing if the following connections should consider this neron lem or so-called transfer learning [218]. In transfer learning, activated [200]. A pooling layer represents a dimensionality the pretrained model’s weights are used as initialization and reduction procedure that processes the feature maps in order then fine-tuned accordingly to fit the new dataset. In most to produce lower resolution maps that are robust to the small cases, retraining happens in a much lower ( 10 times smaller) variations in the location of features [206]. The coefficients learning rate than the original. Transfer learning has been of the filters are the trainable parameters in the CNNs and used for event detection and classification tasks in multiple they are updated simultaneously by the training algorithm to biomedical signals including ECG for cardiac arrhythmia de- minimize the discrepancy between the actual output and the tection [219], EEG for drowsiness detection [220] and driving desired output [206]. fatigue detection [221], and EMG for hand gesture classifi- The design concept of CNNs first evolved for vision ap- cation [222]. However, one thing worth mentioning is that plications; but since then, the same concept is being adopted transfer learning sometimes may not help perform better than for pattern analysis and recognition in biomedical signals the originally trained model if there exist huge differences be- [174, 208–212]. For instance, Shashikumar et al. [210] used tween the datasets or deterioration in inter-subject variability a 5-layer 2D CNN followed by a BRNN in association with [218]. soft attention mechanism to process the wavelet transform of ECG signals for the detection of atrial fibrillation. Tan 10. Conclusion et al. [211] also used a 1D 2-layer CNN with a 3-layer LSTM- In this paper, we provided a comprehensive review of based RNN for the detection of coronary artery disease in event extraction methods in biomedical signals, in particu- ECG. Further, Xiong et al. [212] used a residual convolu- lar hidden Markov models and recurrent neural networks. tional recurrent neural network for the detection of cardiac HMM is a probabilistic model that represents a sequence arrhythmia in ECG. All these experiments using RNNs on of observations in terms of a hidden sequence of states and top of CNNs for biomedical signal analysis were successful sets the concepts and methods on how to find the optimal to produce extremely high levels of abstraction and rich tem- state sequence that best describes the observations. RNN poral representation that can perceive long range contexts is a type of neural networks that was introduced to model without human intervention in addition to being easier to the time dependency and perform contextual mapping in se- optimize computationally. CNNs have been also utilized in quences. This review showed that the presence of dynamic association with fully connected networks to increase the programming algorithms like the EM and Viterbi, led to the capacity of HMMs in connectionist hybrid DNN-HMM mod- wide spread of HMMs which were used to dynamically tran- els due to the ability of CNNs to process high-dimensional scribe the context of many biomedical signals. It wasn’t too multi-step inputs [213]. Such hybrid systems provided state long until HMMs became insufficient for time series mod- of the art performance especially in the field of handwriting eling needs, specifically modeling long range dependencies recognition [214, 215]. Khalifa et al., 2020 Page 15 of 21 and larger state spaces, and RNNs started to gradually re- networks in high resolution cervical auscultation, IEEE Journal of Biomedical and Health Informatics (2020). place HMMs in time-dependent contextual mappings. So [16] E. Sejdić, C. M. Steele, T. Chau, Segmentation of dual-axis swal- far, RNNs have proven superiority in time series modeling lowing accelerometry signals in healthy subjects with analysis of especially in biomedical signals and continue to expand their anthropometric effects on duration of swallowing activities, IEEE domination in building automatic detection and diagnosis Transactions on Biomedical Engineering 56 (2009) 1090–1097. systems through the emerging designs and practices experi- [17] S. Damouras, E. Sejdić, C. M. Steele, T. Chau, An online swallow detection algorithm based on the quadratic variation of dual-axis mented in nearly every field. accelerometry, IEEE Transactions on Signal Processing 58 (2010) 3352–3359. [18] Z. C. Lipton, J. Berkowitz, C. Elkan, A critical review of re- Acknowledgments current neural networks for sequence learning, arXiv preprint The work reported in this manuscript was supported by arXiv:1506.00019 (2015). the National Science Foundation under the CAREER Award [19] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Computation 9 (1997) 1735–1780. Number 1652203. The content is solely the responsibility [20] L. R. Rabiner, A Tutorial on hidden Markov-models and selected of the authors and does not necessarily represent the official applications in speech recognition, Proceedings of the IEEE 77 (1989) views of the National Science Foundation. 257–286. [21] P. J. Werbos, Backpropagation through time - what it does and how to do it, Proceedings of the IEEE 78 (1990) 1550–1560. References [22] D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning representa- [1] L. Glass, Synchronization and rhythmic processes in physiology, tions by back-propagating errors, Nature 323 (1986) 533–536. Nature 410 (2001) 277–284. [23] A. Graves, G. Wayne, I. Danihelka, Neural turing machines, CoRR [2] R. M. Rangayyan, N. P. Reddy, Biomedical signal analysis: A case- abs/1410.5401 (2014). study approach, Annals of Biomedical Engineering 30 (2002) 983– [24] A. Cohen, Hidden Markov models in biomedical signal processing, 983. in: Proceedings of the 20th Annual International Conference of the [3] P. Rashidi, A. Mihailidis, A survey on ambient-assisted living tools IEEE Engineering in Medicine and Biology Society, volume 3, IEEE, for older adults, IEEE Journal of Biomedical and Health Informatics 1998, pp. 1145–1150. 17 (2013) 579–590. [25] L. E. Baum, T. Petrie, Statistical inference for probabilistic functions [4] J. Kim, M. Kim, I. Won, S. Yang, K. Lee, W. Huh, A biomedical of finite state Markov chains, Annals of Mathematical Statistics 37 signal segmentation algorithm for event detection based on slope (1966) 1554–1563. tracing, in: Proceedings of the 31st Annual International Conference [26] D. Jurafsky, J. H. Martin, Speech and language processing, 2nd ed., of the IEEE Engineering in Medicine and Biology Society, IEEE, Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 2009. 2009, pp. 1889–1892. doi:10.1109/IEMBS.2009.5333874. [27] L. E. Baum, J. A. Eagon, An Inequality with Applications to Sta- [5] J. Andreu-Perez, C. C. Poon, R. D. Merrifield, S. T. Wong, G. Z. tistical Estimation for Probabilistic Functions of Markov Processes Yang, Big data for health, IEEE Journal of Biomedical and Health and to a Model for Ecology, Bulletin of the American Mathematical Informatics 19 (2015) 1193–1208. Society 73 (1967) 360–363. [6] R. Gravina, P. Alinia, H. Ghasemzadeh, G. Fortino, Multi-sensor [28] L. E. Baum, G. R. Sell, Growth transformations for functions on fusion in body sensor networks: State-of-the-art and research chal- manifolds, Pacific Journal of Mathematics 27 (1968) 211–227. lenges, Information Fusion 35 (2017) 68–80. [29] L. Baum, An inequality and associated maximization technique in [7] D. P. Mandic, D. Obradovic, A. Kuh, T. Adali, U. Trutschell, M. Golz, statistical estimation of probabilistic functions of a Markov process, P. De Wilde, J. Barria, A. Constantinides, J. Chambers, W. Duch, in: Proceedings of the 3rd Symposium on Inequalities, volume 3, J. Kacprzyk, E. Oja, S. Zadrożny, Data Fusion for Modern Engineer- 1972, pp. 1–8. ing Applications: An Overview, 2005. [30] A. P. Dempster, N. M. Laird, D. B. Rubin, Maximum likelihood from [8] D. Mandic, M. Golz, A. Kuh, D. Obradovic, T. Tanaka, Signal Pro- incomplete data via the EM algorithm, Journal of the Royal Statistical cessing Techniques for Knowledge Extraction and Information Fu- Society: Series B (Statistical Methodology) 39 (1977) 1–38. sion, Springer US, 2008. [31] L. A. Liporace, Maximum-likelihood estimation for multivariate [9] L. Huiying, L. Sakari, H. Iiro, A heart sound segmentation algorithm observations of Markov sources, IEEE Transactions on Information using wavelet decomposition and reconstruction, in: Proceedings of Theory 28 (1982) 729–734. the 19th Annual International Conference of the IEEE Engineering [32] B. H. Juang, Maximum-likelihood estimation for mixture multivariate in Medicine and Biology Society, volume 4, IEEE, 1997, pp. 1630– stochastic observations of Markov-chains, AT&T Technical Journal 1633. 64 (1985) 1235–1249. [10] J. Pan, W. J. Tompkins, A real-time QRS detection algorithm, IEEE [33] Levinson, S, M. Sondhi, Maximum likelihood estimation for multi- Transactions on Biomedical Engineering 32 (1985) 230–236. variate mixture observations of markov chains, IEEE Transactions [11] V. Srinivasan, C. Eswaran, N. Sriraam, Approximate entropy-based on Information Theory 32 (1986) 307–309. epileptic EEG detection using artificial neural networks, IEEE Trans- [34] G. Safont, A. Salazar, L. Vergara, E. Gómez, V. Villanueva, Mul- actions on Information Technology in Biomedicine 11 (2007) 288– tichannel dynamic modeling of non-Gaussian mixtures, Pattern Recognition 93 (2019) 312–323. [12] N. Kannathal, M. L. Choo, U. R. Acharya, P. K. Sadasivan, Entropies [35] A. Salazar, L. Vergara, R. Miralles, On including sequential depen- for detection of epilepsy in EEG, Computer Methods and Programs dence in ICA mixture models, Signal Processing 90 (2010) 2314– in Biomedicine 80 (2005) 187–94. [13] A. Schlogl, F. Lee, H. Bischof, G. Pfurtscheller, Characterization of [36] A. Graves, Supervised sequence labelling, in: Supervised sequence four-class motor imagery EEG data for the BCI-competition 2005, labelling with recurrent neural networks, Springer, 2012, pp. 5–13. Journal of Neural Engineering 2 (2005) L14–L22. [37] J. L. Elman, Finding structure in time, Cognitive Science 14 (1990) [14] Y. Khalifa, J. L. Coyle, E. Sejdić, Non-invasive identification of 179 – 211. swallows via deep learning in high resolution cervical auscultation [38] M. I. Jordan, Attractor dynamics and parallelism in a connectionist recordings, Scientific Reports 10 (2020) 8704. sequential machine, in: J. Diederich (Ed.), Artificial Neural Networks, [15] Y. Khalifa, C. Donohue, J. Coyle, E. Sejdić, Upper esophageal IEEE Press, Piscataway, NJ, USA, 1990, pp. 112–127. sphincter opening segmentation with convolutional recurrent neural Khalifa et al., 2020 Page 16 of 21 [39] H. Jaeger, The "echo state" approach to analysing and training recur- [58] E. D. Übeyli, Combining recurrent neural networks with eigenvector rent neural networks-with an erratum note, Technical Report, German methods for classification of ECG beats, Digital Signal Processing National Research Center for Information Technology, 2001. 19 (2009) 320–329. [40] Y. Khalifa, Z. Zhang, E. Sejdić, Sparse recovery of time-frequency [59] A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. representations via recurrent neural networks, in: Proceedings of the Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C. K. Peng, H. E. 22nd International Conference on Digital Signal Processing, ACM, Stanley, PhysioBank, PhysioToolkit, and PhysioNet: Components of 2017, pp. 1–5. a new research resource for complex physiologic signals, Circulation [41] M. I. Jordan, Serial order: A parallel distributed processing approach, 101 (2000) E215–E220. in: Neural Network Models of Cognition, volume 121 of Advances [60] C. Zhang, G. Wang, J. Zhao, P. Gao, J. Lin, H. Yang, Patient-specific in Psychology, North-Holland, 1997, pp. 471–495. ECG classification based on recurrent neural networks and clustering [42] R. Pascanu, T. Mikolov, Y. Bengio, On the difficulty of training technique, in: Proceedings of the 13th International Conference on recurrent neural networks, in: Proceedings of the 30th International Biomedical Engineering, 2017, pp. 63–67. Conference on Machine Learning, volume 28, 2013, pp. III–1310– [61] G. B. Moody, R. G. Mark, The impact of the MIT-BIH arrhythmia III–1318. database, IEEE Engineering in Medicine and Biology Magazine 20 [43] Y. Bengio, P. Simard, P. Frasconi, Learning long-term dependen- (2001) 45–50. cies with gradient descent is difficult, IEEE Transactions on Neural [62] Z. Xiong, M. K. Stiles, J. Zhao, Robust ECG signal classification Networks 5 (1994) 157–166. for detection of atrial fibrillation using a novel neural network, in: [44] A. Graves, M. Liwicki, S. Fernandez, R. Bertolami, H. Bunke, Computing in Cardiology, volume 44, 2017, pp. 1–4. J. Schmidhuber, A novel connectionist system for unconstrained [63] D. H. Wolpert, Stacked generalization, Neural Networks 5 (1992) handwriting recognition, IEEE Transactions on Pattern Analysis and 241–259. Machine Intelligence 31 (2009) 855–868. [64] M. Zihlmann, D. Perekrestenko, M. Tschannen, Convolutional recur- [45] X. Glorot, Y. Bengio, Understanding the difficulty of training deep rent neural networks for electrocardiogram classification, in: Com- feedforward neural networks, in: Y. W. Teh, M. Titterington (Eds.), puting in Cardiology, 2017, pp. 1–4. Proceedings of the 13th International Conference on Artificial Intelli- [65] M. Limam, F. Precioso, Atrial fibrillation detection and ECG classifi- gence and Statistics, volume 9 of Proceedings of Machine Learning cation based on convolutional recurrent neural network, in: Comput- Research, PMLR, 2010, pp. 249–256. ing in Cardiology, 2017, pp. 1–4. doi:10.22489/CinC.2017.171-325. [46] K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, [66] Y. Chang, S. Wu, L. Tseng, H. Chao, C. Ko, AF detection by exploit- H. Schwenk, Y. Bengio, Learning phrase representations using RNN ing the spectral and temporal characteristics of ECG signals with the encoder–decoder for statistical machine translation, in: Proceed- LSTM model, in: Computing in Cardiology, volume 45, 2018, pp. ings of the Conference on Empirical Methods in Natural Language 1–4. Processing, 2014, pp. 1724–1734. [67] S. Petrutiu, A. V. Sahakian, S. Swiryn, Abrupt changes in fibrillatory [47] F. A. Gers, J. Schmidhuber, F. Cummins, Learning to forget: Contin- wave characteristics at the termination of paroxysmal atrial fibrillation ual prediction with LSTM, in: Proceedings of the 9th International in humans, EP Europace 9 (2007) 466–470. Conference on Artificial Neural Networks, volume 2, IEEE, 1999, [68] A. Taddei, G. Distante, M. Emdin, P. Pisani, G. B. Moody, C. Zee- pp. 850–855. doi:10.1049/cp:19991218. lenberg, C. Marchesi, The European ST-T database: standard for [48] A. Viterbi, Error bounds for convolutional codes and an asymptoti- evaluating systems for the analysis of ST-T changes in ambulatory cally optimum decoding algorithm, IEEE Transactions on Informa- electrocardiography, European Heart Journal 13 (1992) 1164–1172. tion Theory 13 (1967) 260–269. [69] F. M. Nolle, F. K. Badura, J. M. Catlett, R. W. Bowser, M. H. Sketch, [49] P. Schwab, G. C. Scebba, J. Zhang, M. Delai, W. Karlen, Beat by CREI-GARD, a new concept in computerized arrhythmia monitoring beat: Classifying cardiac arrhythmias with recurrent neural networks, systems, Computers in Cardiology 13 (1987) 515–518. in: Computing in Cardiology, volume 44, 2017, pp. 1–4. [70] R. Bousseljot, D. Kreiseler, A. Schnabel, Nutzung der EKG- [50] M. F. Stollenga, W. Byeon, M. Liwicki, J. Schmidhuber, Parallel Signaldatenbank CARDIODAT der PTB über das Internet, Biomedi- multi-dimensional LSTM, with application to fast biomedical volu- zinische Technik/Biomedical Engineering 40 (1995) 317–318. metric image segmentation, arXiv preprint arXiv:1506.07452 (2015). [71] H. W. Lui, K. L. Chow, Multiclass classification of myocardial infarc- [51] W. Gersch, P. Lilly, E. Dong, PVC detection by the heart-beat interval tion with convolutional and recurrent neural networks for portable data—Markov chain approach, Computers and Biomedical Research ECG devices, Informatics in Medicine Unlocked 13 (2018) 26–33. 8 (1975) 370 – 378. [72] G. D. Clifford, C. Liu, B. Moody, L. H. Lehman, I. Silva, Q. Li, A. E. [52] D. A. Coast, R. M. Stern, G. G. Cano, S. A. Briller, An approach Johnson, R. G. Mark, AF classification from a short single lead ECG to cardiac arrhythmia analysis using hidden Markov models, IEEE recording: The PhysioNet/computing in cardiology challenge 2017, Transactions on Biomedical Engineering 37 (1990) 826–836. 2017, pp. 1–4. doi:10.22489/CinC.2017.065-469. [53] R. E. Hermes, D. B. Geselowitz, G. Oliver, Development, distribution, [73] S. Singh, S. K. Pandey, U. Pawar, R. R. Janghel, Classification of ECG and use of the American Heart Association database for ventricular Arrhythmia using Recurrent Neural Networks, Procedia Computer arrhythmia detector evaluation, Computers in Cardiology (1980) Science 132 (2018) 1290–1297. 263–266. [74] A. Kadish, A. E. Buxton, H. Kennedy, B. P. Knight, J. W. Mason, [54] R. V. Andreao, B. Dorizzi, J. Boudy, ECG signal analysis through hid- C. Schuger, C. Tracy, W. L. Winters, A. W. Boone, M. Elnicki, den Markov models, IEEE Transactions on Biomedical Engineering J. W. Hirshfeld, B. H. Lorell, G. Rodgers, H. H. Weitz, ACC/AHA 53 (2006) 1541–1549. clinical competence statement on electrocardiography and ambulatory [55] P. Laguna, R. G. Mark, A. Goldberg, G. B. Moody, A database for electrocardiography, Journal of the American College of Cardiology evaluation of algorithms for measurement of QT and other waveform 38 (2001) 3169–3178. intervals in the ECG, in: Computers in Cardiology, 1997, pp. 673– [75] M. H Crawford, S. Bernstein, P. Deedwania, J. Dimarco, K. J Ferrick, 676. A. Garson, L. Green, H. Leon Greene, M. Silka, P. H Stone, C. Tracy, [56] F. Sandberg, M. Stridh, L. Sornmo, Frequency tracking of atrial R. Gibbons, ACC/AHA guidelines for ambulatory electrocardiog- fibrillation using hidden Markov models, IEEE Transactions on raphy, Journal of the American College of Cardiology 34 (1999) Biomedical Engineering 55 (2008) 502–511. 912–948. [57] J. Oliveira, C. Sousa, M. T. Coimbra, Coupled hidden Markov model [76] K. S. Sayed, A. F. Khalaf, Y. M. Kadah, Arrhythmia classification for automatic ECG and PCG segmentation, in: Proceedings of the based on novel distance series transform of phase space trajectories, IEEE International Conference on Acoustics, Speech and Signal in: Proceedings of the 37th Annual International Conference of the Processing, 2017, pp. 1023–1027. IEEE Engineering in Medicine and Biology Society, 2015, pp. 5195– Khalifa et al., 2020 Page 17 of 21 5198. neering in Medicine and Biology Magazine 20 (2001) 51–57. [77] D. L. Schomer, F. L. Da Silva, Niedermeyer’s electroencephalogra- [96] H. Phan, F. Andreotti, N. Cooray, O. Y. Chén, M. De Vos, Se- phy: basic principles, clinical applications, and related fields, 6th ed., qSleepNet: End-to-End Hierarchical Recurrent Neural Network for Lippincott Williams \& Wilkins, 2012. Sequence-to-Sequence Automatic Sleep Staging, IEEE Transactions [78] D. P. Subha, P. K. Joseph, U. R. Acharya, C. M. Lim, EEG signal on Neural Systems and Rehabilitation Engineering 27 (2019) 400– analysis: A survey, Journal of Medical Systems 34 (2010) 195–212. 410. [79] A. Flexerand, G. Dorffner, P. Sykacekand, I. Rezek, An automatic, [97] N. Michielli, U. R. Acharya, F. Molinari, Cascaded LSTM recurrent continuous and probabilistic sleep stager based on a hidden markov neural network for automated sleep stage classification using single- model, Applied Artificial Intelligence 16 (2002) 199–207. channel EEG signals, Computers in Biology and Medicine 106 (2019) [80] A. Flexer, G. Gruber, G. Dorffner, A reliable probabilistic sleep stager 71–81. based on a single EEG signal, Artificial Intelligence in Medicine 33 [98] C. Sun, J. Fan, C. Chen, W. Li, W. Chen, A Two-Stage Neural (2005) 199–207. Network for Sleep Stage Classification Based on Feature Learning, [81] L. G. Doroshenkov, V. A. Konyshev, S. V. Selishchev, Classification Sequence Learning, and Data Augmentation, IEEE Access 7 (2019) of human sleep stages based on EEG processing using hidden Markov 109386–109397. models, Biomedical Engineering 41 (2007) 25–28. [99] S. H. Sheldon, R. Ferber, M. H. Kryger, Principles and practice of [82] B. Kemp, A. H. Zwinderman, B. Tuk, H. A. Kamphuisen, J. J. Oberye, pediatric sleep medicine, 1st ed., Elsevier Health Sciences, 2005. Analysis of a sleep-dependent neuronal feedback loop: the slow- [100] D. Y. Kang, P. N. DeYoung, A. Malhotra, R. L. Owens, T. P. Coleman, wave microcontinuity of the EEG, IEEE Transactions on Biomedical A state space and density estimation framework for sleep staging in Engineering 47 (2000) 1185–1194. obstructive sleep apnea, IEEE Transactions on Biomedical Engineer- [83] M. T. Bianchi, N. A. Eiseman, S. S. Cash, J. Mietus, C. K. Peng, R. J. ing 65 (2018) 1201–1212. Thomas, Probabilistic sleep architecture models in patients with and [101] A. Roebuck, V. Monasterio, E. Gederi, M. Osipov, J. Behar, A. Mal- without sleep apnea, Journal of Sleep Research 21 (2012) 330–341. hotra, T. Penzel, G. D. Clifford, A review of signals used in sleep [84] S. F. Quan, B. V. Howard, C. Iber, J. P. Kiley, F. J. Nieto, G. T. analysis, Physiological Measurement 35 (2013) R1–R57. O’Connor, D. M. Rapoport, S. Redline, J. Robbins, J. M. Samet, P. W. [102] C. on Epidemiology and Prognosis, I. L. A. Epilepsy, Guidelines for Wahl, The sleep heart health study: Design, rationale, and methods, epidemiologic studies on epilepsy, Epilepsia 34 (1993) 592–596. Sleep 20 (1997) 1077–1085. [103] W. W. Lytton, Computer modelling of epilepsy, Nature Reviews: [85] S. T. Pan, C. E. Kuo, J. H. Zeng, S. F. Liang, A transition-constrained Neuroscience 9 (2008) 626–637. discrete hidden Markov model for automatic sleep staging, Biomedi- [104] M. H. Abdullah, J. M. Abdullah, M. Z. Abdullah, Seizure detection cal Engineering Online 11 (2012) 52. by means of hidden Markov model and stationary wavelet transform [86] F. Yaghouby, S. Sunderam, Quasi-supervised scoring of human sleep of electroencephalograph signals, in: Proceedings of the IEEE- in polysomnograms using augmented input variables, Computers in EMBS International Conference on Biomedical and Health Informat- Biology and Medicine 59 (2015) 54–63. ics, IEEE, 2012, pp. 62–65. doi:10.1109/BHI.2012.6211506. [87] J. A. Onton, D. Y. Kang, T. P. Coleman, Visualization of whole-night [105] O. Smart, M. Chen, Semi-automated patient-specific scalp EEG sleep EEG from 2-channel mobile recording device reveals distinct seizure detection with unsupervised machine learning, in: Proceed- deep sleep stages with differential electrodermal activity, Frontiers ings of the IEEE Conference on Computational Intelligence in Bioin- in Human Neuroscience 10 (2016) 605. formatics and Computational Biology, 2015, pp. 1–7. [88] P. R. Davidson, R. D. Jones, M. T. R. Peiris, Detecting behavioral [106] S. Santaniello, D. L. Sherman, M. A. Mirski, N. V. Thakor, S. V. microsleeps using EEG and LSTM recurrent neural networks, in: Sarma, A Bayesian framework for analyzing iEEG data from a rat Proceedings of the 20th Annual International Conference of the IEEE model of epilepsy, in: Proceedings of the 33rd Annual International Engineering in Medicine and Biology Society, IEEE, 2005, pp. 5754– Conference of the IEEE Engineering in Medicine and Biology Soci- 5757. ety, 2011, pp. 1435–1438. [89] Y. L. Hsu, Y. T. Yang, J. S. Wang, C. Y. Hsu, Automatic sleep [107] F. Mormann, T. Kreuz, C. Rieke, R. G. Andrzejak, A. Kraskov, stage recurrent neural classifier using energy features of EEG signals, P. David, C. E. Elger, K. Lehnertz, On the predictability of epileptic Neurocomputing 104 (2013) 105–114. seizures, Clinical Neurophysiology 116 (2005) 569–587. [90] A. Supratak, H. Dong, C. Wu, Y. Guo, DeepSleepNet: A model [108] T. Maiwald, M. Winterhalder, R. Aschenbrenner-Scheibe, H. U. Voss, for automatic sleep stage scoring based on raw single-channel EEG, A. Schulze-Bonhage, J. Timmer, Comparison of three nonlinear IEEE Transactions on Neural Systems and Rehabilitation Engineering seizure prediction methods by means of the seizure prediction char- 25 (2017) 1998–2008. acteristic, Physica D-Nonlinear Phenomena 194 (2004) 357–368. [91] C. O’Reilly, N. Gosselin, J. Carrier, T. Nielsen, Montreal archive of [109] P. E. McSharry, L. A. Smith, L. Tarassenko, Prediction of epileptic sleep studies: An open-access resource for instrument benchmarking seizures: Are nonlinear methods relevant?, Nature Medicine 9 (2003) and exploratory research, Journal of Sleep Research 23 (2014) 628– 241–242. 635. [110] Y. C. Lai, M. A. Harrison, M. G. Frei, I. Osorio, Controlled test for [92] S. Biswal, J. Kulas, H. Sun, B. Goparaju, M. B. Westover, M. T. predictive power of Lyapunov exponents: their inability to predict Bianchi, J. Sun, SLEEPNET: Automated Sleep Staging System via epileptic seizures, Chaos 14 (2004) 630–642. Deep Learning, arXiv preprint arXiv:1707.08262 (2017). [111] M. Winterhalder, T. Maiwald, H. U. Voss, R. Aschenbrenner-Scheibe, [93] H. Phan, F. Andreotti, N. Cooray, O. Y. Chén, M. D. Vos, Auto- J. Timmer, A. Schulze-Bonhage, The seizure prediction characteris- matic sleep stage classification using single-channel EEG: Learning tic: A general framework to assess and compare seizure prediction sequential features with attention-based recurrent neural networks, in: methods, Epilepsy & Behavior 4 (2003) 318–325. Proceedings of the 40th Annual International Conference of the IEEE [112] S. Wong, A. B. Gardner, A. M. Krieger, B. Litt, A stochastic Engineering in Medicine and Biology Society, 2018, pp. 1452–1455. framework for evaluating seizure prediction algorithms using hidden [94] E. Bresch, U. Großekathöfer, G. Garcia-Molina, Recurrent Deep Markov models, Journal of Neurophysiology 97 (2007) 2525–2532. Neural Networks for Real-Time Sleep Stage Classification From [113] A. B. Gardner, A. M. Krieger, G. Vachtsevanos, B. Litt, One-class Single Channel EEG, Frontiers in Computational Neuroscience 12 novelty detection for seizure analysis from intracranial EEG, Journal (2018) 85. of Machine Learning Research 7 (2006) 1025–1044. [95] G. Klosh, B. Kemp, T. Penzel, A. Schlogl, P. Rappelsberger, [114] B. Direito, C. Teixeira, B. Ribeiro, M. Castelo-Branco, F. Sales, E. Trenker, G. Gruber, J. Zeithofer, B. Saletu, W. M. Herrmann, S. L. A. Dourado, Modeling epileptic brain states using EEG spectral Himanen, D. Kunz, M. J. Barbanoj, J. Roschke, A. Varri, G. Dorffner, analysis and topographic mapping, Journal of Neuroscience Methods The SIESTA project polygraphic and clinical database, IEEE Engi- 210 (2012) 220–229. Khalifa et al., 2020 Page 18 of 21 [115] M. Ihle, H. Feldwisch-Drentrup, C. A. Teixeira, A. Witon, B. Schelter, [134] G. Schalk, E. C. Leuthardt, Brain-computer interfaces using electro- J. Timmer, A. Schulze-Bonhage, EPILEPSIAE - a European epilepsy corticographic signals, IEEE Reviews in Biomedical Engineering 4 database, Computer Methods and Programs in Biomedicine 106 (2011) 140–154. (2012) 127–138. [135] K. Sayed, M. Kamel, M. Alhaddad, H. M. Malibary, Y. M. Kadah, [116] A. H. Shoeb, Application of machine learning to epileptic seizure on- Characterization of phase space trajectories for Brain-Computer In- set detection and treatment, {PhD} {Thesis}, Massachusetts Institute terface, Biomedical Signal Processing and Control 38 (2017) 55–66. of Technology, 2009. [136] K. Sayed, M. Kamel, M. Alhaddad, H. M. Malibary, Y. M. [117] A. Petrosian, D. Prokhorov, R. Homan, R. Dasheiff, D. Wunsch, Kadah, Extracting phase space morphological features for Recurrent neural network based prediction of epileptic seizures in electroencephalogram-based brain-computer interface, Journal of intra- and extracranial EEG, Neurocomputing 30 (2000) 201–218. Medical Imaging and Health Informatics 7 (2017) 771–774. [118] N. F. Güler, E. D. Übeyli, n. Güler, Recurrent neural networks [137] E. Donchin, K. M. Spencer, R. Wijesinghe, The mental prosthesis: employing Lyapunov exponents for EEG signals classification, Expert Assessing the speed of a P300-based brain-computer interface, IEEE Systems with Applications 29 (2005) 506–514. Transactions on Rehabilitation Engineering 8 (2000) 174–179. [119] R. G. Andrzejak, K. Lehnertz, F. Mormann, C. Rieke, P. David, C. E. [138] B. Obermaier, C. Neuper, C. Guger, G. Pfurtscheller, Information Elger, Indications of nonlinear deterministic and finite-dimensional transfer rate in a five-classes brain-computer interface, IEEE Trans- structures in time series of brain electrical activity: dependence on actions on Neural Systems and Rehabilitation Engineering 9 (2001) recording region and brain state, Physical Review. E: Statistical, 283–288. Nonlinear, and Soft Matter Physics 64 (2001) 061907. [139] B. Obermaier, C. Guger, C. Neuper, G. Pfurtscheller, Hidden Markov [120] S. P. Kumar, N. Sriraam, P. G. Benakop, Automated detection of models for online classification of single trial EEG data, Pattern epileptic seizures using wavelet entropy feature with recurrent neural Recognition Letters 22 (2001) 1299–1309. network classifier, in: Proceedings of the IEEE Region 10 Interna- [140] G. Pfurtscheller, C. Neuper, G. R. Muller, B. Obermaier, G. Krausz, tional Conference, 2008, pp. 1–5. A. Schlogl, R. Scherer, B. Graimann, C. Keinrath, D. Skliris, [121] G. R. Minasyan, J. B. Chatten, M. J. Chatten, R. N. Harner, Patient- M. Wortz, G. Supp, C. Schrank, Graz-BCI: State of the art and specific early seizure detection from scalp EEG, Journal of Clinical clinical applications, IEEE Transactions on Neural Systems and Neurophysiology 27 (2010) 163–178. Rehabilitation Engineering 11 (2003) 177–180. [122] M. A. Naderi, H. Mahdavi-Nasab, Analysis and classification of EEG [141] S. Solhjoo, A. M. Nasrabadi, M. R. H. Golpayegani, Classification signals using spectral analysis and recurrent neural networks, in: Pro- of chaotic signals using HMM classifiers: EEG-based mental task ceedings of the 17th Iranian Conference of Biomedical Engineering, classification, in: Proceedings of the 13th European Signal Processing 2010, pp. 1–4. Conference, 2005, pp. 1–4. [123] L. Vidyaratne, A. Glandon, M. Alam, K. M. Iftekharuddin, Deep [142] G. Pfurtscheller, A. Schlögl, Dataset III: Motor imagery, Technical recurrent neural network for seizure detection, in: Proceedings of Report, 2003. the IEEE International Joint Conference on Neural Networks, IEEE, [143] H. Suk, S. Lee, Two-layer hidden Markov models for multi-class 2016, pp. 1202–1207. motor imagery classification, in: Proceedings of the 1st Workshop on [124] S. S. Talathi, Deep Recurrent Neural Networks for seizure detection Brain Decoding: Pattern Recognition Challenges in Neuroimaging, and early seizure detection systems, arXiv preprint arXiv:1706.03283 2010, pp. 5–8. (2017). [144] C. Brunner, R. Leeb, G. Müller-Putz, A. Schlögl, G. Pfurtscheller, [125] M. Golmohammadi, S. Ziyabari, V. Shah, E. Von Weltin, C. Camp- Dataset IIa: Graz dataset A, Technical Report, 2008. bell, I. Obeid, J. Picone, Gated recurrent networks for seizure [145] W. Speier, C. Arnold, J. Lu, A. Deshpande, N. Pouratian, Integrating detection, in: Proceedings of the 2017 IEEE Signal Process- language information with a hidden Markov model to improve com- ing in Medicine and Biology Symposium (SPMB), 2017, pp. 1–5. munication rate in the P300 speller, IEEE Transactions on Neural doi:10.1109/SPMB.2017.8257020. Systems and Rehabilitation Engineering 22 (2014) 678–684. [126] I. Obeid, J. Picone, The Temple University Hospital EEG Data [146] A. Erfanian, B. Mahmoudi, Real-time ocular artifact suppression Corpus, Frontiers in Neuroscience 10 (2016). using recurrent neural network for electro-encephalogram based brain- [127] M. Golmohammadi, V. Shah, S. Lopez, S. Ziyabari, S. Yang, J. Ca- computer interface, Medical & Biological Engineering & Computing maratta, I. Obeid, J. Picone, The TUH EEG seizure corpus, in: 43 (2005) 296–305. Proceedings of the American Clinical Neurophysiology Society An- [147] E. M. Forney, C. W. Anderson, Classification of EEG during imagined nual Meeting, 2017, p. 1. mental tasks by forecasting with Elman recurrent neural networks, in: [128] S. Raghu, N. Sriraam, G. P. Kumar, Classification of epileptic seizures Proceedings of the IEEE International Joint Conference on Neural using wavelet packet log energy and norm entropies with recurrent Networks, IEEE, 2011, pp. 2749–2755. Elman neural network classifier, Cognitive Neurodynamics 11 (2017) [148] D. Balderas, A. Molina, P. Ponce, Alternative classification tech- 51–66. niques for brain-computer interfaces for smart sensor manufacturing [129] A. M. Abdelhameed, H. G. Daoud, M. Bayoumi, Deep Convolu- environments, IFAC-PapersOnLine 48 (2015) 680–685. tional Bidirectional LSTM Recurrent Neural Network for Epilep- [149] R. Leeb, F. Lee, C. Keinrath, R. Scherer, H. Bischof, G. Pfurtscheller, tic Seizure Detection, in: 2018 16th IEEE International New Brain-computer communication: Motivation, aim, and impact of Circuits and Systems Conference (NEWCAS), 2018, pp. 139–143. exploring a virtual apartment, IEEE Transactions on Neural Systems doi:10.1109/NEWCAS.2018.8585542. and Rehabilitation Engineering 15 (2007) 473–482. [130] H. Daoud, M. Bayoumi, Deep Learning based Reliable Early Epilep- [150] R. Maddula, J. Stivers, M. Mousavi, S. Ravindran, V. de Sa, Deep tic Seizure Predictor, in: 2018 IEEE Biomedical Circuits and Sys- recurrent convolutional neural networks for classifying P300 BCI tems Conference (BioCAS), 2018, pp. 1–4. doi:10.1109/BIOCAS.2018. signals, in: Proceedings of the 7th Graz Brain-Computer Interface 8584678. Conference, 2017. [131] R. Hussein, H. Palangi, R. K. Ward, Z. J. Wang, Optimized deep [151] J. Stivers, V. de Sa, Spelling in parallel: Towards a rapid, spatially neural network architecture for robust detection of epileptic seizures independent BCI, in: Proceedings of the 7th Graz Brain-Computer using EEG signals, Clinical Neurophysiology 130 (2019) 25–37. Interface Conference, 2017. [132] G. Pfurtscheller, C. Neuper, Motor imagery and direct brain-computer [152] J. Thomas, T. Maszczyk, N. Sinha, T. Kluge, J. Dauwels, Deep communication, Proceedings of the IEEE 89 (2001) 1123–1134. learning-based classification for brain-computer interfaces, in: Pro- [133] E. C. Leuthardt, G. Schalk, J. R. Wolpaw, J. G. Ojemann, D. W. Moran, ceedings of the IEEE International Conference on Systems, Man, and A brain-computer interface using electrocorticographic signals in Cybernetics, 2017, pp. 234–239. humans, Journal of Neural Engineering 1 (2004) 63–71. [153] V. P. Oikonomou, G. Liaros, K. Georgiadis, E. Chatzilari, K. Adam, Khalifa et al., 2020 Page 19 of 21 S. Nikolopoulos, I. Kompatsiaris, Comparative evaluation of [173] Z. Li, M. Hayashibe, C. Fattal, D. Guiraud, Muscle fatigue tracking state-of-the-art algorithms for SSVEP-based BCIs, arXiv preprint with evoked EMG via recurrent neural network: Toward personal- arXiv:1602.00904 (2016). ized neuroprosthetics, IEEE Computational Intelligence Magazine 9 [154] C. Spampinato, S. Palazzo, I. Kavasidis, D. Giordano, N. Souly, (2014) 38–46. M. Shah, Deep learning human mind for automated visual classifica- [174] P. Xia, J. Hu, Y. Peng, EMG-based estimation of limb movement tion, in: Proceedings of the IEEE Conference on Computer Vision using deep learning with recurrent convolutional neural networks, and Pattern Recognition, 2017, pp. 6809–6817. Artificial Organs 42 (2018) E67–E77. [155] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. H. [175] F. Quivira, T. Koike-Akino, Y. Wang, D. Erdogmus, Translating Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, L. Fei- sEMG signals to continuous hand poses using recurrent neural net- Fei, ImageNet large scale visual recognition challenge, International works, in: Proceedings of the IEEE-EMBS International Conference Journal of Computer Vision 115 (2015) 211–252. on Biomedical and Health Informatics, 2018, pp. 166–169. [156] T. Hosman, M. Vilela, D. Milstein, J. N. Kelemen, D. M. Brandman, [176] A. Graves, Generating sequences with recurrent neural networks, L. R. Hochberg, J. D. Simeral, BCI decoder performance comparison CoRR abs/1308.0850 (2013). of an LSTM recurrent neural network and a Kalman filter in retro- [177] Y. Hu, Y. Wong, W. Wei, Y. Du, M. Kankanhalli, W. Geng, A spective simulation, in: Proceedings of the 2019 9th International novel attention-based hybrid CNN-RNN architecture for sEMG-based IEEE/EMBS Conference on Neural Engineering (NER), 2019, pp. gesture recognition, PloS One 13 (2018) e0206049. 1066–1071. doi:10.1109/NER.2019.8717140. [178] M. Atzori, A. Gijsberts, C. Castellini, B. Caputo, A. G. Hager, S. Elsig, [157] L. R. Hochberg, M. D. Serruya, G. M. Friehs, J. A. Mukand, M. Saleh, G. Giatsidis, F. Bassetto, H. Muller, Electromyography data for non- A. H. Caplan, A. Branner, D. Chen, R. D. Penn, J. P. Donoghue, invasive naturally-controlled robotic hand prostheses, Scientific data Neuronal ensemble control of prosthetic devices by a human with 1 (2014) 140053. tetraplegia, Nature 442 (2006) 164–171. [179] A. Samadani, Gated Recurrent Neural Networks for EMG-Based [158] D. Zhang, L. Yao, K. Chen, S. Wang, X. Chang, Y. Liu, Making Hand Gesture Classification. A Comparative Study, in: Proceedings Sense of Spatio-Temporal Preserving Representations for EEG-Based of the 2018 40th Annual International Conference of the IEEE En- Human Intention Recognition, IEEE Transactions on Cybernetics 50 gineering in Medicine and Biology Society (EMBC), 2018, pp. 1–4. (2020) 3033–3044. doi:10.1109/EMBC.2018.8512531. [159] G. Schalk, D. J. McFarland, T. Hinterberger, N. Birbaumer, J. R. [180] M. Atzori, A. Gijsberts, S. Heynen, A.-G. M. Hager, O. Deriaz, Wolpaw, BCI2000: a general-purpose brain-computer interface (BCI) P. van der Smagt, C. Castellini, B. Caputo, H. Müller, Building system, IEEE Transactions on Biomedical Engineering 51 (2004) the Ninapro database: A resource for the biorobotics community, 1034–1043. in: Proceedings of the 2012 4th IEEE RAS EMBS International [160] S. Tortora, S. Ghidoni, C. Chisari, S. Micera, F. Artoni, Deep Conference on Biomedical Robotics and Biomechatronics (BioRob), learning-based BCI for gait decoding from EEG with LSTM recurrent 2012, pp. 1258–1265. doi:10.1109/BioRob.2012.6290287. neural network, Journal of Neural Engineering 17 (2020) 046011. [181] M. Simão, P. Neto, O. Gibaru, EMG-based online classification of [161] M. J. Zwarts, D. F. Stegeman, Multichannel surface EMG: Basic gestures with recurrent neural networks, Pattern Recognition Letters aspects and clinical utility, Muscle & Nerve 28 (2003) 1–17. 128 (2019) 45–51. [162] J. Y. Hogrel, Clinical applications of surface electromyography in [182] M. Simão, P. Neto, O. Gibaru, UC2018 DualMyo Hand Gesture neuromuscular disorders, Neurophysiologie Clinique 35 (2005) 59– Dataset, 2018. 71. [183] S. Pizzolato, L. Tagliapietra, M. Cognolato, M. Reggiani, H. Müller, [163] M. A. Oskoei, H. S. Hu, Myoelectric control systems-A survey, M. Atzori, Comparison of six electromyography acquisition setups on Biomedical Signal Processing and Control 2 (2007) 275–294. hand movement classification tasks, PloS One 12 (2017) e0186132. [164] K. Englehart, B. Hudgins, P. A. Parker, A wavelet-based continuous [184] S. E. Schmidt, C. Holst-Hansen, C. Graff, E. Toft, J. J. Struijk, Seg- classification scheme for multifunction myoelectric control, IEEE mentation of heart sound recordings by a duration-dependent hidden Transactions on Biomedical Engineering 48 (2001) 302–311. Markov model, Physiological Measurement 31 (2010) 513–529. [165] A. D. Chan, K. B. Englehart, Continuous myoelectric control for [185] A. D. Ricke, R. J. Povinelli, M. T. Johnson, Automatic segmentation powered prostheses using hidden Markov models, IEEE Transactions of heart sound signals using hidden markov models, in: Computers on Biomedical Engineering 52 (2005) 121–124. in Cardiology, volume 32, 2005, pp. 953–956. [166] K. Englehart, B. Hudgins, A. D. C. Chan, Continuous multifunc- [186] P. Sedighian, A. W. Subudhi, F. Scalzo, S. Asgari, Pediatric heart tion myoelectric control using pattern recognition, Technology and sound segmentation using Hidden Markov Model, in: Proceedings of disability 15 (2003) 95–103. the 36th Annual International Conference of the IEEE Engineering [167] X. Zhang, X. Chen, Y. Li, V. Lantz, K. Q. Wang, J. H. Yang, A in Medicine and Biology Society, 2014, pp. 5490–5493. framework for hand gesture recognition based on accelerometer and [187] C. S. Lima, D. Barbosa, Automatic segmentation of the second EMG sensors, IEEE Transactions on Systems Man and Cybernetics cardiac sound by using wavelets and hidden Markov models, in: Part a-Systems and Humans 41 (2011) 1064–1076. Proceedings of the 30th Annual International Conference of the IEEE [168] K. R. Wheeler, M. H. Chang, K. H. Knuth, Gesture-based control Engineering in Medicine and Biology Society, 2008, pp. 334–337. and EMG decomposition, IEEE Transactions on Systems Man and [188] E. Sejdić, G. A. Malandraki, J. L. Coyle, Computational deglutition: Cybernetics Part C-Applications and Reviews 36 (2006) 503–514. Using signal- and image-processing methods to understand swallow- [169] J. Monsifrot, E. Le Carpentier, Y. Aoustin, D. Farina, Sequential ing and associated disorders, IEEE Signal Processing Magazine 36 decoding of intramuscular EMG signals via estimation of a Markov (2019) 138–146. model, IEEE Transactions on Neural Systems and Rehabilitation [189] P. B. Shull, W. Jirattigalachote, M. A. Hunt, M. R. Cutkosky, S. L. Engineering 22 (2014) 1030–1040. Delp, Quantified self and human movement: a review on the clin- [170] K. S. Lee, EMG-based speech recognition using hidden markov mod- ical impact of wearable sensing and feedback for gait analysis and els with global control variables, IEEE Transactions on Biomedical intervention, Gait and Posture 40 (2014) 11–9. Engineering 55 (2008) 930–940. [190] C. Donohue, Y. Khalifa, S. Perera, E. Sejdić, J. L. Coyle, How [171] A. D. Chan, K. Englehart, B. Hudgins, D. F. Lovely, Hidden Markov Closely do Machine Ratings of Duration of UES Opening During model classification of myoelectric signals in speech, IEEE Engi- Videofluoroscopy Approximate Clinician Ratings Using Temporal neering in Medicine and Biology Magazine 21 (2002) 143–146. Kinematic Analyses and the MBSImP?, Dysphagia (2020). [172] A. D. Chan, K. Englehart, B. Hudgins, D. F. Lovely, Myo-electric [191] S. Mao, A. Sabry, Y. Khalifa, J. L. Coyle, E. Sejdić, Estimation signals to augment speech recognition, Medical & Biological Engi- of laryngeal closure duration during swallowing without invasive neering & Computing 39 (2001) 500–504. X-rays, Future Generation Computer Systems (2020). Khalifa et al., 2020 Page 20 of 21 [192] S. Mao, Z. Zhang, Y. Khalifa, C. Donohue, J. L. Coyle, E. Sejdić, [208] H. Cecotti, A. Graser, Convolutional neural networks for P300 detec- Neck sensor-supported hyoid bone movement tracking during swal- tion with application to brain-computer interfaces, IEEE Transactions lowing, Royal Society Open Science 6 (2019) 181982. on Pattern Analysis and Machine Intelligence 33 (2011) 433–445. [193] C. Nickel, C. Busch, S. Rangarajan, M. Möbius, Using hidden Markov [209] S. Kiranyaz, T. Ince, M. Gabbouj, Real-time patient-specific ECG models for accelerometer-based biometric gait recognition, in: Pro- classification by 1-D convolutional neural networks, IEEE Transac- ceedings of the IEEE 7th International Colloquium on Signal Pro- tions on Biomedical Engineering 63 (2016) 664–675. cessing and its Applications, 2011, pp. 58–63. [210] S. P. Shashikumar, A. J. Shah, G. D. Clifford, S. Nemati, Detection [194] A. Mannini, A. M. Sabatini, A hidden Markov model-based tech- of paroxysmal atrial fibrillation using attention-based bidirectional re- nique for gait segmentation using a foot-mounted gyroscope, in: current neural networks, in: Proceedings of the 24th ACM SIGKDD Proceedings of the 33rd Annual International Conference of the IEEE International Conference on Knowledge Discovery & Data Mining, Engineering in Medicine and Biology Society, 2011, pp. 4369–4373. KDD ’18, ACM, London, United Kingdom, 2018, pp. 715–723. [195] C. Nickel, C. Busch, Classifying Accelerometer Data via Hidden doi:10.1145/3219819.3219912. Markov Models to Authenticate People by the Way They Walk, IEEE [211] J. H. Tan, Y. Hagiwara, W. Pang, I. Lim, S. L. Oh, M. Adam, R. S. Aerospace and Electronic Systems Magazine 28 (2013) 29–35. Tan, M. Chen, U. R. Acharya, Application of stacked convolutional [196] G. Panahandeh, N. Mohammadiha, A. Leijon, P. Handel, Continuous and long short-term memory network for accurate identification of hidden Markov model for pedestrian activity classification and gait CAD ECG signals, Computers in Biology and Medicine 94 (2018) analysis, IEEE Transactions on Instrumentation and Measurement 19–26. 62 (2013) 1073–1083. [212] Z. Xiong, M. P. Nash, E. Cheng, V. V. Fedorov, M. K. Stiles, J. Zhao, [197] M. Inoue, S. Inoue, T. Nishida, Deep recurrent neural network for ECG signal classification for the detection of cardiac arrhythmias mobile human activity recognition with high throughput, Artificial using a convolutional recurrent neural network, Physiological Mea- Life and Robotics 23 (2018) 173–185. surement 39 (2018) 094006. [198] A. Lisowska, G. Wheeler, V. Ceballos Inza, I. Poole, An evaluation [213] Y. M. Saidutta, J. Zou, F. Fekri, Increasing the learning Capacity of of supervised, novelty-based and hybrid approaches to fall detec- BCI Systems via CNN-HMM models, in: Proceedings of the 2018 tion using silmee accelerometer data, in: Proceedings of the IEEE 40th Annual International Conference of the IEEE Engineering in International Conference on Computer Vision, 2015, pp. 402–408. Medicine and Biology Society (EMBC), IEEE, 2018, pp. 1–4. [199] T. Theodoridis, V. Solachidis, N. Vretos, P. Daras, Human fall de- [214] Z.-R. Wang, J. Du, W.-C. Wang, J.-F. Zhai, J.-S. Hu, A compre- tection from acceleration measurements using a recurrent neural hensive study of hybrid neural network hidden Markov model for network, in: N. Maglaveras, I. Chouvarda, P. de Carvalho (Eds.), Pre- offline handwritten Chinese text recognition, International Journal on cision Medicine Powered by pHealth and Connected Health, Springer Document Analysis and Recognition (IJDAR) 21 (2018) 241–251. Singapore, 2017, pp. 145–149. [215] Z.-R. Wang, J. Du, J.-M. Wang, Writer-aware CNN for parsimonious [200] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning HMM-based offline handwritten Chinese text recognition, Pattern applied to document recognition, Proceedings of the IEEE 86 (1998) Recognition 100 (2020) 107102. 2278–2324. [216] N. C. Dvornek, D. Yang, P. Ventola, J. S. Duncan, Learning gen- [201] T. Glasmachers, Limits of End-to-End Learning, arXiv preprint eralizable recurrent neural networks from small task-fMRI datasets, arXiv:1704.08305 (2017). in: A. F. Frangi, J. A. Schnabel, C. Davatzikos, C. Alberola-López, [202] G. Cheron, J.-P. Draye, M. Bourgeios, G. Libert, A dynamic neu- G. Fichtinger (Eds.), Proceedings of the 21st Conference on Medi- ral network identification of electromyography and arm trajectory cal Image Computing and Computer Assisted Intervention, Springer relationship during complex movements, IEEE Transactions on International Publishing, 2018, pp. 329–337. Biomedical Engineering 43 (1996) 552–558. [217] C. Yu, Y. Khalifa, E. Sejdić, Silent aspiration detection in high reso- [203] G. Cheron, F. Leurs, A. Bengoetxea, J. P. Draye, M. Destrée, B. Dan, lution cervical auscultations, in: Proceedings of the IEEE-EMBS In- A dynamic recurrent neural network for multiple muscles electromyo- ternational Conference on Biomedical and Health Informatics, 2019, graphic mapping to elevation angles of the lower limb in human pp. 1–4. locomotion, Journal of Neuroscience Methods 129 (2003) 95–104. [218] S. J. Pan, Q. A. Yang, A Survey on transfer learning, IEEE Transac- [204] S. Chauhan, L. Vig, Anomaly detection in ECG time signals via tions on Knowledge and Data Engineering 22 (2010) 1345–1359. deep long short-term memory networks, in: Proceedings of the IEEE [219] A. Isin, S. Ozdalili, Cardiac arrhythmia detection using deep learning, International Conference on Data Science and Advanced Analytics, Procedia Computer Science 120 (2017) 268 – 275. 2015, pp. 1–7. doi:10.1109/DSAA.2015.7344872. [220] C. Wei, Y. Lin, Y. Wang, T. Jung, N. Bigdely-Shamlo, C. Lin, Se- [205] V. G. Sujadevi, K. P. Soman, R. Vinayakumar, Real-time detection lective transfer learning for EEG-based drowsiness detection, in: of atrial fibrillation from short time single lead ECG traces using Proceedings of the IEEE International Conference on Systems, Man, recurrent neural networks, in: S. M. Thampi, S. Mitra, J. Mukhopad- and Cybernetics, 2015, pp. 3229–3232. hyay, K.-C. Li, A. P. James, S. Berretti (Eds.), Intelligent Systems [221] Y.-Q. Zhang, W.-L. Zheng, B.-L. Lu, Transfer components be- Technologies and Applications, Advances in Intelligent Systems and tween subjects for EEG-based driving fatigue detection, in: S. Arik, Computing, Springer International Publishing, Cham, 2017, pp. 212– T. Huang, W. K. Lai, Q. Liu (Eds.), Proceedings of the 29th Confer- 221. doi:10.1007/978-3-319-68385-0_18. ence on Neural Information Processing Systems, Springer Interna- [206] Y. LeCun, K. Kavukcuoglu, C. Farabet, Convolutional networks and tional Publishing, 2015, pp. 61–68. applications in vision, in: Proceedings of the 2010 IEEE International [222] U. Côté-Allard, C. L. Fall, A. Drouin, A. Campeau-Lecours, C. Gos- Symposium on Circuits and Systems, 2010, pp. 253–256. doi:10.1109/ selin, K. Glette, F. Laviolette, B. Gosselin, Deep learning for elec- ISCAS.2010.5537907. tromyographic hand gesture signal classification using transfer learn- [207] Y. LeCun, B. E. Boser, J. S. Denker, D. Henderson, R. E. Howard, ing, IEEE Transactions on Neural Systems and Rehabilitation Engi- W. E. Hubbard, L. D. Jackel, Handwritten Digit Recognition with a neering 27 (2019) 760–771. Back-Propagation Network, in: D. S. Touretzky (Ed.), Proceedings of the 3rd Conference on Neural Information Processing Systems, Morgan-Kaufmann, 1990, pp. 396–404. Khalifa et al., 2020 Page 21 of 21

Journal

Quantitative BiologyarXiv (Cornell University)

Published: Dec 11, 2020

References