Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

4-Class MI-EEG Signal Generation and Recognition with CVAE-GAN

4-Class MI-EEG Signal Generation and Recognition with CVAE-GAN Article 4-Class MI-EEG Signal Generation and Recognition with CVAE-GAN 1 1 1, 1 2 Jun Yang , Huijuan Yu , Tao Shen *, Yaolian Song and Zhuangfei Chen School of Information Science and Automation, Kunming University of Science and Technology, Kunming 650504, China; yang-jun@kust.edu.cn (J.Y.); yuhuijuan@stu.kust.edu.cn (H.Y.); songlyaolian@kust.edu.cn (Y.S.) Medical Faculty, Kunming University of Science and Technology, Kunming 650504, China; micro_ant@hotmail.com * Correspondence: shentao@kust.edu.cn Abstract: As the capability of an electroencephalogram’s (EEG) measurement of the real-time elec- trodynamics of the human brain is known to all, signal processing techniques, particularly deep learning, could either provide a novel solution for learning but also optimize robust representations from EEG signals. Considering the limited data collection and inadequate concentration of during subjects testing, it becomes essential to obtain sufficient training data and useful features with a potential end-user of a brain–computer interface (BCI) system. In this paper, we combined a condi- tional variational auto-encoder network (CVAE) with a generative adversarial network (GAN) for learning latent representations from EEG brain signals. By updating the fine-tuned parameter fed into the resulting generative model, we could synthetize the EEG signal under a specific category. We employed an encoder network to obtain the distributed samples of the EEG signal, and applied an adversarial learning mechanism to continuous optimization of the parameters of the generator, discriminator and classifier. The CVAE was adopted to adjust the synthetics more approximately to the real sample class. Finally, we demonstrated our approach take advantages of both statistic and feature matching to make the training process converge faster and more stable and address the Citation: Yang, J.; Yu, H.; Shen, T.; problem of small-scale datasets in deep learning applications for motor imagery tasks through data Song, Y.; Chen, Z. 4-Class MI-EEG augmentation. The augmented training datasets produced by our proposed CVAE-GAN method Signal Generation and Recognition significantly enhance the performance of MI-EEG recognition. with CVAE-GAN. Appl. Sci. 2021, 11, 1798. https://doi.org/10.3390/app Keywords: brain computer interface; conditional variational auto-encoders; generative adversarial network Received: 26 January 2021 Accepted: 9 February 2021 Published: 18 February 2021 1. Introduction Publisher’s Note: MDPI stays neu- Electroencephalogram (EEG) records the electric potential variations from pyramidal tral with regard to jurisdictional neurons in the cortical layers, therefore recognized as the reflection of the brain activity, claims in published maps and insti- and can be used to study mind processes [1–3]. Although EEG has shown to be a critical tutional affiliations. tool in many domains, it still suffers from a few limitations that hinder its effective anal- ysis or processing. Owing to the brain activity buried under multiple environmental sources, EEG maintains a quite low signal-to-noise ratio (SNR) [4,5]. Consequently, vari- ous filtering and noise reduction techniques including the deep learning (DL) [6] method Copyright: © 2021 by the authors. Li- have been used to minimize the impact of these noise sources and extract true brain ac- censee MDPI, Basel, Switzerland. tivity from the recorded signals. Meanwhile, the DL framework has shown outstanding This article is an open access article performance in the field of complex data processing such as text audio signals and images distributed under the terms and con- [7], playing an ever-increasing role in industrial applications. By virtue of the sufficient ditions of the Creative Commons At- training data, DL can study computational models and learn hierarchical representations tribution (CC BY) license (http://crea- of input data through successive non-linear transformations [8,9], indicating the size of tivecommons.org/licenses/by/4.0/). Appl. Sci. 2021, 11, 1798. https://doi.org/10.3390/app11041798 www.mdpi.com/journal/applsci ... ... ... ... ... ... Appl. Sci. 2021, 11, 1798 2 of 14 the available training data as the restriction of the performance about the identifying model in brain computer interface (BCI) [10,11]. The EEG-based BCI decoding process involves pre-processing, feature extracting, and pattern recognition (classification or regression) [12]. The goal of pre-processing and feature extracting is to extract the target band and channel information from raw EEG and represent it in a compact and relevant manner which is conductive to classification. Re- gression and classification map the extracted feature vector for probability value or n- categories classification results. Figure 1 shows the general paradigm of a BCI system, receiving brain signals and mapping them into control commands for robotic equipment. The system includes several key components, including decoding and control part. The brain signals are collected from humans and sent to the preprocessing component for de- noising and enhancement. Then, the discriminating features can be extracted from the processed signals and sent to the classifier, recognizing and converting the signals into external device commands. Artefact Filtering Removal EEG collection Pre-processing Feature extraction Classification Feedback results Pattern recognition Robotic control Figure 1. The technical pipeline involved in the BCI system. Generative Adversarial Networks (GAN) allows synthesis of data from latent repre- sentations of samples [13], while relying heavily on optimizing generative models, how- ever, suffering from training instability [14]. In this work, we employed the GAN combined with a conditional variational auto- encoder (CVAE) [15] to synthesize EEG signals. By varying the motor imagery category label fed into the resulting generative model, we could generate EEG in a specific category with random noise on a latent attribute vector. We adopted an encoder network to learn the temporal and spectral representations of real EEG samples by simultaneously training a GAN to enforce the learning of the representations. Due to this, our proposed approach had achieved two main contributions. First, this distribution learning of latent represen- tations could help us to generate high imitation of real EEG based on its task features used for data increase and augmentation. Second, benefited from the adversarial learning net- work, the proposed model also could perform well in motor imagery (MI) task classifica- tion through the temporal-spectrum image transformed form the raw EEG. We experi- mented with public and private MI-EEG data, demonstrating its robust capability of gen- erating realistic and diverse samples with motor imagery labels. Finally, we compared the diverse discussions based on methodological choices of the models with evaluation met- rics, thus revealing the optimal results obtained from this study. Appl. Sci. 2021, 11, 1798 3 of 14 Our approach had two novel aspects. First, we adopted CVAE as the generative model for feature sub-space and synthetic EEG reconstruction by virtue of class infor- mation of MI-EEG samples. Second, we constructed generative adversarial patterns ac- cording to the feature sub-space, taking advantages of both statistic and pairwise features to make the training process converge faster and more robust, finally paving the way to improve the efficiency of the EEG-based BCI system. 2. Related Work Conventional research of generative models, including principal component analysis (PCA) [16], the Gaussian mixture model (GMM) [17], and independent component anal- ysis (ICA) [18], assuming a simple formation of the data set. Later, restricted Boltzmann machines (RBMs) [19] and the Markov random field (MRF) [20] were hindered for the reason of lack of effective latent representations. Different from the present promising generative models, the GAN architecture con- sists of two opposing networks trying to outperform each other through a min-max game process. GAN is capable of modeling complicated and high-dimensional distributions, thus trains the learning model to generate data more similar to real samples. Nevertheless, the GAN model faces the converging problem in the training stage leading to the wide discrepancies between generated samples and the natural ones [21]. Meanwhile, the GAN cannot accurately represent the intrinsic characteristics of the normal samples because of the latent manifold providing little useful information [22]. To improve the quality of GAN-based detection, Mean and covariance feature matching GAN [23] was employed to provide mean and covariance feature matching, thus restricting the range of the param- eters for reducing discrimination. Loss-sensitive GAN [24] tries to learn a loss function quantifying the quality of generated samples and uses this loss to generate high-quality data. In this paper, we manage to combine GAN and VAE, learning latent both spatial and temporal representations of real EEG. Due to imperfect measures such as the squared error and the injected noise, the generated signal samples are often distorted, viewed as a disadvantage of VAE [25], which can be made up nicely through repeated adjustment by the GAN model. Our model aimed to decode the MI-EEG by simultaneously training a conditional VAE and GAN network, enforcing the learning of the EEG data representations. Besides, we utilized statistic feature fusion to make the training converge more stable and fluent. 3. Methods In this work, firstly, we employed a data preprocessing step based on short-term Fourier transform (STFT) to convert MI-EEG signals (C3, C4, and Cz electrodes) to a set of time-spectrum images. Then, a CVAE-GAN architecture was proposed for EEG signal generation and classification from a latent learning representation, obtained from an ad- versarial learning process. Considering that CVAE generated images are decent but fuzzy and the ones from CGAN are clear but contain a significant change, CVAE-GAN was used as an appropriate solution. For example, when we go to generate the motor imagery EEG, CVAE-GAN can calculate the class probability of the EEG data with its additional part. Therefore, it can make use of the latent class in training data to generate samples regular- ized by Gaussian distributions with learnable statistic values. The proposed CVAE-GAN architecture for EEG mainly consists of three parts: (1) The encoder network, mapping the EEG sample x to a latent representation z through a convolution neural network (CNN). (2) The generator network, generating fake EEG signals with respect to a latent vector. (3) The recognition network, being trained to distinguish between real and fake EEG data and measuring the class probability of the data from the real and generated input. Then, all the parts would be guided to adjust itself. Appl. Sci. 2021, 11, 1798 4 of 14 3.1. Datasets and Conversion Based on STFT We evaluated our approach on a private EEG dataset collected by ourselves and ad- ditionally a public dataset (Competition IV Data sets 2a). The data were separated into 70% training set and 30% test set repeatedly for cross-validation. The data detail is shown in Table 1. Table 1. Properties of raw materials. Private Public Datasets D1 D2 Subject 3 9 Sample rate (Hz) 250 250 left, right hand, tongue, and l left, right hand, tongue, and Imagery task both feet both feet MI periods for processing 2s 2s Throughout the experiments, subjects were scheduled in front of a computer screen and instructed to perform motor imagery tasks including executing the movement of the right or left hand. Each trial took 8 s with same inter-trial resting period’s length. The fact is that the energy in mu band (8–13 Hz) observed in the motor cortex of the brain de- creases, called event-related de-synchronization (ERD), meanwhile the energy increase caused in the beta band (17–30 Hz) is called event-related synchronization (ERS) after ex- ecuting the MI task. MI tasks were found to cause ERD and ERS, respectively, on the right and left sides of the motor cortex affecting EEG signals at C4 and C3 electrodes. Cz is mainly affected by both tongue and feet MI tasks. Consequently, the datasets we used in this work included recordings from three electrodes (C3, Cz, and C4) of left/right handle MI task and short time Fourier transform (STFT) was applied to converting the 2 s long sequential EEG trial to temporal-spectrum power image. A set of time-frequency power image was generated by a short time window sliding along the time axis of the sequential EEG signal. The STFT can be defined as: −jt ω STFT(, τω)=− s(t)h(t τ)e dt , (1) −∞ where h(t) is a window with a limited number of nonzero points and τ is the window position on the temporal axis. In the case of the 250 Hz signal, this corresponded to 500 samples. The window size of STFT was set to 64 and its number of points to overlap be- tween segments was set to 50. STFT was computed for 32 windows over all samples lead- ing to are 257 × 32 as the size of the spectrum images, where 257 presented the number of sample frequencies and 32 meant the number of segment times. Ultimately, mu and beta frequency bands of the spectrum images were extracted. The final input sizes of the nor- malized image for mu and beta band were empirically chosen as 32 × 32. 3.2. CVAE-GAN Construction CVAE models a generator as a pair of encoder and decoder networks. Compared to VAE, CVAE is able to generate data based on certain attributes. The encoder learns a latent representation z from the data x on condition of the y belonging to a specific category, while the decoder aims to predict the specific category according to the learned represen- tation z. The generated data will be distorted owing to the deficiency of input and output. Fortunately, the generated performance will be improved when it combines with the dis- criminator. On the other side, the generator of GAN is isolated from real EEG, thus the introduction of the encoder will make GAN more stable. The whole CVAE-GAN architec- ture is shown in Figure 2. Appl. Sci. 2021, 11, 1798 5 of 14 Labels Latent representation Real EEG Encoder Generator Generated EEG C3 C3 Cz Cz C4 0-100Hz cropping C4 6-13Hz & 17-30Hz STFT Invers STFT 0-2s Recognizer Discriminator Real or fake Softmax Classifier Predict class Softmax Figure 2. The proposed CVAE-GAN architecture for synthetic EEG detection. Considering the goal of generating process under the control of the specified cate- gory, label dates were extra input to the encoder and decoder. We feed the training sam- ples and its corresponding label to the encoder, then concatenate the hidden representa- tion with the corresponding label and feed it to the decoder to train the network. Ulti- mately, we could generate data with the specified label by feeding the decoder with the noise sampled from the Gaussian distribution and the assigned label after the training process. Therefore, the CVAE can be formulated as: (2) L=− {(xx )+ 0.5[ (x | y )+ μ (x | y )− 1− log (x | y )]}   Gi i i i i i i i We employed the CNN as the implementation architecture of the encoder and de- coder. Their mapping details are illustrated in Table 2. Table 2. Implementation details for the proposed CVAE-GAN architecture. Layers Input Output Processing Pre-processing T × E 32 × 32 × 3 STFT Convolution kenel 20@ 3 × 1 × 3 E1 32 × 32 × 3 15 × 32 × 1 Batch Norm and MaxPool2D (20 × 2) & Dropout (0.5) Convolution kenel 40@1 × 3 × 1 Encoder E2 15 × 32 × 1 15 × 1 × 40 Batch Norm and MaxPool2D (1 × 15) & Dropout (0.5) E3 15 × 1 × 40 600 Flatten E4 600 µ, σ mapping into two feature compression layers Latent Representation R µ, σ z Map representation z G1 z 600 Fully-Connected Layer G2 600 15 × 1 × 40 Reshape Generator G3 15 × 1 × 40 15 × 32 × 1 De-convolution and bilinear interpolation G4 15 × 32 × 1 32 × 32 × 3 De-convolution and bilinear interpolation Convolution kenel 20@ 3 × 1 × 3 F1 32 × 32 × 3 15 × 32 × 1 Batch Norm and MaxPool2D (20 × 2) & Recognizer Feature extractor Dropout (0.5) F2 15 × 32 × 1 15 × 1 × 40 Convolution kenel 40@ 1 × 3 × 1 Appl. Sci. 2021, 11, 1798 6 of 14 Batch Norm and MaxPool2D (1 × 15) & Dropout (0.5) D3 15 × 1 × 40 60 Flatten and Fully-Connected Layer Discriminator softmax 60 2 Recognition for real or fake signal C1 15 × 1 × 40 100 Flatten and Fully-Connected Layer Classifier softmax 100 4 Classification with softmax The input images were convolved with the kernels to be trained and were then put through to generate the map of output in the convolution layer [26]. In a given layer, the map of k-th features can be obtained as: kk Hf== () a f (Wx∗ )+b , (3) ij ij k (4) fa ( ) = ReLU(a)=ln(1+e ) , where 20 and 40 kennels were employed in the first and second layer, respectively. The max-pooling layer is a completely connected layer for different dimensions. The feature mapping after convolutional kernel was down-sized into a smaller sample in the pooling layer. With the proposed approach, the network is fed with the labeled training set, and the error E is computed taking into account the fact that the desired output is different from the output of the network. Subsequently, the stochastic gradient descent algorithm [27] was applied to minimize the error occurring with changes in the network and thus to optimize the network weights and filters in the convolutional layer. The output of the encoder formed vector z through a two layers CNN, which extracts the temporal, spectral, and spatial features from the 3D power image representation of MI-EEG. The encoding vector h was mapped into two feature compression layers in order to calculate the mean µ and variance σ of the neural networks. The latent representation z is calculated as: z=+ μ σε  (5) where ε means randomly sampling from standard normal distribution N(0,1) and  denotes an element-wise product. The covariance matrix of z is diagonal. Because we ex- pected that the components of latent vector z should be mutually independent so as to makes them informative. After every convolution operation, we adopted batch norm to improve the model training efficiency and nonlinearity. The 50% dropout rate was used to prevent from overfitting. The generator mainly consisted of deconvolution and bilinear interpolation up-sampling processes aiming to restore the signal time-spectrum image through the learned features. The generator was used to efficiently identify real and false signals under the specific category condition. The decoder composed of the discriminator and classifier was employed to discover abnormal power image signals and classify out the MI task results. 3.3. Optimization of Training Process The encoder E maps the input data x to a latent representation z through a learned distribution , where s represents the category of the data. The generative net- Pz(| x ,c) work G generates EEG data from a learned distribution . The function of the P(|xz,c) generator and discriminator is similar to that in the generative adversarial network (GAN). The generator tries to model the real data distribution according to the gradients given by the discriminator result learning to distinguish between real and fake samples. In the encoder of the CVAE framework, LKL is employed in the encoding network, indi- cating whether the distribution of latent variable is similar to the expected distribution, as shown: Appl. Sci. 2021, 11, 1798 7 of 14 L = {μμ +sum[exp(σ )-σ -k ]} , (6) KL where the loss function consists of the reconstruction loss of the decoder and the varia- tional posterior encoder loss, this framework was suggested to enforces representation for z with respect to s. However, as is universally known to be an unperfect achievement in generation practice, which we applied GAN to tackle this problem. The GAN consists of a generator G (the CVAE) and a discriminator D, where the generator loss and the discriminator loss can be illustrated respectively as: Lx ←− || x || , (7) Gr f 2 L ←− log(dd ) + log(1− ) . (8) D rg The effect taking from the discriminator can be denoted as: Lf←− || (x ) f (x ) || . (9) GD D r D f 2 In classifier part, the classification loss can be denoted respectively as: L←− (||yc ||+ ||y−c || ) . (10) C2rg 2 The goal of the CVAE-GAN is to minimize the loss function as follows: LL=+L +L +L +L . (11) KLG GD D C The whole training pipeline of the proposed algorithm is shown in Algorithm 1. Algorithm 1. Training Process of the Proposed CVAE-GAN Initialization: m is the batch size. n is the number of category, epoch is the number of iteration Initialize the different network (E, G, D) with random weights. 1: Sample m batch from real EEG data {} x  p rr 2: Map feature parameters using Encode network. μ,σ ← E(,xc ) rr 3: Resample feature parameters to obtain latent representation z 4: Generate conditional samples through the generator network. x ←Gz (,c ) g r 5: Feed the real and generated samples into the decoder network for authenticity identi- fication dD ← ()x ,dD ← ()x and MI-task classification y ←Cx() , rr g g r r y ←Cx() . g g 6: Optimize CVAE-GAN by loss L: LK ← L((qz|x ,c)||P) KLr z 1 1 2 2 Lx ←− || x ||Lf←− || (x ) f (x ) || L ←− log(dd ) + log(1− ) Gr f 2 GD D r D f 2 D rg 2 2 L←− (||yc ||+ ||y−c || ) C2rg 2 LL←+L +L +L +L KLG GD DC 7: Update network parameters: ω ←−∇ () L ω ←−∇() LL + E ω KL GG ω GD E G ω ←−∇ () L ω ←−∇ () L D ω D CC ω D C Output: End until L has converged and save all network parameters. Appl. Sci. 2021, 11, 1798 8 of 14 4. Experimental Results 4.1. Features Extracting and Convergence For a better elaboration about spatial feature extracting, the one-task average features power of the real and the generated EEG according to the 16 electrodes’ positions topo- graphic maps have been illustrated in Figure 3. We can find high spatial match between the real and generated EEG power and apparent variation characteristics on ERD. Figure 3. The average feature power on topographic maps. Figure 4 shows the training curves of the proposed model for one participant. The generator loss gradually decreases to reduce the distance among the distribution of gen- erated samples. After approximately 6 epochs, it seems that both networks’ losses are con- verging to some constant values and the whole training becomes stable. The other sub- jects’ training results on data show homologous trend. Figure 4. The topographic maps of the real and the generated EEG samples power. 4.2. Generation Evaluation Figure 5 plots the average of the real and generated (labeled as fake) samples of each individual subject (S1–S3) from D1. The blue and the green colors represent the real and the artificial fake data, respectively. It can be observed that there is a high match between them. Furthermore, we can find the diversity generated from an individual which sug- gests that the proposed model can learn a complicated temporal variation from real sam- ples rather than copy the same ones repeatedly. This result can pave a smooth way for exploring the possibilities of the practical application of generating EEG samples consid- ering individual diversity. Appl. Sci. 2021, 11, 1798 9 of 14 Real Fake S1 S2 S3 Time Steps (20ms/div) Figure 5. The average of the real and the generated samples from different subjects. We evaluate the proposed CVAE-GAN-based approach with the following frame- works: (1) CVAE, (2) convolutional encoder (CNN), (3) conditional GAN (CGAN) [28]. For each dataset, 70% of normal samples are randomly used to constitute the training set. The testing set is composed of all abnormal and the rest normal samples. Table 3 shows metric results for different architectures. Inception score (IS) is a common method to pro- vide information for the quality of the trained generator. The Frechet inception distance (FID) is also used to give a better evaluation to the quality of the generated samples through calculating the distance distribution between the real and generated samples. The sliced Wasserstein distance (SWD) [29] is employed to approximates the Wasserstein dis- tance by computing projections of the two distributions. These metrics give evidence for the quality of the generated model to some extent. The test accuracy of the classifier for calculating the IS and FID was 86.14%. Obviously, CVAE-GAN performs the best in IS and SWD metrics. CNN outperforms in FID but is relatively worse in other metrics. Over- all, CVAE-GAN and CGAN are the better architecture choices for classified EEG genera- tion. Table 3. Evaluation metrics for different generation models. Model IS FID SWD Real 1.478 0 0 CNN 1.284 10.234 0.087 CGAN 1.315 15.635 0.076 CVAE 1.296 32.743 0.082 CVAE-GAN 1.357 11.364 0.067 Potential Appl. Sci. 2021, 11, 1798 10 of 14 4.3. Classifier Performance We utilized a similar classifier to compare the recognition performance involving be- fore and after the data augmentation. The augmented part (generated samples) accounted for a quarter of the real training sets. Figure 6 shows the performance of testing classifica- tion with all subjects of D1 and D2. There was obviously growth of the classification ac- curacy after data augmentation both in D1 and D2 data. Moreover, we found the overfit- ting evidence in D2-S4 and reduced deviations of accuracy in D2, indicating improvement of the robustness on different subjects with appropriate data augmentation. Figure 6. The average of the real and the generated samples from different subjects. Figure 7 shows the classification results of different models obtained on D2. It can be observed that the proposed CVAE-GAN is superior to other methods and achieves the best accuracy on D1 and D2, manifesting its superiority of adversarial models with the latent generation code and the generalization ability to automatically extract latent fea- tures from the subject’s diversity. We can also find that the adversarial model CGAN can achieve better results but not in all subjects. This suggests that the latent representation encoding can be helpful for subject invariant representations, inspiring us to explore sub- ject-invariant features further. Figure 7. Experimental results of classification performance in the D1 and D2 dataset. Appl. Sci. 2021, 11, 1798 11 of 14 Figure 8 shows the confusion matrix and average accuracies of four classes all over the subjects of D2 compared with two methods including CGAN presenting conditional adversarial processing for EEG signal discrimination as well as CNN using the convolu- tion processing for recognition about an 2D time-frequency target MI-EEG transformed by STFT. The lower-right value corresponds to overall accuracy. Bottom row depict sen- sitivity and rightmost column lists the precision which is defined as the indicator of spe- cific classification characteristics. CVAE-GAN models have an obvious improvement in four MI-task classification, better than CNN being trained without appending generated data. The result also demonstrates the enhancement of MI-EEG-based recognition by ap- pending generated data with the proposed framework. Figure 8. Classification performance of confusion matrices for the three methods. Figure 9 illustrates the confusion matrix corresponding to recognition with real and synthetic data. The recognition in real EEG has achieved 4 percent, exceeding the synthetic source. We were also conscious of the appearance that of its relatively high sensitivity and precision in tongue and feet movement imagery task in synthetic EEG data, revealing that the proposed generative architecture obtains advantages in single channel feature learn- ing due to the fact that discriminating information of tongue and feet, is mainly contained in the Cz channel. Actual Actual Hand(L) Hand(R) Tongue Feet Precision Hand(L) Hand(R) Tongue Feet Precision Synthetic EEG Real EEG Figure 9. Classification performance of confusion matrices on real and synthetic samples. Predicted Sensitivity Feet Tongue Hand(R) Hand(L) Predicted Sensitivity Feet Tongue Hand(R) Hand(L) Appl. Sci. 2021, 11, 1798 12 of 14 4.4. Efficiency Analysis Generally, deep learning algorithms require substantial time to execute, bringing lim- itation of their suitability to BCI applications which typically require close to real-time performance. For instance, the practical deployment of a BCI system could be limited by its recognition time-delay if it takes two minutes to recognize the user’s intent. In this section, we will focus on the running time of our approach and compare it to the widely used baselines. As shown in Figure 10, CNN required the least training time as the result of its simple framework and weight. Furthermore, employing CVAE as the generative model could effectively reduce the consuming time during adversarial training. However, training is a one-off operation. For practical considerations, the execution time of an algo- rithm during testing is what matters most. The testing time of our approach is less than ten seconds, as similar as other baselines. Figure 10. Classification performance of confusion matrices on real and synthetic samples. 5. Discussion We had demonstrated that an unsupervised model using CVAE to learn the statisti- cal structure of MI-EEG input data could effectively augment and generate EEG data so as to address the problem of small-scale datasets in deep learning applications for MI tasks. The public dataset of BCI Competition IV dataset and private dataset collected in our lab were used to evaluate the method. The generative task is supposed to synthesize 2D format EEG for existing statistical distribution. First, we tested these three kinds of data generating methods. Then, we ana- lyzed and compared the generated and real samples in different data sources from sub- jects. However, with some generating differences and randomness still existing, the ex- periment results revealed the approximate variation tendency and distribution guiding the goal and direction of our future exploration of improved methods and synthesis. As the second work, the MI-EEG synthesized from our model had else revealed its usage in data augmentation or for training better facing recognition model. After this study, we would clearly recognize the importance of data augmentation and the challenge of useful feature capturing under an insufficient data situation in accordance with the ex- perimental results. Furthermore, applying the CVAE as the generated model is capable of improving the robustness on adversarial training, aiming to take advantages of both sta- tistic and feature matching to make the training process converge faster and more stable. Appl. Sci. 2021, 11, 1798 13 of 14 6. Conclusions and Future Work This paper studied the application of the CVAE-GAN in the classified motor imagery EEG generation with the various prospective usages in BCI systems, for instance, recon- struction of the corrupted data and non-homologous data augmentation. In order to achieve high quality generation under the 4-class conditional MI-task EEG, a combination of latent representation and the adversarial network is proposed to learn subject-invariant representations and to make the generated samples approaching the reals. Compared with other generated models on the public and private dataset, the proposed approach generated EEG samples from different subjects according to its MI-task, and experimental results show an effectiveness of the CVAE-GAN. In experimental evaluation, the training based on data augmentation with the proposed framework enhances the performance of MI-EEG recognition. In the future, this study should be further continued to evaluate how the generated EEG data affects the performance of the BCI system and transfer learning approach that explore and develop some shared structure in the classified EEG data. Author Contributions: Conceptualization, J.Y.; methodology, J.Y.; formal analysis, T.S.; writing— original draft preparation, J.Y., H.Y. and Y.S.; writing—review and editing, J.Y., H.Y. and Y.S.; su- pervision, Z.C. All authors have read and agreed to the published version of the manuscript. Funding: This research was funded by Yunnan Reserve Talents of Young and Middle-aged Aca- demic and Technical Leaders (Shen Tao, 2018), Yunnan Young Top Talents of Ten Thousand Plan (Shen Tao, Zhu Yan, Yunren Social Development No. 2018 73). Institutional Review Board Statement: The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board. Informed Consent Statement: All subjects involved in this study gave their informed consent. In- stitutional review board approval of our hospital was obtained for this study. Data Availability Statement: The data used to support the findings of this study are available from the corresponding author upon request. Acknowledgments: The research was totally supported and sponsored by the following project: Introduction of Talent Research Startup Fund Project of Kunming University of Science and Tech- nology under Program Approval Number KKSY201903028, National natural science foundation of China under Program Approval Number 31760281; Yunnan Reserve Talents of Young and Middle- aged Academic and Technical Leaders (Shen Tao, 2018), Yunnan Young Top Talents of Ten Thou- sands Plan (Shen Tao, Zhu Yan, Yunren Social Development No. 2018 73). Furthermore, we are grateful for the development of Keras API which made the implementation of deep learning algo- rithms easier. Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manu- script, or in the decision to publish the results. References 1. Biasiucci, A.; Franceschiello, B.; Murray, M.M. Electroencephalography. Curr. Biol. 2019, 29, R80–R85, doi:10.1016/j.cub.2018.11.052. 2. NiederMeyer, E. Niedermeyer’s Electroencephalography; Oxford University Press: Oxford, UK, 2017; Volume 15, pp. 1–15. 3. Hari, R.; Puce, A. Introduction; Oxford University Press: Oxford, UK, 2017; Volume 28, pp. 3–12. 4. Bigdely-Shamlo, N.; Mullen, T.; Kothe, C.; Su, K.M.; Robbins, K.A. The PREP pipeline: standardized preprocessing for large- scale EEG analysis. Front. Neuroinformatics 2015, 9, 16. 5. Jas, M.; Engemann, D.A.; Bekhti, Y.; Raimondo, F.; Gramfort, A. Autoreject: Automated artifact rejection for MEG and EEG data. NeuroImage 2017, 159, 417–429, doi:10.1016/j.neuroimage.2017.06.030. 6. Roy, Y.; Banville, H.; Albuquerque, I.; Gramfort, A.; Falk, T.H.; Faubert, J. Deep learning-based electroencephalography analy- sis: A systematic review. J. Neural. Eng. 2019, 16, 051001, doi:10.1088/1741-2552/ab260c. 7. LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning; Nature: London, UK, 2015; pp. 436–444. 8. Özdenizci, O.; Wang, Y.; Koike-Akino, T.; Erdoğmuş, D. Trasfer Learning in Brain-Computer Interfaces with Adversarial Vari- ational Autoencoders. In Proceedings of the 2019 9th International IEEE/EMBS Conference on Neural Engineering (NER), San Francisco, CA, USA, 20–23 March 2019; pp. 207–210. DOI: 10.1109/NER.2019.8716897. Appl. Sci. 2021, 11, 1798 14 of 14 9. Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Bengio, Y. Generative adversarial networks. arXiv 2014, arXiv:1406.2661. 10. Fahimi, F.; Zhang, Z.; Goh, W.B.; Ang, K.K.; Guan, C. Towards EEG Generation Using GANs for BCI Applications. In Proceed- ings of the IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), Chicago, IL, USA, USA, 19–22 May 2019; pp. 1–4. DOI:10.1109/BHI.2019.8834503 11. Luo, Y.; Lu, B.-L. EEG Data Augmentation for Emotion Recognition Using a Conditional Wasserstein GAN. In Proceedings of the 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, USA, 18–21 July 2018; Volume 2018, pp. 2535–2538. 12. Ming, Y.; Ding, W.; Pelusi, D.; Wu, D.; Wang, Y.-K.; Prasad, M.; Lin, C.-T. Subject adaptation network for EEG data analysis. Appl. Soft Comput. 2019, 84, 105689, doi:10.1016/j.asoc.2019.105689. 13. Li, J.; He, H. Information Generative Bayesian Adversarial Networks: A Representation Learning Model for Transmission Gear Parameters. IEEE/ASME Trans. Mechatron. 2019, 24, 1998–2007, doi:10.1109/tmech.2019.2935350. 14. Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved training of wasserstein gans. arXiv 2017, arXiv:1704.00028. 15. Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. 16. Turk, M.A.; Pentland, A.P. Face Recognition Using Eigenfaces. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Maui, HI, USA, USA, 3–6 June 1991; pp. 586–591. 17. Theis, L.; Hosseini, R.; Bethge, M. Mixtures of Conditional Gaussian Scale Mixtures Applied to Multiscale Image Representa- tions. PLoS ONE 2012, 7, e39857, doi:10.1371/journal.pone.0039857. 18. Hyv¨arinen, A.; Karhunen, J.; Oja, E. Independent component analysis. Statistician 2003, 46, doi:10.2307/4128225. 19. Salakhutdinov, R.; Hinton, G. Deep boltzmann machines. In AISTATS; Microtome Publishing: Clearwater Beach, FL, USA, 2009; Volume 1, p. 3. 20. Ranzato, M.A.; Mnih, V.; Hinton, G.E. Generating more realistic images using gated MRF's. Adv Neural Info. Proc. Syst. 2010, 23, 2002–2010. 21. Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training gans. arXiv 2016, arXiv:1606.03498. 22. Bao, J.; Chen, D.; Wen, F.; Li, H.; Hua, G. CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training. In Pro- ceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2764–2773. 23. Bian, J.; Hui, X.; Sun, S.; Zhao, X.; Tan, M. A Novel and Efficient CVAE-GAN-Based Approach with Informative Manifold for Semi-Supervised Anomaly Detection. IEEE Access 2019, 7, 88903–88916, doi:10.1109/access.2019.2920251. 24. Mroueh, Y.; Sercu, T.; Goel, V. Mcgan: Mean and Covariance Feature Matching Gan. In Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 August 2017; pp. 2527–2535. 25. Qi, G.-J. Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities. Int. J. Comput. Vis. 2019, 128, 1118–1140, doi:10.1007/s11263-019-01265-2. 26. Kavasidis, I.; Palazzo, S.; Spampinato, C.; Giordano, D.; Shah, M. Brain2image: Converting Brain Signals Into Images. In Pro- ceedings of the 25th ACM International Conference on Multimedia, Mountain View, CA, USA, 23–27 October 2017; pp. 1809– 27. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference Learn. Repre- sent. (ICLR), San Diego, CA, USA, 5–8 May 2015. 28. Zhu, M.; Fang, C.; Du, H.; Qi, M.; Wu, Z. Application research on improved CGAN in image raindrop removal. J. Eng. 2019, 2019, 8404–8408, doi:10.1049/joe.2019.1092. 29. Oudre, L.; Jakubowicz, J.; Bianchi, P.; Simon, C. Classification of Periodic Activities Using the Wasserstein Distance. IEEE Trans. Biomed. Eng. 2012, 59, 1610–1619, doi:10.1109/tbme.2012.2190930. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Applied Sciences Multidisciplinary Digital Publishing Institute

4-Class MI-EEG Signal Generation and Recognition with CVAE-GAN

Loading next page...
 
/lp/multidisciplinary-digital-publishing-institute/4-class-mi-eeg-signal-generation-and-recognition-with-cvae-gan-eemSlxy4jj

References (27)

Publisher
Multidisciplinary Digital Publishing Institute
Copyright
© 1996-2021 MDPI (Basel, Switzerland) unless otherwise stated Disclaimer The statements, opinions and data contained in the journals are solely those of the individual authors and contributors and not of the publisher and the editor(s). MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. Terms and Conditions Privacy Policy
ISSN
2076-3417
DOI
10.3390/app11041798
Publisher site
See Article on Publisher Site

Abstract

Article 4-Class MI-EEG Signal Generation and Recognition with CVAE-GAN 1 1 1, 1 2 Jun Yang , Huijuan Yu , Tao Shen *, Yaolian Song and Zhuangfei Chen School of Information Science and Automation, Kunming University of Science and Technology, Kunming 650504, China; yang-jun@kust.edu.cn (J.Y.); yuhuijuan@stu.kust.edu.cn (H.Y.); songlyaolian@kust.edu.cn (Y.S.) Medical Faculty, Kunming University of Science and Technology, Kunming 650504, China; micro_ant@hotmail.com * Correspondence: shentao@kust.edu.cn Abstract: As the capability of an electroencephalogram’s (EEG) measurement of the real-time elec- trodynamics of the human brain is known to all, signal processing techniques, particularly deep learning, could either provide a novel solution for learning but also optimize robust representations from EEG signals. Considering the limited data collection and inadequate concentration of during subjects testing, it becomes essential to obtain sufficient training data and useful features with a potential end-user of a brain–computer interface (BCI) system. In this paper, we combined a condi- tional variational auto-encoder network (CVAE) with a generative adversarial network (GAN) for learning latent representations from EEG brain signals. By updating the fine-tuned parameter fed into the resulting generative model, we could synthetize the EEG signal under a specific category. We employed an encoder network to obtain the distributed samples of the EEG signal, and applied an adversarial learning mechanism to continuous optimization of the parameters of the generator, discriminator and classifier. The CVAE was adopted to adjust the synthetics more approximately to the real sample class. Finally, we demonstrated our approach take advantages of both statistic and feature matching to make the training process converge faster and more stable and address the Citation: Yang, J.; Yu, H.; Shen, T.; problem of small-scale datasets in deep learning applications for motor imagery tasks through data Song, Y.; Chen, Z. 4-Class MI-EEG augmentation. The augmented training datasets produced by our proposed CVAE-GAN method Signal Generation and Recognition significantly enhance the performance of MI-EEG recognition. with CVAE-GAN. Appl. Sci. 2021, 11, 1798. https://doi.org/10.3390/app Keywords: brain computer interface; conditional variational auto-encoders; generative adversarial network Received: 26 January 2021 Accepted: 9 February 2021 Published: 18 February 2021 1. Introduction Publisher’s Note: MDPI stays neu- Electroencephalogram (EEG) records the electric potential variations from pyramidal tral with regard to jurisdictional neurons in the cortical layers, therefore recognized as the reflection of the brain activity, claims in published maps and insti- and can be used to study mind processes [1–3]. Although EEG has shown to be a critical tutional affiliations. tool in many domains, it still suffers from a few limitations that hinder its effective anal- ysis or processing. Owing to the brain activity buried under multiple environmental sources, EEG maintains a quite low signal-to-noise ratio (SNR) [4,5]. Consequently, vari- ous filtering and noise reduction techniques including the deep learning (DL) [6] method Copyright: © 2021 by the authors. Li- have been used to minimize the impact of these noise sources and extract true brain ac- censee MDPI, Basel, Switzerland. tivity from the recorded signals. Meanwhile, the DL framework has shown outstanding This article is an open access article performance in the field of complex data processing such as text audio signals and images distributed under the terms and con- [7], playing an ever-increasing role in industrial applications. By virtue of the sufficient ditions of the Creative Commons At- training data, DL can study computational models and learn hierarchical representations tribution (CC BY) license (http://crea- of input data through successive non-linear transformations [8,9], indicating the size of tivecommons.org/licenses/by/4.0/). Appl. Sci. 2021, 11, 1798. https://doi.org/10.3390/app11041798 www.mdpi.com/journal/applsci ... ... ... ... ... ... Appl. Sci. 2021, 11, 1798 2 of 14 the available training data as the restriction of the performance about the identifying model in brain computer interface (BCI) [10,11]. The EEG-based BCI decoding process involves pre-processing, feature extracting, and pattern recognition (classification or regression) [12]. The goal of pre-processing and feature extracting is to extract the target band and channel information from raw EEG and represent it in a compact and relevant manner which is conductive to classification. Re- gression and classification map the extracted feature vector for probability value or n- categories classification results. Figure 1 shows the general paradigm of a BCI system, receiving brain signals and mapping them into control commands for robotic equipment. The system includes several key components, including decoding and control part. The brain signals are collected from humans and sent to the preprocessing component for de- noising and enhancement. Then, the discriminating features can be extracted from the processed signals and sent to the classifier, recognizing and converting the signals into external device commands. Artefact Filtering Removal EEG collection Pre-processing Feature extraction Classification Feedback results Pattern recognition Robotic control Figure 1. The technical pipeline involved in the BCI system. Generative Adversarial Networks (GAN) allows synthesis of data from latent repre- sentations of samples [13], while relying heavily on optimizing generative models, how- ever, suffering from training instability [14]. In this work, we employed the GAN combined with a conditional variational auto- encoder (CVAE) [15] to synthesize EEG signals. By varying the motor imagery category label fed into the resulting generative model, we could generate EEG in a specific category with random noise on a latent attribute vector. We adopted an encoder network to learn the temporal and spectral representations of real EEG samples by simultaneously training a GAN to enforce the learning of the representations. Due to this, our proposed approach had achieved two main contributions. First, this distribution learning of latent represen- tations could help us to generate high imitation of real EEG based on its task features used for data increase and augmentation. Second, benefited from the adversarial learning net- work, the proposed model also could perform well in motor imagery (MI) task classifica- tion through the temporal-spectrum image transformed form the raw EEG. We experi- mented with public and private MI-EEG data, demonstrating its robust capability of gen- erating realistic and diverse samples with motor imagery labels. Finally, we compared the diverse discussions based on methodological choices of the models with evaluation met- rics, thus revealing the optimal results obtained from this study. Appl. Sci. 2021, 11, 1798 3 of 14 Our approach had two novel aspects. First, we adopted CVAE as the generative model for feature sub-space and synthetic EEG reconstruction by virtue of class infor- mation of MI-EEG samples. Second, we constructed generative adversarial patterns ac- cording to the feature sub-space, taking advantages of both statistic and pairwise features to make the training process converge faster and more robust, finally paving the way to improve the efficiency of the EEG-based BCI system. 2. Related Work Conventional research of generative models, including principal component analysis (PCA) [16], the Gaussian mixture model (GMM) [17], and independent component anal- ysis (ICA) [18], assuming a simple formation of the data set. Later, restricted Boltzmann machines (RBMs) [19] and the Markov random field (MRF) [20] were hindered for the reason of lack of effective latent representations. Different from the present promising generative models, the GAN architecture con- sists of two opposing networks trying to outperform each other through a min-max game process. GAN is capable of modeling complicated and high-dimensional distributions, thus trains the learning model to generate data more similar to real samples. Nevertheless, the GAN model faces the converging problem in the training stage leading to the wide discrepancies between generated samples and the natural ones [21]. Meanwhile, the GAN cannot accurately represent the intrinsic characteristics of the normal samples because of the latent manifold providing little useful information [22]. To improve the quality of GAN-based detection, Mean and covariance feature matching GAN [23] was employed to provide mean and covariance feature matching, thus restricting the range of the param- eters for reducing discrimination. Loss-sensitive GAN [24] tries to learn a loss function quantifying the quality of generated samples and uses this loss to generate high-quality data. In this paper, we manage to combine GAN and VAE, learning latent both spatial and temporal representations of real EEG. Due to imperfect measures such as the squared error and the injected noise, the generated signal samples are often distorted, viewed as a disadvantage of VAE [25], which can be made up nicely through repeated adjustment by the GAN model. Our model aimed to decode the MI-EEG by simultaneously training a conditional VAE and GAN network, enforcing the learning of the EEG data representations. Besides, we utilized statistic feature fusion to make the training converge more stable and fluent. 3. Methods In this work, firstly, we employed a data preprocessing step based on short-term Fourier transform (STFT) to convert MI-EEG signals (C3, C4, and Cz electrodes) to a set of time-spectrum images. Then, a CVAE-GAN architecture was proposed for EEG signal generation and classification from a latent learning representation, obtained from an ad- versarial learning process. Considering that CVAE generated images are decent but fuzzy and the ones from CGAN are clear but contain a significant change, CVAE-GAN was used as an appropriate solution. For example, when we go to generate the motor imagery EEG, CVAE-GAN can calculate the class probability of the EEG data with its additional part. Therefore, it can make use of the latent class in training data to generate samples regular- ized by Gaussian distributions with learnable statistic values. The proposed CVAE-GAN architecture for EEG mainly consists of three parts: (1) The encoder network, mapping the EEG sample x to a latent representation z through a convolution neural network (CNN). (2) The generator network, generating fake EEG signals with respect to a latent vector. (3) The recognition network, being trained to distinguish between real and fake EEG data and measuring the class probability of the data from the real and generated input. Then, all the parts would be guided to adjust itself. Appl. Sci. 2021, 11, 1798 4 of 14 3.1. Datasets and Conversion Based on STFT We evaluated our approach on a private EEG dataset collected by ourselves and ad- ditionally a public dataset (Competition IV Data sets 2a). The data were separated into 70% training set and 30% test set repeatedly for cross-validation. The data detail is shown in Table 1. Table 1. Properties of raw materials. Private Public Datasets D1 D2 Subject 3 9 Sample rate (Hz) 250 250 left, right hand, tongue, and l left, right hand, tongue, and Imagery task both feet both feet MI periods for processing 2s 2s Throughout the experiments, subjects were scheduled in front of a computer screen and instructed to perform motor imagery tasks including executing the movement of the right or left hand. Each trial took 8 s with same inter-trial resting period’s length. The fact is that the energy in mu band (8–13 Hz) observed in the motor cortex of the brain de- creases, called event-related de-synchronization (ERD), meanwhile the energy increase caused in the beta band (17–30 Hz) is called event-related synchronization (ERS) after ex- ecuting the MI task. MI tasks were found to cause ERD and ERS, respectively, on the right and left sides of the motor cortex affecting EEG signals at C4 and C3 electrodes. Cz is mainly affected by both tongue and feet MI tasks. Consequently, the datasets we used in this work included recordings from three electrodes (C3, Cz, and C4) of left/right handle MI task and short time Fourier transform (STFT) was applied to converting the 2 s long sequential EEG trial to temporal-spectrum power image. A set of time-frequency power image was generated by a short time window sliding along the time axis of the sequential EEG signal. The STFT can be defined as: −jt ω STFT(, τω)=− s(t)h(t τ)e dt , (1) −∞ where h(t) is a window with a limited number of nonzero points and τ is the window position on the temporal axis. In the case of the 250 Hz signal, this corresponded to 500 samples. The window size of STFT was set to 64 and its number of points to overlap be- tween segments was set to 50. STFT was computed for 32 windows over all samples lead- ing to are 257 × 32 as the size of the spectrum images, where 257 presented the number of sample frequencies and 32 meant the number of segment times. Ultimately, mu and beta frequency bands of the spectrum images were extracted. The final input sizes of the nor- malized image for mu and beta band were empirically chosen as 32 × 32. 3.2. CVAE-GAN Construction CVAE models a generator as a pair of encoder and decoder networks. Compared to VAE, CVAE is able to generate data based on certain attributes. The encoder learns a latent representation z from the data x on condition of the y belonging to a specific category, while the decoder aims to predict the specific category according to the learned represen- tation z. The generated data will be distorted owing to the deficiency of input and output. Fortunately, the generated performance will be improved when it combines with the dis- criminator. On the other side, the generator of GAN is isolated from real EEG, thus the introduction of the encoder will make GAN more stable. The whole CVAE-GAN architec- ture is shown in Figure 2. Appl. Sci. 2021, 11, 1798 5 of 14 Labels Latent representation Real EEG Encoder Generator Generated EEG C3 C3 Cz Cz C4 0-100Hz cropping C4 6-13Hz & 17-30Hz STFT Invers STFT 0-2s Recognizer Discriminator Real or fake Softmax Classifier Predict class Softmax Figure 2. The proposed CVAE-GAN architecture for synthetic EEG detection. Considering the goal of generating process under the control of the specified cate- gory, label dates were extra input to the encoder and decoder. We feed the training sam- ples and its corresponding label to the encoder, then concatenate the hidden representa- tion with the corresponding label and feed it to the decoder to train the network. Ulti- mately, we could generate data with the specified label by feeding the decoder with the noise sampled from the Gaussian distribution and the assigned label after the training process. Therefore, the CVAE can be formulated as: (2) L=− {(xx )+ 0.5[ (x | y )+ μ (x | y )− 1− log (x | y )]}   Gi i i i i i i i We employed the CNN as the implementation architecture of the encoder and de- coder. Their mapping details are illustrated in Table 2. Table 2. Implementation details for the proposed CVAE-GAN architecture. Layers Input Output Processing Pre-processing T × E 32 × 32 × 3 STFT Convolution kenel 20@ 3 × 1 × 3 E1 32 × 32 × 3 15 × 32 × 1 Batch Norm and MaxPool2D (20 × 2) & Dropout (0.5) Convolution kenel 40@1 × 3 × 1 Encoder E2 15 × 32 × 1 15 × 1 × 40 Batch Norm and MaxPool2D (1 × 15) & Dropout (0.5) E3 15 × 1 × 40 600 Flatten E4 600 µ, σ mapping into two feature compression layers Latent Representation R µ, σ z Map representation z G1 z 600 Fully-Connected Layer G2 600 15 × 1 × 40 Reshape Generator G3 15 × 1 × 40 15 × 32 × 1 De-convolution and bilinear interpolation G4 15 × 32 × 1 32 × 32 × 3 De-convolution and bilinear interpolation Convolution kenel 20@ 3 × 1 × 3 F1 32 × 32 × 3 15 × 32 × 1 Batch Norm and MaxPool2D (20 × 2) & Recognizer Feature extractor Dropout (0.5) F2 15 × 32 × 1 15 × 1 × 40 Convolution kenel 40@ 1 × 3 × 1 Appl. Sci. 2021, 11, 1798 6 of 14 Batch Norm and MaxPool2D (1 × 15) & Dropout (0.5) D3 15 × 1 × 40 60 Flatten and Fully-Connected Layer Discriminator softmax 60 2 Recognition for real or fake signal C1 15 × 1 × 40 100 Flatten and Fully-Connected Layer Classifier softmax 100 4 Classification with softmax The input images were convolved with the kernels to be trained and were then put through to generate the map of output in the convolution layer [26]. In a given layer, the map of k-th features can be obtained as: kk Hf== () a f (Wx∗ )+b , (3) ij ij k (4) fa ( ) = ReLU(a)=ln(1+e ) , where 20 and 40 kennels were employed in the first and second layer, respectively. The max-pooling layer is a completely connected layer for different dimensions. The feature mapping after convolutional kernel was down-sized into a smaller sample in the pooling layer. With the proposed approach, the network is fed with the labeled training set, and the error E is computed taking into account the fact that the desired output is different from the output of the network. Subsequently, the stochastic gradient descent algorithm [27] was applied to minimize the error occurring with changes in the network and thus to optimize the network weights and filters in the convolutional layer. The output of the encoder formed vector z through a two layers CNN, which extracts the temporal, spectral, and spatial features from the 3D power image representation of MI-EEG. The encoding vector h was mapped into two feature compression layers in order to calculate the mean µ and variance σ of the neural networks. The latent representation z is calculated as: z=+ μ σε  (5) where ε means randomly sampling from standard normal distribution N(0,1) and  denotes an element-wise product. The covariance matrix of z is diagonal. Because we ex- pected that the components of latent vector z should be mutually independent so as to makes them informative. After every convolution operation, we adopted batch norm to improve the model training efficiency and nonlinearity. The 50% dropout rate was used to prevent from overfitting. The generator mainly consisted of deconvolution and bilinear interpolation up-sampling processes aiming to restore the signal time-spectrum image through the learned features. The generator was used to efficiently identify real and false signals under the specific category condition. The decoder composed of the discriminator and classifier was employed to discover abnormal power image signals and classify out the MI task results. 3.3. Optimization of Training Process The encoder E maps the input data x to a latent representation z through a learned distribution , where s represents the category of the data. The generative net- Pz(| x ,c) work G generates EEG data from a learned distribution . The function of the P(|xz,c) generator and discriminator is similar to that in the generative adversarial network (GAN). The generator tries to model the real data distribution according to the gradients given by the discriminator result learning to distinguish between real and fake samples. In the encoder of the CVAE framework, LKL is employed in the encoding network, indi- cating whether the distribution of latent variable is similar to the expected distribution, as shown: Appl. Sci. 2021, 11, 1798 7 of 14 L = {μμ +sum[exp(σ )-σ -k ]} , (6) KL where the loss function consists of the reconstruction loss of the decoder and the varia- tional posterior encoder loss, this framework was suggested to enforces representation for z with respect to s. However, as is universally known to be an unperfect achievement in generation practice, which we applied GAN to tackle this problem. The GAN consists of a generator G (the CVAE) and a discriminator D, where the generator loss and the discriminator loss can be illustrated respectively as: Lx ←− || x || , (7) Gr f 2 L ←− log(dd ) + log(1− ) . (8) D rg The effect taking from the discriminator can be denoted as: Lf←− || (x ) f (x ) || . (9) GD D r D f 2 In classifier part, the classification loss can be denoted respectively as: L←− (||yc ||+ ||y−c || ) . (10) C2rg 2 The goal of the CVAE-GAN is to minimize the loss function as follows: LL=+L +L +L +L . (11) KLG GD D C The whole training pipeline of the proposed algorithm is shown in Algorithm 1. Algorithm 1. Training Process of the Proposed CVAE-GAN Initialization: m is the batch size. n is the number of category, epoch is the number of iteration Initialize the different network (E, G, D) with random weights. 1: Sample m batch from real EEG data {} x  p rr 2: Map feature parameters using Encode network. μ,σ ← E(,xc ) rr 3: Resample feature parameters to obtain latent representation z 4: Generate conditional samples through the generator network. x ←Gz (,c ) g r 5: Feed the real and generated samples into the decoder network for authenticity identi- fication dD ← ()x ,dD ← ()x and MI-task classification y ←Cx() , rr g g r r y ←Cx() . g g 6: Optimize CVAE-GAN by loss L: LK ← L((qz|x ,c)||P) KLr z 1 1 2 2 Lx ←− || x ||Lf←− || (x ) f (x ) || L ←− log(dd ) + log(1− ) Gr f 2 GD D r D f 2 D rg 2 2 L←− (||yc ||+ ||y−c || ) C2rg 2 LL←+L +L +L +L KLG GD DC 7: Update network parameters: ω ←−∇ () L ω ←−∇() LL + E ω KL GG ω GD E G ω ←−∇ () L ω ←−∇ () L D ω D CC ω D C Output: End until L has converged and save all network parameters. Appl. Sci. 2021, 11, 1798 8 of 14 4. Experimental Results 4.1. Features Extracting and Convergence For a better elaboration about spatial feature extracting, the one-task average features power of the real and the generated EEG according to the 16 electrodes’ positions topo- graphic maps have been illustrated in Figure 3. We can find high spatial match between the real and generated EEG power and apparent variation characteristics on ERD. Figure 3. The average feature power on topographic maps. Figure 4 shows the training curves of the proposed model for one participant. The generator loss gradually decreases to reduce the distance among the distribution of gen- erated samples. After approximately 6 epochs, it seems that both networks’ losses are con- verging to some constant values and the whole training becomes stable. The other sub- jects’ training results on data show homologous trend. Figure 4. The topographic maps of the real and the generated EEG samples power. 4.2. Generation Evaluation Figure 5 plots the average of the real and generated (labeled as fake) samples of each individual subject (S1–S3) from D1. The blue and the green colors represent the real and the artificial fake data, respectively. It can be observed that there is a high match between them. Furthermore, we can find the diversity generated from an individual which sug- gests that the proposed model can learn a complicated temporal variation from real sam- ples rather than copy the same ones repeatedly. This result can pave a smooth way for exploring the possibilities of the practical application of generating EEG samples consid- ering individual diversity. Appl. Sci. 2021, 11, 1798 9 of 14 Real Fake S1 S2 S3 Time Steps (20ms/div) Figure 5. The average of the real and the generated samples from different subjects. We evaluate the proposed CVAE-GAN-based approach with the following frame- works: (1) CVAE, (2) convolutional encoder (CNN), (3) conditional GAN (CGAN) [28]. For each dataset, 70% of normal samples are randomly used to constitute the training set. The testing set is composed of all abnormal and the rest normal samples. Table 3 shows metric results for different architectures. Inception score (IS) is a common method to pro- vide information for the quality of the trained generator. The Frechet inception distance (FID) is also used to give a better evaluation to the quality of the generated samples through calculating the distance distribution between the real and generated samples. The sliced Wasserstein distance (SWD) [29] is employed to approximates the Wasserstein dis- tance by computing projections of the two distributions. These metrics give evidence for the quality of the generated model to some extent. The test accuracy of the classifier for calculating the IS and FID was 86.14%. Obviously, CVAE-GAN performs the best in IS and SWD metrics. CNN outperforms in FID but is relatively worse in other metrics. Over- all, CVAE-GAN and CGAN are the better architecture choices for classified EEG genera- tion. Table 3. Evaluation metrics for different generation models. Model IS FID SWD Real 1.478 0 0 CNN 1.284 10.234 0.087 CGAN 1.315 15.635 0.076 CVAE 1.296 32.743 0.082 CVAE-GAN 1.357 11.364 0.067 Potential Appl. Sci. 2021, 11, 1798 10 of 14 4.3. Classifier Performance We utilized a similar classifier to compare the recognition performance involving be- fore and after the data augmentation. The augmented part (generated samples) accounted for a quarter of the real training sets. Figure 6 shows the performance of testing classifica- tion with all subjects of D1 and D2. There was obviously growth of the classification ac- curacy after data augmentation both in D1 and D2 data. Moreover, we found the overfit- ting evidence in D2-S4 and reduced deviations of accuracy in D2, indicating improvement of the robustness on different subjects with appropriate data augmentation. Figure 6. The average of the real and the generated samples from different subjects. Figure 7 shows the classification results of different models obtained on D2. It can be observed that the proposed CVAE-GAN is superior to other methods and achieves the best accuracy on D1 and D2, manifesting its superiority of adversarial models with the latent generation code and the generalization ability to automatically extract latent fea- tures from the subject’s diversity. We can also find that the adversarial model CGAN can achieve better results but not in all subjects. This suggests that the latent representation encoding can be helpful for subject invariant representations, inspiring us to explore sub- ject-invariant features further. Figure 7. Experimental results of classification performance in the D1 and D2 dataset. Appl. Sci. 2021, 11, 1798 11 of 14 Figure 8 shows the confusion matrix and average accuracies of four classes all over the subjects of D2 compared with two methods including CGAN presenting conditional adversarial processing for EEG signal discrimination as well as CNN using the convolu- tion processing for recognition about an 2D time-frequency target MI-EEG transformed by STFT. The lower-right value corresponds to overall accuracy. Bottom row depict sen- sitivity and rightmost column lists the precision which is defined as the indicator of spe- cific classification characteristics. CVAE-GAN models have an obvious improvement in four MI-task classification, better than CNN being trained without appending generated data. The result also demonstrates the enhancement of MI-EEG-based recognition by ap- pending generated data with the proposed framework. Figure 8. Classification performance of confusion matrices for the three methods. Figure 9 illustrates the confusion matrix corresponding to recognition with real and synthetic data. The recognition in real EEG has achieved 4 percent, exceeding the synthetic source. We were also conscious of the appearance that of its relatively high sensitivity and precision in tongue and feet movement imagery task in synthetic EEG data, revealing that the proposed generative architecture obtains advantages in single channel feature learn- ing due to the fact that discriminating information of tongue and feet, is mainly contained in the Cz channel. Actual Actual Hand(L) Hand(R) Tongue Feet Precision Hand(L) Hand(R) Tongue Feet Precision Synthetic EEG Real EEG Figure 9. Classification performance of confusion matrices on real and synthetic samples. Predicted Sensitivity Feet Tongue Hand(R) Hand(L) Predicted Sensitivity Feet Tongue Hand(R) Hand(L) Appl. Sci. 2021, 11, 1798 12 of 14 4.4. Efficiency Analysis Generally, deep learning algorithms require substantial time to execute, bringing lim- itation of their suitability to BCI applications which typically require close to real-time performance. For instance, the practical deployment of a BCI system could be limited by its recognition time-delay if it takes two minutes to recognize the user’s intent. In this section, we will focus on the running time of our approach and compare it to the widely used baselines. As shown in Figure 10, CNN required the least training time as the result of its simple framework and weight. Furthermore, employing CVAE as the generative model could effectively reduce the consuming time during adversarial training. However, training is a one-off operation. For practical considerations, the execution time of an algo- rithm during testing is what matters most. The testing time of our approach is less than ten seconds, as similar as other baselines. Figure 10. Classification performance of confusion matrices on real and synthetic samples. 5. Discussion We had demonstrated that an unsupervised model using CVAE to learn the statisti- cal structure of MI-EEG input data could effectively augment and generate EEG data so as to address the problem of small-scale datasets in deep learning applications for MI tasks. The public dataset of BCI Competition IV dataset and private dataset collected in our lab were used to evaluate the method. The generative task is supposed to synthesize 2D format EEG for existing statistical distribution. First, we tested these three kinds of data generating methods. Then, we ana- lyzed and compared the generated and real samples in different data sources from sub- jects. However, with some generating differences and randomness still existing, the ex- periment results revealed the approximate variation tendency and distribution guiding the goal and direction of our future exploration of improved methods and synthesis. As the second work, the MI-EEG synthesized from our model had else revealed its usage in data augmentation or for training better facing recognition model. After this study, we would clearly recognize the importance of data augmentation and the challenge of useful feature capturing under an insufficient data situation in accordance with the ex- perimental results. Furthermore, applying the CVAE as the generated model is capable of improving the robustness on adversarial training, aiming to take advantages of both sta- tistic and feature matching to make the training process converge faster and more stable. Appl. Sci. 2021, 11, 1798 13 of 14 6. Conclusions and Future Work This paper studied the application of the CVAE-GAN in the classified motor imagery EEG generation with the various prospective usages in BCI systems, for instance, recon- struction of the corrupted data and non-homologous data augmentation. In order to achieve high quality generation under the 4-class conditional MI-task EEG, a combination of latent representation and the adversarial network is proposed to learn subject-invariant representations and to make the generated samples approaching the reals. Compared with other generated models on the public and private dataset, the proposed approach generated EEG samples from different subjects according to its MI-task, and experimental results show an effectiveness of the CVAE-GAN. In experimental evaluation, the training based on data augmentation with the proposed framework enhances the performance of MI-EEG recognition. In the future, this study should be further continued to evaluate how the generated EEG data affects the performance of the BCI system and transfer learning approach that explore and develop some shared structure in the classified EEG data. Author Contributions: Conceptualization, J.Y.; methodology, J.Y.; formal analysis, T.S.; writing— original draft preparation, J.Y., H.Y. and Y.S.; writing—review and editing, J.Y., H.Y. and Y.S.; su- pervision, Z.C. All authors have read and agreed to the published version of the manuscript. Funding: This research was funded by Yunnan Reserve Talents of Young and Middle-aged Aca- demic and Technical Leaders (Shen Tao, 2018), Yunnan Young Top Talents of Ten Thousand Plan (Shen Tao, Zhu Yan, Yunren Social Development No. 2018 73). Institutional Review Board Statement: The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Institutional Review Board. Informed Consent Statement: All subjects involved in this study gave their informed consent. In- stitutional review board approval of our hospital was obtained for this study. Data Availability Statement: The data used to support the findings of this study are available from the corresponding author upon request. Acknowledgments: The research was totally supported and sponsored by the following project: Introduction of Talent Research Startup Fund Project of Kunming University of Science and Tech- nology under Program Approval Number KKSY201903028, National natural science foundation of China under Program Approval Number 31760281; Yunnan Reserve Talents of Young and Middle- aged Academic and Technical Leaders (Shen Tao, 2018), Yunnan Young Top Talents of Ten Thou- sands Plan (Shen Tao, Zhu Yan, Yunren Social Development No. 2018 73). Furthermore, we are grateful for the development of Keras API which made the implementation of deep learning algo- rithms easier. Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manu- script, or in the decision to publish the results. References 1. Biasiucci, A.; Franceschiello, B.; Murray, M.M. Electroencephalography. Curr. Biol. 2019, 29, R80–R85, doi:10.1016/j.cub.2018.11.052. 2. NiederMeyer, E. Niedermeyer’s Electroencephalography; Oxford University Press: Oxford, UK, 2017; Volume 15, pp. 1–15. 3. Hari, R.; Puce, A. Introduction; Oxford University Press: Oxford, UK, 2017; Volume 28, pp. 3–12. 4. Bigdely-Shamlo, N.; Mullen, T.; Kothe, C.; Su, K.M.; Robbins, K.A. The PREP pipeline: standardized preprocessing for large- scale EEG analysis. Front. Neuroinformatics 2015, 9, 16. 5. Jas, M.; Engemann, D.A.; Bekhti, Y.; Raimondo, F.; Gramfort, A. Autoreject: Automated artifact rejection for MEG and EEG data. NeuroImage 2017, 159, 417–429, doi:10.1016/j.neuroimage.2017.06.030. 6. Roy, Y.; Banville, H.; Albuquerque, I.; Gramfort, A.; Falk, T.H.; Faubert, J. Deep learning-based electroencephalography analy- sis: A systematic review. J. Neural. Eng. 2019, 16, 051001, doi:10.1088/1741-2552/ab260c. 7. LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning; Nature: London, UK, 2015; pp. 436–444. 8. Özdenizci, O.; Wang, Y.; Koike-Akino, T.; Erdoğmuş, D. Trasfer Learning in Brain-Computer Interfaces with Adversarial Vari- ational Autoencoders. In Proceedings of the 2019 9th International IEEE/EMBS Conference on Neural Engineering (NER), San Francisco, CA, USA, 20–23 March 2019; pp. 207–210. DOI: 10.1109/NER.2019.8716897. Appl. Sci. 2021, 11, 1798 14 of 14 9. Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Bengio, Y. Generative adversarial networks. arXiv 2014, arXiv:1406.2661. 10. Fahimi, F.; Zhang, Z.; Goh, W.B.; Ang, K.K.; Guan, C. Towards EEG Generation Using GANs for BCI Applications. In Proceed- ings of the IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), Chicago, IL, USA, USA, 19–22 May 2019; pp. 1–4. DOI:10.1109/BHI.2019.8834503 11. Luo, Y.; Lu, B.-L. EEG Data Augmentation for Emotion Recognition Using a Conditional Wasserstein GAN. In Proceedings of the 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Honolulu, HI, USA, 18–21 July 2018; Volume 2018, pp. 2535–2538. 12. Ming, Y.; Ding, W.; Pelusi, D.; Wu, D.; Wang, Y.-K.; Prasad, M.; Lin, C.-T. Subject adaptation network for EEG data analysis. Appl. Soft Comput. 2019, 84, 105689, doi:10.1016/j.asoc.2019.105689. 13. Li, J.; He, H. Information Generative Bayesian Adversarial Networks: A Representation Learning Model for Transmission Gear Parameters. IEEE/ASME Trans. Mechatron. 2019, 24, 1998–2007, doi:10.1109/tmech.2019.2935350. 14. Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved training of wasserstein gans. arXiv 2017, arXiv:1704.00028. 15. Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. 16. Turk, M.A.; Pentland, A.P. Face Recognition Using Eigenfaces. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Maui, HI, USA, USA, 3–6 June 1991; pp. 586–591. 17. Theis, L.; Hosseini, R.; Bethge, M. Mixtures of Conditional Gaussian Scale Mixtures Applied to Multiscale Image Representa- tions. PLoS ONE 2012, 7, e39857, doi:10.1371/journal.pone.0039857. 18. Hyv¨arinen, A.; Karhunen, J.; Oja, E. Independent component analysis. Statistician 2003, 46, doi:10.2307/4128225. 19. Salakhutdinov, R.; Hinton, G. Deep boltzmann machines. In AISTATS; Microtome Publishing: Clearwater Beach, FL, USA, 2009; Volume 1, p. 3. 20. Ranzato, M.A.; Mnih, V.; Hinton, G.E. Generating more realistic images using gated MRF's. Adv Neural Info. Proc. Syst. 2010, 23, 2002–2010. 21. Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training gans. arXiv 2016, arXiv:1606.03498. 22. Bao, J.; Chen, D.; Wen, F.; Li, H.; Hua, G. CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training. In Pro- ceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2764–2773. 23. Bian, J.; Hui, X.; Sun, S.; Zhao, X.; Tan, M. A Novel and Efficient CVAE-GAN-Based Approach with Informative Manifold for Semi-Supervised Anomaly Detection. IEEE Access 2019, 7, 88903–88916, doi:10.1109/access.2019.2920251. 24. Mroueh, Y.; Sercu, T.; Goel, V. Mcgan: Mean and Covariance Feature Matching Gan. In Proceedings of the 34th International Conference on Machine Learning (ICML), Sydney, Australia, 6–11 August 2017; pp. 2527–2535. 25. Qi, G.-J. Loss-Sensitive Generative Adversarial Networks on Lipschitz Densities. Int. J. Comput. Vis. 2019, 128, 1118–1140, doi:10.1007/s11263-019-01265-2. 26. Kavasidis, I.; Palazzo, S.; Spampinato, C.; Giordano, D.; Shah, M. Brain2image: Converting Brain Signals Into Images. In Pro- ceedings of the 25th ACM International Conference on Multimedia, Mountain View, CA, USA, 23–27 October 2017; pp. 1809– 27. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference Learn. Repre- sent. (ICLR), San Diego, CA, USA, 5–8 May 2015. 28. Zhu, M.; Fang, C.; Du, H.; Qi, M.; Wu, Z. Application research on improved CGAN in image raindrop removal. J. Eng. 2019, 2019, 8404–8408, doi:10.1049/joe.2019.1092. 29. Oudre, L.; Jakubowicz, J.; Bianchi, P.; Simon, C. Classification of Periodic Activities Using the Wasserstein Distance. IEEE Trans. Biomed. Eng. 2012, 59, 1610–1619, doi:10.1109/tbme.2012.2190930.

Journal

Applied SciencesMultidisciplinary Digital Publishing Institute

Published: Feb 18, 2021

There are no references for this article.