Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Inter-individual variability of EEG features during microsleep events

Inter-individual variability of EEG features during microsleep events Current Directions in Biomedical Engineering 2019;5(1):13-16 Martin Golz*, Adolf Schenka, Florian Haselbeck and Martin P. Pauli Inter-individual variability of EEG features during microsleep events https://doi.org/10.1515/cdbme-2019-0004 Abstract: This paper examines the question of how when microsleep events (MSE) of car drivers should be de- strongly the spectral properties of the EEG during micro- tected from the EEG. Machine learning methods have been sleep differ between individuals. For this purpose, 3859 mi- successfully used for this purpose [1,2]. Because these meth- crosleep examples were compared with 4044 counterexam- ods are based solely on the given data set and on assumptions ples in which drivers were very drowsy but were able to about the underlying data generating, stochastic processes, perform the driving task. Two types of signal features were great care must be taken when selecting data [3]. Therefore, compared: logarithmic power spectral densities and en- one research focus should be in validation. On the one hand, tropy measures of wavelets coefficient series. Discriminant training sample size should be as large as possible in order to analyses were performed with the following machine learn- enable confident and accurate learning, and on the other hand ing methods: support-vector machines, gradient boosting, enough test examples should be available, which must be learning vector quantization. To the best of our knowledge, drawn identically and statistically independent from the same this is the first time that results of the leave-one-subject-out unknown distribution as the training examples [4]. cross-validation (LOSO CV) for the detection of micro- sleep are presented. Error rates lower than 5.0 % resulted in To the best of our knowledge, this is the first time that re- 17 subjects and lower than 13 % in another 11 subjects. In sults of a leave-one-subject-out cross-validation (LOSO CV) 3 individuals, EEG features could not be explained by the [4] for MSE detection are presented. LOSO CV involves train- pool of EEG features of all other individuals; for them, de- ing on data of all subjects except one and validation on data of tection errors were 15.1 %, 17.1 %, and 27.0 %. In compar- the subject held out. This procedure is performed for each in- ison, cross validation by means of repeated random sub- dividual such that data of every individual was 𝑛 − 1 times sampling, in which individuality is not considered, yielded used for training and at one time used for testing, whereby 𝑛 mean error rates of 5.0 ± 0.5 %. is the number of individuals. By holding out data of an indi- A subsequent inspection of raw EEG data showed that in vidual, LOSO CV simulates the case that data of the individual two individuals a bad signal quality due to poor electrode are not currently available for machine learning training, but attachment could be the cause and in one individual a very in the future for testing the learned model. It asks how much unusual behavior, a high and long-lasting eyelid activity the results are influenced if data of an individual are available which interfered the recorded EEG in all channels. Keywords: Microsleep, EEG, periodogram, support-vector or not and therefore, it asks how general the results of machine machines, SVM, learning vector quantization, LVQ, gradient learning can be interpreted. boosting, leave one out, LOSO cross-validation. https://doi.org/10.1515/cdbme-2019-0004 2 Material LOSO CV analysis is based on data of two studies performed 1 Introduction in our driving simulation lab in the years 2007 [5] and 2016. The first was completed by 16 (5 ♀,  11 ♂, age: 24.4 ± 3.1) and The recognition of changes in brain state is a challenging task. the second by 15 young adults (7 ♀, 8 ♂, age: 24.4 ± 2.8). The This is especially the case for short-term requirements, e.g. procedure was almost the same in both studies: seven driving sessions with a duration of 40 minutes were started hourly be- tween 1 and 8 am. All subjects wore an actometer for three ______ days before the study, so that it could be tested, among other *Corresponding author: Martin Golz: University of Applied things, that the end of sleep was at least 8 am and that subjects Sciences Schmalkalden, Blechhammer 4, Schmalkalden, initiated sleep latest at 1 am in the two nights before the study. Germany, e-mail: m.golz@hs-sm.de The study design of both studies ensured that the following Adolf Schenka, Florian Haselbeck, Martin Patrick Pauli: Uni- versity of Applied Sciences, Schmalkalden, Germany Open Access. © 2019 Martin Golz, Adolf Schenka, Florian Haselbeck, Martin P. Paulia published by De Gruyter. This work is licensed under the Creative Commons Attribution NonCommercial NoDerivatives 4.0 License. M. Golz et al., Inter-individual variability of EEG features during microsleep events — 14 four factors were effective for achieving high drowsiness: (1) (A1 and A2 channels were not processed) was extracted. Com- Time since sleep was at least 16 hours, (2) Time on task was parisons of the machine learning performance for these DWT relatively long (280 min), (3) Time of day was near the circa- feature sets and for the LogPSD feature sets will be presented dian trough, and (4) High monotony occurred due to the driv- in section 4. ing task and the absence of communication with others. Empirical optimizations of LogPSD estimates led to the There were differences between the both studies, particu- following setting: spectral bands having a width of 1.0 Hz larly in the technical lab equipment, including the EEG de- across the interval from 0.2 to 40.2 Hz. This way, the number vices. The SigmaPL-Pro (Neurowerk GmbH, Gelenau, Ger- of LogPSD values was 40 per channel and 240 in total. Pro- many) was used in 2007 and the SomnoScreen (Somnomed- cessing a smaller number of channels led to increased errors. ics GmbH, Kist, Germany) in 2016. Electrodes were attached For supervised training of machine learning methods, a to eight positions (Fp1, Fp2, C3, C4, O1, O2, A1, A2; refer- sample containing pairs of feature vectors 𝒙 and target varia- ence electrode: Cz; common average reference). bles 𝑡 must be provided: Based on video recordings of the eye region, head and ( ) { } 𝑺 = { 𝒙 , 𝒕 |𝒙 ∈ ℝ , 𝒕 ∈ 𝟎 , 𝟏 , 𝒏 = 𝟏 , 𝟐 , … , 𝑵 } (1) shoulders as well as the driving scene, the starting times and 𝒏 𝒏 𝒏 𝒏 lengths of MSE were determined. MSE have always been de- with 𝑁 = 7903; 𝑑 = 240 for LogPSD, 𝑑 = 216 for DWT. termined based on observable behavioral characteristics, in As binary target variable the class label 𝑡 = 1 was used for particular prolonged eyelid closure and slow eye movement. MSE examples and 𝑡 = 0 for counterexamples. This visual assessment was performed by a trained person with The following three different methods of machine learning long experience in this field [3]. A total of 7903 events con- have been applied in order to compare their performance: sisting of 3859 MSE and of 4044 counterexamples was drawn  Learning vector quantization (LVQ) from recordings.  Gradient boosting (GB)  Support-vector machines (SVM) 3 Methods LVQ is a neural net consisting of one layer of neurons per- forming competitive learning. I.e., the neuron whose weight The machine learning methods achieved lowest error rates if vector 𝒘 is closest to the current input vector 𝒙 in terms of 𝑘 𝑛 the EEG was segmented such that the beginning and end of the a pre-selected vector norm wins the competition among all EEG segments were 1 s before and 3 s after the start time of neurons and is adapted by the following learning rule: MSE occurrence. This setting was fixed uniformly for all channels and all subjects. Details of these empirical optimiza- 𝚫 𝒘 = ±𝜼 (𝒙 − 𝒘 ) (2) 𝒄 𝒏 𝒄 tions are addressed in section 4. Logarithmic power spectral with step size 𝜂 ∈ (0,1), 𝒘 = 𝑛𝑎𝑟𝑚𝑖𝑔 ‖𝒙 − 𝒘 ‖, 𝒙 ∈ 𝑆 , densities (LogPSD) were estimated using the modified peri- 𝒄 𝑛 𝑘 𝑛 𝒘 ∈ {(𝒘 , 𝜏 )|𝒘 ∈ ℝ , 𝜏 ∈ {0,1}, 𝑘 = 1, … , 𝐾 , 𝐾 ≪ 𝑁 }. odogram with Hann tapers. Alternatively, the following direct 𝑘 𝑘 𝑘 𝑘 𝑘 The gradient boosting (GB) algorithm creates an ensemble and indirect PSD estimation methods were tested: Welch, of decision trees. At each iteration, for each class a new tree is multi-taper, Yule-Walker, Burg. For them, however, the final created such that they correct the errors of the former tree. This error rates of machine learning were slightly higher. Addition- is fulfilled when the squared error loss function has a negative ally, feature extraction was extended to the discrete wavelet gradient. Trees added to the ensemble are no more modified. transform (DWT) and to wavelet packet transform. Signal de- At the end, for a given data example all trees are retrieved and composition was performed up to level 8. The total power and a majority decision is made: The class that was calculated by the following entropies were estimated for each detail and for more trees is selected. Since GB tends to overfit relatively the approximation coefficient series: Shannon, threshold, quickly, the decision trees should be limited according to the SURE, 𝑙 norm, and mean logarithmic instantaneous power number of branches per level and the depth of the tree. Here [6]. The DWT using Daubechies mother wavelet of order 2 LightGBM was used, a numerically efficient GB variant de- (db2) at decomposition level 5 and with the total power as well veloped by Microsoft Inc. [7], which uses existing graphics as all 5 types of entropies provided feature sets that proved processing unit in addition to the central processing unit. How- most successful for machine learning. Thus, from each EEG ever, our data sets were too small to benefit from using the segment a 216-dimensional feature vector consisting of 6 graphics processing unit. LightGBM tends to have deeper ra- measures (total power and 5 entropies) of 6 wavelet coefficient ther than wider decision trees [7]. series (1 approximation, 5 details) in each of 6 EEG channels M. Golz et al., Inter-individual variability of EEG features during microsleep events — 15 Table 1: Mean and standard deviation of MSE detection errors es- SVM aims at finding a mapping 𝒙 ↦ 𝑡 for any vector 𝒙 ∈ 𝑑 timated by repeated random subsampling using three different ( ) ℝ based on all samples 𝒙 , 𝑡 ∈ 𝑆 . The largest possible 𝑛 𝑛 learning methods. The number of EEG segments 𝑁 and the number margin of a linear separation function 𝒘 𝒙 + 𝑏 = 0 between of LogPSD features 𝑁 were equal for all methods. the two class domains is sought. I.e. the parameters 𝒘 ∈ ℝ , 𝑏 ∈ ℝ must be optimized provided that the distance be- Method 𝑵 𝑵 𝑬 [%] 𝑬 [%] 𝑭 𝑰𝑵𝑻𝑹𝑨 𝑻𝑺𝑻𝑬 tween the separation function and the nearest feature vectors, the support vectors, is maximum. It has been proved that this LVQ 7,903 240 problem has a unique solution. The corresponding Lagrangian: GB 7,903 240 1 2 𝑁 T ( ) ‖ ‖ ∑ ( ( ) ) 𝐿 𝒘 , 𝑏 , 𝜶 = 𝒘 − 𝛼 𝑡 𝒘 𝒙 + 𝑏 − 1 (3) 𝑛 𝑛 𝑛 𝑛 =1 SVM 7,903 240 must be in a saddle point because 𝐿 (𝒘 , 𝑏 , 𝜶 ) must be mini- mized with respect to 𝒘 , 𝑏 and maximized with respect to𝜶 . The solution can be formulated explicitly: 4 Results 𝒘 = 𝛼 𝑡 𝒙 , 𝑏 = 𝑡 − 𝒘 𝒙 (4) 𝑆 𝑛 =1 𝑛 𝑛 𝑛 𝑆 𝑛 𝑆 𝑛 First, results of the repeated random subsampling are shown. { | } Only the set of support-vectors 𝒙 𝛼 > 0 contribute to equ. 𝑛 𝑛 It turned out that the length of EEG segments can be varied (4). If the training set is not separable, an error term 𝐶 = between 4 and 10 s without any degradation in classification 𝑠𝑙 𝜉 with 𝑁 different slack variables 𝜉 ≥ 0 can be intro- performance. In order to have maximum time resolution the 𝑖 𝑖 𝑖 =1 duced to forgive classification errors (soft margin principle). segmentation length was set to 4 s. The second segmentation This leads to a restriction of the multipliers to the interval 0 ≤ parameter, the time offset between the segment center and the 𝛼 ≤ 𝐶 , ∀𝑛 = 1, … , 𝑁 . The regularization parameter C must beginning of the MSE, showed a very sensitive influence on be optimized empirically by minimizing mean training errors. classification performance (Fig. 1). Only in a small interval the Additionally, the solution should be searched within the high- algorithms can learn accurately; it is optimal to set the center dimensional reproducing kernel Hilbert space by applying an of the segment 1 s after MSE starts. Also, LogPSD features ( ) admissible kernel function 𝑘 𝒙 , 𝒙 in order to take advantage 𝑛 resulted in lower error rates than DWT features. of the blessings of high dimensionality and to get non-linear For these optimal settings, the comparison of LVQ, GB, separation functions in the input space. This way, the separa- and SVM learning algorithms resulted in small differences in ( ) tion function changes from 𝒘 𝒙 + 𝑏 = 0 to 𝑘 𝒙 , 𝒙 + 𝑏 = 0 𝑛 mean errors and their standard deviations (Tab. 1). The com- ( ) ⁄ ‖ ‖ and the equation (3) changes to 𝐿 𝒘 , 𝑏 , 𝜶 = 1 2 𝒘 − putational load of SVM was at least 10 times higher than that ∑ ( ( ( ) ) ) 𝛼 𝑡 𝑘 𝒙 , 𝒙 + 𝑏 − 1 . Gaussian kernel functions of GB and this was about the same factor higher than that of 𝑛 =1 𝑛 𝑛 𝑛 ( ) ( ‖ ‖ ) 𝑘 𝒙 , 𝒙 = exp −𝛾 𝒙 − 𝒙 were used, which have only 𝑛 𝑛 LVQ. It must be emphasized, however, that these are results one free parameter 𝛾 to be optimized empirically. from repeated random subsampling of independent training and test sets that include data from all subjects in both sets. Thus, the methods already have information about each sub- ject during the training. This is not the case for LOSO CV, in which data of one subject is held out of training and used to estimate the mean test error. For 31 subjects, the average size of the training set is 30/31 𝑁 = 7648 and 1⁄31 𝑁 = 255 for the test set. Since the data of each test person are kept out and the data of all others are used for training, there are 31 different mean test errors (Fig. 2). The results show that there are large differences in learning performance. Data from most subjects were classified accurately by classifiers who learned from data from all other subjects. However, if only the results of the best classifier (SVM) are considered, it turns out that five subjects Figure 1: Mean and standard deviation of classification errors on are classified with errors higher than 10 %. That is, during the test set versus the time offset between segment center training the methods did not obtain enough information from and MSE start time. Results were estimated using the LVQ algorithm on LogPSD and on DWT feature sets. data of the other 30 subjects to correctly classify data of the subject held out. Comparison of learning methods shows that 𝑠𝑙 M. Golz et al., Inter-individual variability of EEG features during microsleep events — 16 in LOSO CV the SVM can almost always classify better than GB and LVQ, often much better. 5 Conclusions The presented investigation shows that classification of short- term EEG segments to behavioral characteristics is possible at low error rates, if the characteristics relate to a change in brain state, i.e. the microsleep. It has been demonstrated that only in a short time interval around the onset of MSE an accurate learnability is given. For segments that are a few seconds be- fore or after MSE onset, such low errors cannot be achieved, because obviously no specific brain state can be defined here, Figure 2: Results of LOSO cross-validation of all subjects. Mean classification errors for three different learning algorithms. but the normal multi-process mode of the awake brain, which is limited, however, by high drowsiness. Direct estimation of LogPSD by the modified periodogram CV analyses in order to ultimately obtain indications for a con- led to higher accuracies than other estimation methods, which clusive explanation. provide lower variance. This suggests that the trade-off be- tween bias and variance must be chosen differently for ma- Author Statement chine learning methods. They obviously benefit from PSD es- Authors state no funding involved, as well as no conflict of timation methods with lower bias at higher variance. It re- interest. Informed consent has been obtained from all individ- mains a future challenge to find a suitable feature extraction uals included in both studies. Ethical approval: The research from DWT coefficient series. Power and entropy measures related to human use complies with all relevant national regu- were of little use in achieving low classification errors in the lations, institutional policies and was performed in accordance following step. with the tenets of the Helsinki Declaration, and has been ap- LOSO CV simulates the case that in the future, in a dataset proved by the authors' institutional review board. where learning methods had already been learned in the past, another set of examples of a new subject will be added. It was found that data of a few subjects could not be well explained References by data of the other 30 subjects. In order to explain this result, [1] Golz M, Sommer D. Automatic Recognition of Microsleep the raw recordings and the distribution of the extracted fea- Events. J Biomedizinische Technik 2004;49,Suppl2:332-3. tures were inspected. No easily identifiable cause was found. [2] Peiris M, Jones R, Davidson P, Bones P. Detecting behavior- th One subject (#30) had unusually strong and long-lasting blink ral microsleeps from EEG power spectra. Proc 28 EMBS activity in periods of high drowsiness. This activity superim- Conf 2006:5723-26. [3] Golz M, Schenka A, Sommer D, et al. The role of expert eva- posed all channels and may have led to a considerable bias in luation for microsleep detection. Current Directions in Bio- estimated spectral features. Others were found to have poor medical Engineering 2015;1(1):92–5. signal quality, probably due to poor electrode contact resist- [4] Xu G, Huang JZ. Asymptotic optimality and efficient compu- ance in combination with low amplitude EEG. tation of the leave-subject-out cross-validation. The Annals of Statistics 2012;40(6):3003-30. It might also be that the individually diverse EEG charac- [5] Golz M, Sommer D, Trutschel U, et al. Evaluation of fatigue teristics are a further explanation. This has already been re- monitoring technologies. J Somnology 2010;14(3):187-99. ported from studies on features extracted from 30 s EEG seg- [6] Misiti M, Misiti Y, Oppenheim G, Poggi JM. Wavelet toolbox. ments during sleep. In studies with monozygotic and dizygotic The MathWorks Inc., Natick (MA), USA; 2017. [7] Ke G, Meng Q, Finley T, Wang T, Chen W, et al. LightGBM: twins, evidence was found of high individuality and high her- A highly efficient gradient boosting decision tree. In Adv Neu- itability of EEG features. One group of authors concluded that ral Inform Proc Syst 2017:3146-54. the EEG could possibly be the most inheritable trait of humans [8] De Gennaro L, Marzano C, Fratello F, et al. The EEG finger- [8]. We are working on carefully processing data from further print of sleep is genetically determined: A twin study. Annals studies in our laboratory and incorporating them into LOSO of Neurology 2008,64:455-60. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Current Directions in Biomedical Engineering de Gruyter

Inter-individual variability of EEG features during microsleep events

Loading next page...
 
/lp/de-gruyter/inter-individual-variability-of-eeg-features-during-microsleep-events-DljJcXKMmj
Publisher
de Gruyter
Copyright
© 2019 by Walter de Gruyter Berlin/Boston
eISSN
2364-5504
DOI
10.1515/cdbme-2019-0004
Publisher site
See Article on Publisher Site

Abstract

Current Directions in Biomedical Engineering 2019;5(1):13-16 Martin Golz*, Adolf Schenka, Florian Haselbeck and Martin P. Pauli Inter-individual variability of EEG features during microsleep events https://doi.org/10.1515/cdbme-2019-0004 Abstract: This paper examines the question of how when microsleep events (MSE) of car drivers should be de- strongly the spectral properties of the EEG during micro- tected from the EEG. Machine learning methods have been sleep differ between individuals. For this purpose, 3859 mi- successfully used for this purpose [1,2]. Because these meth- crosleep examples were compared with 4044 counterexam- ods are based solely on the given data set and on assumptions ples in which drivers were very drowsy but were able to about the underlying data generating, stochastic processes, perform the driving task. Two types of signal features were great care must be taken when selecting data [3]. Therefore, compared: logarithmic power spectral densities and en- one research focus should be in validation. On the one hand, tropy measures of wavelets coefficient series. Discriminant training sample size should be as large as possible in order to analyses were performed with the following machine learn- enable confident and accurate learning, and on the other hand ing methods: support-vector machines, gradient boosting, enough test examples should be available, which must be learning vector quantization. To the best of our knowledge, drawn identically and statistically independent from the same this is the first time that results of the leave-one-subject-out unknown distribution as the training examples [4]. cross-validation (LOSO CV) for the detection of micro- sleep are presented. Error rates lower than 5.0 % resulted in To the best of our knowledge, this is the first time that re- 17 subjects and lower than 13 % in another 11 subjects. In sults of a leave-one-subject-out cross-validation (LOSO CV) 3 individuals, EEG features could not be explained by the [4] for MSE detection are presented. LOSO CV involves train- pool of EEG features of all other individuals; for them, de- ing on data of all subjects except one and validation on data of tection errors were 15.1 %, 17.1 %, and 27.0 %. In compar- the subject held out. This procedure is performed for each in- ison, cross validation by means of repeated random sub- dividual such that data of every individual was 𝑛 − 1 times sampling, in which individuality is not considered, yielded used for training and at one time used for testing, whereby 𝑛 mean error rates of 5.0 ± 0.5 %. is the number of individuals. By holding out data of an indi- A subsequent inspection of raw EEG data showed that in vidual, LOSO CV simulates the case that data of the individual two individuals a bad signal quality due to poor electrode are not currently available for machine learning training, but attachment could be the cause and in one individual a very in the future for testing the learned model. It asks how much unusual behavior, a high and long-lasting eyelid activity the results are influenced if data of an individual are available which interfered the recorded EEG in all channels. Keywords: Microsleep, EEG, periodogram, support-vector or not and therefore, it asks how general the results of machine machines, SVM, learning vector quantization, LVQ, gradient learning can be interpreted. boosting, leave one out, LOSO cross-validation. https://doi.org/10.1515/cdbme-2019-0004 2 Material LOSO CV analysis is based on data of two studies performed 1 Introduction in our driving simulation lab in the years 2007 [5] and 2016. The first was completed by 16 (5 ♀,  11 ♂, age: 24.4 ± 3.1) and The recognition of changes in brain state is a challenging task. the second by 15 young adults (7 ♀, 8 ♂, age: 24.4 ± 2.8). The This is especially the case for short-term requirements, e.g. procedure was almost the same in both studies: seven driving sessions with a duration of 40 minutes were started hourly be- tween 1 and 8 am. All subjects wore an actometer for three ______ days before the study, so that it could be tested, among other *Corresponding author: Martin Golz: University of Applied things, that the end of sleep was at least 8 am and that subjects Sciences Schmalkalden, Blechhammer 4, Schmalkalden, initiated sleep latest at 1 am in the two nights before the study. Germany, e-mail: m.golz@hs-sm.de The study design of both studies ensured that the following Adolf Schenka, Florian Haselbeck, Martin Patrick Pauli: Uni- versity of Applied Sciences, Schmalkalden, Germany Open Access. © 2019 Martin Golz, Adolf Schenka, Florian Haselbeck, Martin P. Paulia published by De Gruyter. This work is licensed under the Creative Commons Attribution NonCommercial NoDerivatives 4.0 License. M. Golz et al., Inter-individual variability of EEG features during microsleep events — 14 four factors were effective for achieving high drowsiness: (1) (A1 and A2 channels were not processed) was extracted. Com- Time since sleep was at least 16 hours, (2) Time on task was parisons of the machine learning performance for these DWT relatively long (280 min), (3) Time of day was near the circa- feature sets and for the LogPSD feature sets will be presented dian trough, and (4) High monotony occurred due to the driv- in section 4. ing task and the absence of communication with others. Empirical optimizations of LogPSD estimates led to the There were differences between the both studies, particu- following setting: spectral bands having a width of 1.0 Hz larly in the technical lab equipment, including the EEG de- across the interval from 0.2 to 40.2 Hz. This way, the number vices. The SigmaPL-Pro (Neurowerk GmbH, Gelenau, Ger- of LogPSD values was 40 per channel and 240 in total. Pro- many) was used in 2007 and the SomnoScreen (Somnomed- cessing a smaller number of channels led to increased errors. ics GmbH, Kist, Germany) in 2016. Electrodes were attached For supervised training of machine learning methods, a to eight positions (Fp1, Fp2, C3, C4, O1, O2, A1, A2; refer- sample containing pairs of feature vectors 𝒙 and target varia- ence electrode: Cz; common average reference). bles 𝑡 must be provided: Based on video recordings of the eye region, head and ( ) { } 𝑺 = { 𝒙 , 𝒕 |𝒙 ∈ ℝ , 𝒕 ∈ 𝟎 , 𝟏 , 𝒏 = 𝟏 , 𝟐 , … , 𝑵 } (1) shoulders as well as the driving scene, the starting times and 𝒏 𝒏 𝒏 𝒏 lengths of MSE were determined. MSE have always been de- with 𝑁 = 7903; 𝑑 = 240 for LogPSD, 𝑑 = 216 for DWT. termined based on observable behavioral characteristics, in As binary target variable the class label 𝑡 = 1 was used for particular prolonged eyelid closure and slow eye movement. MSE examples and 𝑡 = 0 for counterexamples. This visual assessment was performed by a trained person with The following three different methods of machine learning long experience in this field [3]. A total of 7903 events con- have been applied in order to compare their performance: sisting of 3859 MSE and of 4044 counterexamples was drawn  Learning vector quantization (LVQ) from recordings.  Gradient boosting (GB)  Support-vector machines (SVM) 3 Methods LVQ is a neural net consisting of one layer of neurons per- forming competitive learning. I.e., the neuron whose weight The machine learning methods achieved lowest error rates if vector 𝒘 is closest to the current input vector 𝒙 in terms of 𝑘 𝑛 the EEG was segmented such that the beginning and end of the a pre-selected vector norm wins the competition among all EEG segments were 1 s before and 3 s after the start time of neurons and is adapted by the following learning rule: MSE occurrence. This setting was fixed uniformly for all channels and all subjects. Details of these empirical optimiza- 𝚫 𝒘 = ±𝜼 (𝒙 − 𝒘 ) (2) 𝒄 𝒏 𝒄 tions are addressed in section 4. Logarithmic power spectral with step size 𝜂 ∈ (0,1), 𝒘 = 𝑛𝑎𝑟𝑚𝑖𝑔 ‖𝒙 − 𝒘 ‖, 𝒙 ∈ 𝑆 , densities (LogPSD) were estimated using the modified peri- 𝒄 𝑛 𝑘 𝑛 𝒘 ∈ {(𝒘 , 𝜏 )|𝒘 ∈ ℝ , 𝜏 ∈ {0,1}, 𝑘 = 1, … , 𝐾 , 𝐾 ≪ 𝑁 }. odogram with Hann tapers. Alternatively, the following direct 𝑘 𝑘 𝑘 𝑘 𝑘 The gradient boosting (GB) algorithm creates an ensemble and indirect PSD estimation methods were tested: Welch, of decision trees. At each iteration, for each class a new tree is multi-taper, Yule-Walker, Burg. For them, however, the final created such that they correct the errors of the former tree. This error rates of machine learning were slightly higher. Addition- is fulfilled when the squared error loss function has a negative ally, feature extraction was extended to the discrete wavelet gradient. Trees added to the ensemble are no more modified. transform (DWT) and to wavelet packet transform. Signal de- At the end, for a given data example all trees are retrieved and composition was performed up to level 8. The total power and a majority decision is made: The class that was calculated by the following entropies were estimated for each detail and for more trees is selected. Since GB tends to overfit relatively the approximation coefficient series: Shannon, threshold, quickly, the decision trees should be limited according to the SURE, 𝑙 norm, and mean logarithmic instantaneous power number of branches per level and the depth of the tree. Here [6]. The DWT using Daubechies mother wavelet of order 2 LightGBM was used, a numerically efficient GB variant de- (db2) at decomposition level 5 and with the total power as well veloped by Microsoft Inc. [7], which uses existing graphics as all 5 types of entropies provided feature sets that proved processing unit in addition to the central processing unit. How- most successful for machine learning. Thus, from each EEG ever, our data sets were too small to benefit from using the segment a 216-dimensional feature vector consisting of 6 graphics processing unit. LightGBM tends to have deeper ra- measures (total power and 5 entropies) of 6 wavelet coefficient ther than wider decision trees [7]. series (1 approximation, 5 details) in each of 6 EEG channels M. Golz et al., Inter-individual variability of EEG features during microsleep events — 15 Table 1: Mean and standard deviation of MSE detection errors es- SVM aims at finding a mapping 𝒙 ↦ 𝑡 for any vector 𝒙 ∈ 𝑑 timated by repeated random subsampling using three different ( ) ℝ based on all samples 𝒙 , 𝑡 ∈ 𝑆 . The largest possible 𝑛 𝑛 learning methods. The number of EEG segments 𝑁 and the number margin of a linear separation function 𝒘 𝒙 + 𝑏 = 0 between of LogPSD features 𝑁 were equal for all methods. the two class domains is sought. I.e. the parameters 𝒘 ∈ ℝ , 𝑏 ∈ ℝ must be optimized provided that the distance be- Method 𝑵 𝑵 𝑬 [%] 𝑬 [%] 𝑭 𝑰𝑵𝑻𝑹𝑨 𝑻𝑺𝑻𝑬 tween the separation function and the nearest feature vectors, the support vectors, is maximum. It has been proved that this LVQ 7,903 240 problem has a unique solution. The corresponding Lagrangian: GB 7,903 240 1 2 𝑁 T ( ) ‖ ‖ ∑ ( ( ) ) 𝐿 𝒘 , 𝑏 , 𝜶 = 𝒘 − 𝛼 𝑡 𝒘 𝒙 + 𝑏 − 1 (3) 𝑛 𝑛 𝑛 𝑛 =1 SVM 7,903 240 must be in a saddle point because 𝐿 (𝒘 , 𝑏 , 𝜶 ) must be mini- mized with respect to 𝒘 , 𝑏 and maximized with respect to𝜶 . The solution can be formulated explicitly: 4 Results 𝒘 = 𝛼 𝑡 𝒙 , 𝑏 = 𝑡 − 𝒘 𝒙 (4) 𝑆 𝑛 =1 𝑛 𝑛 𝑛 𝑆 𝑛 𝑆 𝑛 First, results of the repeated random subsampling are shown. { | } Only the set of support-vectors 𝒙 𝛼 > 0 contribute to equ. 𝑛 𝑛 It turned out that the length of EEG segments can be varied (4). If the training set is not separable, an error term 𝐶 = between 4 and 10 s without any degradation in classification 𝑠𝑙 𝜉 with 𝑁 different slack variables 𝜉 ≥ 0 can be intro- performance. In order to have maximum time resolution the 𝑖 𝑖 𝑖 =1 duced to forgive classification errors (soft margin principle). segmentation length was set to 4 s. The second segmentation This leads to a restriction of the multipliers to the interval 0 ≤ parameter, the time offset between the segment center and the 𝛼 ≤ 𝐶 , ∀𝑛 = 1, … , 𝑁 . The regularization parameter C must beginning of the MSE, showed a very sensitive influence on be optimized empirically by minimizing mean training errors. classification performance (Fig. 1). Only in a small interval the Additionally, the solution should be searched within the high- algorithms can learn accurately; it is optimal to set the center dimensional reproducing kernel Hilbert space by applying an of the segment 1 s after MSE starts. Also, LogPSD features ( ) admissible kernel function 𝑘 𝒙 , 𝒙 in order to take advantage 𝑛 resulted in lower error rates than DWT features. of the blessings of high dimensionality and to get non-linear For these optimal settings, the comparison of LVQ, GB, separation functions in the input space. This way, the separa- and SVM learning algorithms resulted in small differences in ( ) tion function changes from 𝒘 𝒙 + 𝑏 = 0 to 𝑘 𝒙 , 𝒙 + 𝑏 = 0 𝑛 mean errors and their standard deviations (Tab. 1). The com- ( ) ⁄ ‖ ‖ and the equation (3) changes to 𝐿 𝒘 , 𝑏 , 𝜶 = 1 2 𝒘 − putational load of SVM was at least 10 times higher than that ∑ ( ( ( ) ) ) 𝛼 𝑡 𝑘 𝒙 , 𝒙 + 𝑏 − 1 . Gaussian kernel functions of GB and this was about the same factor higher than that of 𝑛 =1 𝑛 𝑛 𝑛 ( ) ( ‖ ‖ ) 𝑘 𝒙 , 𝒙 = exp −𝛾 𝒙 − 𝒙 were used, which have only 𝑛 𝑛 LVQ. It must be emphasized, however, that these are results one free parameter 𝛾 to be optimized empirically. from repeated random subsampling of independent training and test sets that include data from all subjects in both sets. Thus, the methods already have information about each sub- ject during the training. This is not the case for LOSO CV, in which data of one subject is held out of training and used to estimate the mean test error. For 31 subjects, the average size of the training set is 30/31 𝑁 = 7648 and 1⁄31 𝑁 = 255 for the test set. Since the data of each test person are kept out and the data of all others are used for training, there are 31 different mean test errors (Fig. 2). The results show that there are large differences in learning performance. Data from most subjects were classified accurately by classifiers who learned from data from all other subjects. However, if only the results of the best classifier (SVM) are considered, it turns out that five subjects Figure 1: Mean and standard deviation of classification errors on are classified with errors higher than 10 %. That is, during the test set versus the time offset between segment center training the methods did not obtain enough information from and MSE start time. Results were estimated using the LVQ algorithm on LogPSD and on DWT feature sets. data of the other 30 subjects to correctly classify data of the subject held out. Comparison of learning methods shows that 𝑠𝑙 M. Golz et al., Inter-individual variability of EEG features during microsleep events — 16 in LOSO CV the SVM can almost always classify better than GB and LVQ, often much better. 5 Conclusions The presented investigation shows that classification of short- term EEG segments to behavioral characteristics is possible at low error rates, if the characteristics relate to a change in brain state, i.e. the microsleep. It has been demonstrated that only in a short time interval around the onset of MSE an accurate learnability is given. For segments that are a few seconds be- fore or after MSE onset, such low errors cannot be achieved, because obviously no specific brain state can be defined here, Figure 2: Results of LOSO cross-validation of all subjects. Mean classification errors for three different learning algorithms. but the normal multi-process mode of the awake brain, which is limited, however, by high drowsiness. Direct estimation of LogPSD by the modified periodogram CV analyses in order to ultimately obtain indications for a con- led to higher accuracies than other estimation methods, which clusive explanation. provide lower variance. This suggests that the trade-off be- tween bias and variance must be chosen differently for ma- Author Statement chine learning methods. They obviously benefit from PSD es- Authors state no funding involved, as well as no conflict of timation methods with lower bias at higher variance. It re- interest. Informed consent has been obtained from all individ- mains a future challenge to find a suitable feature extraction uals included in both studies. Ethical approval: The research from DWT coefficient series. Power and entropy measures related to human use complies with all relevant national regu- were of little use in achieving low classification errors in the lations, institutional policies and was performed in accordance following step. with the tenets of the Helsinki Declaration, and has been ap- LOSO CV simulates the case that in the future, in a dataset proved by the authors' institutional review board. where learning methods had already been learned in the past, another set of examples of a new subject will be added. It was found that data of a few subjects could not be well explained References by data of the other 30 subjects. In order to explain this result, [1] Golz M, Sommer D. Automatic Recognition of Microsleep the raw recordings and the distribution of the extracted fea- Events. J Biomedizinische Technik 2004;49,Suppl2:332-3. tures were inspected. No easily identifiable cause was found. [2] Peiris M, Jones R, Davidson P, Bones P. Detecting behavior- th One subject (#30) had unusually strong and long-lasting blink ral microsleeps from EEG power spectra. Proc 28 EMBS activity in periods of high drowsiness. This activity superim- Conf 2006:5723-26. [3] Golz M, Schenka A, Sommer D, et al. The role of expert eva- posed all channels and may have led to a considerable bias in luation for microsleep detection. Current Directions in Bio- estimated spectral features. Others were found to have poor medical Engineering 2015;1(1):92–5. signal quality, probably due to poor electrode contact resist- [4] Xu G, Huang JZ. Asymptotic optimality and efficient compu- ance in combination with low amplitude EEG. tation of the leave-subject-out cross-validation. The Annals of Statistics 2012;40(6):3003-30. It might also be that the individually diverse EEG charac- [5] Golz M, Sommer D, Trutschel U, et al. Evaluation of fatigue teristics are a further explanation. This has already been re- monitoring technologies. J Somnology 2010;14(3):187-99. ported from studies on features extracted from 30 s EEG seg- [6] Misiti M, Misiti Y, Oppenheim G, Poggi JM. Wavelet toolbox. ments during sleep. In studies with monozygotic and dizygotic The MathWorks Inc., Natick (MA), USA; 2017. [7] Ke G, Meng Q, Finley T, Wang T, Chen W, et al. LightGBM: twins, evidence was found of high individuality and high her- A highly efficient gradient boosting decision tree. In Adv Neu- itability of EEG features. One group of authors concluded that ral Inform Proc Syst 2017:3146-54. the EEG could possibly be the most inheritable trait of humans [8] De Gennaro L, Marzano C, Fratello F, et al. The EEG finger- [8]. We are working on carefully processing data from further print of sleep is genetically determined: A twin study. Annals studies in our laboratory and incorporating them into LOSO of Neurology 2008,64:455-60.

Journal

Current Directions in Biomedical Engineeringde Gruyter

Published: Sep 1, 2019

Keywords: Microsleep; EEG; periodogram; support-vector machines; SVM; learning vector quantization; LVQ; gradient boosting; leave one out; LOSO cross-validation

There are no references for this article.