Detection of Pitting in Gears Using a Deep Sparse Autoencoder
Detection of Pitting in Gears Using a Deep Sparse Autoencoder
Qu, Yongzhi;He, Miao;Deutsch, Jason;He, David
2017-05-16 00:00:00
applied sciences Article Detection of Pitting in Gears Using a Deep Sparse Autoencoder 1 2 2 2 , 3 , Yongzhi Qu , Miao He , Jason Deutsch and David He * School of Mechanical and Electronic Engineering, Wuhan University of Technology, Wuhan 430070, China; quwong@whut.edu.cn Department of Mechanical and Industrial Engineering, University of Illinois at Chicago, Chicago, IL 60607, USA; mhe21@uic.edu (M.H.); jdeuts4@uic.edu (J.D.) College of Mechanical Engineering and Automation, Northeastern University, Shenyang 110819, China * Correspondence: davidhe@uic.edu; Tel.: +86-24-8369-6132 Academic Editor: César M. A. Vasques Received: 15 March 2017; Accepted: 12 May 2017; Published: 16 May 2017 Abstract: In this paper; a new method for gear pitting fault detection is presented. The presented method is developed based on a deep sparse autoencoder. The method integrates dictionary learning in sparse coding into a stacked autoencoder network. Sparse coding with dictionary learning is viewed as an adaptive feature extraction method for machinery fault diagnosis. An autoencoder is an unsupervised machine learning technique. A stacked autoencoder network with multiple hidden layers is considered to be a deep learning network. The presented method uses a stacked autoencoder network to perform the dictionary learning in sparse coding and extract features from raw vibration data automatically. These features are then used to perform gear pitting fault detection. The presented method is validated with vibration data collected from gear tests with pitting faults in a gearbox test rig and compared with an existing deep learning-based approach. Keywords: gear; pitting detection; deep sparse autoencoder; vibration; deep learning 1. Introduction Gears are one of the most critical components in many industrial machines. Health monitoring and fault diagnosis of gears are necessary to reduce breakdown time and increase productivity. Pitting is one of the most common gear faults and normally difficult to detect. An undetected gear pitting fault during the operation of the gears can lead to catastrophic failures of the machines. In recent years, many gear pitting fault detection methods have been developed. Following the same way to classify machine fault diagnostic and prognostic methods by [1,2], gear pitting fault detection methods can be classified into two main categories, namely model-based methods and data-driven methods. The model-based techniques rely on accurate dynamic models of the systems, while the data-driven approaches use data to train fault detection models. Model-based approaches obtain the residuals between actual system and output. These residuals are then used as the indicator of the actual faults [3,4]. However, the model-based approaches require not only expertise in dynamic modeling, but also accurate condition parameters of the studied system. On the other hand, data-driven approaches do not require the knowledge of the target system and dynamic modeling expertise. In comparison with model-based techniques, data-driven approaches can design a fault detection system that can be easily applied when massive data is available. Data-driven techniques are appropriate when a comprehensive understanding of system operation is absent, or when it is sufficiently difficult to model the complicated system [5]. Data-driven-based gear pitting fault detection methods in general relies on feature extraction by human experts and complicated signal processing techniques. For example, Reference [6] used Appl. Sci. 2017, 7, 515; doi:10.3390/app7050515 www.mdpi.com/journal/applsci Appl. Sci. 2017, 7, 515 2 of 15 a zoomed phase map of continuous wavelet transform to detect minor damage such as gear fitting. References [7,8] used the mean frequency of a scalogram to get features for gear pitting fault detection. Reference [9] extracted condition indicators from time-averaged vibration data for gear pitting damage detection. Reference [10] used empirical mode decomposition (EMD) to extract features from vibration signals for gear pitting detection. Reference [11] combined EMD and fast independent component analysis to extract features from stator current signals for gear pitting fault detection. Reference [12,13] applied spectral kurtosis to extract features for gear pitting fault detection. Reference [14] extracted statistical parameters of vibration signals in the frequency domains as an input to artificial neural network for gear pitting fault classification. One challenge facing the abovementioned data-driven gear fault detection methods in the era of big data is that features extracted from vibration signals depend greatly on prior knowledge of complicated signal processing and diagnosis expertise. Besides, features are selected per the specific fault detection problems and may not be appropriate for different fault detection problems. An approach that can automatically and effectively self-learns gear fault features from the big vibration data and effectively detect the gear fault is necessary to address the challenge. As a data-driven approach, sparse coding is a class of unsupervised methods for learning sets of overcomplete bases to represent data efficiently. Unlike principal component analysis (PCA) that learn a complete set of basis vectors efficiently, sparse coding learns an overcomplete basis. This gives sparse coding the advantage of generating basis vectors that are able to better capture structures and patterns inherent in the input data. Recently, sparse coding-based methods have been developed for machinery fault diagnosis [15–18]. However, these methods used manually constructed overcomplete dictionaries that cannot guarantee to match the structures in the analyzed data. Sparse coding with dictionary learning is viewed as an adaptive feature extraction method for machinery fault diagnosis [19]. Study reported in Reference [19] developed a feature extraction approach for machinery fault diagnosis using sparse coding with dictionary learning. In their approach, the dictionary is learned through solving a joint optimization problem alternatively: one for the dictionary and one for the sparse coefficients. One limitation with this approach is that solving the joint optimization problem alternatively for massive data is NP-complete [20] and therefore is not efficient for automation. In this paper, a new method is proposed. The proposed approach combines the advantages of sparse coding with dictionary learning in feature extraction and the self-learning power of the deep sparse autoencoder for dictionary learning. Autoencoder is an unsupervised machine learning technique and a deep autoencoder is a stacked autoencoder network with multiple hidden layers. To the knowledge of the authors, no attempt to combine sparse coding with dictionary learning and deep sparse autoencoder for gear pitting fault detection has been reported in the literature. 2. The Methodology The general procedure of the presented method is shown in Figure 1 below. As shown in Figure 1, the presented method is composed of three main steps. Firstly, the dictionary and the corresponding representation of raw data will be obtained through unsupervised learning by the deep sparse autoencoder. Then, a simple backpropagation neural network constructed as the last hidden layer and the output layer is trained to classify the healthy and pitting gear condition using the learnt representation. With the learnt dictionary and trained classifier, the testing raw data are then imported into the network for pitting fault detection. It should be noted that the dictionary learning process is an unsupervised learning process. Thus, the representations regarded as features extracted from raw signals are learnt completely unsupervised without fine-tuning. Sections 2.1 and 2.2 give a brief introduction on dictionary learning and autoencoder, respectively. Dictionary learning using deep sparse autoencoder for gear pitting detection is explained in Section 2.3. Appl. Sci. 2017, 7, 515 3 of 15 Appl. Sci. 2017, 7, x 3 of 15 Raw vibration data Data without known labels Data with known labels Dictionary learning using deep sparse autoencoder Stacked dictionaries and sparse representations of Data without signals known labels Gear pitting classifier training Gear pitting detection model Gear pitting detection results Figure 1. General procedure of the deep sparse autoencoder-based gear pitting detection. Figure 1. General procedure of the deep sparse autoencoder-based gear pitting detection. 2.1. Dictionary Learning 2.1. Dictionary Learning In recent years, the application of dictionary learning has been popularized in various fields, In recent years, the application of dictionary learning has been popularized in various fields, including image and speech recognition [21–25]. The study of dictionary learning application in including image and speech recognition [21–25]. The study of dictionary learning application in vision can be traced back to the end of the last century [26]. The goal of dictionary learning is to learn vision can be traced back to the end of the last century [26]. The goal of dictionary learning is to a basis for representation of the original input data. The expansion of dictionary learning based learn a basis for representation of the original input data. The expansion of dictionary learning applications is benefited from the introduction of K-SVD [27,28]. K-SVD is an algorithm that based applications is benefited from the introduction of K-SVD [27,28]. K-SVD is an algorithm that decomposes the training data in matrix form into a dense basis and sparse coefficients. Given an decomposes the training data in matrix form into a dense basis and sparse coefficients. Given an input input signal =[ , ,…, ] , the basic dictionary learning formula can be expressed as: signal x = [x , x , . . . , x ] , the basic dictionary learning formula can be expressed as: 1 2 ‖ ‖ min − (1) minkx DSk (1) D,S where ∈ℝ represents the dictionary matrix to be learnt with dimension n as number of data points in the input signal and as the number of atoms in the dictionary D, each column of D nK where D 2 R represents the dictionary matrix to be learnt with dimension n as number of data the basic function also known as atoms in dictionary learning, =[ , ,…, ] the points in the input signal x and K as the number of atoms in the dictionary D, each column of D the representation coefficients of the input signal , and ‖∙‖ the approximation accur Tacy accessed by basic function d also known as atoms in dictionary learning, S = [s , s , . . . , s ] the representation k 1 2 k the . coefficients of the input signal x, and kk the approximation accuracy accessed by the l norm. The goal of dictionary learning is to learn a basis which can represent the samples in a sparse presentation, meaning that is required to be sparse. Thus, fundamental principle of dictionary learning with sparse representations is expressed as: Appl. Sci. 2017, 7, 515 4 of 15 The goal of dictionary learning is to learn a basis which can represent the samples in a sparse presentation, meaning that S is required to be sparse. Thus, fundamental principle of dictionary learning with sparse representations is expressed as: minkSk (2) Subject to: kx DSk g (3) where function kk is referred to as l norm that counts the nonzero entries of a vector, as a sparsity measurement, and g the approximation error tolerance. As shown in Equation (2), solution of the l norm minimization is a NP hard problem [20]. Thus, the orthogonal matching pursuit (OMP) [29] is commonly used to solve approximation of l norm minimization. As mentioned previously, the popular used dictionary learning algorithm K-SVD was developed with employment of OMP as well. The K-SVD is constituted by two main procedures. In the first procedure, the dictionary matrix is firstly learnt and then it is used in the second procedure to represent the data sparsely. In the procedure of dictionary learning, K-SVD estimate the atoms one at a time according to the ranking update with efficient technique. Such strategy leads to the disadvantage of K-SVD as relatively low computing efficiency since the singular value decomposition (SVD) is required in each iteration. The basic functions of dictionary matrix D can be either manually extracted or automatically learned from the input data. The manually extracted basic functions are simple and will lead to fast algorithms, however with poor performance on matching the structure in the analyzed data. An adaptive dictionary should be learned from input data through machine learning based methods, such that the basic functions can capture a maximal amount of structures of the data. 2.2. Autoencoder The structure of autoencoder is shown in Figure 2. A typical autoencoder contains two parts, namely the encoding and decoding part. As shown in Figure 2, the encoding part maps the input data to the latent expression in the hidden layer, and then the decoding part reconstructed the latent expression to the original data as output. In an autoencoder, all the neurons in the input layer are connected to all the neurons in the hidden layer, and vice versa. With a given input data (bias term included) vector x, the latent expression in the hidden layer h can be written as: h = f (wx) (4) where w represents the weights matrix between each neuron in the input layer and the one in the hidden layer, f the non-linear activation function used to smooth the output of the hidden layer. Commonly, the activation function is selected as sigmoid or tanh function. The decoding portion reverse the maps the latent expression to the data space as: x ˆ = f w f (wx) (5) d e where x ˆ represents the reconstructed data mapped from the latent expression in the hidden layer, 0 T w = w the weight matrix between hidden layer and the output layer, and f the activation function to smooth the output layer results. Likewise, f is usually selected as sigmoid or tanh function. The objective in the autoencoder training procedure is to obtain the set of encoding weights w and decoding weights w such that the error between the original input data and the reconstructed data is minimized. The learning objective can be written as: argminkx x ˆk (6) w,w … Appl. Sci. 2017, 7, 515 5 of 15 The smooth and continuously differentiable activation function in the Equation (5) guarantees that even as a non-convex problem in Equation (6), the smooth results leads it can be solved by gradient descent techniques. Appl. Sci. 2017, 7, x 5 of 15 Encoding Encoding Input layer/Input data Input layer/Input data Output layer Output layer Decoding Decoding Hidden layer Hidden layer Figure 2. Scheme of single autoencoder. Figure 2. Scheme of single autoencoder. Figure 2. Scheme of single autoencoder. Furthermore, multiple autoencoders can be stacked to construct a deep structure. The deep Furthermore, multiple autoencoders can be stacked to construct a deep structure. The deep autoencoder structure is illustrated in Figure 3. autoencoder structure is illustrated in Figure 3. Autoencoder 1 Autoencoder 2 Autoencoder m Input layer Hidden layer 1 Hidden layer 2 Hidden layer m Figure 3. Scheme of deep autoencoder structure. Figure 3. Scheme of deep autoencoder structure. For the deep autoencoders shown in Figure 3, the overall cost function can be expressed based For the deep autoencoders shown in Figure 3, the overall cost function can be expressed based on ‖ ‖ [ ( )] on Equation (6) as: min − 2 where = … , and 0 0 0 Equation (6) as: min kx x ˆk where x ˆ = f w fw . . . w [E(x)]g, and ,…, , ,…, d1 2 1 2 m 0 0 w ,...,w ,w ,...,w 1 m 1 ( ) = [ … ( ) ] (7) ( ) n ( ) o E(x) = f w f [w . . . f (w x)] (7) m 1 m 2 e1 1 e(m 1) e(m 2) th where and ( = 1,2,…, ) represent the encoding and decoding weight matrix of the i autoencoder in the network respectively, and the encoding and decoding activation function 0 th where w and w (i = 1, 2, . . . , m) represent the encoding and decoding weight matrix of the i th i of the i autoencoder. The computational complexity of massive amount of parameters (weight autoencoder in the network respectively, f and f the encoding and decoding activation function of ei di matrix) in Equation (7) results in computation challenge and over fitting phenomena. Thus, searching for the appropriate solution is commonly accomplished through the layer-wise learning behavior. 2.3. Dictionary Learning Using Deep Sparse Autoencoder Based on the previously reviewed dictionary learning and stacked autoencoder models, a deep sparse autoencoder based dictionary learning is presented in this section. Like the structure of a Appl. Sci. 2017, 7, 515 6 of 15 th the i autoencoder. The computational complexity of massive amount of parameters (weight matrix) in Equation (7) results in computation challenge and over fitting phenomena. Thus, searching for the appropriate solution is commonly accomplished through the layer-wise learning behavior. 2.3. Dictionary Learning Using Deep Sparse Autoencoder Based on the previously reviewed dictionary learning and stacked autoencoder models, a deep sparse autoencoder based dictionary learning is presented in this section. Like the structure of a deep autoencoder, the deep sparse autoencoder based dictionary learning can be illustrated as Figure 4 below. ࡰ ࡰ ࡿ ࡿ ࡿ ଵ ଶ ିଵ X ࡿ Figure 4. Scheme of stacked dictionary learning. Figure 1. Scheme of stacked dictionary learning. As shown in Figure 4, each dash block represents a shallow/single dictionary learning process. The first dictionary learning process can be written as: X = D S (8) 1 1 n o where X = x 2 R stands for the set of N input signals, x the signal vector with a length of d, i i i=1 n o n o j j g 1 2 K d 1 2 N D = d , d , . . . , d , . . . , d for d 2 R the first learnt dictionary, and S = s , s , . . . , s , . . . , s 1 1 1 1 1 1 1 1 1 1 1 for s 2 R the first latent expression of X in D . Treating the deep autoencoder as a dictionary learning n o 1 2 p K p 0 0 0 0 0 0 d network, one can define D = d , d , . . . , d , . . . , d for d 2 R as the reconstruction weight 1 1 1 1 1 1 from latent expression S to original input X. Like the expression in the autoencoder, the reconstructed input data X can be written as: X = f D f (S ) (9) d e 1 where f and f represent the encoding and decoding activation functions, respectively. e d Substitute S in Equation (9) with Equation (8), X can be written as: nh io 0 1 X = f D f D X (10) d e Here in this study, the activation function for both encoding and decoding processes are selected as sigmoid function. The cost function of dictionary learning using deep sparse autoencoder can be expressed as: min kX Xk + b KL(rjjr ˆ ) (11) å j D,D 2N j=1 r 1 r KL(rjjr ˆ ) = r log + (1 r) log (12) r ˆ 1 r ˆ j j Appl. Sci. 2017, 7, 515 7 of 15 h i r ˆ = 1 + exp D X (13) where N represents the number of input vectors, b the parameter controlling the weight of the sparsity penalty term, r the sparsity parameter, r ˆ the average activation of the hidden unit j over the all N training samples. The sparsity penalty term is defined as Kullback-Leibler (KL) divergence, which is used to measure the difference between two distributions. It is defined as KL(r r ˆ = 0 when r = r , otherwise the KL divergence increases as r r increases. In comparison with the j j similar k-sparse autoencoder proposed in [30], the advantages of the deep sparse autoencoder include: (1) The introduction of the sparsity penalty leads to the automatic determination of the sparsity rather than pre-defined as k-sparsity. It enables the deep sparse autoencoder to extract the sparse features more accurately based on the characteristics of the data. (2) The dictionary is learnt in the encoding procedure. The encoding dictionary is different from the encoding weight matrix. (3) The deep sparse autoencoders does not require the fine-tuning process while the performance of k-sparse autoencoders relies on the supervised fine-tuning process. In the deep autoencoder, the output of a hidden layer in the previous autoencoder can be taken as th the input to the next autoencoder. Let the first layer of the k autoencoder in the deep autoencoder th th 0 be the k layer and the second layer as the (k + 1) layer. Also, let D and D be the dictionary and th th reconstruction weight for the k layer in the deep autoencoder, the encoding procedure in the k autoencoder can be expressed as: k k a = f z (14) k+1 1 k z = D a (15) k th k k+1 th th where a stands for the output of the k layer, z and z the input for the k and (k + 1) layer, respectively. th Similarly, the decoding procedure in the k autoencoder can be expressed as: k k a = f z (16) k 1 0 k z = D a (17) th Thus, the original input X can be expressed by the latent expression S in the (k + 1) layer as: X = (D D . . . D )S (18) 1 2 k k th where D represents the learnt dictionary in the k dictionary learning process, S the latent expression k k of X in D . The stacked dictionaries D D . . . D will be learnt in a greedy layer by layer way. The greedy ( ) 1 2 layer by layer learning guarantees the convergence at each layer. 3. Gear Test Experimental Setup and Data Collection The gear pitting tests were performed on a single stage gearbox installed as an electronically closed transmission test rig. The gearbox test rig includes two 45 kW Siemens servo motors. One of the motors can act as the driving motor while the other can be configured as the load motor. The configuration of the driving mode is flexible. Compared with traditional open loop test rig, the electrically closed test rig is economically more efficient, and can virtually be configured with arbitrary load and speed specifications within the rated power. The overall gearbox test rig, excluding the control system, is showed in Figure 5. Appl. Sci. 2017, 7, x 7 of 15 determination of the sparsity rather than pre-defined as k-sparsity. It enables the deep sparse autoencoder to extract the sparse features more accurately based on the characteristics of the data. (2) The dictionary is learnt in the encoding procedure. The encoding dictionary is different from the encoding weight matrix. (3) The deep sparse autoencoders does not require the fine-tuning process while the performance of k-sparse autoencoders relies on the supervised fine-tuning process. In the deep autoencoder, the output of a hidden layer in the previous autoencoder can be taken th as the input to the next autoencoder. Let the first layer of the k autoencoder in the deep autoencoder th th be the k layer and the second layer as the (k + 1) layer. Also, let and be the dictionary and th th reconstruction weight for the k layer in the deep autoencoder, the encoding procedure in the k autoencoder can be expressed as: = ( ) (14) = (15) th th th where stands for the output of the k layer, and the input for the k and (k + 1) layer, respectively. th Similarly, the decoding procedure in the k autoencoder can be expressed as: = ( ) (16) = (17) th Thus, the original input can be expressed by the latent expression in the (k + 1) layer as: =( … ) (18) th where represents the learnt dictionary in the k dictionary learning process, the latent expression of in . The stacked dictionaries ( … ) will be learnt in a greedy layer by layer way. The greedy layer by layer learning guarantees the convergence at each layer. 3. Gear Test Experimental Setup and Data Collection The gear pitting tests were performed on a single stage gearbox installed as an electronically closed transmission test rig. The gearbox test rig includes two 45 kW Siemens servo motors. One of the motors can act as the driving motor while the other can be configured as the load motor. The configuration of the driving mode is flexible. Compared with traditional open loop test rig, the electrically closed test rig is economically more efficient, and can virtually be configured with Appl. Sci. 2017, 7, 515 8 of 15 arbitrary load and speed specifications within the rated power. The overall gearbox test rig, excluding the control system, is showed in Figure 5. Figure 5. The gearbox test rig. Figure 5. The gearbox test rig. Appl. Sci. 2017, 7, x 8 of 15 The testing gearbox is a single stage gearbox with spur gears. The gearbox has a speed reduction of 1.8:1. The input driving gear has 40 teeth and the driven gear has 72 teeth. The 3-D geometric model The testing gearbox is a single stage gearbox with spur gears. The gearbox has a speed reduction of 1.8:1. The input driving gear has 40 teeth and the driven gear has 72 teeth. The 3-D of the gearbox is shown in Figure 6. geometric model of the gearbox is shown in Figure 6. (a) (b) Figure 6. 3-D model of the gears under testing. (a) gear models in 3-D dimension; (b) .gear models in Figure 6. 3-D model 2-D dimension. of the gears under testing. (a) gear models in 3-D dimension; (b) gear models in 2-D dimension. Gear parameters are provided in Table 1. Table 1. List of gear parameters for the tested gearbox. Gear parameters are provided in Table 1. Gear Parameter Driving Gear Driven Gear Tooth number 40 72 Table 1. List Modof ule gear parameters for 3 mm the tested gearbox. 3 mm Base circle diameter 112.763 mm 202.974 mm Pitch diameter 120 mm 216 mm Gear Parameter Driving Gear Driven Gear Pressure angle 20° 20° Addendum coefficient 1 1 Tooth number 40 72 Coefficient of top clearance 0.25 0.25 Diametric pitch 8.4667 8.4667 Module 3 mm 3 mm Engaged angle 19.7828° 19.7828° Base circle diameter 112.763 mm 202.974 mm Circular pitch 9.42478 mm 9.42478 mm Pitch diameter 120 mm 216 mm Addendum 4.5 mm 3.588 mm Pressure angle Dedendum 2.25 mm 20 3.162 mm 20 Addendum modification coefficient 0.5 0.196 Addendum coefficient 1 1 Addendum modification 1.5 mm 0.588 mm Coefficient of top clearance 0.25 0.25 Fillet radius 0.9 mm 0.9 mm Diametric pitch 8.4667 8.4667 Tooth thickness 5.8043 mm 5.1404 mm Tooth width 85 mm 85 mm Engaged angle 19.7828 19.7828 Theoretical center distance 168 mm 168 mm Circular pitch 9.42478 mm 9.42478 mm Actual center distance 170.002 mm 170.002 mm Addendum 4.5 mm 3.588 mm Dedendum 2.25 mm 3.162 mm The pitting fault was simulated by using electrical discharge machine to erode gear tooth face. The pitting location is on one of the teeth on the output driven gear with 72 teeth. Approximately, Addendum modification coefficient 0.5 0.196 the gear tooth face was eroded with a depth of 0.5 mm. One row of pitting faults was created along Addendum modification 1.5 mm 0.588 mm the tooth width. The simulated pitting fault is shown in Figure 7. Fillet radius 0.9 mm 0.9 mm Tooth thickness 5.8043 mm 5.1404 mm Tooth width 85 mm 85 mm Theoretical center distance 168 mm 168 mm Actual center distance 170.002 mm 170.002 mm Appl. Sci. 2017, 7, 515 9 of 15 The pitting fault was simulated by using electrical discharge machine to erode gear tooth face. The pitting location is on one of the teeth on the output driven gear with 72 teeth. Approximately, the gear tooth face was eroded with a depth of 0.5 mm. One row of pitting faults was created along the tooth width. The simulated pitting fault is shown in Figure 7. Appl. Sci. 2017, 7, x 9 of 15 Appl. Sci. 2017, 7, x 9 of 15 Figure 7. Simulated gear pitting fault. Figure 7. Simulated gear pitting fault. Figure 7. Simulated gear pitting fault. A tri-axial accelerometer was attached on the gearbox case close to the bearing house on the A tri-axial accelerometer was attached on the gearbox case close to the bearing house on the output end as shown in Figure 8. A tri-axial accelerometer was attached on the gearbox case close to the bearing house on the output end as shown in Figure 8. output end as shown in Figure 8. Tri-axial Tri-axial accelerometer accelerometer Figure 8. Vibration and torque measurement for the testing gearbox. Figure 8. Vibration and torque measurement for the testing gearbox. Both healthy and pitted gearboxes under various operating condition were run and the vibration signals collected. The tested operation conditions are listed in Table 2. The vibration Both healthy and pitted gearboxes under various operating condition were run and the vibration Figure 8. Vibration and torque measurement for the testing gearbox. signals were collected with a sampling rate of 20.48 KHz. signals collected. The tested operation conditions are listed in Table 2. The vibration signals were Both healthy and pitted gearboxes under various operating condition were run and the collected with a sampling rate Table 2. of 20.48 Operation condition KHz. of the experiments. vibration signals collected. The tested operation conditions are listed in Table 2. The vibration Speed (rpm) 100 200 500 1000 Table 2. Operation condition of the experiments. signals were collected with a sampling rate of 20.48 KHz. Torque (Nm) 50/100/200/300/400/500 50/100/200/300/400/500 50/100/200/300/400/500 50/100/200/300/400/500 Figure 9 shows the raw vibration signals collected for normal gear and pitting gear at loading Speed (rpm) 100 200 500 1000 Table 2. Operation condition of the experiments. conditions of 100 Nm and 500 Nm. Torque (Nm) 50/100/200/300/400/500 50/100/200/300/400/500 50/100/200/300/400/500 50/100/200/300/400/500 Speed (rpm) 100 200 500 1000 Torque (Nm) 50/100/200/300/400/500 50/100/200/300/400/500 50/100/200/300/400/500 50/100/200/300/400/500 Figure 9 shows the raw vibration signals collected for normal gear and pitting gear at loading conditions of 100 Nm and 500 Nm. Figure 9 shows the raw vibration signals collected for normal gear and pitting gear at loading conditions of 100 Nm and 500 Nm. Appl. Sci. 2017, 7, 515 10 of 15 Appl. Sci. 2017, 7, x 10 of 15 (a) (b) (c) (d) Figure 9. The waveforms of raw vibration data for normal and pitting gear at various rotating speeds Figure 9. The waveforms of raw vibration data for normal and pitting gear at various rotating under loading conditions of 100 Nm and 500 Nm. (a)waveforms of healthy signals with 100 Nm; speeds under loading conditions of 100 Nm and 500 Nm. (a)waveforms of healthy signals with (b) waveforms of healthy signals with 500 Nm; (c) waveforms of faulty signals with 100 Nm; 100 Nm; (b) waveforms of healthy signals with 500 Nm; (c) waveforms of faulty signals with 100 Nm; (d) waveforms of faulty signals with 500 Nm. (d) waveforms of faulty signals with 500 Nm. 4. The Validation Results 4. The Validation Results The proposed deep sparse autoencoder structure was implemented to accomplish the greedy The proposed deep sparse autoencoder structure was implemented to accomplish the greedy deep deep dictionary learning. The vibration signals along the z vertical direction were used in this study dictionary learning. The vibration signals along the z vertical direction were used in this study since since they contain the richest vibration information among the three monitored directions. At first, they contain the richest vibration information among the three monitored directions. At first, the gear the gear pitting detection was carried on using signals with light loading as the training data and pitting detection was carried on using signals with light loading as the training data and signals with signals with heavy loadings as the testing data, respectively. Loadings of 100 Nm torque and 500 Nm heavy loadings as the testing data, respectively. Loadings of 100 Nm torque and 500 Nm torque torque were used as light loading condition and heavy loading condition, respectively. Signals at were used as light loading condition and heavy loading condition, respectively. Signals at rotating rotating speeds of 100, 200, 500 and 1000 rpm were used for validation tests. To study the influence speeds of 100, 200, 500 and 1000 rpm were used for validation tests. To study the influence of different of different rotating speeds on the pitting gear fault detection performance of the deep sparse rotating speeds on the pitting gear fault detection performance of the deep sparse autoencoder, 100 and autoencoder, 100 and 1000 rpm were selected as low and high speed for independent validations. 1000 rpm were selected as low and high speed for independent validations. The length of the samples The length of the samples was decided to ensure that at least one revolution of the output driven was decided to ensure that at least one revolution of the output driven gear was included. Therefore, gear was included. Therefore, there were 23,000 data points in each sample for signals at 100 rpm there were 23,000 data points in each sample for signals at 100 rpm and 15,000 data points for signals and 15,000 data points for signals at a speed of at least 200 rpm. Thus, 26 samples of signals at at a speed of at least 200 rpm. Thus, 26 samples of signals at 100 rpm were generated for healthy gear 100 rpm were generated for healthy gear and pitting gear, respectively. Hence, there were 52 samples and pitting gear, respectively. Hence, there were 52 samples in the training dataset and 52 samples in in the training dataset and 52 samples in the testing dataset. Similarly, 40 samples of signals at 1000 rpm the testing dataset. Similarly, 40 samples of signals at 1000 rpm were generated for each gear condition, were generated for each gear condition, with 80 samples in the training dataset and 80 samples in the with 80 samples in the training dataset and 80 samples in the testing dataset. The structure of the deep testing dataset. The structure of the deep sparse autoencoder was designed separately for signals at sparse autoencoder was designed separately for signals at 100 rpm and signals at the speed of over 100 rpm and signals at the speed of over 100 rpm as: one input layer (23,000 neurons for signals at 100 rpm as: one input layer (23,000 neurons for signals at 100 rpm and 15,000 neurons for signals at the 100 rpm and 15,000 neurons for signals at the speed over 100 rpm), four hidden layers speed over 100 rpm), four hidden layers (1000-500-200-50 neurons), and one output layer (2 neurons). (1000-500-200-50 neurons), and one output layer (2 neurons). Particularly, following the suggestions Particularly, following the suggestions in [31], the sparsity parameter in each sparse autoencoder was in [31], the sparsity parameter in each sparse autoencoder was set as = 3 and = 0.005 . The sparse set as b = 3 and r = 0.005. The sparse representations of the original signals were imported into representations of the original signals were imported into classifier for pitting gear fault detection. The last hidden layers of 50 neurons and the output layer of 2 neurons were constructed as a simple Appl. Sci. 2017, 7, 515 11 of 15 classifier for pitting gear fault detection. The last hidden layers of 50 neurons and the output layer of 2 neurons were constructed as a simple backpropagation neural network as a classifier for gear pitting detection. The two neurons in the output layer were setup for classifying the input signals as either gear pitting fault or normal gear. The training parameters of the back propagation neural network classifier were set as: training epoch was 100, learning rate was 0.05 and the momentum was 0.05. For each gear condition, the models were executed 5 times to get average detection accuracy. The detection results are shown in Table 3. Table 3. Detection results at 100 and 1000 rpm (trained with light loading samples and tested with heavy loading samples). Training Accuracy (100 Nm) Testing Accuracy (500 Nm) Gear Conditions (100 rpm/1000 rpm) (100 rpm/1000 rpm) Healthy gear 100%/99.50% 99.23%/98.84% Pitting gear 98.43%/100% 98.43%/98.91% Overall accuracy 99.22%/99.75% 98.83%/98.88% The detection results in Table 3 show a good adaptive learning performance of the presented method. The testing accuracy is high as 98.88% overall, which is slightly lower than the training accuracy. It can be explained as that signals at light loading condition contain less fault significant information. Furthermore, the same designed deep sparse autoencoder structure was experimented with heavy loading training data and light loading testing data. The detection results are shown in Table 4. Table 4. Detection results at 100 and 1000 rpm (trained with heavy loading samples and tested with light loading samples). Training Accuracy (500 Nm) Testing Accuracy (100 Nm) Gear Conditions (100 rpm/1000 rpm) (100 rpm/1000 rpm) Healthy gear 100%/99.95% 100%/99.90% Pitting gear 100%/100% 99.23%/100% Overall accuracy 100%/99.98% 99.62%/99.95% It can be observed from Table 4 that in comparison with results shown in Table 3, the testing accuracy is slightly higher than the training accuracy. The better adaptive feature extraction and fault detection results are benefited from that the signals with heavy loading condition contain more fault significant information. The results in Tables 3 and 4 show marginal influence of the rotating shaft speeds on the pitting fault detection performance. Moreover, the signals with stable loading and mixed rotating speed were also tested in the study. The detailed description of each dataset used in the validation is provided in Table 5. The detection results are presented in Tables 6 and 7. Table 5. Dataset description. Loading Condition of the Loading Condition of the Rotating Speed Length of Signal Dataset Training Dataset (Nm) Testing Dataset (Nm) (rpm) Sample A 100 500 100 23,000 B 500 100 100 23,000 C 100 500 1000 15,000 D 500 100 1000 15,000 E 100 500 100/200/500/1000 15,000 F 500 100 100/200/500/1000 15,000 Appl. Sci. 2017, 7, 515 12 of 15 Table 6. Detection results at mixed rotating speeds (trained with light loading samples and tested with heavy loading samples). Gear Conditions Training Accuracy (100 Nm) Testing Accuracy (500 Nm) Healthy gear 99.45% 97.21% Pitting gear 99.65% 97.05% Overall accuracy 99.55% 97.13% Table 7. Detection results at mixed rotating speeds (trained with heavy loading samples and tested with light loading samples). Gear Conditions Training Accuracy (500 Nm) Testing Accuracy (100 Nm) Healthy 99.45% 99.94% Pitting gear 99.58% 99.84% Over all 99.52% 99.89% Still, 100 and 500 Nm torque loadings were selected as the light loading and heavy loading condition. For each loading condition, 52 samples and 80 samples were generated for both healthy and pitting gear condition at 100 rpm and at the speeds over 100 rpm, respectively. In comparison with the results in Tables 3 and 4, even though the detection accuracies in Tables 6 and 7 for both cases (trained with light loading samples and tested with heavy loading samples, and vice versa) are slightly lower, the accuracies obtained by the deep sparse autoencoders are satisfactorily as high as 97.13% and 99.89%. The satisfactory detection results show the good performance without the effects of various rotating speeds of the deep sparse autoencoders. Furthermore, it shows the capability of the deep sparse autoencoders in automatically extracting the adaptive features from the raw vibration signals. The validation results have shown the good robustness of the deep sparse autoencoders on gear pitting detection without much influence of working conditions, including loadings and rotating speeds. To make a comparison, a typical autoencoder based deep neural network (DNN) presented in [32] was selected to detect the gear pitting fault using the same data. The DNN was designed with a similar structure like the deep sparse autoencoder, namely one input layer (23,000 neurons and 15,000 neurons), four hidden layers (1000-500-200-50 neurons), and one output layer (2 neurons). Like the deep sparse autoencoder, the last hidden layer and the output layer of the DNN was designed as a back propagation neural network classifier. Since the autoencoder based neural network normally requires supervised fine-tuning process for better classification, the designed DNN was tested without and with supervised fine-tuning. The detection results of the DNN are provided in Tables 8 and 9. Table 8. Detection results of DNN at mixed rotating speeds (trained with light loading samples and tested with heavy loading samples). Training Accuracy (100 Nm) Testing Accuracy (500 Nm) Gear Conditions Without Supervised With Supervised Without Supervised With Supervised Fine-Tuning Fine-Tuning Fine-Tuning Fine-Tuning Healthy gear 85.25% 90.50% 83.42% 89.85% Pitting gear 85.85% 89.95% 81.15% 88.24% Overall accuracy 85.55% 90.23% 82.29% 89.05% Table 9. Detection results of DNN at mixed rotating speeds (trained with heavy loading samples and tested with light loading samples). Training Accuracy (500 Nm) Testing Accuracy (100 Nm) Gear Conditions Without Supervised With Supervised Without Supervised With Supervised Fine-Tuning Fine-Tuning Fine-Tuning Fine-Tuning Healthy gear 82.18% 90.25% 84.25% 91.50% Pitting gear 84.15% 88.85% 85.17% 91.50% Overall accuracy 83.17% 89.55% 84.71% 91.50% Appl. Sci. 2017, 7, x 13 of 15 Appl. Sci. 2017, 7, 515 13 of 15 In comparing the results obtained by the DNN in Tables 8 and 9 with those obtained by the In comparing the results obtained by the DNN in Tables 8 and 9 with those obtained by the deep deep sparse autoencoder, one can see that the deep sparse autoencoder gives a better performance sparse than the DNN based autoencoder, one approa can see ch fthat or gea the r pi deep tting fa sparse ult de autoencode tection. In b r gives oth ca a ses, the better performance detection acthan curaci the es DNN based approach for gear pitting fault detection. In both cases, the detection accuracies obtained obtained by the DNN are much lower than those of the deep sparse autoencoder. In comparison by with the DNN, the presented method i the DNN are much lower than those of the s more ro deep sparse bust i autoencoder n automati.ca Inllcomparison y extracting the with the ada DNN, ptive the presented method is more robust in automatically extracting the adaptive features for gear pitting features for gear pitting detection. In addition, the presented method does not require the supervised detection. fine-tuning Inpr addition, ocess. Such the advant presented age method will incre does ase t not he computat require the ional e supervised fficiencfine-tuning y and enhan prcocess. e the Such robustness of advantage the gea will rincr pitting fa ease the ult detecti computational on in deal effi in ciency g with m and assive da enhance ta. the robustness of the gear pitting To veri fault fy the a detection bil in ity of dealing the with presented massive method data. for automatically adaptive features extraction, using a To ve simila rify the r approa ability ch in [ of the 33 pr ], the esented princi method pal component for automatically analysis (PC adaptive A) was em features ployed t extracti o v on, isualize using a the extracted similar apprfeatures. The oach in [33], the values o principal f neurons in th component e last hidden analysis (PCA) layer were re was employed garded as pi to visualize tting fathe ult extracted features si featur nce they were used f es. The values of o neur r pions tting detecti in the last on hidden in the output layer. There layer were regarded as for pitting e, 50 fe fault ature featur s were es since obtained by they wer the deep sp e used for pitting arse autoencod detectionein rs. The the output first two pri layer. Ther ncipl efor e components were e, 50 features wereused for the obtained by the visu deep alizat sparse ion since t autoencoders. hey carried o Theufirst t more two th pr an 9 inciple 0% inf components ormation in t werh ee fe used ature for the domain. visualization Since pisince tting they gear detecti carriedo out n resul mort es a than t rota 90% ting information speed of 10 in00 rpm the featur is sl e ight domain. ly mo Since re accur pitting ate tgear han t detection hat at 100 re rpm, sults at onl ry otating features obta speed of in1000 ed using si rpm isgna slightly ls at 100 rpm we more accurate re pl than otted f thato at r observ 100 rpm, atio only n. The sca features tter obtained plot of using principle co signalsmponents of the at 100 rpm were plotted features automat for observation. ically extracted The scatter plot from datasets A, B, of principle components E and F ar of thee featur presented in Fig es automatically ure 10. extracted from datasets A, B, E and F are presented in Figure 10. (a) (b) (c) (d) Figure 10. Scatter plot of principle components for the features extracted from: (a) dataset A; Figure 10. Scatter plot of principle components for the features extracted from: (a) dataset A; (b) dataset B; (c) dataset E, and (d) dataset F. (b) dataset B; (c) dataset E, and (d) dataset F. It can be observed from Figure 10 that the features of the same health condition are grouped in It can be observed from Figure 10 that the features of the same health condition are grouped in the the corresponding clusters which are clearly separated from each other. In comparison with Figure 10a,c, corresponding clusters which are clearly separated from each other. In comparison with Figure 10a,c, Figure 10b,d show a better clustering performance and more clear separation boundary between the Figure 10b,d show a better clustering performance and more clear separation boundary between the healthy and pitting gear conditions. This could be due to the fact that the features of heaving healthy and pitting gear conditions. This could be due to the fact that the features of heaving loading loading conditions in A and E are extracted using the deep sparse autoencoders trained with data of conditions in A and E are extracted using the deep sparse autoencoders trained with data of light light loading conditions. The fault features of light conditions are normally less significant than those of heavy loading conditions. Appl. Sci. 2017, 7, 515 14 of 15 loading conditions. The fault features of light conditions are normally less significant than those of heavy loading conditions. 5. Conclusions Gears are one of the most critical components in many industrial machines and pitting is one of the most common gear faults and normally difficult to detect. An undetected gear pitting fault during the operation of the gears can lead to catastrophic failures of the machines. In this paper, a new method for gear pitting fault detection was presented. The presented method was developed based on a deep sparse autoencoder that integrates dictionary learning in sparse coding into a stacked autoencoder network. The presented method uses a stacked autoencoder network to perform the dictionary learning in sparse coding and automatically extract features from raw vibration data. These features are then used to train a simple backpropagation neural network to perform pitting fault detection. The presented method was validated with vibration data collected from tests with gear pitting faults in a gearbox test rig and compared with a deep neural network based approach. In the validation tests, data obtained from one loading condition was used to train the gear pitting detection model and the model was then tested with data obtained from a different loading condition. The validation results have shown the good robustness of the deep sparse autoencoders on gear pitting detection without much influence of working conditions, including loadings and rotating speeds. The comparison between the deep sparse autoencoder and the deep neural network has shown the outstanding performance of the presented method on automatically extracting the adaptive features than the deep neural network based method. Acknowledgments: This work was partially supported by NSFC (51505353) and NSF of Hubei Province (2016CFB584). Author Contributions: Yongzhi Qu conceived, designed, and performed the gear experiments; Miao He and Jason Deutsch analyzed the data; Yongzhi Qu, Miao He, and David He wrote the paper. Conflicts of Interest: The authors declare no conflict of interest. References 1. Jardine, A.K.S.; Lin, D.; Banjevic, D. A review on machinery diagnostics and prognostics implementing condition-based maintenance. Mech. Syst. Signal Process. 2006, 20, 1483–1510. [CrossRef] 2. Heng, A.; Zhang, S.; Tan, A.C.C.; Mathew, J. Rotating machinery prognostics: State of the art. Challenges and opportunities. Mech. Syst. Signal Process. 2009, 23, 724–739. [CrossRef] 3. Rahmounea, C.; Benazzouz, D. Early detection of pitting failure in gears using a spectral kurtosis analysis. Mech. Ind. 2012, 13, 245–254. [CrossRef] 4. Feki, N.; Cavoret, J.; Ville, F.; Velex, P. Gear tooth pitting modelling and detection based on transmission error measurements. Eur. J. Comput. Mech. 2013, 22, 106–119. 5. Liu, J.; Wang, G. A multi-step predictor with a variable input pattern for system state forecasting. Mech. Syst. Signal Process. 2009, 23, 1586–1599. [CrossRef] 6. Lee, S.K.; Shim, J.-S.; Cho, B.-O. Damage detection of a gear with initial pitting using the zoomed phase map of continuous wavelet transform. Key Eng. Mater. 2006, 306–308, 223–228. [CrossRef] 7. Ozturk, H.; Sabuncu, M.; Yesilyurt, I. Early detection of pitting damage in gears using mean frequency of scalogram. J. Vib. Control 2008, 14, 469–484. [CrossRef] 8. Ozturk, H.; Yesilyurt, I.; Sabuncu, M. Detection and advancement monitoring of distributed pitting failure in gears. J. Non-Destruct. Eval. 2010, 29, 63–73. [CrossRef] 9. Lewicki, D.G.; Dempsey, P.J.; Heath, G.F.; Shanthakumaran, P. Gear fault detection effectiveness as applied to tooth surface pitting fatigue damage. In Proceedings of the American Helicopter Society 65th Annual Forum, Grapevine, TX, USA, 27–29 May 2009. 10. Teng, W.; Wang, F.; Zhang, K.; Liu, Y.; Ding, X. Pitting fault detection of a wind turbine gearbox using empirical mode decomposition. Stroj. Vestnik J. Mech. Eng. 2014, 60, 12–20. [CrossRef] 11. He, Q.; Ren, X.; Jiang, G.; Xie, P. A hybrid feature extraction methodology for gear pitting fault detection using motor stator current signal. Insight Non-Destruct. Test. Cond. Monit. 2014, 56, 326–333. [CrossRef] Appl. Sci. 2017, 7, 515 15 of 15 12. Peršin, G.; Viintin, J.; Juriic, D. Gear pitting detection based on spectral kurtosis and adaptive denoising filtering. In Proceedings of the 11th International Conference on Condition Monitoring and Machinery Failure Prevention Technologies, CM 2014/MFPT 2014, Manchester, UK, 10–12 June 2014. 13. Elasha, F.; Ruiz-Carcel, C.; Mba, D.; Kiat, G.; Nze, I.; Yebra, G. Pitting detection in worm gearboxes with vibration analysis. Eng. Fail. Anal. 2014, 23, 231–241. [CrossRef] 14. Ümütlü, R.; Rafet, C.; Hizarci, B.; Ozturk, H.; Kiral, Z. Pitting detection in a worm gearbox using artificial neural networks. In Proceedings of the INTER-NOISE 2016—45th International Congress and Exposition on Noise Control Engineering: Towards a Quieter Future, Hamburg, Germany, 21–24 August 2016. 15. Liu, B.; Ling, S.F.; Gribonval, R. Bearing failure detection using matching pursuit. NDT Eval. Int. 2002, 35, 255–262. [CrossRef] 16. Yang, H.; Mathew, J.; Ma, L. Fault diagnosis of rolling element bearings using basis pursuit. Mech. Syst. Signal Process. 2005, 19, 341–356. [CrossRef] 17. Feng, Z.; Chu, F. Application of atomic decomposition to gear damage detection. J. Sound Vib. 2007, 32, 138–151. [CrossRef] 18. Zhao, F.; Chen, J.; Dong, G. Application of matching pursuit in fault diagnosis of gears. J. Shanghai Jiaotong Univ. 2009, 43, 910–913. [CrossRef] 19. Liu, H.; Liu, C.; Huang, Y. Adaptive feature extraction using sparse coding for machinery fault diagnosis. Mech. Syst. Signal Process. 2011, 25, 550–574. [CrossRef] 20. Natarajan, B.K. Sparse approximate solutions to linear systems. SIAM J. Comput. 1995, 24, 227–234. [CrossRef] 21. Ravishankar, S.; Bresler, Y. MR Image reconstruction from highly undersampled k-space data by dictionary learning. IEEE Trans. Med. Imaging 2010, 30, 1028–1041. [CrossRef] [PubMed] 22. Dong, W.; Lin, X.; Zhang, L.; Shi, G. Sparsity-based image denoising via dictionary learning and structural clustering. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA, 20–25 June 2011. 23. Yang, M.; Zhang, L.; Feng, X.; Zhang, D. Sparse representation based Fisher discrimination dictionary learning for image classification. Int. J. Comput. Vis. 2014, 109, 209–232. [CrossRef] 24. Jafari, M.G.; Plumbley, M.D. Fast dictionary learning for sparse representations of speech signals. IEEE J. Sel. Top. Signal Process. 2011, 5, 1025–1031. [CrossRef] 25. Sigg, C.D.; Dikk, T.; Buhmann, J.M. Speech enhancement using generative dictionary learning. IEEE Trans. Audio Speech Lang. Process. 2012, 20, 1698–1712. [CrossRef] 26. Olshausen, B.A.; Field, D.J. Sparse coding with an overcomplete basis set: A strategy employed by V1? Vis. Res. 1997, 37, 3311–3325. [CrossRef] 27. Aharon, M.; Elad, M.; Bruckstein, A. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans. Signal Process. 2010, 54, 4311–4322. [CrossRef] 28. Rubinstein, R.; Bruckstein, A.M.; Elad, M. Dictionaries for sparse representation modeling. Proc. IEEE 2010, 98, 1045–1057. [CrossRef] 29. Pati, Y.; Rezaiifar, R.; Krishnaprasad, P. Orthogonal Matching Pursuit: Recursive function approximation with application to wavelet decomposition. In Proceedings of the Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 1–3 November 1993. 30. Makhzani, A.; Frey, B. K-Sparse Autoencoders. In Proceedings of the 2nd International Conference on Learning Representations (ICLR2014), Banff, AB, Canada, 14–16 April 2014. 31. Ng, A. CS 294A Lecture Notes: Sparse Autoencoder; Stanford University: Palo Alto, CA, USA, 2010. 32. Jia, F.; Lei, Y.; Lin, J.; Zhou, X.; Lu, N. Deep neural networks: A promising tool for fault characteristic mining and intelligent diagnosis of rotating machinery with massive data. Mech. Syst. Signal Process. 2016, 72–73, 303–315. [CrossRef] 33. Yunusa-Kaltungo, A.; Sinha, J.K. Sensitivity analysis of higher order coherent spectra in machine faults diagnosis. Struct. Health Monit. 2016, 15, 555–567. [CrossRef] © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png
Applied Sciences
Multidisciplinary Digital Publishing Institute
http://www.deepdyve.com/lp/multidisciplinary-digital-publishing-institute/detection-of-pitting-in-gears-using-a-deep-sparse-autoencoder-27i7qgbczY