Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

An extreme rainfall-induced landslide susceptibility assessment using autoencoder combined with random forest in Shimane Prefecture, Japan

An extreme rainfall-induced landslide susceptibility assessment using autoencoder combined with... Background: Landslide-affecting factors are uncorrelated or non-linearly correlated, limiting the predictive performance of traditional machine learning methods for landslide susceptibility assessment. Deep learning methods can take advantage of the high-level representation and reconstruction of information from landslide-affecting factors. In this paper, a novel deep learning-based algorithm that combine classifiers of both deep learning and machine learning is proposed for landslide susceptibility assessment. A stacked autoencoder (StAE) and a sparse autoencoder (SpAE) both consist of an input layer for raw data, hidden layer for feature extraction, and output layer for classification and prediction. As a study case, Oda City and Gotsu City in Shimane Prefecture, southwestern Japan, were used for susceptibility assessment and prediction of landslides triggered by extreme rainfall. Results: The prediction performance was compared by analyzing real landslide and non-landslide data. The prediction performance of random forest (RF) was evaluated as better than that of a support vector machine (SVM) in traditional machine learning, so RF was combined with both StAE and SpAE. The results show that the prediction ratio of the combined classifiers was 93.2% for StAE combined with RF model and 92.5% for SpAE combined with RF model, which were higher than those of the SVM (87.4%), RF (89.7%), StAE (84.2%), and SpAE (88.2%). Conclusions: This study provides an example of combined classifiers giving a better predictive ratio than a single classifier. The asymmetric and unsupervised autoencoder combined with RF can exploit optimal non-linear features from landslide-affecting factors successfully, outperforms some conventional machine learning methods, and is promising for landslide susceptibility assessment. Keywords: Stacked autoencoder, Sparse autoencoder, Support vector machine, Random forest, Landslide susceptibility Introduction remote sensing are more accurate than statistical ap- Landslide susceptibility assessment is a cogent research proaches (Alexakis et al. 2014; Ciampalini et al. 2015;Di topic intended to determine the spatial probability of land- Martire et al. 2016), while physical methods are not suitable slide occurrence since landslides continuously result in for large areas (Tien Bui et al. 2016). Therefore, statistical damages and casualties worldwide (Corominas et al. 2013). approaches have received much attention because it is Spatial occurrence is called susceptibility, and landslide sus- efficient for fast recognizing landslides in large areas (Chen ceptibility maps generated from landslide-affecting factors et al. 2018b). It is necessary for decision makers to fast using statistical approaches subdivide areas into different recognize large areas where landslides are expected to re- terrains that are likely to cause certain types of landslides sult in land use planning and disaster control. Landslide (Segoni et al. 2018). Physical methods using GIS and susceptibility prediction based on statistical approaches can achieve this goal efficiently (Borrelli et al. 2018;Huang et al. 2019). Most of the quantitative methods for produ- * Correspondence: geonamsoil@gmail.com cing landslide susceptibility maps refer to regression or Department of Earth Science, Shimane University, 1060 Nishikawatsu-cho, Matsue, Shimane 690-8504, Japan © The Author(s). 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 2 of 16 classification approaches between real landslide data and stacked autoencoder and dropout by sparse autoencoder artificially created non-landslide data (Fell et al. 2008). The for non-linear correlations of the landslide-affecting factors quantitative methods most widely used for landslide sus- and gives better feature descriptions than the original data. ceptibility mapping are such as logistic regression (Lee and It does not require any additional methods which are Talib 2005; Ayalew and Yamagishi 2005; Bai et al. 2010; required for appropriate training data. In summary, this Aditian et al. 2018), naïve Bayes (Tien Bui et al. 2012;Tsan- study proposes the combined method of the advantage of garatos and Ilia 2016), artificial neural networks (Pradhan deep learning and the benefits of machine learning for et al. 2010;Arnoneetal. 2016), support vector machines landslide susceptibility assessment. The landslides in Oda (Yao et al. 2008; Yilmaz 2010; Ballabio and Sterlacchini City and Gotsu City in Shimane Prefecture, southwestern 2012;Xuet al. 2012), decision trees (Saito et al. 2009;Yeon Japan, are used as case study. A stacked autoencoder and et al. 2010), and random forest (Alessandro et al. 2015; sparse autoencoder are combined with random forest ac- Trigila et al. 2015;Hongetal. 2016;Chenetal. 2019b;Park quired from the results of a better predictive performance et al. 2019) in machine learning techniques. between support vector machine and random forest. Recently, deep learning algorithms have made a series of revolutions in the field of machine learning (Huang et al. Study area 2019) since the classification capability of a neural network The study area is located in Oda City and Gotsu City, to fit a decision boundary plane has become significantly Shimane Prefecture, southwestern Japan (Fig. 1). The more reliable (LeCun et al. 2015) which can successfully elevation varies from sea level to 1123 m (Table 1). The learn and extract patterns and unique features from big average annual rainfall recorded from the rainfall sta- data (Ayinde et al. 2019). Deep learning also can effectively tions at Fukumitsu, Oda, and Sakurae are 1657 mm, avoid local optimization and eliminates the need to set 1786 mm, and 2011 mm from 2008 to 2018 (Fig. 2). The model parameters because of autonomous processes cumulative rainfall for 2013 recorded from the rainfall (Zhang et al. 2017). At the moment, the core techniques of stations at Fukumitsu, Oda, and Sakurae are 2270 mm, deep learning are neural networks that have two or more 2102 mm, and 2656 mm, respectively (http://www.jma. hidden layers, including the following techniques: the adap- go.jp/jma/index.html). In this study, a total of 90 land- tive neuro-fuzzy inference system (Park et al. 2012); recur- slides were caused by extreme rainfall from May to rent neural networks (Chen et al. 2015); deep belief October 2013 (Table 2), and 69 of the landslides were networks (Huang and Xiang 2018); long short-term mem- triggered by extreme rainfall in August 2013. These ory (Xiao et al. 2018;Yangetal. 2019); and convolutional landslides can be described as shallow landslides that neural networks (Wang et al. 2019). Deep learning-based were determined based on field investigation. autoencoder is a semi-unsupervised learning method with no prior knowledge, such as landslide inventory, which Spatial data setting means that landslide and non-landslide labels and linear Landslide susceptibility prediction can be evaluated as a and non-linear correlation assumptions are not needed binary classification problem between landslides and non- (Huang et al. 2019). For landslide susceptibility assessment, landslides. A spatial database setting including landslide traditional methods for de-correlation are based on the pixel grid, non-landslide pixel grid, and related landslide- prior assumption that there are linear correlations between affecting factors is needed for statistical analysis (Huang landslides and non-landslides. However, landslide-affecting et al. 2019). This spatial database was divided into a train- factors are usually non-linear in practical applications. The ing dataset and a validation dataset. autoencoder driven by data rather than prior knowledge These real 90 of landslides and 90 of non-landslides arti- can transform raw data into non-linear correlated features. ficially generated from ArcGIS software were randomly In this paper, novel deep learning algorithms, namely, split into two parts with a ratio of 70% and 30%. Seventy both stacked autoencoder and sparse autoencoder com- percent of the landslide and non-landslide grid cells were bined with traditional machine learning, are proposed for selected for the training model, and the remaining 30% landslide susceptibility prediction. StAE and SpAE are un- were used for the validation model. Furthermore, the supervised learning as it does not require external labels landslide (event) and non-landslide (non - event) grid cells on landslides information. The encoding and decoding were set to 1 and 0, respectively, and the values of 1 and 0 process all happen in the dataset. The input and output were used for classification and prediction as the output data have the same number of dimensions, and the hidden variables of the landslide susceptibility prediction models. layer has fewer dimensions. Autoencoders are learned Thereafter, the calculated frequency ratio (FR) values were automatically from dataset, which is easy to train special- considered as numeric input variables of landslide suscep- ized instances of the algorithm that will perform well on a tibility prediction models. specific type of landslide-affecting factors. The autoencoder The landslide-affecting factors in study area are com- technique takes advantage of dimension reduction by plex, and it is difficult to confirm which affecting factors Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 3 of 16 Fig. 1 Study area and landslide inventory of Oda Citiy and Gotsu City in Shimane Prefecture, southwestern Japan (RGB color of Sentinel-2 satellite) Table 1 Description and frequency ratio (FR) of topographical and distance to factors in the study area Factors Values No. of Landslides FR Factors Values No. of Landslides FR Altitude (m) 0–105.77 59 2.15 Profile curvature −37.55 - (− 3.71) 3 0.84 105.78–215.95 21 0.80 −3.72 - (−1.18) 6 0.43 215.96–339.36 10 0.43 −1.19 - 1.03 39 0.90 339.37–550.90 0 0 1.04–3.87 36 1.53 550.91–1123.84 0 0 3.88–43.08 6 1.06 Slope (degree) 0–9.50 2 0.78 Dis. to stream < 101 58 1.64 9.51–19.00 30 1.11 101–200 21 0.73 19.01–28.21 29 1.08 201–300 8 0.44 28.22–38.00 24 0.92 301–400 3 0.46 38.01–73.40 5 0.71 > 401 0 0 Plan curvature −49.05 - (−3.81) 2 0.78 Dis. to road < 200 39 2.91 −3.82 - (−1.11) 12 1.11 201–400 17 1.38 −1.12 - 0.57 47 1.08 401–600 7 0.64 0.58–2.60 24 0.92 601–800 5 0.52 2.61–37.03 5 0.71 > 801 22 0.50 Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 4 of 16 Fig. 2 Annual rainfall from 2008 to 2018, and monthly rainfall of 2013 in the study area are the most important and necessary among the topo- The topographic factors were acquired and calculated graphic, geological, hydrological, distance to stream and based on the digital elevation model (DEM), with a spatial distance to road. In landslide susceptibility modeling, resolution of 10 m, including altitude, slope angle, plan landslides may reoccur under conditions similar to those curvature, profile curvature (Yilmaz et al. 2012), distance of past landslides (Westen et al. 2003; Lee and Talib 2005; to stream (Devkota et al. 2013; Guo et al. 2015), stream Dagdelenler et al. 2016). A total of 14 affecting factors power index (SPI) (Park and Kim 2019), and topographic were acquired and chosen as input variables for landslide wetness index (TWI) (Althuwaynee et al. 2016;Colkesen susceptibility models (Figs. 3, 4 and 5). et al. 2016). The distance to road (Alexakis et al. 2014;Roy Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 5 of 16 Table 2 Description and frequency ratio (FR) of remote sensing - derived index and hydrological factors in the study area Factors Values No. of Landslides FR Factors Values No. of Landslides FR NDVI −0.242 - 0.143 15 7.26 SPI −13.816 - (−8.806) 3 0.65 0.144–0.255 19 3.15 −8.805 - (−4.352) 14 0.83 0.256–0.332 30 1.40 −4.351 - 0.101 20 0.76 0.333–0.391 26 0.75 0.102–2.773 46 1.28 0.392–0.650 0 0 2.774–14.574 7 1.04 NDWI −0.324 - 0.046 1 0.22 TWI −7.969 - (−2.240) 17 0.82 0.047–0.125 22 2.19 −2.239 - 2.114 23 0.75 0.126–0.179 36 1.75 2.115–4.520 38 1.43 0.180–0.223 25 0.79 4.521–8.416 12 1.22 0.224–0.483 6 0.26 8.417–21.250 0 0 BI 0.248–0.312 4 0.86 Rainfall (mm) 1338.5–1424 13 0.54 0.313–0.322 10 0.45 (from May to Oct. in 2013) 1424.1–1528.5 5 0.36 0.323–0.329 33 0.97 1528.6–1633 2 0.17 0.330–0.339 34 1.50 1633.1–1718.5 47 1.69 0.340–0.436 9 1.41 1718.6–1823 23 1.82 and Saha 2019) was acquired from Geospatial Information provide valuable information for producing landslide sus- Authority of Japan (https://fgd.gsi.go.jp/download/menu. ceptibility maps, as quantitative measurement determined php). Normalized difference vegetation index (NDVI) by frequency ratio. Therefore, all 14 landslide-affecting (Chen et al. 2019a), normalized difference water index factors are utilized as input variables in the model to (NDWI) (Luo et al. 2019), and bare soil index (BI) (Huang evaluate their capabilities in performance and feature ex- et al. 2019) were derived from the Landsat TM 8 image traction for the landslide susceptibility assessment. data, resampled with a 10 m resolution (Zhu et al. 2018). The geological factors were derived from the 1:200,000 Methodology scale geological map, which was obtained from the Geo- This study was performed using the following main logical Survey of Japan, AIST (https://www.gsj.jp/en/). steps (Fig. 6): (1) correlation analysis between landslide These landslide-affecting factors were reflected using the inventory and landslide-affecting factors using frequency raster format with a spatial resolution of 10 × 10 m, which ratio, (2) landslide susceptibility prediction using SVM results in raster format that has the advantages of regular and RF models in machine learning, (3) landslide sus- shape, quick subdivision, and high modeling efficiency ceptibility prediction using StAE and SpAE employing (Huang et al. 2019). back propagation neural network in deep learning, (4) For continuous affecting factors, the Jenks natural break evaluation of StAE and SpAE combined with machine method was used to divide each continuous affecting learning acquired from a better prediction ratio between factor into five classes. Then the frequency ratio of all sub- SVM and RF, and (5) validation and comparison of pre- classes of each landslide affecting factor was calculated as dictive performance from the area under the curves and shown in Tables 1, 2 and 3. The frequency ratio allows landslide susceptibility maps produced by six models. that all 14 landslide-affecting factors have significant influ- The landslide samples were created after collecting and ences on landslide occurrence. Some studies have sug- preparing the landslide inventory map, the DEM derived gested that the correlations between affecting factors factors, and remote sensing and geological factors. The should be eliminated to reduce model noise for the land- landslide inventory samples were counted and used to slide susceptibility assessment (Hong et al. 2017; Lin et al. randomly generate non-landslide samples. The final data 2017;Chenet al. 2018a). However, the number of input combined the landslides and non-landslides samples variables of the deep learning algorithm is generally hun- with a defined label (1 and 0, respectively) for each sam- dreds or thousands due to their strong feature extraction ple. Fourteen landslide-affecting factors were prepared ability, and 14 input variables will not result in informa- from a spatial database. The values of the landslide- tion redundancy. On the other hand, some collinearity affecting factors at each sample location were utilized, phenomena between landslide-affecting factors can be tol- and the derived information was prepared using RStudio. erated by the fast-developed machine learning models The dependent variable was converted with one-hot en- (Huang et al. 2019). These 14 landslide-affecting factors coding. The data were then categorized into subsets: for Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 6 of 16 Fig. 3 Thematic maps of topographic factors (a-d) and distance to factors (e and f) considered in this study: a elevation (m), b slope angle (degree), c plan curvature, d profile curvature, e distance to stream (m), and f distance to road (m) Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 7 of 16 Fig. 4 Thematic maps of remote sensing - derived index (a-c) and hydrological factors (d-f) considered in this study: a NDVI, b NDWI, c BI, d SPI, e TWI, and f cumulative rainfall from March to October in 2013 (mm) Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 8 of 16 Fig. 5 Thematic maps of geological factors considered in this study: a geological age and b lithology training (70%) and validation (30%). The StAE and SpAE prediction rate than SVM model. In this study, the valid- model was trained in an unsupervised manner for feature ation of the proposed models was based on a well-known extraction, and a set of new features was generated. These area under the receiver operating characteristic curve. Par- new features were used to train StAE-bpnn and SpAE- ameter tuning was also utilized to assess better accuracy. bpnn in deep learning, and anomaly detection based StAE Finally, landslide susceptibility maps were generated using with RF and SpAE with RF which is selected as better equal interval function in ArcGIS 10.6 software. Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 9 of 16 Table 3 Description and frequency ratio (FR) of geological factors in the study area Factors Values No. of Ls FR Factors Values No. of Ls FR Geological age Unknown age 22 1.31 Lithology Felsic plutonic rocks 9 0.93 Triassic to Jurassic 8 0.41 Gabbro and diorite in accretionary complex 13 1.12 Present 14 3.09 Higher terrace 27 0.85 Permian 13 1.53 Lower terrace 16 0.82 Paleocene to Early Eocene 2 0.36 Mafic plutonic rocks 10 1.50 Middle to Late Miocene 2 1.37 Marine and non-marine sediments 13 3.94 Middle to Late Miocene 10 9.15 Non-alkaline felsic volcanic intrusive rocks 2 0.64 Middle Eocene 1 0.15 Non-alkaline felsic volcanic rocks 0 0 Late Pleistocene to Holocene 18 1.41 Non-alkaline mafic volcanic rocks 0 0 Late Pleistocene 0 0 Non-alkaline pyroclastic flow volcanic rocks 0 0 Late Miocene to Holocene 0 0 Sand dune deposits 0 0 Late Eocene to Early Oligocene 0 0 Schist 0 0 Late Cretaceous 0 0 Ultramafic rocks 0 0 Early to Middle Miocene 0 0 Volcanic debris 0 0 Early Pleistocene 0 0 Water 0 0 Early Miocene to Middle Miocene 0 0 0 0 Frequency ratio (FR) the ratio is less than 1, then the relationship will be The number of landslide pixel grids in each class is eval- weak. If the value is 1, it means an average correlation uated, and the frequency ratio for each factor class is (Meten et al. 2015). assigned by dividing the landslide ratio by the area ratio. The frequency ratio shows the correlation between land- Support vector machine (SVM) slides and affecting factors in a specific area. If this ratio Two main principles of SVM are the optimal classification is greater than 1, then the relationship between a land- hyperplane and the use of kernel features. The purpose of slide and the affecting factor’s class will be strong but if optimal sorting hyperplanes is to accurately distinguish Fig. 6 Flow chart of the research process Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 10 of 16 the two types of samples between landslides and non- can effectively prevent overfitting and gradient disappear- landslides while maximizing the sorting margin. Deter- ance (Huang et al. 2019). To initially achieve de- mining kernel function and optimal parameters are critical correlation among the 14 landslide-affecting factors, drop- for evaluating landslide susceptibility using SVM. Polyno- out was added to the input layer. mial kernels and radial basis function are the most com- The process of StAE is as follows. First, some of the monly used kernels in the literature (Huang and Zhao neurons in the network are randomly dropped in the 2018). To optimize two parameters, both penalty coeffi- mini-batch training samples and the remaining neurons cient C and kernel function parameters are needed in the are fed to the next layer. After obtaining this mini-batch SVM model. training sample, the deleted neurons are recovered and some neurons in the network are randomly deleted once Random forest (RF) again. The corresponding parameters are updated based The random forest, a classification tree algorithm with on the stochastic gradient descent method, performed repeated dichotomy data, can significantly reduce the on the neurons that have not been removed. computations required for classification and regression. In RF algorithms, predictive models are established by Results utilizing many decision trees. Based on randomly se- Landslide susceptibility modelling using the six models lected variables and samples, these trees and their deci- All models based on the deep learning and machine sions are generated. Once the model is established, the learning were coded in R language on RStudio. For the samples are first sorted individually according to all de- SVM model and RF model, parameters were determined cision trees in the model, and then by all trees (Huang using a 10-fold cross-validation approach. With radial and Zhao 2018). The proportion of decision tree esti- basis function, SVM model was acquired from grid mates and generates landslide susceptibility indexes, search for SVM parameter tuning. For RF model, it was which can predict landslide occurrence between all deci- composed of ‘mtry’ and ‘tree’, which were 3 and 300, re- sion trees in the RF model (Goetz et al. 2015). spectively. The autoencoder models based on the deep neural network were coded in R language on RStudio Stacked autoencoder (StAE) using H2O packages. These algorithms were performed The StAE is an artificial neural network, which is a spe- using hyperbolic tangent function (i.e., the tanh func- cial type of multi-layer perceptron. It is a type of un- tion) in every hidden layer which was used to encode supervised learning algorithm with an asymmetric and decode the input to the output in the undercom- structure, in which the middle layer represents the en- plete autoencoder. In the H2O library, five hidden layers coding of the input data in the bottleneck layer (Yu and with encoders and decoders were designed by using the Príncipe 2019). The bottleneck constrains the amount of tanh activation function in each layer. Stacked autoenco- information that can traverse the full network, forcing ders (StAE) were constructed by organizing autoencoder the learned compression of the input data. The StAE is on top of each other also known as deep autoencoder. trained to reconstruct the input of landslide-affecting StAE consists of multiple autoencoder stacked into mul- factors onto the output layer for feature representation, tiple layers where the output of each layer was wired to which prevents the simple copying of the data and the the inputs of the successive layers, as seen in Fig. 7, network. The middle layer has a lower dimension to which was composed of 80–50–2-50-80. To obtain good avoid overfitting, which can either select a subset of fea- parameters, StAE employed greedy layer-wise training. tures with the highest importance or apply some dimen- The benefit of StAE was that it can evaluate the benefits sion reduction techniques (Hinton and Salakhutdinov of deep network, which has greater expressive power. 2006; Charte et al. 2018). In this study, the StAE com- Furthermore, it usually can capture useful hierarchical bined with back propagation neural network was proc- grouping of the input. Finally, model construction was essed for a lower dimension of features than the input determined by the majority vote among all trees using data have, which can be used for learning the most im- RF models. The aim of sparse autoencoder (SpAE) was portant features of the data. to make a large number of neurons to have low average output so that neurons may be inactive most of the time. Sparse autoencoder (SpAE) The limitation of autoencoders to have only small num- The SpAE consists of an input layer, hidden layers, and an bers of hidden units can be overcome by adding a spars- output layer. Each layer in this neural network contains a ity constraint, where a large number of hidden units can sufficient number of neurons. Dropout can randomly clas- be utilized usually more than one input. Three hidden sify the weight of some implicit layer nodes and reduce layers with encoders and decoders were designed by the mutual dependence between nodes to realize the using the tanh activation function in each layer in the normalization of neural networks. Additionally, dropout H2O library. Sparsity can be achieved by introducing a Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 11 of 16 Fig. 7 Architecture of stacked autoencoder combined with random forest employing reconstruction error value by anomaly detection loss function during training or manually zeroing few low (0.0–0.2) index of the susceptibility map (Fig. 9d, e, f strongest hidden unit activations, which was composed and Table 4). of 200–200-200 (Fig. 8). For classification, model class was constructed by RF model by means of the majority Discussion vote among all trees. Reconstruction error value employ- Validation of prediction performance ing mean square error was used by means of anomaly The landslide susceptibility assessment was verified detection in both StAE and SpAE, which were 0.068 and using the area under the curve on the validation dataset 0.088, respectively. for six models. The predictive ratio for landslide suscep- tibility assessment is mainly calculated by confusion matrix. The true positive rate (TPR) is defined as the ra- Landslide susceptibility maps produced by the six models tio of true positive to the sum of true positive and false The landslide susceptibility maps were derived from negative, and the false positive rate (FPR) is defined as SVM, RF, StAE, SpAE, StAE with RF, and SpAE with RF the ratio of false positive to the sum of false positive and in the ArcGIS 10.6 software (Fig. 9). For better true negative to the number of validation samples visualization and comparison, the indices were reclassi- (Zhang and Wang 2019). In general, the true positive de- fied into five classes using the equal interval function: fines the landslide grid cells that are predictive as land- very low (0–0.2), low (0.2–0.4), moderate (0.4–0.6), high slides, true negative means non-landslide grid cells that (0.6–0.8), and very high (0.8–1). The susceptibility class are predictive as non-landslides, false-positive reflects area of the StAE model as the best performance (Table 4) non-landslide grid cells that are predictive as landslides, were 6.31%, 13.58%, 33.04%, 36.81%, and 10.26%, re- and false negative means landslide grid cells that are pre- spectively. The susceptibility class area of the RF model dictive as non-landslides (Huang et al. 2019). The area (Fig. 9b) and StAE model (Fig. 9c) has very high value. under the curve was applied to assess the prediction per- The susceptibility index value of the SVM model (Fig. formance of landslide susceptibility index values on the 9a) and StAE models (Fig. 9c) were prominent near the validation dataset. The prediction rate values of SVM, road (Fig. 3f). SpAE and SpAE with RF have lower values RF, StAE, SpAE, StAE with RF and SpAE with RF model of class area percentage for a very high (0.8–1.0) index are obtained by calculating the area under the prediction of the susceptibility map. RF and StAE have lower values rate curves. The StAE with RF and SpAE model of com- of class area percentage for a moderate (0.4–0.6) index bined classifier have relatively higher prediction rates of the susceptibility map. StAE with RF and SpAE with than using SVM, RF, StAE, and SpAE model of single RF have lower values of class area percentage for a very classifier (Fig. 10). This means that the classifiers Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 12 of 16 Fig. 8 Architecture of sparse autoencoder combined with random forest employing reconstruction error value by anomaly detection combined with both autoencoder and traditional ma- the result of the statistical analysis, as an increase in chine learning are better than using a single classifier. sample size, the result would be more acceptable. Ac- Autoencoder is unsupervised learning as it does not cording to Demoulin and Chung (2007), in spite of the require external labels on landslide information. The limited sample size using ten landslides in about 15 × encoding and decoding process all happen within the 15 km scale, Bayesian method in machine learning deliv- dataset. The input and output data have the same ered satisfying prediction rates. Heckmann et al. (2014) number of dimensions, and the hidden layer has state that small samples result in large standard errors fewer dimensions. Thus, it contains compressed infor- and wide confidence intervals for the population param- mation of the input layer, which is why it acts as a eters. In the case of regression parameters, small samples dimension reduction for the original input layer. From cause the estimation to be uncertain, and there is a the hidden layer, the neural network is able to decode higher risk of coefficients being insignificant when the the information to its original dimensions. Autoenco- respective confidence interval includes zero. With re- ders are learned automatically from data examples, spect to replicate sampling and model selection, it is ex- which is a useful property. It means that it is easy to pected that the diversity of models. However, increasing train specialized instances of the algorithm that will sample sizes causes standard errors and confidence in- perform well on a specific type of input. It does not tervals in parameter estimation to decrease. In a require any additional methods which are required for significance-based stepwise model selection, very large appropriate training data. samples are expected to facilitate the inclusion of more and more variables. Reichenbach et al. (2018) present Sample size that some articles did not use any landslide inventory, One of the challenges for landslide susceptibility map- which are based on the relative importance of the the- ping is to suggest the sample size on the number of matic maps as landslide-affecting factors (Adler and landslide inventories. Several articles have been reported Huffman 2007). In this study, all models obtained from to address adequate numbers of landslide inventories 84% to 93% prediction rate using 90 landslides (about needed to make acceptable landslide susceptibility map- 20 km square), which is similar to previous study ping where sample size varies from 0 to several thousand (Sabokbar et al. 2014) of different study area where 82 in different scales of study areas. The sample size affects landslides were used (about 24 km square). Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 13 of 16 Fig. 9 Landslide susceptibility maps of Shimane prefecture using (a) support vector machine, (b) random forest, (c) stacked autoencoder, (d) sparse autoencoder, (e) stacked autoencoder combined with RF, and (f) sparse autoencoder combined with RF Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 14 of 16 Table 4 Number of landslides occurred and percentage of landslide susceptibility class area Landslide SVM RF StAE SpAE StAE with RF SpAE with RF susceptibility No. % No. % No. % No. % No. % No. % class(index value) Very low (0–0.2) 15 17.00 1 11.80 5 35.65 7 30.16 2 6.31 3 5.82 Low (0.2–0.4) 4 19.61 2 10.45 1 11.09 10 19.07 6 13.58 8 14.38 Moderate (0.4–0.6) 7 25.83 4 8.43 3 10.52 15 24.73 16 33.04 13 35.96 High (0.6–0.8) 17 24.94 13 33.43 4 13.07 44 22.60 66 36.81 66 33.25 Very high (0.8–1.0) 47 12.62 70 35.89 77 29.67 14 3.44 0 10.26 0 10.59 Sum 90 100 90 100 90 100 90 100 90 100 90 100 Study limitation strategy to construct non-landslide for regression and In this study, all landslide points were obtained through classification, 3) scale of study area, 4) resolution of DEM, GPS by field investigation from May to October in 2013 5) relatively equal scatter distribution of landslide inven- without the aid of satellite imagery or unnamed aerial ve- tory in study area 6) setting boundary of study area to hicle (UAV). As seen in Fig. 2f, most landslide points were construct landslide-affecting factors, 7) reasonable selec- in the vicinity of human activity near the roads in the tion of landslide-affecting factors. To construct distinct mountains, not inside the mountainous area. The landslide landslide inventory with distinguishing landslide triggering inventory near the roads may affect landslide susceptibility factors between rainfall and earthquake is considered the maps (Fig. 9), which results in landslide susceptibility index most important key step than using any advanced classi- value near the roads higher than in other areas. fier for landslide susceptibility mapping. Landslide susceptibility mapping is based on the prob- ability of reoccurrence at the area where landslides already Conclusion occurred, unlike mapping physically based on modeling, In this study, the classifiers combined with both deep which relies on as follows: 1) the number of abundant learning and traditional machine learning, StAE with RF landslide inventories for statistical analysis, 2) sampling and SpAE with RF models, are proposed for landslide Fig. 10 The area under the curves for prediction ratio and validation of landslide susceptibility maps produced by the six models Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 15 of 16 susceptibility prediction. The autoencoder consists of in- neural network in a tertiary region of Ambon, Indonesia. Geomorphology 318:101–111 put layers for raw data, hidden layers for feature extrac- Alessandro T, Carla I, Carlo E, Gabriele SM (2015) Comparison of logistic tion, and output layers for landslide susceptibility regression and random forests techniques for shallow landslide susceptibility prediction. The combined classifiers have the advantage assessment in Giampilieri (NE Sicily, Italy). Geomorphology 249:119–136 Alexakis D, Agapiou A, Tzouvaras M, Themistocleous K, Neocleous K, Michaelides of both machine learning and deep learning, i.e., dimen- S, Hadjimitsis D (2014) Integrated use of GIS and remote sensing for sion reduction of the StAE model and dropout of the monitoring landslides in transportation pavements: the case study of Paphos SpAE model for feature extraction. area in Cyprus. Nat Hazards 72:119–141 Althuwaynee OF, Pradhan B, Lee S (2016) A novel integrated model for assessing The six models were applied in Oda City and Gotsu landslide susceptibility mapping using CHAID and AHP pair-wise comparison. City, Shimane Prefecture, southwestern Japan. The cor- Int J Remote Sens 37(5):1190–1209 relation between landslides and landslide-affecting fac- Arnone E, Francipane A, Scarbaci A, Puglisi C, Noto LV (2016) Effect of raster resolution and polygon-conversion algorithm on landslide susceptibility tors using frequency ratio was high in NDVI, distance to mapping. Environ Model Softw 84:467–481 road, and altitude. Performance assessment was carried Ayalew L, Yamagishi H (2005) The application of GIS-based logistic regression for out with the SVM, RF, StAE, SpAE, StAE with RF, and landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan. Geomorphology 65:15–31 SpAE with RF models. The results show that the pro- Ayinde BO, Inanc T, Zurada JM (2019) Regularizing deep neural networks by posed StAE with RF and SpAE with RF models have a enhancing diversity in feature extraction. IEEE Transa Neural Netw Learn Syst relatively better prediction rate than a single classifier 30(9):1–12 Bai S, Wang J, Lü G, Zhou P, Hou S, Xu S (2010) GIS-based logistic regression for such as SVM, RF, StAE and SpAE models. In conclusion, landslide susceptibility mapping of the Zhongxian segment in the three the proposed combined classifier is promising for classi- gorges area, China. Geomorphology 115:23–31 fication between landslide and non-landslide following Ballabio C, Sterlacchini S (2012) Support vector machines for landslide susceptibility mapping: the Staffora River basin case study, Italy. Math Geosci landslide susceptibility prediction because it can over- 44(1):47–70 come the limitations of conventional machine learning Borrelli L, Ciurleo M, Gullà G (2018) Shallow landslide susceptibility assessment in algorithms, extract features and pattern recognition, re- granitic rocks using GIS-based statistical methods: the contribution of the duce computations, and improve performance. weathering grade map. Landslides 15(6):1127–1142 Charte D, Charte F, García S, Jesus MJ, Herrera F (2018) A practical tutorial on autoencoders for nonlinear feature fusion: taxonomy, models, software and Acknowledgements guidelines. Inf Fusion 44:78–96 The authors express their sincere gratitude to Zili DAI (Shimane University, Chen H, Zeng Z, Tang H (2015) Landslide deformation prediction based on Matsue, Japan), Prakash DHUNGANA (Shimane University, Matsue, Japan), recurrent neural network. Neural Process Lett 41(2):169–178 Ran LI (Shimane University, Matsue, Japan), Rong Zhou (Shimane University, Chen W, Panahi M, Tsangaratos P, Shahabi H, Ilia I, Panahi S, Li S, Jaafari A, Ahmadg Matsue, Japan), and Akinori IIO (Shimane University, Matsue, Japan), for their BB (2019a) Applying population-based evolutionary algorithms and a neuro- kind assistance. The authors also acknowledge financial support from fuzzy system for modeling landslide susceptibility. Catena 172:212–231 Shimane University, Japan. Chen W, Peng J, Hong H, Shahabi H, Pradhan B, Liu J, Zhu AX, Pei X, Duan Z (2018a) Landslide susceptibility modelling using GIS-based machine learning Authors’ contributions techniques for Chongren county, Jiangxi province, China. Sci Total Environ FW conducted field investigation in 2013 and provided guidance in the 626:1121–1135 study area of landslides triggered by extreme rainfall in Shimane Prefecture, Chen W, Sun Z, Han J (2019b) Landslide susceptibility modeling using integrated southwestern Japan. KN carried out the landslide susceptibility assessment ensemble weights of evidence with logistic regression and random forest and produced landslide susceptibility maps using deep learning combined models. Appl Sci 9(1):171 with machine learning. Both authors read and approved the final Chen W, Zhang S, Li R, Shahabi H (2018b) Performance evaluation of the GIS- manuscript. based data mining techniques of best-first decision tree, random forest, and naïve Bayes tree for landslide susceptibility modeling. Sci Total Environ 644: Funding 1006–1018 The study was financially supported by funding awarded under the study, Ciampalini A, Raspini F, Bianchini S, Frodella W, Bardi F, Lagomarsino D, Traglia F, “Initiation and motion mechanisms of long runout landslides due to rainfall Moretti S, Proietti C, Pagliara P, Onori R, Corazza A, Duro A, Basile G, Casagli N and earthquake in the falling pyroclastic deposit slope area” (JSPS-B- (2015) Remote sensing as tool for development of landslide databases: the case 19H01980, Principal Investigator: Fawu Wang). of the Messina Province (Italy) geodatabase. Geomorphology 249:103–118 Colkesen I, Sahin EK, Kavzoglu T (2016) Susceptibility mapping of shallow Availability of data and materials landslides using kernel-based Gaussian process, support vector machines and The DEM data utilized in this study are freely available from the Geospatial logistic regression. J Afr Earth Sci 118:53–64 Information Authority of Japan (https://fgd.gsi.go.jp/download/menu.php). Corominas J, Van Westen C, Frattini P, Cascini L, Malet J, Fotopoulou S, Catani F, The Landsat 8 satellite image are acquired from the United States Geological Van Den Eeckhaut M, Mavrouli O, Agliardi F, Pitilakis K, Winter M, Pastor M, Survey (USGS) (https://earthexplorer.usgs.gov). All six models for statistical Ferlisi S, Tofani V, Hervás J, Smith J (2013) Recommendations for the computing and data visualization are coded by R (ver. 3.6.1) open-source quantitative analysis of landslide risk. Bull Eng Geol Environ 73(2):209–263 software (https://www.r-project.org/). Dagdelenler G, Nefeslioglu HA, Gokceoglu C (2016) Modification of seed cell sampling strategy for landslide susceptibility mapping: an application from Competing interests the eastern part of the Gallipoli peninsula (Canakkale, Turkey). Bull Eng Geol The authors declare that they have no competing interests. Environ 75(2):575–590 Demoulin A, Chung C (2007) Mapping landslide susceptibility from small datasets: a Received: 13 November 2019 Accepted: 21 January 2020 case study in the pays de Herve (E Belgium). Geomorphology 89:391–404 Devkota KC, Regmi AD, Pourghasemi HR, Yoshida K, Pradhan B, Ryu IC, Dhital MR, Althuwaynee OF (2013) Landslide susceptibility mapping using certainty References factor, index of entropy and logistic regression models in GIS and their Aditian A, Kubota T, Shinohara Y (2018) Comparison of GIS-based landslide comparison at Mugling-Narayanghat road section in Nepal Himalaya. Nat susceptibility models using frequency ratio, logistic regression, and artificial Hazards 65(1):135–165 Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 16 of 16 Di Martire D, Tessitore S, Brancato D, Ciminelli MG, Costabile S, Costantini M, Segoni S, Tofani V, Rosi A, Catani F, Casagli N (2018) Combination of rainfall Graziano GV, Minati F, Ramondini M, Calcaterra D (2016) Landslide detection thresholds and susceptibility maps for dynamic landslide hazard assessment integrated system (LaDIS) based on in-situ and satellite SAR interferometry at regional scale. Front Earth Sci 6(85):1–11 measurements. Catena 137:406–421 Tien Bui D, Ho TC, Pradhan B, Pham BT, Nhu VH, Revhaug I (2016) GIS-based modeling Fell R, Corominas J, Bonnard C, Cascini L, Leroi E, Savage W (2008) Guidelines for of rainfall-induced landslides using data mining-based functional trees classifier with landslide susceptibility, hazard and risk zoning for land use planning. Eng AdaBoost, bagging, and MultiBoost ensemble frameworks. Environ Earth Sci 75:1–22 Geol 102(3–4):85–98 Tien Bui D, Pradhan B, Lofman O, Revhaug I (2012) Landslide susceptibility assessment in Vietnam using support vector machines, decision tree, and Goetz JN, Brenning A, Petschko H, Leopold P (2015) Evaluating machine learning Naïve Bayes models. Math Probl Eng 2012:1–26 and statistical prediction techniques for landslide susceptibility modeling. Tsangaratos P, Ilia I (2016) Comparison of a logistic regression and naïve Bayes Comput Geosci 81:1–11 classifier in landslide susceptibility assessments: the influence of models Guo C, David RM, Zhang Y, Wang K, Yang Z (2015) Quantitative assessment of complexity and training dataset size. Catena 145:164–179 landslide susceptibility along the Xianshuihe fault zone, Tibetan plateau, Wang Y, Fang Z, Hong H (2019) Comparison of convolutional neural networks for China. Geomorphology 248:93–110 landslide susceptibility mapping in Yanshan County, China. Sci Total Environ Heckmann T, Gegg K, Gegg A, Becht M (2014) Sample size matters: investigating 666:975–993 the effect of sample size on a logistic regression susceptibility model for Westen CJV, Rengers N, Soeters R (2003) Use of geomorphological information in debris flows. Nat Hazards Earth Syst Sci 180:259–278 indirect landslide susceptibility assessment. Nat Hazards 30(3):399–419 Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with Xiao L, Zhang Y, Peng G (2018) Landslide susceptibility assessment using integrated neural networks. Science 313:504–507 deep learning algorithm along the China-Nepal highway. Sensors 18:1–13 Hong H, Ilia I, Tsangaratos P, Chen W, Xu C (2017) A hybrid fuzzy weight of Xu C, Dai F, Xu X, Lee YH (2012) GIS-based support vector machine modeling of evidence method in landslide susceptibility analysis on the Wuyuan area, earthquake-triggered landslide susceptibility in the Jianjiang River watershed, China. Geomorphology 290:1–16 China. Geomorphology 145-146:70–80 Hong H, Pourghasemi HR, Pourtaghi ZS (2016) Landslide susceptibility Yang BB, Yin KL, Lacasse S, Liu ZQ (2019) Time series analysis and long short- assessment in Lianhua County (China): a comparison between a random term memory neural network to predict landslide displacement. Landslides forest data mining technique and bivariate and multivariate statistical 16(4):677–694 models. Geomorphology 259:105–118 Yao X, Tham LG, Dai FC (2008) Landslide susceptibility mapping based on Hong Y, Adler RF, Huffman GJ (2007) Use of satellite remote sensing data in the support vector machine: a case study on natural slopes of Hong Kong, mapping of global landslide susceptibility. Nat Hazards 43(2):245–256 China. Geomorphology 101:572–582 Huang F, Zhang J, Zhou C, Wang Y, Huang J, Zhu L (2019) A deep learning Yeon Y, Han J, Ryu K (2010) Landslide susceptibility mapping in Injae, Korea, algorithm using a fully connected sparse autoencoder neural network for using a decision tree. Eng Geol 116:274–283 landslide susceptibility prediction. Landslides 17:217–229 Yilmaz C, Topal T, Süzen ML (2012) GIS-based landslide susceptibility mapping Huang L, Xiang LY (2018) Method for meteorological early warning of using bivariate statistical analysis in Devrek (Zonguldak-Turkey). Environ Earth precipitation-induced landslides based on deep neural network. Neural Sci 65(7):2161–2178 Process Lett 48(2):1243–1260 Yilmaz I (2010) Comparison of landslide susceptibility mapping methodologies Huang Y, Zhao L (2018) Review on landslide susceptibility mapping using for Koyulhisar, Turkey: conditional probability, logistic regression, artificial support vector machines. Catena 165:520–529 neural networks, and support vector machine. Environ Earth Sci 61:821–836 LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436 Yu S, Príncipe JC (2019) Understanding autoencoders with information theoretic Lee S, Talib JA (2005) Probabilistic landslide susceptibility and factor effect concepts. Neural Netw 117:104–123 analysis. Environ Geol 47(7):982–990 Zhang R, Isola P, Efros AA (2017) Split-brain autoencoders: unsupervised learning Lin GF, Chang MJ, Huang YC, Ho JY (2017) Assessment of susceptibility to rainfall by cross-channel prediction. Paper presented at the Computer Vision & induced landslides using improved self-organizing linear output map, Pattern Recognition support vector machine, and logistic regression. Eng Geol 224:62–74 Zhang S, Wang FW (2019) Three-dimensional seismic slope stability assessment Luo X, Lin F, Zhu S, Yu M, Zhang Z, Meng L, Peng J (2019) Mine landslide with the application of Scoops3D and GIS: a case study in Atsuma, Hokkaido. susceptibility assessment using IVM, ANN and SVM models considering the Geoenvironmental Disasters 6:1–14 contribution of affecting factors. Public Libr Sci 14(4):1–18 Zhu X, Miao Y, Yang L, Bai S, Liu J, Hong H (2018) Comparison of the presence- Meten M, Prakash B, Yatabe R (2015) Effect of landslide factor combinations on only method and presence-absence method in landslide susceptibility the prediction accuracy of landslide susceptibility maps in the Blue Nile mapping. Catena 171:222–233 gorge of Central Ethiopia. Geoenvironmental Disasters 2:1–17 Park I, Choi J, Lee M, Lee S (2012) Application of an adaptive neuro-fuzzy inference system to ground subsidence hazard mapping. Comput Geosci 48:228–238 Publisher’sNote Park S, Hamm S, Kim J (2019) Performance evaluation of the GIS-based data- Springer Nature remains neutral with regard to jurisdictional claims in mining techniques decision tree, random forest, and rotation forest for published maps and institutional affiliations. landslide susceptibility modeling. Sustainability 11(20):5659 Park S, Kim J (2019) Landslide susceptibility mapping based on random forest and boosted regression tree models, and a comparison of their performance. Appl Sci 9(5):942 Pradhan B, Lee S, Buchroithner MF (2010) GIS-based back-propagation neural network model and its cross-application and validation for landslide susceptibility analyses. Comput Environ Urban Syst 34:216–235 Reichenbach P, Rossi M, Malamud BD, Mihir M, Guzzetti F (2018) A review of statistically-based landslide susceptibility models. Earth Sci Rev 14:60–91 Roy J, Saha S (2019) Landslide susceptibility mapping using knowledge driven statistical models in Darjeeling District, West Bengal, India. Geoenvironmental Disasters 6:1–18 Sabokbar HF, Roodposhti MS, Tazik E (2014) Landslide susceptibility mapping using geographically-weighted principal component analysis. Geomorphology 226:15–24 Saito H, Nakayama D, Matsuyama H (2009) Comparison of landslide susceptibility based on a decision-tree model and actual landslide occurrence: the Akaishi Mountains, Japan. Geomorphology 109(3):108–121 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Geoenvironmental Disasters Springer Journals

An extreme rainfall-induced landslide susceptibility assessment using autoencoder combined with random forest in Shimane Prefecture, Japan

Geoenvironmental Disasters , Volume 7 (1) – Jan 30, 2020

Loading next page...
 
/lp/springer-journals/an-extreme-rainfall-induced-landslide-susceptibility-assessment-using-gU4qYawoiV

References (71)

Publisher
Springer Journals
Copyright
Copyright © The Author(s). 2020
eISSN
2197-8670
DOI
10.1186/s40677-020-0143-7
Publisher site
See Article on Publisher Site

Abstract

Background: Landslide-affecting factors are uncorrelated or non-linearly correlated, limiting the predictive performance of traditional machine learning methods for landslide susceptibility assessment. Deep learning methods can take advantage of the high-level representation and reconstruction of information from landslide-affecting factors. In this paper, a novel deep learning-based algorithm that combine classifiers of both deep learning and machine learning is proposed for landslide susceptibility assessment. A stacked autoencoder (StAE) and a sparse autoencoder (SpAE) both consist of an input layer for raw data, hidden layer for feature extraction, and output layer for classification and prediction. As a study case, Oda City and Gotsu City in Shimane Prefecture, southwestern Japan, were used for susceptibility assessment and prediction of landslides triggered by extreme rainfall. Results: The prediction performance was compared by analyzing real landslide and non-landslide data. The prediction performance of random forest (RF) was evaluated as better than that of a support vector machine (SVM) in traditional machine learning, so RF was combined with both StAE and SpAE. The results show that the prediction ratio of the combined classifiers was 93.2% for StAE combined with RF model and 92.5% for SpAE combined with RF model, which were higher than those of the SVM (87.4%), RF (89.7%), StAE (84.2%), and SpAE (88.2%). Conclusions: This study provides an example of combined classifiers giving a better predictive ratio than a single classifier. The asymmetric and unsupervised autoencoder combined with RF can exploit optimal non-linear features from landslide-affecting factors successfully, outperforms some conventional machine learning methods, and is promising for landslide susceptibility assessment. Keywords: Stacked autoencoder, Sparse autoencoder, Support vector machine, Random forest, Landslide susceptibility Introduction remote sensing are more accurate than statistical ap- Landslide susceptibility assessment is a cogent research proaches (Alexakis et al. 2014; Ciampalini et al. 2015;Di topic intended to determine the spatial probability of land- Martire et al. 2016), while physical methods are not suitable slide occurrence since landslides continuously result in for large areas (Tien Bui et al. 2016). Therefore, statistical damages and casualties worldwide (Corominas et al. 2013). approaches have received much attention because it is Spatial occurrence is called susceptibility, and landslide sus- efficient for fast recognizing landslides in large areas (Chen ceptibility maps generated from landslide-affecting factors et al. 2018b). It is necessary for decision makers to fast using statistical approaches subdivide areas into different recognize large areas where landslides are expected to re- terrains that are likely to cause certain types of landslides sult in land use planning and disaster control. Landslide (Segoni et al. 2018). Physical methods using GIS and susceptibility prediction based on statistical approaches can achieve this goal efficiently (Borrelli et al. 2018;Huang et al. 2019). Most of the quantitative methods for produ- * Correspondence: geonamsoil@gmail.com cing landslide susceptibility maps refer to regression or Department of Earth Science, Shimane University, 1060 Nishikawatsu-cho, Matsue, Shimane 690-8504, Japan © The Author(s). 2020 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 2 of 16 classification approaches between real landslide data and stacked autoencoder and dropout by sparse autoencoder artificially created non-landslide data (Fell et al. 2008). The for non-linear correlations of the landslide-affecting factors quantitative methods most widely used for landslide sus- and gives better feature descriptions than the original data. ceptibility mapping are such as logistic regression (Lee and It does not require any additional methods which are Talib 2005; Ayalew and Yamagishi 2005; Bai et al. 2010; required for appropriate training data. In summary, this Aditian et al. 2018), naïve Bayes (Tien Bui et al. 2012;Tsan- study proposes the combined method of the advantage of garatos and Ilia 2016), artificial neural networks (Pradhan deep learning and the benefits of machine learning for et al. 2010;Arnoneetal. 2016), support vector machines landslide susceptibility assessment. The landslides in Oda (Yao et al. 2008; Yilmaz 2010; Ballabio and Sterlacchini City and Gotsu City in Shimane Prefecture, southwestern 2012;Xuet al. 2012), decision trees (Saito et al. 2009;Yeon Japan, are used as case study. A stacked autoencoder and et al. 2010), and random forest (Alessandro et al. 2015; sparse autoencoder are combined with random forest ac- Trigila et al. 2015;Hongetal. 2016;Chenetal. 2019b;Park quired from the results of a better predictive performance et al. 2019) in machine learning techniques. between support vector machine and random forest. Recently, deep learning algorithms have made a series of revolutions in the field of machine learning (Huang et al. Study area 2019) since the classification capability of a neural network The study area is located in Oda City and Gotsu City, to fit a decision boundary plane has become significantly Shimane Prefecture, southwestern Japan (Fig. 1). The more reliable (LeCun et al. 2015) which can successfully elevation varies from sea level to 1123 m (Table 1). The learn and extract patterns and unique features from big average annual rainfall recorded from the rainfall sta- data (Ayinde et al. 2019). Deep learning also can effectively tions at Fukumitsu, Oda, and Sakurae are 1657 mm, avoid local optimization and eliminates the need to set 1786 mm, and 2011 mm from 2008 to 2018 (Fig. 2). The model parameters because of autonomous processes cumulative rainfall for 2013 recorded from the rainfall (Zhang et al. 2017). At the moment, the core techniques of stations at Fukumitsu, Oda, and Sakurae are 2270 mm, deep learning are neural networks that have two or more 2102 mm, and 2656 mm, respectively (http://www.jma. hidden layers, including the following techniques: the adap- go.jp/jma/index.html). In this study, a total of 90 land- tive neuro-fuzzy inference system (Park et al. 2012); recur- slides were caused by extreme rainfall from May to rent neural networks (Chen et al. 2015); deep belief October 2013 (Table 2), and 69 of the landslides were networks (Huang and Xiang 2018); long short-term mem- triggered by extreme rainfall in August 2013. These ory (Xiao et al. 2018;Yangetal. 2019); and convolutional landslides can be described as shallow landslides that neural networks (Wang et al. 2019). Deep learning-based were determined based on field investigation. autoencoder is a semi-unsupervised learning method with no prior knowledge, such as landslide inventory, which Spatial data setting means that landslide and non-landslide labels and linear Landslide susceptibility prediction can be evaluated as a and non-linear correlation assumptions are not needed binary classification problem between landslides and non- (Huang et al. 2019). For landslide susceptibility assessment, landslides. A spatial database setting including landslide traditional methods for de-correlation are based on the pixel grid, non-landslide pixel grid, and related landslide- prior assumption that there are linear correlations between affecting factors is needed for statistical analysis (Huang landslides and non-landslides. However, landslide-affecting et al. 2019). This spatial database was divided into a train- factors are usually non-linear in practical applications. The ing dataset and a validation dataset. autoencoder driven by data rather than prior knowledge These real 90 of landslides and 90 of non-landslides arti- can transform raw data into non-linear correlated features. ficially generated from ArcGIS software were randomly In this paper, novel deep learning algorithms, namely, split into two parts with a ratio of 70% and 30%. Seventy both stacked autoencoder and sparse autoencoder com- percent of the landslide and non-landslide grid cells were bined with traditional machine learning, are proposed for selected for the training model, and the remaining 30% landslide susceptibility prediction. StAE and SpAE are un- were used for the validation model. Furthermore, the supervised learning as it does not require external labels landslide (event) and non-landslide (non - event) grid cells on landslides information. The encoding and decoding were set to 1 and 0, respectively, and the values of 1 and 0 process all happen in the dataset. The input and output were used for classification and prediction as the output data have the same number of dimensions, and the hidden variables of the landslide susceptibility prediction models. layer has fewer dimensions. Autoencoders are learned Thereafter, the calculated frequency ratio (FR) values were automatically from dataset, which is easy to train special- considered as numeric input variables of landslide suscep- ized instances of the algorithm that will perform well on a tibility prediction models. specific type of landslide-affecting factors. The autoencoder The landslide-affecting factors in study area are com- technique takes advantage of dimension reduction by plex, and it is difficult to confirm which affecting factors Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 3 of 16 Fig. 1 Study area and landslide inventory of Oda Citiy and Gotsu City in Shimane Prefecture, southwestern Japan (RGB color of Sentinel-2 satellite) Table 1 Description and frequency ratio (FR) of topographical and distance to factors in the study area Factors Values No. of Landslides FR Factors Values No. of Landslides FR Altitude (m) 0–105.77 59 2.15 Profile curvature −37.55 - (− 3.71) 3 0.84 105.78–215.95 21 0.80 −3.72 - (−1.18) 6 0.43 215.96–339.36 10 0.43 −1.19 - 1.03 39 0.90 339.37–550.90 0 0 1.04–3.87 36 1.53 550.91–1123.84 0 0 3.88–43.08 6 1.06 Slope (degree) 0–9.50 2 0.78 Dis. to stream < 101 58 1.64 9.51–19.00 30 1.11 101–200 21 0.73 19.01–28.21 29 1.08 201–300 8 0.44 28.22–38.00 24 0.92 301–400 3 0.46 38.01–73.40 5 0.71 > 401 0 0 Plan curvature −49.05 - (−3.81) 2 0.78 Dis. to road < 200 39 2.91 −3.82 - (−1.11) 12 1.11 201–400 17 1.38 −1.12 - 0.57 47 1.08 401–600 7 0.64 0.58–2.60 24 0.92 601–800 5 0.52 2.61–37.03 5 0.71 > 801 22 0.50 Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 4 of 16 Fig. 2 Annual rainfall from 2008 to 2018, and monthly rainfall of 2013 in the study area are the most important and necessary among the topo- The topographic factors were acquired and calculated graphic, geological, hydrological, distance to stream and based on the digital elevation model (DEM), with a spatial distance to road. In landslide susceptibility modeling, resolution of 10 m, including altitude, slope angle, plan landslides may reoccur under conditions similar to those curvature, profile curvature (Yilmaz et al. 2012), distance of past landslides (Westen et al. 2003; Lee and Talib 2005; to stream (Devkota et al. 2013; Guo et al. 2015), stream Dagdelenler et al. 2016). A total of 14 affecting factors power index (SPI) (Park and Kim 2019), and topographic were acquired and chosen as input variables for landslide wetness index (TWI) (Althuwaynee et al. 2016;Colkesen susceptibility models (Figs. 3, 4 and 5). et al. 2016). The distance to road (Alexakis et al. 2014;Roy Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 5 of 16 Table 2 Description and frequency ratio (FR) of remote sensing - derived index and hydrological factors in the study area Factors Values No. of Landslides FR Factors Values No. of Landslides FR NDVI −0.242 - 0.143 15 7.26 SPI −13.816 - (−8.806) 3 0.65 0.144–0.255 19 3.15 −8.805 - (−4.352) 14 0.83 0.256–0.332 30 1.40 −4.351 - 0.101 20 0.76 0.333–0.391 26 0.75 0.102–2.773 46 1.28 0.392–0.650 0 0 2.774–14.574 7 1.04 NDWI −0.324 - 0.046 1 0.22 TWI −7.969 - (−2.240) 17 0.82 0.047–0.125 22 2.19 −2.239 - 2.114 23 0.75 0.126–0.179 36 1.75 2.115–4.520 38 1.43 0.180–0.223 25 0.79 4.521–8.416 12 1.22 0.224–0.483 6 0.26 8.417–21.250 0 0 BI 0.248–0.312 4 0.86 Rainfall (mm) 1338.5–1424 13 0.54 0.313–0.322 10 0.45 (from May to Oct. in 2013) 1424.1–1528.5 5 0.36 0.323–0.329 33 0.97 1528.6–1633 2 0.17 0.330–0.339 34 1.50 1633.1–1718.5 47 1.69 0.340–0.436 9 1.41 1718.6–1823 23 1.82 and Saha 2019) was acquired from Geospatial Information provide valuable information for producing landslide sus- Authority of Japan (https://fgd.gsi.go.jp/download/menu. ceptibility maps, as quantitative measurement determined php). Normalized difference vegetation index (NDVI) by frequency ratio. Therefore, all 14 landslide-affecting (Chen et al. 2019a), normalized difference water index factors are utilized as input variables in the model to (NDWI) (Luo et al. 2019), and bare soil index (BI) (Huang evaluate their capabilities in performance and feature ex- et al. 2019) were derived from the Landsat TM 8 image traction for the landslide susceptibility assessment. data, resampled with a 10 m resolution (Zhu et al. 2018). The geological factors were derived from the 1:200,000 Methodology scale geological map, which was obtained from the Geo- This study was performed using the following main logical Survey of Japan, AIST (https://www.gsj.jp/en/). steps (Fig. 6): (1) correlation analysis between landslide These landslide-affecting factors were reflected using the inventory and landslide-affecting factors using frequency raster format with a spatial resolution of 10 × 10 m, which ratio, (2) landslide susceptibility prediction using SVM results in raster format that has the advantages of regular and RF models in machine learning, (3) landslide sus- shape, quick subdivision, and high modeling efficiency ceptibility prediction using StAE and SpAE employing (Huang et al. 2019). back propagation neural network in deep learning, (4) For continuous affecting factors, the Jenks natural break evaluation of StAE and SpAE combined with machine method was used to divide each continuous affecting learning acquired from a better prediction ratio between factor into five classes. Then the frequency ratio of all sub- SVM and RF, and (5) validation and comparison of pre- classes of each landslide affecting factor was calculated as dictive performance from the area under the curves and shown in Tables 1, 2 and 3. The frequency ratio allows landslide susceptibility maps produced by six models. that all 14 landslide-affecting factors have significant influ- The landslide samples were created after collecting and ences on landslide occurrence. Some studies have sug- preparing the landslide inventory map, the DEM derived gested that the correlations between affecting factors factors, and remote sensing and geological factors. The should be eliminated to reduce model noise for the land- landslide inventory samples were counted and used to slide susceptibility assessment (Hong et al. 2017; Lin et al. randomly generate non-landslide samples. The final data 2017;Chenet al. 2018a). However, the number of input combined the landslides and non-landslides samples variables of the deep learning algorithm is generally hun- with a defined label (1 and 0, respectively) for each sam- dreds or thousands due to their strong feature extraction ple. Fourteen landslide-affecting factors were prepared ability, and 14 input variables will not result in informa- from a spatial database. The values of the landslide- tion redundancy. On the other hand, some collinearity affecting factors at each sample location were utilized, phenomena between landslide-affecting factors can be tol- and the derived information was prepared using RStudio. erated by the fast-developed machine learning models The dependent variable was converted with one-hot en- (Huang et al. 2019). These 14 landslide-affecting factors coding. The data were then categorized into subsets: for Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 6 of 16 Fig. 3 Thematic maps of topographic factors (a-d) and distance to factors (e and f) considered in this study: a elevation (m), b slope angle (degree), c plan curvature, d profile curvature, e distance to stream (m), and f distance to road (m) Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 7 of 16 Fig. 4 Thematic maps of remote sensing - derived index (a-c) and hydrological factors (d-f) considered in this study: a NDVI, b NDWI, c BI, d SPI, e TWI, and f cumulative rainfall from March to October in 2013 (mm) Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 8 of 16 Fig. 5 Thematic maps of geological factors considered in this study: a geological age and b lithology training (70%) and validation (30%). The StAE and SpAE prediction rate than SVM model. In this study, the valid- model was trained in an unsupervised manner for feature ation of the proposed models was based on a well-known extraction, and a set of new features was generated. These area under the receiver operating characteristic curve. Par- new features were used to train StAE-bpnn and SpAE- ameter tuning was also utilized to assess better accuracy. bpnn in deep learning, and anomaly detection based StAE Finally, landslide susceptibility maps were generated using with RF and SpAE with RF which is selected as better equal interval function in ArcGIS 10.6 software. Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 9 of 16 Table 3 Description and frequency ratio (FR) of geological factors in the study area Factors Values No. of Ls FR Factors Values No. of Ls FR Geological age Unknown age 22 1.31 Lithology Felsic plutonic rocks 9 0.93 Triassic to Jurassic 8 0.41 Gabbro and diorite in accretionary complex 13 1.12 Present 14 3.09 Higher terrace 27 0.85 Permian 13 1.53 Lower terrace 16 0.82 Paleocene to Early Eocene 2 0.36 Mafic plutonic rocks 10 1.50 Middle to Late Miocene 2 1.37 Marine and non-marine sediments 13 3.94 Middle to Late Miocene 10 9.15 Non-alkaline felsic volcanic intrusive rocks 2 0.64 Middle Eocene 1 0.15 Non-alkaline felsic volcanic rocks 0 0 Late Pleistocene to Holocene 18 1.41 Non-alkaline mafic volcanic rocks 0 0 Late Pleistocene 0 0 Non-alkaline pyroclastic flow volcanic rocks 0 0 Late Miocene to Holocene 0 0 Sand dune deposits 0 0 Late Eocene to Early Oligocene 0 0 Schist 0 0 Late Cretaceous 0 0 Ultramafic rocks 0 0 Early to Middle Miocene 0 0 Volcanic debris 0 0 Early Pleistocene 0 0 Water 0 0 Early Miocene to Middle Miocene 0 0 0 0 Frequency ratio (FR) the ratio is less than 1, then the relationship will be The number of landslide pixel grids in each class is eval- weak. If the value is 1, it means an average correlation uated, and the frequency ratio for each factor class is (Meten et al. 2015). assigned by dividing the landslide ratio by the area ratio. The frequency ratio shows the correlation between land- Support vector machine (SVM) slides and affecting factors in a specific area. If this ratio Two main principles of SVM are the optimal classification is greater than 1, then the relationship between a land- hyperplane and the use of kernel features. The purpose of slide and the affecting factor’s class will be strong but if optimal sorting hyperplanes is to accurately distinguish Fig. 6 Flow chart of the research process Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 10 of 16 the two types of samples between landslides and non- can effectively prevent overfitting and gradient disappear- landslides while maximizing the sorting margin. Deter- ance (Huang et al. 2019). To initially achieve de- mining kernel function and optimal parameters are critical correlation among the 14 landslide-affecting factors, drop- for evaluating landslide susceptibility using SVM. Polyno- out was added to the input layer. mial kernels and radial basis function are the most com- The process of StAE is as follows. First, some of the monly used kernels in the literature (Huang and Zhao neurons in the network are randomly dropped in the 2018). To optimize two parameters, both penalty coeffi- mini-batch training samples and the remaining neurons cient C and kernel function parameters are needed in the are fed to the next layer. After obtaining this mini-batch SVM model. training sample, the deleted neurons are recovered and some neurons in the network are randomly deleted once Random forest (RF) again. The corresponding parameters are updated based The random forest, a classification tree algorithm with on the stochastic gradient descent method, performed repeated dichotomy data, can significantly reduce the on the neurons that have not been removed. computations required for classification and regression. In RF algorithms, predictive models are established by Results utilizing many decision trees. Based on randomly se- Landslide susceptibility modelling using the six models lected variables and samples, these trees and their deci- All models based on the deep learning and machine sions are generated. Once the model is established, the learning were coded in R language on RStudio. For the samples are first sorted individually according to all de- SVM model and RF model, parameters were determined cision trees in the model, and then by all trees (Huang using a 10-fold cross-validation approach. With radial and Zhao 2018). The proportion of decision tree esti- basis function, SVM model was acquired from grid mates and generates landslide susceptibility indexes, search for SVM parameter tuning. For RF model, it was which can predict landslide occurrence between all deci- composed of ‘mtry’ and ‘tree’, which were 3 and 300, re- sion trees in the RF model (Goetz et al. 2015). spectively. The autoencoder models based on the deep neural network were coded in R language on RStudio Stacked autoencoder (StAE) using H2O packages. These algorithms were performed The StAE is an artificial neural network, which is a spe- using hyperbolic tangent function (i.e., the tanh func- cial type of multi-layer perceptron. It is a type of un- tion) in every hidden layer which was used to encode supervised learning algorithm with an asymmetric and decode the input to the output in the undercom- structure, in which the middle layer represents the en- plete autoencoder. In the H2O library, five hidden layers coding of the input data in the bottleneck layer (Yu and with encoders and decoders were designed by using the Príncipe 2019). The bottleneck constrains the amount of tanh activation function in each layer. Stacked autoenco- information that can traverse the full network, forcing ders (StAE) were constructed by organizing autoencoder the learned compression of the input data. The StAE is on top of each other also known as deep autoencoder. trained to reconstruct the input of landslide-affecting StAE consists of multiple autoencoder stacked into mul- factors onto the output layer for feature representation, tiple layers where the output of each layer was wired to which prevents the simple copying of the data and the the inputs of the successive layers, as seen in Fig. 7, network. The middle layer has a lower dimension to which was composed of 80–50–2-50-80. To obtain good avoid overfitting, which can either select a subset of fea- parameters, StAE employed greedy layer-wise training. tures with the highest importance or apply some dimen- The benefit of StAE was that it can evaluate the benefits sion reduction techniques (Hinton and Salakhutdinov of deep network, which has greater expressive power. 2006; Charte et al. 2018). In this study, the StAE com- Furthermore, it usually can capture useful hierarchical bined with back propagation neural network was proc- grouping of the input. Finally, model construction was essed for a lower dimension of features than the input determined by the majority vote among all trees using data have, which can be used for learning the most im- RF models. The aim of sparse autoencoder (SpAE) was portant features of the data. to make a large number of neurons to have low average output so that neurons may be inactive most of the time. Sparse autoencoder (SpAE) The limitation of autoencoders to have only small num- The SpAE consists of an input layer, hidden layers, and an bers of hidden units can be overcome by adding a spars- output layer. Each layer in this neural network contains a ity constraint, where a large number of hidden units can sufficient number of neurons. Dropout can randomly clas- be utilized usually more than one input. Three hidden sify the weight of some implicit layer nodes and reduce layers with encoders and decoders were designed by the mutual dependence between nodes to realize the using the tanh activation function in each layer in the normalization of neural networks. Additionally, dropout H2O library. Sparsity can be achieved by introducing a Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 11 of 16 Fig. 7 Architecture of stacked autoencoder combined with random forest employing reconstruction error value by anomaly detection loss function during training or manually zeroing few low (0.0–0.2) index of the susceptibility map (Fig. 9d, e, f strongest hidden unit activations, which was composed and Table 4). of 200–200-200 (Fig. 8). For classification, model class was constructed by RF model by means of the majority Discussion vote among all trees. Reconstruction error value employ- Validation of prediction performance ing mean square error was used by means of anomaly The landslide susceptibility assessment was verified detection in both StAE and SpAE, which were 0.068 and using the area under the curve on the validation dataset 0.088, respectively. for six models. The predictive ratio for landslide suscep- tibility assessment is mainly calculated by confusion matrix. The true positive rate (TPR) is defined as the ra- Landslide susceptibility maps produced by the six models tio of true positive to the sum of true positive and false The landslide susceptibility maps were derived from negative, and the false positive rate (FPR) is defined as SVM, RF, StAE, SpAE, StAE with RF, and SpAE with RF the ratio of false positive to the sum of false positive and in the ArcGIS 10.6 software (Fig. 9). For better true negative to the number of validation samples visualization and comparison, the indices were reclassi- (Zhang and Wang 2019). In general, the true positive de- fied into five classes using the equal interval function: fines the landslide grid cells that are predictive as land- very low (0–0.2), low (0.2–0.4), moderate (0.4–0.6), high slides, true negative means non-landslide grid cells that (0.6–0.8), and very high (0.8–1). The susceptibility class are predictive as non-landslides, false-positive reflects area of the StAE model as the best performance (Table 4) non-landslide grid cells that are predictive as landslides, were 6.31%, 13.58%, 33.04%, 36.81%, and 10.26%, re- and false negative means landslide grid cells that are pre- spectively. The susceptibility class area of the RF model dictive as non-landslides (Huang et al. 2019). The area (Fig. 9b) and StAE model (Fig. 9c) has very high value. under the curve was applied to assess the prediction per- The susceptibility index value of the SVM model (Fig. formance of landslide susceptibility index values on the 9a) and StAE models (Fig. 9c) were prominent near the validation dataset. The prediction rate values of SVM, road (Fig. 3f). SpAE and SpAE with RF have lower values RF, StAE, SpAE, StAE with RF and SpAE with RF model of class area percentage for a very high (0.8–1.0) index are obtained by calculating the area under the prediction of the susceptibility map. RF and StAE have lower values rate curves. The StAE with RF and SpAE model of com- of class area percentage for a moderate (0.4–0.6) index bined classifier have relatively higher prediction rates of the susceptibility map. StAE with RF and SpAE with than using SVM, RF, StAE, and SpAE model of single RF have lower values of class area percentage for a very classifier (Fig. 10). This means that the classifiers Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 12 of 16 Fig. 8 Architecture of sparse autoencoder combined with random forest employing reconstruction error value by anomaly detection combined with both autoencoder and traditional ma- the result of the statistical analysis, as an increase in chine learning are better than using a single classifier. sample size, the result would be more acceptable. Ac- Autoencoder is unsupervised learning as it does not cording to Demoulin and Chung (2007), in spite of the require external labels on landslide information. The limited sample size using ten landslides in about 15 × encoding and decoding process all happen within the 15 km scale, Bayesian method in machine learning deliv- dataset. The input and output data have the same ered satisfying prediction rates. Heckmann et al. (2014) number of dimensions, and the hidden layer has state that small samples result in large standard errors fewer dimensions. Thus, it contains compressed infor- and wide confidence intervals for the population param- mation of the input layer, which is why it acts as a eters. In the case of regression parameters, small samples dimension reduction for the original input layer. From cause the estimation to be uncertain, and there is a the hidden layer, the neural network is able to decode higher risk of coefficients being insignificant when the the information to its original dimensions. Autoenco- respective confidence interval includes zero. With re- ders are learned automatically from data examples, spect to replicate sampling and model selection, it is ex- which is a useful property. It means that it is easy to pected that the diversity of models. However, increasing train specialized instances of the algorithm that will sample sizes causes standard errors and confidence in- perform well on a specific type of input. It does not tervals in parameter estimation to decrease. In a require any additional methods which are required for significance-based stepwise model selection, very large appropriate training data. samples are expected to facilitate the inclusion of more and more variables. Reichenbach et al. (2018) present Sample size that some articles did not use any landslide inventory, One of the challenges for landslide susceptibility map- which are based on the relative importance of the the- ping is to suggest the sample size on the number of matic maps as landslide-affecting factors (Adler and landslide inventories. Several articles have been reported Huffman 2007). In this study, all models obtained from to address adequate numbers of landslide inventories 84% to 93% prediction rate using 90 landslides (about needed to make acceptable landslide susceptibility map- 20 km square), which is similar to previous study ping where sample size varies from 0 to several thousand (Sabokbar et al. 2014) of different study area where 82 in different scales of study areas. The sample size affects landslides were used (about 24 km square). Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 13 of 16 Fig. 9 Landslide susceptibility maps of Shimane prefecture using (a) support vector machine, (b) random forest, (c) stacked autoencoder, (d) sparse autoencoder, (e) stacked autoencoder combined with RF, and (f) sparse autoencoder combined with RF Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 14 of 16 Table 4 Number of landslides occurred and percentage of landslide susceptibility class area Landslide SVM RF StAE SpAE StAE with RF SpAE with RF susceptibility No. % No. % No. % No. % No. % No. % class(index value) Very low (0–0.2) 15 17.00 1 11.80 5 35.65 7 30.16 2 6.31 3 5.82 Low (0.2–0.4) 4 19.61 2 10.45 1 11.09 10 19.07 6 13.58 8 14.38 Moderate (0.4–0.6) 7 25.83 4 8.43 3 10.52 15 24.73 16 33.04 13 35.96 High (0.6–0.8) 17 24.94 13 33.43 4 13.07 44 22.60 66 36.81 66 33.25 Very high (0.8–1.0) 47 12.62 70 35.89 77 29.67 14 3.44 0 10.26 0 10.59 Sum 90 100 90 100 90 100 90 100 90 100 90 100 Study limitation strategy to construct non-landslide for regression and In this study, all landslide points were obtained through classification, 3) scale of study area, 4) resolution of DEM, GPS by field investigation from May to October in 2013 5) relatively equal scatter distribution of landslide inven- without the aid of satellite imagery or unnamed aerial ve- tory in study area 6) setting boundary of study area to hicle (UAV). As seen in Fig. 2f, most landslide points were construct landslide-affecting factors, 7) reasonable selec- in the vicinity of human activity near the roads in the tion of landslide-affecting factors. To construct distinct mountains, not inside the mountainous area. The landslide landslide inventory with distinguishing landslide triggering inventory near the roads may affect landslide susceptibility factors between rainfall and earthquake is considered the maps (Fig. 9), which results in landslide susceptibility index most important key step than using any advanced classi- value near the roads higher than in other areas. fier for landslide susceptibility mapping. Landslide susceptibility mapping is based on the prob- ability of reoccurrence at the area where landslides already Conclusion occurred, unlike mapping physically based on modeling, In this study, the classifiers combined with both deep which relies on as follows: 1) the number of abundant learning and traditional machine learning, StAE with RF landslide inventories for statistical analysis, 2) sampling and SpAE with RF models, are proposed for landslide Fig. 10 The area under the curves for prediction ratio and validation of landslide susceptibility maps produced by the six models Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 15 of 16 susceptibility prediction. The autoencoder consists of in- neural network in a tertiary region of Ambon, Indonesia. Geomorphology 318:101–111 put layers for raw data, hidden layers for feature extrac- Alessandro T, Carla I, Carlo E, Gabriele SM (2015) Comparison of logistic tion, and output layers for landslide susceptibility regression and random forests techniques for shallow landslide susceptibility prediction. The combined classifiers have the advantage assessment in Giampilieri (NE Sicily, Italy). Geomorphology 249:119–136 Alexakis D, Agapiou A, Tzouvaras M, Themistocleous K, Neocleous K, Michaelides of both machine learning and deep learning, i.e., dimen- S, Hadjimitsis D (2014) Integrated use of GIS and remote sensing for sion reduction of the StAE model and dropout of the monitoring landslides in transportation pavements: the case study of Paphos SpAE model for feature extraction. area in Cyprus. Nat Hazards 72:119–141 Althuwaynee OF, Pradhan B, Lee S (2016) A novel integrated model for assessing The six models were applied in Oda City and Gotsu landslide susceptibility mapping using CHAID and AHP pair-wise comparison. City, Shimane Prefecture, southwestern Japan. The cor- Int J Remote Sens 37(5):1190–1209 relation between landslides and landslide-affecting fac- Arnone E, Francipane A, Scarbaci A, Puglisi C, Noto LV (2016) Effect of raster resolution and polygon-conversion algorithm on landslide susceptibility tors using frequency ratio was high in NDVI, distance to mapping. Environ Model Softw 84:467–481 road, and altitude. Performance assessment was carried Ayalew L, Yamagishi H (2005) The application of GIS-based logistic regression for out with the SVM, RF, StAE, SpAE, StAE with RF, and landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan. Geomorphology 65:15–31 SpAE with RF models. The results show that the pro- Ayinde BO, Inanc T, Zurada JM (2019) Regularizing deep neural networks by posed StAE with RF and SpAE with RF models have a enhancing diversity in feature extraction. IEEE Transa Neural Netw Learn Syst relatively better prediction rate than a single classifier 30(9):1–12 Bai S, Wang J, Lü G, Zhou P, Hou S, Xu S (2010) GIS-based logistic regression for such as SVM, RF, StAE and SpAE models. In conclusion, landslide susceptibility mapping of the Zhongxian segment in the three the proposed combined classifier is promising for classi- gorges area, China. Geomorphology 115:23–31 fication between landslide and non-landslide following Ballabio C, Sterlacchini S (2012) Support vector machines for landslide susceptibility mapping: the Staffora River basin case study, Italy. Math Geosci landslide susceptibility prediction because it can over- 44(1):47–70 come the limitations of conventional machine learning Borrelli L, Ciurleo M, Gullà G (2018) Shallow landslide susceptibility assessment in algorithms, extract features and pattern recognition, re- granitic rocks using GIS-based statistical methods: the contribution of the duce computations, and improve performance. weathering grade map. Landslides 15(6):1127–1142 Charte D, Charte F, García S, Jesus MJ, Herrera F (2018) A practical tutorial on autoencoders for nonlinear feature fusion: taxonomy, models, software and Acknowledgements guidelines. Inf Fusion 44:78–96 The authors express their sincere gratitude to Zili DAI (Shimane University, Chen H, Zeng Z, Tang H (2015) Landslide deformation prediction based on Matsue, Japan), Prakash DHUNGANA (Shimane University, Matsue, Japan), recurrent neural network. Neural Process Lett 41(2):169–178 Ran LI (Shimane University, Matsue, Japan), Rong Zhou (Shimane University, Chen W, Panahi M, Tsangaratos P, Shahabi H, Ilia I, Panahi S, Li S, Jaafari A, Ahmadg Matsue, Japan), and Akinori IIO (Shimane University, Matsue, Japan), for their BB (2019a) Applying population-based evolutionary algorithms and a neuro- kind assistance. The authors also acknowledge financial support from fuzzy system for modeling landslide susceptibility. Catena 172:212–231 Shimane University, Japan. Chen W, Peng J, Hong H, Shahabi H, Pradhan B, Liu J, Zhu AX, Pei X, Duan Z (2018a) Landslide susceptibility modelling using GIS-based machine learning Authors’ contributions techniques for Chongren county, Jiangxi province, China. Sci Total Environ FW conducted field investigation in 2013 and provided guidance in the 626:1121–1135 study area of landslides triggered by extreme rainfall in Shimane Prefecture, Chen W, Sun Z, Han J (2019b) Landslide susceptibility modeling using integrated southwestern Japan. KN carried out the landslide susceptibility assessment ensemble weights of evidence with logistic regression and random forest and produced landslide susceptibility maps using deep learning combined models. Appl Sci 9(1):171 with machine learning. Both authors read and approved the final Chen W, Zhang S, Li R, Shahabi H (2018b) Performance evaluation of the GIS- manuscript. based data mining techniques of best-first decision tree, random forest, and naïve Bayes tree for landslide susceptibility modeling. Sci Total Environ 644: Funding 1006–1018 The study was financially supported by funding awarded under the study, Ciampalini A, Raspini F, Bianchini S, Frodella W, Bardi F, Lagomarsino D, Traglia F, “Initiation and motion mechanisms of long runout landslides due to rainfall Moretti S, Proietti C, Pagliara P, Onori R, Corazza A, Duro A, Basile G, Casagli N and earthquake in the falling pyroclastic deposit slope area” (JSPS-B- (2015) Remote sensing as tool for development of landslide databases: the case 19H01980, Principal Investigator: Fawu Wang). of the Messina Province (Italy) geodatabase. Geomorphology 249:103–118 Colkesen I, Sahin EK, Kavzoglu T (2016) Susceptibility mapping of shallow Availability of data and materials landslides using kernel-based Gaussian process, support vector machines and The DEM data utilized in this study are freely available from the Geospatial logistic regression. J Afr Earth Sci 118:53–64 Information Authority of Japan (https://fgd.gsi.go.jp/download/menu.php). Corominas J, Van Westen C, Frattini P, Cascini L, Malet J, Fotopoulou S, Catani F, The Landsat 8 satellite image are acquired from the United States Geological Van Den Eeckhaut M, Mavrouli O, Agliardi F, Pitilakis K, Winter M, Pastor M, Survey (USGS) (https://earthexplorer.usgs.gov). All six models for statistical Ferlisi S, Tofani V, Hervás J, Smith J (2013) Recommendations for the computing and data visualization are coded by R (ver. 3.6.1) open-source quantitative analysis of landslide risk. Bull Eng Geol Environ 73(2):209–263 software (https://www.r-project.org/). Dagdelenler G, Nefeslioglu HA, Gokceoglu C (2016) Modification of seed cell sampling strategy for landslide susceptibility mapping: an application from Competing interests the eastern part of the Gallipoli peninsula (Canakkale, Turkey). Bull Eng Geol The authors declare that they have no competing interests. Environ 75(2):575–590 Demoulin A, Chung C (2007) Mapping landslide susceptibility from small datasets: a Received: 13 November 2019 Accepted: 21 January 2020 case study in the pays de Herve (E Belgium). Geomorphology 89:391–404 Devkota KC, Regmi AD, Pourghasemi HR, Yoshida K, Pradhan B, Ryu IC, Dhital MR, Althuwaynee OF (2013) Landslide susceptibility mapping using certainty References factor, index of entropy and logistic regression models in GIS and their Aditian A, Kubota T, Shinohara Y (2018) Comparison of GIS-based landslide comparison at Mugling-Narayanghat road section in Nepal Himalaya. Nat susceptibility models using frequency ratio, logistic regression, and artificial Hazards 65(1):135–165 Nam and Wang Geoenvironmental Disasters (2020) 7:6 Page 16 of 16 Di Martire D, Tessitore S, Brancato D, Ciminelli MG, Costabile S, Costantini M, Segoni S, Tofani V, Rosi A, Catani F, Casagli N (2018) Combination of rainfall Graziano GV, Minati F, Ramondini M, Calcaterra D (2016) Landslide detection thresholds and susceptibility maps for dynamic landslide hazard assessment integrated system (LaDIS) based on in-situ and satellite SAR interferometry at regional scale. Front Earth Sci 6(85):1–11 measurements. Catena 137:406–421 Tien Bui D, Ho TC, Pradhan B, Pham BT, Nhu VH, Revhaug I (2016) GIS-based modeling Fell R, Corominas J, Bonnard C, Cascini L, Leroi E, Savage W (2008) Guidelines for of rainfall-induced landslides using data mining-based functional trees classifier with landslide susceptibility, hazard and risk zoning for land use planning. Eng AdaBoost, bagging, and MultiBoost ensemble frameworks. Environ Earth Sci 75:1–22 Geol 102(3–4):85–98 Tien Bui D, Pradhan B, Lofman O, Revhaug I (2012) Landslide susceptibility assessment in Vietnam using support vector machines, decision tree, and Goetz JN, Brenning A, Petschko H, Leopold P (2015) Evaluating machine learning Naïve Bayes models. Math Probl Eng 2012:1–26 and statistical prediction techniques for landslide susceptibility modeling. Tsangaratos P, Ilia I (2016) Comparison of a logistic regression and naïve Bayes Comput Geosci 81:1–11 classifier in landslide susceptibility assessments: the influence of models Guo C, David RM, Zhang Y, Wang K, Yang Z (2015) Quantitative assessment of complexity and training dataset size. Catena 145:164–179 landslide susceptibility along the Xianshuihe fault zone, Tibetan plateau, Wang Y, Fang Z, Hong H (2019) Comparison of convolutional neural networks for China. Geomorphology 248:93–110 landslide susceptibility mapping in Yanshan County, China. Sci Total Environ Heckmann T, Gegg K, Gegg A, Becht M (2014) Sample size matters: investigating 666:975–993 the effect of sample size on a logistic regression susceptibility model for Westen CJV, Rengers N, Soeters R (2003) Use of geomorphological information in debris flows. Nat Hazards Earth Syst Sci 180:259–278 indirect landslide susceptibility assessment. Nat Hazards 30(3):399–419 Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with Xiao L, Zhang Y, Peng G (2018) Landslide susceptibility assessment using integrated neural networks. Science 313:504–507 deep learning algorithm along the China-Nepal highway. Sensors 18:1–13 Hong H, Ilia I, Tsangaratos P, Chen W, Xu C (2017) A hybrid fuzzy weight of Xu C, Dai F, Xu X, Lee YH (2012) GIS-based support vector machine modeling of evidence method in landslide susceptibility analysis on the Wuyuan area, earthquake-triggered landslide susceptibility in the Jianjiang River watershed, China. Geomorphology 290:1–16 China. Geomorphology 145-146:70–80 Hong H, Pourghasemi HR, Pourtaghi ZS (2016) Landslide susceptibility Yang BB, Yin KL, Lacasse S, Liu ZQ (2019) Time series analysis and long short- assessment in Lianhua County (China): a comparison between a random term memory neural network to predict landslide displacement. Landslides forest data mining technique and bivariate and multivariate statistical 16(4):677–694 models. Geomorphology 259:105–118 Yao X, Tham LG, Dai FC (2008) Landslide susceptibility mapping based on Hong Y, Adler RF, Huffman GJ (2007) Use of satellite remote sensing data in the support vector machine: a case study on natural slopes of Hong Kong, mapping of global landslide susceptibility. Nat Hazards 43(2):245–256 China. Geomorphology 101:572–582 Huang F, Zhang J, Zhou C, Wang Y, Huang J, Zhu L (2019) A deep learning Yeon Y, Han J, Ryu K (2010) Landslide susceptibility mapping in Injae, Korea, algorithm using a fully connected sparse autoencoder neural network for using a decision tree. Eng Geol 116:274–283 landslide susceptibility prediction. Landslides 17:217–229 Yilmaz C, Topal T, Süzen ML (2012) GIS-based landslide susceptibility mapping Huang L, Xiang LY (2018) Method for meteorological early warning of using bivariate statistical analysis in Devrek (Zonguldak-Turkey). Environ Earth precipitation-induced landslides based on deep neural network. Neural Sci 65(7):2161–2178 Process Lett 48(2):1243–1260 Yilmaz I (2010) Comparison of landslide susceptibility mapping methodologies Huang Y, Zhao L (2018) Review on landslide susceptibility mapping using for Koyulhisar, Turkey: conditional probability, logistic regression, artificial support vector machines. Catena 165:520–529 neural networks, and support vector machine. Environ Earth Sci 61:821–836 LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436 Yu S, Príncipe JC (2019) Understanding autoencoders with information theoretic Lee S, Talib JA (2005) Probabilistic landslide susceptibility and factor effect concepts. Neural Netw 117:104–123 analysis. Environ Geol 47(7):982–990 Zhang R, Isola P, Efros AA (2017) Split-brain autoencoders: unsupervised learning Lin GF, Chang MJ, Huang YC, Ho JY (2017) Assessment of susceptibility to rainfall by cross-channel prediction. Paper presented at the Computer Vision & induced landslides using improved self-organizing linear output map, Pattern Recognition support vector machine, and logistic regression. Eng Geol 224:62–74 Zhang S, Wang FW (2019) Three-dimensional seismic slope stability assessment Luo X, Lin F, Zhu S, Yu M, Zhang Z, Meng L, Peng J (2019) Mine landslide with the application of Scoops3D and GIS: a case study in Atsuma, Hokkaido. susceptibility assessment using IVM, ANN and SVM models considering the Geoenvironmental Disasters 6:1–14 contribution of affecting factors. Public Libr Sci 14(4):1–18 Zhu X, Miao Y, Yang L, Bai S, Liu J, Hong H (2018) Comparison of the presence- Meten M, Prakash B, Yatabe R (2015) Effect of landslide factor combinations on only method and presence-absence method in landslide susceptibility the prediction accuracy of landslide susceptibility maps in the Blue Nile mapping. Catena 171:222–233 gorge of Central Ethiopia. Geoenvironmental Disasters 2:1–17 Park I, Choi J, Lee M, Lee S (2012) Application of an adaptive neuro-fuzzy inference system to ground subsidence hazard mapping. Comput Geosci 48:228–238 Publisher’sNote Park S, Hamm S, Kim J (2019) Performance evaluation of the GIS-based data- Springer Nature remains neutral with regard to jurisdictional claims in mining techniques decision tree, random forest, and rotation forest for published maps and institutional affiliations. landslide susceptibility modeling. Sustainability 11(20):5659 Park S, Kim J (2019) Landslide susceptibility mapping based on random forest and boosted regression tree models, and a comparison of their performance. Appl Sci 9(5):942 Pradhan B, Lee S, Buchroithner MF (2010) GIS-based back-propagation neural network model and its cross-application and validation for landslide susceptibility analyses. Comput Environ Urban Syst 34:216–235 Reichenbach P, Rossi M, Malamud BD, Mihir M, Guzzetti F (2018) A review of statistically-based landslide susceptibility models. Earth Sci Rev 14:60–91 Roy J, Saha S (2019) Landslide susceptibility mapping using knowledge driven statistical models in Darjeeling District, West Bengal, India. Geoenvironmental Disasters 6:1–18 Sabokbar HF, Roodposhti MS, Tazik E (2014) Landslide susceptibility mapping using geographically-weighted principal component analysis. Geomorphology 226:15–24 Saito H, Nakayama D, Matsuyama H (2009) Comparison of landslide susceptibility based on a decision-tree model and actual landslide occurrence: the Akaishi Mountains, Japan. Geomorphology 109(3):108–121

Journal

Geoenvironmental DisastersSpringer Journals

Published: Jan 30, 2020

There are no references for this article.