DeepDyve requires Javascript to function. Please enable Javascript on your browser to continue.
An Optimization of Statistical Index Method Based on Gaussian Process Regression and GeoDetector, for Higher Accurate Landslide Susceptibility Modeling
An Optimization of Statistical Index Method Based on Gaussian Process Regression and GeoDetector,...
Cheng, Cen;Yang, Yang;Zhong, Fengcheng;Song, Chao;Zhen, Yan
2022-10-11 00:00:00
applied sciences Article An Optimization of Statistical Index Method Based on Gaussian Process Regression and GeoDetector, for Higher Accurate Landslide Susceptibility Modeling 1 , 2 , 3 1 , 2 , 3 , 3 , 4 5 1 , 2 Cen Cheng , Yang Yang *, Fengcheng Zhong , Chao Song and Yan Zhen State Key Laboratory of Oil and Gas Reservoir Geology and Exploitation, Southwest Petroleum University, Chengdu 610500, China School of Geosciences and Technology, Southwest Petroleum University, Chengdu 610500, China Spatial Information Technology and Big Data Mining Research Center, Southwest Petroleum University, Chengdu 610500, China Sichuan Xinyang Anchuang Technology Co., Ltd., Chengdu 610500, China HEOA Group, West China School of Public Health and West China Fourth Hospital, Sichuan University, Chengdu 610044, China * Correspondence: yy_swpu@swpu.edu.cn; Tel.: +86-18681279063 Abstract: Landslide susceptibility assessment is an effective non-engineering landslide prevention at the regional scale. This study aims to improve the accuracy of landslide susceptibility assessment by using an optimized statistical index (SI) method. A landslide inventory containing 493 historical landslides was established, and 20 initial influencing factors were selected for modeling. First, a combination of GeoDetector and recursive feature elimination was used to eliminate the redundant factors. Then, an optimization method for weights of SI was adopted based on Gaussian process regression (GPR). Finally, the predictive abilities of the original SI model, the SI model with optimized factors (GD-SI), and the SI model with optimized factors and weights (GD-GPR-SI) were compared Citation: Cheng, C.; Yang, Y.; Zhong, and evaluated by the area under the receiver operating characteristic curve (AUC) on the testing F.; Song, C.; Zhen, Y. An Optimization of Statistical Index datasets. The GD-GPR-SI model has the highest AUC value (0.943), and the GD-SI model (0.936) also Method Based on Gaussian Process has a higher value than the SI model (0.931). The results highlight the necessity of factor screening Regression and GeoDetector, for and weight optimization. The factor screening method used in this study can effectively eliminate Higher Accurate Landslide factors that negatively affect the SI model. Furthermore, by optimizing the SI weights through GPR, Susceptibility Modeling. Appl. Sci. more reasonable weights can be obtained for model performance improvement. 2022, 12, 10196. https://doi.org/ 10.3390/app122010196 Keywords: landslide susceptibility; statistical index; Gaussian process regression; GeoDetector; Academic Editor: Fernando Rocha recursive feature elimination Received: 18 August 2022 Accepted: 3 October 2022 Published: 11 October 2022 1. Introduction Publisher’s Note: MDPI stays neutral Landslide is a natural disaster that can be defined as the movement of rock, dirt, with regard to jurisdictional claims in or debris down a slope [1]. Landslides are common around the world and commonly published maps and institutional affil- occur in mountainous areas, posing varying degrees of threat to people’s life and property iations. safety [2]. Froude and Petley [3] conducted a temporal and spatial analysis of the global data set of fatal non-seismic landslides from January 2004 to December 2016. Their data showed that 55,997 people were killed in 4862 different landslide events, with Asia being the major region suffering from landslide disasters. In addition, the number of landslides Copyright: © 2022 by the authors. caused by human activities is increasing. Landslide susceptibility mapping (LSM) is an Licensee MDPI, Basel, Switzerland. effective risk assessment method used for landslide prevention and control. In recent This article is an open access article years, various models have been applied to landslide susceptibility mapping. Improving or distributed under the terms and innovating these models to obtain more accurate mapping is a major difficulty in landslide conditions of the Creative Commons susceptibility assessment studies [4]. Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ At present, quantitative models applied to landslide susceptibility assessment can 4.0/). be divided into four categories: physical-based models, opinion-driven models, bivariate Appl. Sci. 2022, 12, 10196. https://doi.org/10.3390/app122010196 https://www.mdpi.com/journal/applsci Appl. Sci. 2022, 12, 10196 2 of 28 statistical models, and machine learning-based models [5]. Physical-based models are suitable for local-area scale mapping and analysis and have a strong landslide warning abil- ity [6]. However, because of the large amount of required field survey data, the evaluation process is complicated and expensive, making it unsuitable for landslide risk evaluation in large-scale areas [7]. Opinion-driven models such as the analytical hierarchy process [8], step-wise assessment ratio analysis [9], and analytical network process [10] have also been applied in numerous landslide susceptibility studies. In these models, evaluation is based on existing expert knowledge, and the evaluation process does not follow a consistent standard, making quantifying the results difficult. Bivariate statistical models are infor- mation mining methods based on statistics, such as frequency ratio (FR) [11,12], statistical index (SI) [13], certainty factor (CF) [14], and evidence weight [15]. This type of model is straightforward to implement, easy to understand, and has satisfactory prediction perfor- mance. More recently, due to the growing development and maturity of big data mining techniques, machine learning has become a hotspot in the field of landslide susceptibility research owing to its powerful data analysis and prediction abilities. In essence, machine learning and multivariate statistical analysis intersect. Further examples including logistic regression (LR) [16,17], random forest [18–20], support vector machine [21,22], artificial neural network [23,24], and other algorithms, have been applied in landslide susceptibility assessment with advanced prediction performance. In addition to the above models, hybrid methods utilizing multiple model types also achieved excellent performance [25,26]. Statistical models and machine learning-based models are the most widely used quan- titative analysis models. However, both types of models have specific disadvantages. Although machine learning-based algorithms have high predictive accuracy, their underly- ing rules are complicated and difficult to express intuitively. Hence, they are not conducive to analyzing the relationship between landslides and factors [27,28]. Bivariate statistical models overcome this problem [29,30], but they employ a certain irrationality in the distri- bution of weights, which decreases their predictive accuracy. According to Tobler ’s first law of geography, objects that are close to each other in geographical space are also more closely related [31]. For models such as SI and FR, each class has the same weight. For continuous factors such as altitude, this leads to sudden changes in the weights at the boundary of different classes, and similar factor values have completely different weights. In the same class, different factor values have the same weight, which is unreasonable [9,32,33]. Model quality is directly related to the accuracy of its evaluation, but the selection of influencing factors also affects landslide susceptibility evaluation results [34]. At present, popular factor screening methods include the information gain ratio [35], variance inflation factors [36], recursive feature elimination (RFE) [37], rough set [38], principal component analysis [39], Pearson correlation coefficient [40], and Spearman correlation coefficient [41]. In addition, the GeoDetector method proposed by Wang et al., (2010) effectively uses spatial information of data to identify the primary factors affecting a certain phenomenon [42,43]. This has been innovatively applied to landslide susceptibility analysis [44,45]. This study aims to develop a hybrid optimization method for the SI model. This method optimizes the SI weight through GPR, which can avoid the irrationality of the bivariate statistical model mentioned above and improve the accuracy of landslide suscep- tibility assessment. In addition, the integration of GeoDetector and RFE is used to further optimize landslide influencing factors used for modeling. The area along Duwen highway in Sichuan Province, China, was used as the study area. A landslide inventory was created, and the overall performance of the SI model, SI model with optimized factors (GD-SI), and SI model with optimized factors and weights (GD-GPR-SI) were compared and analyzed. 2. Materials 2.1. Study Area The study area stretches along the Duwen Highway (see Figure 1), located in Sichuan 0 0 0 Province, China. Its geographic coverage is 103 36 E–103 64 E longitude and 30 94 N– 0 2 31 52 N latitude, with an area of 922 km . The Minjiang River, an important branch of the Appl. Sci. 2022, 12, x FOR PEER REVIEW 3 of 30 2. Materials 2.1. Study Area The study area stretches along the Duwen Highway (see Figure 1), located in Sichuan Province, China. Its geographic coverage is 103°36′ E–103°64′ E longitude and 30°94′ N– Appl. Sci. 2022, 12, 10196 3 of 28 31°52′ N latitude, with an area of 922 km . The Minjiang River, an important branch of the upper reaches of the Yangtze River, is the main river in the study area. Many hydropower structures have been built along this river to provide energy for nearby areas. The Duwen upper reaches of the Yangtze River, is the main river in the study area. Many hydropower Highway is built along the basin. In addition, many roads are distributed throughout the structures have been built along this river to provide energy for nearby areas. The Duwen study area. On 12 May 2008, an earthquake with a magnitude of Ms 8.0 occurred in the Highway is built along the basin. In addition, many roads are distributed throughout the study area, leading to a large number of secondary disasters, including a large number of study area. On 12 May 2008, an earthquake with a magnitude of Ms 8.0 occurred in the landslides [46]. study area, leading to a large number of secondary disasters, including a large number of landslides [46]. Figure Figure 1.1.Landslide Landslide inventory inventory map map and and location locatioof n of the the study stud ar y ea: area: (a)(location a) locatio of n Sichuan of SichuPr an ovince Province in China; in Chi( n b a); location (b) location of of the study the stuar dy ea; are (ca ) ; study (c) study area area and a landslide nd landsli inventory de inventmap. ory map. The altitude in the study area varies significantly. The lowest altitude is ~734 m, and The altitude in the study area varies significantly. The lowest altitude is ~734 m, and the highest altitude is ~5280 m, providing favorable conditions for landslide formation [47]. the highest altitude is ~5280 m, providing favorable conditions for landslide formation The study area has a continental monsoon climate. The annual rainfall is 800–1300 mm [45]. [47]. The study area has a continental monsoon climate. The annual rainfall is 800–1300 There is a wide range of stratigraphic outcrops in the study area, primarily Triassic in mm [45]. There is a wide range of stratigraphic outcrops in the study area, primarily Tri- age. The area has good vegetation coverage and is primarily covered with forests. Hard assic in age. The area has good vegetation coverage and is primarily covered with forests. rocks are mainly distributed in the north and middle of the study area, while soft rocks are Hard rocks are mainly distributed in the north and middle of the study area, while soft primarily distributed in the southern regions. In addition, the exposed bedrock is primarily rocks are primarily distributed in the southern regions. In addition, the exposed bedrock composed of granite, diorite, limestone, phyllite, sandstone, and granite [48]. is primarily composed of granite, diorite, limestone, phyllite, sandstone, and granite [48]. 2.2. Landslide Inventory An accurate landslide inventory map is the basis for effective landslide susceptibility assessment [35]. Landslide data in this study originates from a 0.5 m resolution multi- band remote sensing image obtained by the Pleiades satellite in 2014. Based on remote sensing image interpretation and field investigation verification, 493 historical landslides were identified in the study area. According to the Varnes classification system [49], the Appl. Sci. 2022, 12, 10196 4 of 28 landslides in the study area mainly belong to rock fall, and a small part of them belong to debris fall and debris flow. The total landslide area is 15.6 km , accounting for 1.69% of the study area. The average area, maximum area, and minimum area of landslide are 2 2 2 0.032 km , 0.991 km , and 0.00041 km respectively. Roads in the study area are the main infrastructure that suffers from landslide damage, causing enormous economic losses. In this study, the geometric center of the landslide surface is taken as the landslide point. According to the data and prior knowledge, a 30 m 30 m grid was selected as the basic evaluation unit. Consequently, 1,024,455 grids were created for the study area, and 493 landslide points were located in different evaluation units, with a total of 493 landslide units. By random sampling, 70% (345 landslides) of landslides were used as training data for modeling, while the other 30% (148 landslides) were used for testing. A landslide inventory map was established using these data (see Figure 1a). 2.3. Landslide Influencing Factors The selection of influencing factors is a key step in landslide susceptibility model- ing [30]. The formation mechanism of a landslide is complicated, and its occurrence is the result of numerous factors [36,50]. Factors affecting the emergence of a landslide vary with different study areas. Therefore, at present, there is no definite rule for the selec- tion of landslide influencing factors [33,51]. According to previous studies [5,44,45,47,52] and data availability, the landslide influencing factors in the study area are divided into four categories, and 20 factors were selected as the initial factors. These include topo- graphic factors (altitude, slope, aspect, plan curvature, profile curvature, degree of relief, and topographic wetness index (TWI)), geological factors (lithology, seismic intensity, dis- tance from fault zones, and stratigraphy), ecological factors (distance from main rivers, distance from streams, annual rainfall, normalized difference vegetation index (NDVI), land cover, and soil erosion intensity), and factors related to human engineering activities (distance from roads, residential kernel density, and distance from hydropower stations). Land cover data originates from GlobeLand30 (http://www.globallandcover.com/, ac- cessed on 21 April 2021), and the NDVI data originates from Geospatial Data Cloud (http://www.gscloud.cn/, accessed on 7 August 2021). Topographic factors including altitude, plan curvature, profile curvature, slope, aspect, degree of relief, and TWI, were derived from a digital elevation model (DEM) with a 30 m resolution. All other factor data including the DEM were provided by the Sichuan Province Bureau of Surveying, Mapping, and Geoinformation, China. In this study, ArcGIS (version 10.7.1, ESRI, Redlands, CA, USA) software was used to overlay all factor layers with the landslide inventory map and then calculate the dis- tance from roads, rivers, faults, and hydropower stations to each grid. Subsequently, all continuous factors were reclassified according to previous studies and prior knowledge. The equal interval method was used to classify distance factors (such as rivers and roads, and this method was also applied to annual rainfall due to the availability of data). Specific factors, including plan curvature, profile curvature, and aspect, were classified based on the experience provided by previous studies [9,30,53]. Other factors were classified using the Jenks natural breaks method. Table 1 shows the specific classification of each factor, and Figure 2 shows the reclassified factor layers. Appl. Sci. 2022, 12, 10196 5 of 28 Table 1. Classification of landslide influencing factors. Category Reclassification Factor Data Type Class Attribution Method 1. 734–1000; 2. 1000–1400; 3. 1400–1800; 4. Altitude (m) Continuous Equal interval 1800–2200; 5. 2200–2600; 6. >2600 1. 0–12.58; 2. 12.58–27.06; 3. 27.06–36.79; 4. Slope ( ) Continuous Jenks natural breaks 36.79–44.57; 5. 44.57–52.98; 6. >52.98 1. Flat; 2. North; 3. Northeast; 4. East; 5. Aspect Continuous Expert knowledge Southeast; 6. South; 7. Southwest; 8. West; 9. Northwest Topographic 1. <