Table of Contents
- Abstract
- Introduction
- Research method
- Generalized Method of Moments Estimation for Spatial Autoregressive Model
- Geographically Weighted Regression
- Logistic Regression
- Multivariate Linear Regression
- Evaluation methods
- Selection of optimal factors using the improved PSO algorithm
- Study Area Location
- Effective information layers
- Results
- Logistic Regression
- Discussion
- Conclusion
- References
Using GMM-SAR model in combination with improved PSO algorithm for identifying and analyzing spatially effective factors in landslide susceptibility mapping: a case study in Padena, Semirom, Isfahan, Iran
Abstract
In the present study, Generalized Method of Moments estimation for Spatial Autoregressive model (GMM-SAR), Geographically Weighted Regression (GWR) with tri-cube distance weighting kernel, Logistic Regression (LR) model and Multivariate Linear Regression (MLR) model in combination with improved Particle Swarm Optimization (PSO) algorithm were compared for their accuracy and performance in landslide susceptibility mapping. Therefore, 11 information layers of slope, aspect, plan curvature, profile curvature, distance from faults, distance from residential areas, distance from roads, distance from rivers, lithology, land use, and rainfall were used as effective factors in landslides occurrence in the study area.
A total of 68 occurred landslides were identified in the study area using historical reports, interpretation of aerial photographs, and field surveys. Using the combination of the above-mentioned algorithms with the improved PSO algorithm, the landslide susceptibility maps were prepared for the study area. The accuracy of the prepared models was evaluated using the Receiver Operating Characteristic (ROC) curve. The coefficient of determination (R2), the root mean square error (RMSE), and the Normal Root Mean Square Error (NRMSE) were calculated for all methods.
The results showed that the GWR algorithm with tri-cube distance weighting kernel exhibited the highest (0.9556), followed by GMM-SAR (0.8395), LR (0.5373) and MLR (0.5052) models. Similarly, ROC plots and RMSE values also showed that the prediction rates gave similar results. Therefore, it can be concluded that, in this study, the GWR model with tri-cube distance weighting kernel exhibits the best performance in landslide susceptibility mapping.
Keywords: Landslide, Generalized method of moments estimation, Spatial autoregressive model, Geographically weighted regression, Logistic regression, Multivariate linear regression
Introduction
Landslides are among the most complex natural phenomena that threaten the lives and properties of humans around the world. About 60 percent of Iran’s area is covered by mountains and hills. This mainly mountainous topography is a prone environment for landslides formation, which along with active and seismic tectonic, diverse geological, climatic conditions, provide the most suitable conditions for landslide occurrences in the country.
To reduce the risks of a landslide, various researchers have attempted to subdivide the landslide risk by using different methods. In the landslide susceptibility mapping, the earth surface is divided into special and separate areas of actual or potential degrees of risk with no degree to a high degree of risk. The landslide zonation map can be the basis for future actions and planning for development at regional and local levels. The process of preparing a zonation map is based on the recognition of natural features and quantitative modeling based on the data of the study area.
Although the use of remote sensing and GIS in for casting the landslides dates back to the 1970s, the statistical methods have been used in the landslide hazard prediction since the 1990s. The methods used to evaluate and prepare a landslide hazard zonation map can be divided into four categories (Youssef et al. 2016): Heuristic, Deterministic, Statistical and combination of these methods.
Heuristic techniques often require the specialist knowledge and hint to assign a weight and determine the impact of different factors. These methods usually depend on the study area and vary according to the expert opinion from one region to another. Deterministic methods usually depend on physical laws, in which the amounts of motivation and inhibitory forces are calculated for the soil motion.
The most important problem with these methods is that they require high resolution spatial data, meaning that these methods can only be used for areas with little area, because acquisition of such data of high spatial resolution capabilities is expensive (Yilmaz 2009). Some statistical methods such as logistic regression, neural network, frequency ratio and general linear models have been used in the landslide hazard zonation (Youssef et al.; Remondo et al. 2005; Lee and Pradhan 2007; Castellanos and Westen 2007; Mathew et al. 2009; Pradhan and Lee 2009; Bai et al. 2010; Schleier et al. 2014; Ahmed 2015; Milaghardan et al. 2016; Jiang et al. 2017).
The preparation of a landslide zoning map depends on several spatial layers that are used in different regions of the country, according to available information and the expert opinion. Some papers categorize the effective layers of landslides into two categories of internal and external factors (Milaghardan et al. 2016). Internal factors such as slope, aspect, soil type, and elevation are constant and do not change over time. On the other hand, external factors depend on time, suggesting that these factors are neither constant nor predictable. Some of external factors include rainfall, earthquake, land use, etc.
Pourghasemi et al. (2012) predicted a landslide using a conditional probability model and the index of entropy for the Safarood region with a good accuracy. The results of the index of entropy indicate that the most important factors in the zoning of a landslide risk are the height and land use. The conditional probability model has the ability of zoning the landslide hazard even with a small number of layers. The defect of the conditional probability model is that the effective layers must be independent. The advantage is that it can determine the impact of each layer separately. The index of entropy has a higher accuracy than the conditional probability model (Pourghasemi et al. 2012).
Elkadiri et al. (2014) used an artificial neural network and a logistic regression model to prepare a landslide risk map. The artificial neural network was affected by more factors than the logistic regression (Elkadiri et al., 2014).
Catany (2013) studied the effect of the map scale and its accuracy on a map of the landslide hazard zonation by the random forest technique (Catani et al. 2013). In addition, Paudel and Oguchi (2014) analyzed a Landslide in Niigata of Japan using the Random Forest (Paudel and Oguchi 2014).
Using the two-class kernel logistic regression, alternating decision tree, and support vector machines, Pradhan et al. (2015) analyzed the landslide risk in the Yihuang area of China. The alternating decision tree provides much better results than the two-class kernel logistic regression and support vector machines. The slope layer in all three methods has the greatest effect on the landslide hazard zonation (Hong et al. 2015).
Pradhan (2016) combined the spatial multi-criteria evaluation and deterministic model to assess the landslide susceptibility. First, with a deterministic model called shallow landslide stability and a probabilistic model called spatial multi-criteria evaluation, the two landslide susceptibility maps were prepared separately. Afterward, using the frequency ratio method, these two maps were combined to increase the accuracy (Pradhan and kim 2016).
Arif Basofi et al. (2016) modeled a landslide in a region of Indonesia using the fuzzy logic and linguistic variables quantified by experts. Previously, in this area, a landslide hazard zoning was carried out through the Analytic Hierarchy Process (AHP) method. The author classified the risk of the landslide by the fuzzy logic, and then, analyzed the output results with a chi-square test and concluded that the AHP method has a higher accuracy (Basofi et al. 2016).
Milaghardan et al. (2016) utilized the spatial information system (GIS) and the Dempster-Shaffer theory to investigate and analyze the uncertainty in the landslide forecasting model. In this paper, landslides occurrence was predicted only using the internal factors, because external factors need complete information of the spatial and temporal distribution. The advantage of this method is that the uncertainty is simultaneously taken into account as well (Milaghardan et al., 2016).
Haoyuan Hong et al. (2017) improved the accuracy of the landslide susceptibility mapping using a novel region-partitioning approach. They showed that the partitioning of a study area into two regions through their proposed method improved the prediction rate from 0.77 to 0.85 when the support vector machine was used, and from 0.87 to 0.88 when the logistic regression model was utilized. They also used the Geographically Weighted Regression (GWR) method to predict the landslide, because the GWR considers the spatial distribution of the landslide locations in the regression analysis. In their study, the LR and SVM models performed better than the GWR model, because multicollinearity is substantially stronger in the GWR model than in global regression models (Hong et al. 2018).
Mitra et al. (2018) assessed the landslide risk and sensitivity using the multi-criteria decision support system and the Bayesian Network Approach for the Darjeeling hills in Bengal, India. The Bayesian theory is widely used for inference in uncertainty situations in intelligent systems (Mitra et al. 2018).
Nicu and Asăndulesei (2018) used the frequency ratio, statistical index, and analytic hierarchy process to compare the detection power of the mentioned methods in predicting the landslide sensitivity. Using the accuracy assessing methods such as the Area Under the Receiver Operating Characteristic (ROC) Curve and the seed cell area index (SCAI), they concluded that the statistical index method had a better performance in predicting the landslide susceptibility (Nicu and Asăndulesei 2018).
A-Xing Zhu et al. (2018) assessed the landslide risk for two regions in China. They used three different methods and studied the accuracy of these methods. One of the methods was based on the expert knowledge and the other two methods included a logistic regression model and an artificial neural network.
The accuracy of prediction in two-dataset driven methods were better than the method based on the expert knowledge, because in two-dataset driven methods, the dataset that was used to train the algorithm was used to validate the results of the algorithm. In the method based on the expert knowledge, data is not used to train the algorithm and is used only for validation of the algorithm, however, when comparing the results of the landslide predicted by algorithms, it is clear that the expert-based methodology has a much higher accuracy in the zoning of landslides in the high-risk areas. The results of these three methods in two different regions showed that the results of the logistic regression and artificial neural network methods had less sustainability than the expert knowledge based method (Zhu et al. 2018).
Wei Chen et al. (2018) evaluated and compared four progressive machine learning techniques. These four techniques were Bayes’ net, Radical Basis Function (RBF) Classifier, Logistic Model Tree (LMT) and Random Forest. They divided the data of landslides into two parts: 30% for training algorithms and 70% for evaluating the results. After calculations, it was determined that the random forest method had the highest sensitivity, specificity, and accuracy. Besides, after calculating ROC, it was determined that the random forest ROC’s curve had the greatest Area Under the Curve (AUC), which was also a reason for the accuracy and precision of this method from the rest of the methods (Chen et al. 2018).
Mukhiddin Juliev et al. (2019) used and compared Statistical Index (SI), Frequency Ratio (FR) and Certainty Factor (CF) to produce a landslide susceptibility map for north-eastern part of Uzbekistan. 70 percent of landslide areas were used for the training of the methods, whereas 30 percent were used for the evaluation of the results. Using Area Under the Receiver Operating Characteristic (ROC) Curve as a accuracy assessing method, they concluded that the statistical index method had a better performance in predicting the landslide susceptibility. The training accuracies were 82.1%, 74.3% and 74%, while the prediction accuracies are 80%, 70% and 71%, for the SI, FR and CF methods, respectively. So they concluded that SI has the best performance in their study (Juliev et al., 2019).
Jean Baptiste Nsengiyumva et al. (2019) employed and compare different statistical and probabilistic methods. They use weights of evidence, logistic regression, frequency ratio and statistical index to produce susceptibility maps for Rwanda area in center-eastern of Africa. The generated susceptibility maps were validated using the receiver operating characteristic curves. The results from their study revealed that prediction rates were 92.7%, 86.9%, 81.2% and 79.5% respectively for weights of evidence, logistic regression, frequency ratio and statistical index methods. The weights of evidence attained the highest AUC value while the statistical index made the lowest AUC value (Nsengiyumva et al., 2019).
Qingfeng He et al. (2019) compare three machine learning algorithms including Naïve Bayes, radial basis function Classifier, and RBF Network for landslide susceptibility mapping in Longhai, located in China. The findings of these three models were validated and compared using area under the receiver operating characteristics curve and Friedman and Wilcoxon signed-rank tests as statistical metrics. The findings of this study disclosed that the RBF Classifier model is a capable method for spatial prediction of landslide over the world (He et al., 2019).
The main aim of this paper is the implementation and comparison of four data-driven models. A literature review on the landslide susceptibility evaluation shows that no paper has compered the performance of the GMM-SAR model in the landslide susceptibility mapping yet. Therefore, the main difference between the present study and the methods described in the aforementioned literature is that the GMM-SAR and the other three models were applied for the landslide susceptibility mapping in the study area. Furthermore, a validation analysis was executed to comparatively estimate the accuracy and the prediction ability of the models.
Research method
Considering the existence of multiple layers in determining a landslide susceptibility map, most researchers use all the available data for landslide modeling (Lee and Pradhan 2007), but some researchers first identify more effective layers between the available effective layers and then use those layers to model the landslide susceptibility (Kavzoglu et al. 2015). In this research, we will use an intelligent method to determine the effect of each factor in order to increase the modeling accuracy and also to identify the most effective layers.
In this regard, the improved PSO algorithm is proposed to select the optimal spatial factors among the effective factors, or to select the best subset of effective factors. In this regard, in order to identify the effect of the spatial layers on the landslide, the fitness function in the improved PSO algorithm is set to 1-R2, where R2 is the coefficient of determination of the regression algorithm. Therefore, in this research, based on the presence of four regression algorithms, this process is performed for all four regression algorithms.
Generalized Method of Moments Estimation for Spatial Autoregressive Model
Regression methods are methods used to model and quantify the relationship between a dependent variable and one or more independent variables. Regular regression methods consider a constant and equal relationship between the variables and solve the problem assuming that the data is completely independent and that the environment is homogeneous. The spatial data has certain features that are:
- Spatial autocorrelation that is based on Tobler’s law (Tobler 1970) and states that the dependence decreases as the distance increases.
- Spatial heterogeneity that expresses the change in spatial autocorrelation in space and the heterogeneity of the environment. The spatial correlation between the spatial data and the spatial effect of each data on its adjacent data causes the relationship between independent and dependent variables to be positive in a part of the study area and negative in the other part.
Therefore, regular regression methods are not able to distinguish exact relationships between independent and dependent variables. In 1979, these concepts led Paelinck and Klaassen to the notion of spatial econometrics. Spatial econometrics is a branch of econometrics that deals with spatial autocorrelation and spatial heterogeneity in regression methods in flat and 3D data (Paelinck 1978). In the 1980s, Anselin developed the concept of spatial econometrics by introducing the spatial models. Anselin provided several spatial regression models. In this research, a spatial autoregressive model is used (Anselin 1980). In general, in a standard regression model, the spatial dependence can be considered in two ways:
- An additional regressor in the form of a dependent variable, which in this case enters with the form of Wy in the equation.
- In this case, the spatial variable does not come directly into the equation and is considered to be the remainder or the model error, which causes. In state 1, the model is called the spatial lag (Anselin 1980). This model is more efficient when the focus is on the importance of spatial data and their effects on the interpolation.
Typically, a spatial lag model is expressed by a similar form to that of the Spatial autoregressive equations:
In the above equation, is the Spatial autoregressive coefficient, is the errors vector, W is a weighted spatial matrix based on the location of observations, is the dependent variable, and X is a vector of independent variables. should be chosen so that is invertible. W specifies that for any position in the system, other positions near that position affect the value of the dependent parameter in that position. Spatial weights are dependent on the type of neighborhood and can vary in amount depending on the neighborhood. For the ith variable y, the spatial lag value is calculated with the following equation:
Unlike time-series data that has a time lag for a variable at different time intervals, the spatial lag is obtained based on the relationship of a situation with its neighboring regions.
The equation (1) can also be written as follows:
In this study, the weight matrix is obtained using the coordinates of the points and the following relations. First, we obtain the variance matrix of y values. Then, we create a diagonal matrix whose parameters on the main diameter are obtained from the sum of the values of the desired row in the variance matrix of y values:
In the above matrix, C is the variance matrix. The weight matrix used in this study is obtained by the following equation:
In the above relation, A is an adjacency matrix calculated using the Delaunay triangulation and the Voronoi tessellation. In the adjacency matrix, if two points are related to each other, for those two points the value of the matrix is 1, otherwise zero is recorded. We can calculate the weight matrix with the following equation as well, that the result is a symmetric matrix:
and are weight matrices that have the same eigenvalues.
Geographically Weighted Regression
The main advantage of the geographically weight regression method over the usual regression method is its ability to investigate the spatial effects of the variables (Brunsdon et al. 1998). Therefore, the relationship between the variables varies by location (Brunsdon et al., 1998). In GWR, the regression relation estimated in each region is different from the other regions. GWR is a local regression method that has significantly improved the usual regression to use the spatial data.
The spatial correlation between spatial data and the spatial effect of each data on its adjacent data makes the relationship between independent and dependent variables positive in a part of the study area and negative in the other part. Therefore, non-spatial regression models are unable to determine the exact relationship between independent and dependent variables (Zhu et al. 2018). In order to solve the correlation problem between the spatial data, in GWR, a weight is assigned to each observation based on its distance from the estimated position, then the precision of work increases. The overall geographically weighted regression relationship is as follows:
In the above relation, y is the dependent variable, is the independent variable, P is the number of independent variables, ε is the residual of the model, and is the regression coefficient that is a function of position of the observation points. To obtain the regression coefficients we use the following equation:
In the above relation, W is the weight matrix of the observation produced by the position of the points.
Assigning weights is very important in the GWR method. There are several ways to do this, for example, the use of the tri-cube kernel have shown good performance. In this study, the tri-cube kernel was used where is the geometric weight of the observation jth in the ith point, is the Euclidean distance of the two points i and j, and h is the bandwidth parameter. Choosing the right bandwidth is very important to solve the problem. If the bandwidth is small, the result variance increases, and if the bandwidth is large, the results of the GWR method are the same as the regular regression. In this paper, the cross-validation method is used to determine the optimal bandwidth.
is the estimated value of. The bandwidth that minimizes Eq. (11) is considered as the optimal bandwidth.
Logistic Regression
The logistics regression developed by David Cox in 1958 (Cox 1958). In logistic regression, the dependent variable is discrete and classical. Logistic regression measures the relationship between a dependent variable and one or more independent variables by estimating probabilities using a logistic function, which is the cumulative logistic distribution. In logistic regression, conditional distribution y|x is a Bernoulli distribution, because the dependent variable is binary and the predicted values of probabilities are restricted by the logistical distribution function to the interval [0 1]. The following equations show how to use this model:
is the dependent variable, b0 is the model intercept, n is the number of landslides effective factors, is the weight of each factor, is the landslide effective (independent) factor, and is the probability of landslide occurrence.
Multivariate Linear Regression
In multivariate linear regression, the least squares method can be used to estimate the regression coefficients. In a multivariate regression, it is assumed that the number of observations is more than the number of variables. The set of observations is as follows (Lee and Pradhan 2007):
Multivariate linear regression model is as follows:
It can also be written in the following way:
The least squares function is obtained according to the following equation:
The goal is to minimize the above function, which is achieved by selecting the appropriate values for the βs.
Evaluation methods
It is necessary to use scientific methods to evaluate the landslide modeling. Usually, to determine the accuracy of the regression results, the Coefficient of Determination is used, which is calculated as follows.
In the above relations, n is the number of observations, is the ith observation, is the estimated value for the ith observation and is the mean of observations. Other items that can indicate the accuracy of the regression method are:
The Root Mean Square Error (RMSE) and the Normal Root Mean Square Error (NRMSE) that are effective in determining the accuracy.
Another statistic that is recommended for assessing the accuracy of interpolation is Moran’s I, which is as follows.
The e vector is the vector of residues and is the sum of all weights. The spatial autocorrelation of the model remains with this statistic. The closer the value of this statistic is to zero, it indicates the high accuracy of the modeling by the regression algorithm, and the remainders have a more uniform distribution.
Selection of optimal factors using the improved PSO algorithm
As stated above, many factors are effective in the landslide hazard zonation, but among these factors, there are factors that are more effective and the combination of these factors results in a better accuracy. Finding the most important factors is a repetitive process that must be investigated by various combination of effective layers and their accuracy should be compared with each other. Trying out all the different combinations is very time consuming and sometimes an impossible task; therefore, the optimization algorithms should be used for this purpose. In this research, an improved PSO optimization algorithm is used.
The improved PSO algorithm used in this paper, as a typical PSO, initially assigns a random position to each particle. For the kth particle in a d-dimensional space, i.e. the number of effective factors in landslide susceptibility mapping, the position vector is as follows:
In the above equation and are respectively the lower and upper bounds of each factorand n denotes the number of particles in the iteration t. The velocity vector structure of each particle is as follows:
In the above equation, and are respectively the lower and upper bounds of each . In the improved PSO algorithm, each section of the position vector is depicted between [0 1] based on a generalized normalization equation.
Also, each part of the velocity vector is initialized using the normal distribution in the interval
Therefore, the velocity of each particle for the next iteration is changed as follows:
w(t) is the inertia weight that directly affects the convergence of the improved PSO algorithm by creating a better balance between local search and global search. is the personal experience of each particle, and is the coefficient of learning associated with the experiments of all particles. The values of r is selected from interval [0 1] randomly. w(t) is calculated as follows:
where and are the preset values for the minimum and maximum value of w. A threshold, , was defined to select spatial factors in a position vector particle in a way . The value of the selected factors for a particle, i.e. were used to calculate the fitness function, i.e., for each particle in an iteration t. Afterwards, and are calculated as follow:
where is the normalized position vector of best particle in the iteration of t, is the normalized position vector of the best particle in the iteration of t-1, and is the normalized position vector of the best particle from the first to tth iterations. Then, the new particle’s position vector is obtained using the following equation:
The above procedure is repeated as long as the termination condition becomes true.
Study Area Location
The study area is located between 30 ° 44′ 59˝ and 31° 7′ 30˝ North latitude and 51° 29′ 52˝ and 51° 45′ 10˝ East longitude and is situated about 60 km from the Semirom county in Iran. The area of this region, which includes the Marbor river basin, is about 990 square kilometers. This region is part of the Zagros mountain chain and includes a large altitude change of about 2,300 meters.
Effective information layers
Information layers of rivers and streams, residential areas, plant cover, land use and roads were prepared using 1: 25000 maps of the National Cartographic Center of Iran (NCC) for the region of interest. The slopes, aspect, plan curvature, and profile curvature were also provided using the digital elevation model with 30-meter spatial resolution. Lithology information and faults information of the region were prepared from the 1:100,000 geological map of the Geological Survey of Iran (GSI). The map of landslides in the region was prepared from the GSI. Then, using the Euclidean distance, the maps of distance from faults, distance from rivers and streams, and distance from main roads were prepared.
In order to determine the rainfall in the region, the average rainfall data of the Iranian Meteorological Organization in the last decade was used in 19 stations around the study area. It is not feasible to determine the amount of precipitation in any situation in the region using observation and data collection. Hence, the average rainfall data of the last decade and the position of each station were collected. Then, the precipitation rates for each point in the studied area were interpolated using the general kriging interpolation with an exponential semivariogram model and a resolution of 30-meter.
It should be noted that the information layers of the distance to the fault and rainfall are wider than the selected area, because the effect of the faults around the study area is also significant in the landslide. On the other hand, the calculation of rainfall without using the statistics data of surrounding station is also impossible. The lithological layers were categorized into three groups of soils, soft rock, and hard rock based on the research of Kamranzad (Kamranzad et al. 2016).
Results
To implement this research, in the first step, the correlation between the information layers was studied. The following relationships were used for calculating the correlation:
In the above relationships, Cov (X,Y) is the covariance between two sets of data x and y, and are means of these two sets, n is the number of data for each set, and r is the correlation coefficient between the two sets of data with the Standard deviations and . Typically, a correlation coefficient between -0.6 and 0.6 shows a low correlation between the factors, but it is not possible to definitely determine a threshold for correlation or non-correlation between factors, and accordingly, eliminate the highly correlated factors.
This study used the layers whose correlation coefficients were between -0.6 and 0.6. Considering the fact that evolutionary algorithms usually have the ability to consider the relationship between the effective layers and can be continued without considering this step, but in this research, to increase the accuracy, the layers with minimum correlation were selected.
In order to measure the dependence of the landslides to the factors in Table (1), each time one of the GMM-SAR, GWR with tri-cube kernel, LR and MLR, was combined with the improved PSO algorithm to find the optimal combination of effective factors on the landslide in conjunction with the regression.
As explained in the description of the improved PSO algorithm, in this research, and values were equal to 2 based on the previous studies (Djerou et al. 2007). Also, and Sth = 0.6 were selected. The weights values were obtained by the
method proposed by Iwasaki et al. (2006). The number of variables was 11 and the size of the population was 100. The maximum number of iterations for the optimal answer was also 200.
The following table shows a comparison between the accuracy of different methods.
Logistic Regression
For Multivariate Linear Regression and Logistic Regression, because these methods donot have spatial weight matrix, Moran’s I cannot be calculated.
Based on the estimated values of the observations and residuals as well as the regression coefficients, the Landslide susceptibility zonation maps were obtained using the Kriging interpolation method with an exponential semivariogram model and a resolution of 30 meters. The predicted landslide values were divided into five categories based on the natural break method: very low risk areas, low risk areas, moderate risk areas, high risk areas, and very high risk areas.
Discussion
Application and comparison of the GMM-SAR model is relatively new in landslide susceptibility mapping studies. The results of the current study showed that the GWR model had a slightly higher prediction performance than the GMM-SAR model, and these two model have significant prediction performance than the two non-spatial models. The GMM-SAR model is relatively new in landslide susceptibility mapping studies, but GWR has had acceptable performance in different fields of study, including landslides (Faraji Sabokbar et al. 2014), allocation (Huang et al. 2010), and other predictions (Wang et al. 2013).
As shown in Fig. 7 and 8, the estimated values of a landslide and the observed landslide values in the GWR and GMM-SAR algorithms are very close to each other, indicating the accuracy of these results, but for the other two non-spatial algorithms this is not true.
However, the amount of RMSE and NRMSE indicates the accuracy of these methods in the interpolation on the data of this study, but to further examine the accuracy, the Receiver ROC curve and the area below it was used. The ROC curve is a graphical representation of the equilibrium between the negative and positive error rates for each possible value of the slices.
The surface below the ROC curve, which called Area under the curve (AUC), represents the predictive value of the system by describing its ability to estimate the true occurrence of the scaling. The higher the area below the curve, the higher the modeling accuracy and prediction. If the surface area below the curve is 0.5, it indicates that predictive accuracy is not better than the probability and randomness. In this [image: D:\MASTER\proposal\essays\GMM-gwr\result_images\ROC.PNG]research, the ROC curve is plotted using the software and the preliminary and predicted data.
As shown in Table (3), due to the fact that the spatial relationship between variables is also considered, the proposed spatial interpolation algorithms in this study have far better results than non-spatial algorithms.
The following table compares the percentage of different parts of the zoning maps obtained by different methods. As can be seen, the GWR and GMM-SAR algorithms have closer results, indicating a better performance of these two algorithms than the other two non-spatial algorithms.
Determining the significance of effective factors in the landslide susceptibility mapping is important in landslide studies. Table (5) shows the most important effective factors identified by each regression algorithm.
We also noticed that the land use and the plan curvature have higher importance according to the frequency ratio in different models. Youssef et al. (2015a) concluded that slope angle, land use, and altitude have higher importance in landslides occurrence, which is consistent with the results of the current study.
Conclusion
Landslides are one of the most important natural disasters in the world, which annually destroy significant areas of productive lands, inhabited areas and vital infrastructures. Therefore, determining the effective factors on the landslide and forecasting the probability of occurrence through the landslide susceptibility modeling can help us minimize the destruction caused by a landslide. In the present research, in the first step, we tried to determine the optimal effective factors in the landslide occurrence for Semirom of the Isfahan province in Iran. In this regard, the combination of 4 regression algorithms with improved PSO algorithm was provided. In each of these four states, some of the 11 layers were identified as the most effective factors in landslides occurrence.
The accuracy measurements through the GMM-SAR model revealed the most important and effective factors to be plan curvature, profile curvature, aspect, and land use. The most important effective factors for the GWR model with a tri-cube kernel are distance to residential places, distance to rivers, distance to roads, slope, distance to faults, and land use. Using the multivariate linear regression algorithm, it was determined that plan curvature, profile curvature, aspect, distance to fault, and rainfall were more effective in landslides occurrence. In the logistic regression algorithm, the layers of distance to residential places, lithology, plan curvature and land use had the greatest impact. Using the effective layers identified by each method, the landslide risk occurrence was modeled for that method. The spatial regression methods, due to the consideration of spatial relationships between the observations, had a very good agreement with the spatial data and, as expected, the accuracy of this research confirms this issue.
Therefore, the results demonstrate that the GWR model with the tri-cube kernel is the best optimized model in this study and it can be considered as a capable method for landslide susceptibility mapping in similar landslide prone areas all over the world for better accuracy, and then, the GMM-SAR with a partial difference in accuracy, has the best performance. It is recommended that in further researches, more information layers be used to increase the accuracy of the zoning of the landslide hazard. For example, information layers such as altitude, soil thickness, groundwater depth, volcanic map, water flow strength index, slope length, and topographic humidity index can be used. This information was not available to be used in this research.
References
- Ahmed B (2015) Landslide Susceptibility Modelling Applying User-Defined Weighting and Data-Driven Statistical Techniques in Cox’s Bazar Municipality, Bangladesh. Nat Hazards 79:1707–1737. doi: 10.1007/s11069-015-1922-4
- Anselin L (1980) Estimation methods for spatial autoregressive structures : a study in spatial econometrics. Thesis, Ithaca, N.Y. (209 West Sibley Hall, Cornell University, Ithaca, N.Y. 14853) : Program in Urban and Regional Studies, Cornell University
- Bai S, Wang J, G.N L, et al (2010) GIS-based and logistic regression for landslide susceptibility mapping of Zhongxian segment in the Three Gorge area,China. Geomorphology 115:23–31. doi: 10.1016/j.geomorph.2009.09.025
- Basofi A, Fariza A, Dzulkarnain MR (2016) Landslides susceptibility mapping using fuzzy logic: A case study in Ponorogo, East Java, Indonesia. In: 2016 International Conference on Data and Software Engineering (ICoDSE). pp 1–7
- Brunsdon C, Fotheringham S, Charlton M (1998) Geographically Weighted Regression. J R Stat Soc Ser Stat 47:431–443. doi: 10.1111/1467-9884.00145
- Castellanos E, Westen CJ (2007) Generation of a landslide risk index map for Cuba using spatial multi-criteria evaluation. Landslides 4:311–325. doi: 10.1007/s10346-007-0087-y
- Catani F, Lagomarsino D, Segoni S, Tofani V (2013) Landslide susceptibility estimation by random forests technique: sensitivity and scaling issues. Nat Hazards Earth Syst Sci 13:2815–2831. doi: 10.5194/nhess-13-2815-2013
- Chen W, Peng J, Hong H, et al (2018) Landslide susceptibility modelling using GIS-based machine learning techniques for Chongren County, Jiangxi Province, China. Sci Total Environ 626:1121–1135. doi: 10.1016/j.scitotenv.2018.01.124
- Cox DR (1958) The Regression Analysis of Binary Sequences. J R Stat Soc Ser B Methodol 20:215–242
- Djerou L., et al. (2007). Towards the best points of Interpolation using particles swarm optimization approach. IEEE Congress on Evolutionary Computation, CEC, 25-28, Singapore.
- Elkadiri R, Sultan M, Youssef AM, et al A Remote Sensing-Based Approach for Debris-Flow Susceptibility Assessment Using Artificial Neural Networks and Logistic Regression Modeling. IEEE J Sel Top Appl Earth Obs Remote Sens 7:4818–4835
- Faraji Sabokbar H, Shadman Roodposhti M, Tazik E (2014) Landslide susceptibility mapping using geographically-weighted principal component analysis. Geomorphology 226:15–24. doi: 10.1016/j.geomorph.2014.07.026
- He, Q., Shahabi, H., Shirzadi, A., Li, S., Chen, W., Wang, N., Chai, H., Bian, H., Ma, J., Chen, Y., Wang, X., Chapi, K., Ahmad, B.B., 2019. Landslide spatial modelling using novel bivariate statistical based Naïve Bayes, RBF Classifier, and RBF Network machine learning algorithms. Sci. Total Environ. 663, 1–15. https://doi.org/10.1016/j.scitotenv.2019.01.329
- Hong H, Pradhan B, Sameen MI, et al (2018) Improving the accuracy of landslide susceptibility model using a novel region-partitioning approach. Landslides 15:753–772. doi: 10.1007/s10346-017-0906-8
- Hong H, Pradhan B, Xu C, Tien Bui D (2015) Spatial prediction of landslide hazard at the Yihuang area (China) using two-class kernel logistic regression, alternating decision tree and support vector machines. CATENA 133:266–281. doi: 10.1016/j.catena.2015.05.019
- Huang B, Wu B, Barry M (2010) Geographically and temporally weighted regression for modeling spatio-temporal variation in house prices. Int J Geogr Inf Sci 24:383–401. doi: 10.1080/13658810802672469
- Iwasaki N, Yasuda K, Ueno G (2006). Dynamic parameter tuning of particle swarm optimization. IEEJ Transactions, 1, 353–363.
- Jiang W, Rao P, Cao R, et al (2017) Comparative evaluation of geological disaster susceptibility using multi-regression methods and spatial accuracy validation. J Geogr Sci 27:439–462. doi: 10.1007/s11442-017-1386-4
- Juliev, M., Mergili, M., Mondal, I., Nurtaev, B., Pulatov, A., Hübl, J., 2019. Comparative analysis of statistical methods for landslide susceptibility mapping in the Bostanlik District, Uzbekistan. Sci. Total Environ. 653, 801–814. https://doi.org/10.1016/j.scitotenv.2018.10.431
- Kamranzad F, Mohasel Afshar E, Mojarab M, Memarian H (2016) Landslide Hazard Zonation in Tehran Province Using Data-Driven and AHP Methods. J Geosci 25:101–114. doi: 10.22071/gsj.2015.41372
- Kavzoglu T, Kutlug Sahin E, Colkesen I (2015) Selecting optimal conditioning factors in shallow translational landslide susceptibility mapping using genetic algorithm. Eng Geol 192:101–112. doi: 10.1016/j.enggeo.2015.04.004
- Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95 – International Conference on Neural Networks. pp 1942–1948 vol.4
- Lee S, Pradhan B (2007) Landslide hazard mapping at Selangor, Malaysia using frequency ratio and logistic regression models. Landslides 4:33–41. doi: 10.1007/s10346-006-0047-y
- Mathew J, Jha VK, Rawat GS (2009) Landslide susceptibility zonation mapping and its validation in part of Garhwal Lesser Himalaya, India, using binary logistic regression analysis and receiver operating characteristic curve method. Landslides 6:17–26. doi: 10.1007/s10346-008-0138-z
- Milaghardan AH, Delavar M, Chehreghan A (2016) Uncertainty in landslide occurrence prediction using Dempster–Shafer theory. Model Earth Syst Environ 2:. doi: 10.1007/s40808-016-0240-5
- Mitra D, Bhandery C, Mukhopadhyay A, et al (2018) Landslide Risk Assessment in Darjeeling Hills Using Multi-criteria Decision Support System: A Bayesian Network Approach. In: Disaster Risk Governance in India and Cross Cutting Issues. Springer, Singapore, pp 361–386
- Nicu IC, Asăndulesei A (2018) GIS-based evaluation of diagnostic areas in landslide susceptibility analysis of Bahluieț River Basin (Moldavian Plateau, NE Romania). Are Neolithic sites in danger? Geomorphology 314:27–41. doi: 10.1016/j.geomorph.2018.04.010
- Nsengiyumva, J.B., Luo, G., Amanambu, A.C., Mind’je, R., Habiyaremye, G., Karamage, F., Ochege, F.U., Mupenzi, C., 2019. Comparing probabilistic and statistical methods in landslide susceptibility modeling in Rwanda/Centre-Eastern Africa. Sci. Total Environ. 659, 1457–1472. https://doi.org/10.1016/j.scitotenv.2018.12.248
- Paelinck J (1978) Spatial econometrics. Econ Lett 1:59–63
- Paudel U, Oguchi T (2014) Implementation of Random Forest in landslide susceptibility study, a case study of the Tokamachi area, Niigata, Japan
- Pourghasemi HR, Mohammady M, Pradhan B (2012) Landslide susceptibility mapping using index of entropy and conditional probability models in GIS: Safarood Basin, Iran. CATENA 97:71–84. doi: 10.1016/j.catena.2012.05.005
- Pradhan A, kim Y-T (2016) Evaluation of a combined spatial multi-criteria evaluation model and deterministic model for landslide susceptibility mapping. Catena 140:125–139. doi: 10.1016/j.catena.2016.01.022
- Pradhan B, Lee S (2009) Regional landslide susceptibility analysis using back-propagation neural network model at Cameron Highland, Malaysia.