TY - JOUR
T1 - Optimizing landslide susceptibility mapping using machine learning and geospatial techniques
AU - Agboola, Gazali
AU - Beni, Leila Hashemi
AU - Elbayoumi, Tamer M
AU - Thompson, Gary
PY - 2024/7/1
Y1 - 2024/7/1
N2 - Landslides present a substantial risk to human lives, the environment, and infrastructure. Consequently, it is crucial to highlight the regions prone to future landslides by examining the correlation between past landslides and various geo-environmental factors. This study aims to investigate the optimal data selection and machine learning model, or ensemble technique, for evaluating the vulnerability of areas to landslides and determining the most accurate approach. To attain our objectives, we considered two different scenarios for selecting landslide-free random points (a slope threshold and a buffer-based approach) and performed a comparative analysis of five machine learning models for landslide susceptibility mapping, namely: Support Vector Machine (SVM), Logistic Regression (LR), Linear Discriminant Analysis (LDA), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). The study area for this research is an area in Polk County in Western North Carolina that has experienced fatal landslides, leading to casualties and significant damage to infrastructure, properties, and road networks. The model construction process involves the utilization of a dataset comprising 1215 historical landslide occurrences and 1215 non-landslide points. We integrated a total of fourteen geospatial data layers, consisting of topographic variables, soil data, geological data, and land cover attributes. We use various metrics to assess the models' performance, including accuracy, F1-score, Kappa score, and AUC-ROC. In addition, we used the seeded-cell area index (SCAI) to evaluate map consistency. The ensemble of the five models using Weighted Average produces outstanding results, with an AUC-ROC of 99.4% for the slope threshold scenario and 91.8% for the buffer-based scenario. Our findings emphasize the significant impact of non-landslide random sampling on model performance in landslide susceptibility mapping. Furthermore, by optimally identifying landslide-prone regions and hotspots that need urgent risk management and land use planning, our study demonstrates the effectiveness of machine learning models in analyzing landslide susceptibility and providing valuable insights for informed decision-making and disaster risk reduction initiatives.
AB - Landslides present a substantial risk to human lives, the environment, and infrastructure. Consequently, it is crucial to highlight the regions prone to future landslides by examining the correlation between past landslides and various geo-environmental factors. This study aims to investigate the optimal data selection and machine learning model, or ensemble technique, for evaluating the vulnerability of areas to landslides and determining the most accurate approach. To attain our objectives, we considered two different scenarios for selecting landslide-free random points (a slope threshold and a buffer-based approach) and performed a comparative analysis of five machine learning models for landslide susceptibility mapping, namely: Support Vector Machine (SVM), Logistic Regression (LR), Linear Discriminant Analysis (LDA), Random Forest (RF), and Extreme Gradient Boosting (XGBoost). The study area for this research is an area in Polk County in Western North Carolina that has experienced fatal landslides, leading to casualties and significant damage to infrastructure, properties, and road networks. The model construction process involves the utilization of a dataset comprising 1215 historical landslide occurrences and 1215 non-landslide points. We integrated a total of fourteen geospatial data layers, consisting of topographic variables, soil data, geological data, and land cover attributes. We use various metrics to assess the models' performance, including accuracy, F1-score, Kappa score, and AUC-ROC. In addition, we used the seeded-cell area index (SCAI) to evaluate map consistency. The ensemble of the five models using Weighted Average produces outstanding results, with an AUC-ROC of 99.4% for the slope threshold scenario and 91.8% for the buffer-based scenario. Our findings emphasize the significant impact of non-landslide random sampling on model performance in landslide susceptibility mapping. Furthermore, by optimally identifying landslide-prone regions and hotspots that need urgent risk management and land use planning, our study demonstrates the effectiveness of machine learning models in analyzing landslide susceptibility and providing valuable insights for informed decision-making and disaster risk reduction initiatives.
KW - Data driven
KW - Landslide susceptibility
KW - Machine learning
KW - Natural disaster
KW - Remote sensing
UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85189608112&origin=inward
UR - https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=85189608112&origin=inward
U2 - 10.1016/j.ecoinf.2024.102583
DO - 10.1016/j.ecoinf.2024.102583
M3 - Article
SN - 1574-9541
VL - 81
JO - Ecological Informatics
JF - Ecological Informatics
M1 - 102583
ER -