Machine learning-based landslide susceptibility assessment with optimized ratio of landslide to non-landslide samples
[Display omitted] •A new framework is proposed for landslide susceptibility analysis.•Bayesian algorithm is used to optimize the proportion of landslide samples.•The framework is validated by a case study, and RF and GBDT outperform SVM. Machine learning models have been widely used for landslide su...
Saved in:
Published in | Gondwana research Vol. 123; pp. 198 - 216 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.11.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | [Display omitted]
•A new framework is proposed for landslide susceptibility analysis.•Bayesian algorithm is used to optimize the proportion of landslide samples.•The framework is validated by a case study, and RF and GBDT outperform SVM.
Machine learning models have been widely used for landslide susceptibility assessment (LSA) in recent years. The accuracy of machine learning-based LSA often hinges on the ratio of landslide to non-landslide (or positive/negative, P/N) samples. A proper ratio of the P/N samples will significantly improve the performance of machine learning-based LSA, but an improper ratio can cause inadequate training or data pollution. Conventionally, the determination of the P/N sample ratio is based on experience or by trials and errors, which has substantial uncertainties. This paper proposes a Bayesian optimization method to optimize the P/N sample ratio for machine learning models. Firstly, Anhua County in Hunan province of China is selected as the study area because of numerous landslide disasters that occurred in recent years. Secondly, three representative machine learning models of the support vector machine (SVM), the random forest (RF) and the gradient boost decision tree (GBDT) are adopted to assess the landslide susceptibility. Subsequently, a Bayesian optimization algorithm is used to obtain the optimal P/N sample ratio, considering the effects of various ratios of training/test set. Finally, the improved models and the corresponding landslide susceptibility maps are established using the obtained optimal P/N sample ratio. The results show that the performance of SVM, RF and GBDT are all improved with the optimized P/N sample ratio. The highest AUC value is for the RF model (0.840, improved by 1.3%), followed by GBDT (0.831, improved by 1.3%), and SVM (0.775, improved by 0.7%). However, the RF and GBDT are more suitable than SVM to address sample unbalance issues in LSA. It is suggested to use the Bayesian optimization algorithm to optimize the P/N sample ratio in machine learning-based LSA model. |
---|---|
ISSN: | 1342-937X 1878-0571 |
DOI: | 10.1016/j.gr.2022.05.012 |