Potential of random forest machine learning algorithm for geological mapping using PALSAR and Sentinel-2A remote sensing data: A case study of Tsagaan-uul area, southern Mongolia

[Display omitted] •Higher train-test splits increased variable influence in Random Forest models.•ALOS PALSAR DEM data showed the highest Gini index, aiding geological mapping.•Data split ratio impacted model performance more than the number of decision trees.•In rock sample studies, stratification...

Full description

Saved in:
Bibliographic Details
Published inJournal of Asian Earth Sciences: X Vol. 14; p. 100204
Main Authors Badrakh, Munkhsuren, Tserendash, Narantsetseg, Choindonjamts, Erdenejargal, Albert, Gáspár
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.12.2025
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:[Display omitted] •Higher train-test splits increased variable influence in Random Forest models.•ALOS PALSAR DEM data showed the highest Gini index, aiding geological mapping.•Data split ratio impacted model performance more than the number of decision trees.•In rock sample studies, stratification influenced results more than ntree, mtry. Geological mapping in remote and geologically complex regions can be substantially improved by integrating remote sensing data with machine learning algorithms. This study evaluates the effectiveness of the Random Forest algorithm for geological mapping in the Tsagaan-uul area of the Khatanbulag ancient massif, Mongolia, a region characterized by limited accessibility and sparse field data. A comprehensive set of predictor variables was used, including Sentinel-2A spectral bands and indices, ALOS PALSAR digital elevation model, and terrain morphometric features. Two distinct training strategies were employed: (1) based on a geological map, (2) based on field-collected rock samples from two lithologically diverse formations. Variable importance was assessed using the Mean Decrease Gini index, while classification performance was measured through overall accuracy, precision, recall, F1-score, and the Kappa coefficient. In the first experiment, ALOS PALSAR DEM and Terrain Ruggedness Index were identified as the most influential predictors. Overall accuracy across all nine models ranged from 59.9 % to 64.4 %, with Kappa coefficients between 0.508 and 0.562. Model 1, which used a 90–10 % split, achieved the highest performance, while Model 4 recorded the lowest. These suggest that the data split ratio had a greater impact on model accuracy than the number of decision trees. In the second experiment, variations in the number of trees and variables per split had minimal effects, whereas the choice of stratification method significantly affected model outcomes. Overall, findings emphasize the critical role of dataset configuration, such as class balance and representative sampling, in optimizing Random Forest-based geological mapping.
ISSN:2590-0560
2590-0560
DOI:10.1016/j.jaesx.2025.100204