GeoRF: a geospatial random forest
The geospatial domain increasingly relies on data-driven methodologies to extract actionable insights from the growing volume of available data. Despite the effectiveness of tree-based models in capturing complex relationships between features and targets, they fall short when it comes to considerin...
Saved in:
Published in | Data mining and knowledge discovery Vol. 38; no. 6; pp. 3414 - 3448 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
Springer US
01.11.2024
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The geospatial domain increasingly relies on data-driven methodologies to extract actionable insights from the growing volume of available data. Despite the effectiveness of tree-based models in capturing complex relationships between features and targets, they fall short when it comes to considering spatial factors. This limitation arises from their reliance on univariate, axis-parallel splits that result in rectangular areas on a map. To address this issue and enhance both performance and interpretability, we propose a solution that introduces two novel bivariate splits: an oblique and Gaussian split designed specifically for geographic coordinates. Our innovation, called Geospatial Random Forest (geoRF), builds upon Geospatial Regression Trees (GeoTrees) to effectively incorporate geographic features and extract maximum spatial insights. Through an extensive benchmark, we show that our geoRF model outperforms traditional spatial statistical models, other spatial RF variations, machine learning and deep learning methods across a range of geospatial tasks. Furthermore, we contextualize our method’s computational time complexity relative to baseline approaches. Our prediction maps illustrate that geoRF produces more robust and intuitive decision boundaries compared to conventional tree-based models. Utilizing impurity-based feature importance measures, we validate geoRF’s effectiveness in highlighting the significance of geographic coordinates, especially in data sets exhibiting pronounced spatial patterns. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1384-5810 1573-756X |
DOI: | 10.1007/s10618-024-01046-7 |