Explainable Machine Learning for Mapping Minerals From CRISM Hyperspectral Data
This paper addresses some of the challenges in automating mineral mapping from CRISM hyperspectral data using two ML algorithms, namely, Random Forest (RF) and Gradient Boosted Trees (GBTree) algorithms. An interpretable framework using tree‐ensemble classification and SHapely Additive exPlanations...
Saved in:
Published in | Journal of geophysical research. Machine learning and computation Vol. 2; no. 2 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Wiley
01.06.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This paper addresses some of the challenges in automating mineral mapping from CRISM hyperspectral data using two ML algorithms, namely, Random Forest (RF) and Gradient Boosted Trees (GBTree) algorithms. An interpretable framework using tree‐ensemble classification and SHapely Additive exPlanations (SHAP) is implemented to interpret the model decisions. SHAP explanations quantify the influence of diagnostic absorption features, demonstrating that classifiers rely on physically significant spectral features, rather than artifacts. Novel metrics: “Physically Significant Precision,” “Physically Significant Recall,” and “Physically Significant F‐measure” quantify the classifier's expected performance on unseen data. RF outperforms GBTree, and thus is used to develop a novel framework for mineral mapping from CRISM data, demonstrated on five CRISM datacubes. This amalgamation of Random Forest and SHAP addresses limitations associated with existing CRISM classification methods, offering stability during training, reduced manual intervention, and interpretability while achieving a Kappa (κ) $(\kappa )$ of 0.91 over the CRISM Machine Learning Toolkit's mineral data set with ∼ ${\sim} $470,000 labeled spectra.
Plain Language Summary
This paper describes a method for automating mineral detection from CRISM hyperspectral data, comparing Random Forest and Gradient Boosted Trees algorithms. It presents a method using tree‐ensemble algorithms for classification and SHapely Additive exPlanations (SHAP) for interpretability. SHAP explanations show classifiers prioritize physically significant spectral features over artifacts. Novel metrics: “Physically Significant Precision,” “Physically Significant Recall,” and “Physically Significant F‐measure” gauge classifiers' expected performance on unseen data. RF outperforms GBTree and thus is used to develop a new framework for mineral mapping from CRIM data, which is demonstrated on five CRISM datacubes. The combination of Random Forest and SHAP addresses limitations of existing CRISM classification methods, providing stability, reduced manual intervention, and interpretability. It achieves a Kappa (κ) $(\kappa )$ of 0.91 over ∼ ${\sim} $470,000 labeled spectra from the CRISM Machine Learning Toolkit's mineral data set.
Key Points
The effectiveness of the Random Forest and gradient‐boosted tree classifiers for identifying minerals from hyperspectral data is evaluated
A novel metric, based on SHapely Additive exPlanations, to estimate a trained classifier's expected performance on unseen data is introduced
A novel explainable framework for mapping minerals from CRISM hyperspectral data is introduced |
---|---|
ISSN: | 2993-5210 2993-5210 |
DOI: | 10.1029/2024JH000391 |