Prediction of Distributed River Sediment Respiration Rates Using Community‐Generated Data and Machine Learning

River sediment microbial respiration is a key indicator of ecosystem functioning and the biogeochemical fluxes across this critical zone link surface and subsurface waters. As such, there is tremendous interest in measuring and mapping these respiration rates. Respiration observations are expensive...

Full description

Saved in:
Bibliographic Details
Published inJournal of geophysical research. Machine learning and computation Vol. 1; no. 3
Main Authors Gary, Stefan F., Scheibe, Timothy D., Rexer, Em, Torreira, Alvaro Vidal, Garayburu‐Caruso, Vanessa A., Goldman, Amy, Stegen, James C.
Format Journal Article
LanguageEnglish
Published United States Wiley 01.09.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:River sediment microbial respiration is a key indicator of ecosystem functioning and the biogeochemical fluxes across this critical zone link surface and subsurface waters. As such, there is tremendous interest in measuring and mapping these respiration rates. Respiration observations are expensive and labor intensive; there is limited data available to the community. An open science, collaborative initiative is collecting samples for respiration rate analysis and multi‐scale metadata; this evolving data set is being used for making machine learning (ML) predictions at unsampled sites to help inform continued community engagement. However, it is a challenge to find an optimum configuration for ML models to work with this feature‐rich (i.e., 100+ possible input variables) data set. Here, we present results from a two‐tiered approach to managing the analysis of this complex data set: (a) a stacked ensemble of models that automatically optimizes hyperparameters and manages the training of many models and (b) feature permutation importance to detect the most important features in the models. The major elements of this workflow are modular, portable, open, and cloud‐based thus making this implementation a potential template for other applications. The models developed here predict that sediment organic matter chemistry is one of the most important features for predicting sediment respiration rate. Other larger‐scale, important features fall into the categories of climatic, ecological, geological, and fluvial settings. Leveraging these larger‐scale features to generate data‐driven estimates of river sediment respiration rates reveals spatially consistent but heterogeneous patterns across the river network of the Columbia River Basin. Plain Language Summary We want to determine the environmental factors that impact the amount of oxygen and nutrients that are used by microbes in river sediments. River sediment oxygen and nutrient use are important to river ecosystems but vary a lot between different locations. The number of measurements have been limited but are increasing thanks to volunteers participating in an open science project. Here, we use machine learning (ML) with existing data to make predictions of river sediment microbial oxygen consumption. The resulting ML models, and their predictions, are then used to estimate which aspects of the environment are the most important for making good predictions. It appears that the presence/absence of different kinds of nutrients for the microbes may be the most important factor in predicting oxygen consumption in sediment. Larger‐scale factors, especially the local climate, geography, and ecology of the river, have important roles, too. Finally, we use these models to make a map of estimated oxygen consumption in river sediments across the Columbia River Basin. Maps like ours can be combined with river flow models to get a holistic understanding of river systems as well as guide future sampling efforts to reduce uncertainty in the model predictions. Key Points Machine learning models can estimate spatially variable river sediment oxygen consumption and explain up to 65 percent of the variance Sediment organic matter chemistry is one of the most important features for predicting variations in respiration rates Large scale climatological features are also important for prediction and can be used to map respiration rates and estimated uncertainty
Bibliography:AC05-76RL01830; SC0020464
USDOE Office of Science (SC), Biological and Environmental Research (BER)
PNNL-SA--196211
ISSN:2993-5210
2993-5210
DOI:10.1029/2024JH000199