Predicting in-stream water quality constituents at the watershed scale using machine learning

Predicting in-stream water quality is necessary to support the decision-making process of protecting healthy waterbodies and restoring impaired ones. Data-driven modeling is an efficient technique that can be used to support such efforts. Our objective was to determine if in-stream concentrations of...

Full description

Saved in:

Bibliographic Details
Published in	Journal of contaminant hydrology Vol. 251; p. 104078
Main Authors	Adedeji, Itunu C., Ahmadisharaf, Ebrahim, Sun, Yanshuo
Format	Journal Article
Language	English
Published	Netherlands Elsevier B.V 01.12.2022
Subjects	Environmental Monitoring - methods In-stream water quality Machine Learning Nitrogen - analysis Oxygen - analysis Phosphorus - analysis Rivers Seasonality Uncertainty quantification Water Quality Uncertainty quantification In-stream water quality Machine learning Seasonality
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Predicting in-stream water quality is necessary to support the decision-making process of protecting healthy waterbodies and restoring impaired ones. Data-driven modeling is an efficient technique that can be used to support such efforts. Our objective was to determine if in-stream concentrations of contaminants, nutrients—total phosphorus (TP) and total nitrogen (TN) —total suspended solids (TSS), dissolved oxygen (DO), and fecal coliform bacteria (FC) can be predicted satisfactorily using machine learning (ML) algorithms based on publicly available datasets. To achieve this objective, we evaluated four modeling scenarios, differing in terms of the required inputs (i.e., publicly available datasets (e.g., land-use/land cover)), antecedent conditions, and additional in-stream water quality observations (e.g., pH and turbidity). We implemented five ML algorithms—Support Vector Machines, Random Forest (RF), eXtreme Gradient Boost (XGB), ensemble RF-XGB, and Artificial Neural Network (ANN) —and demonstrated our modeling framework in an inland stream—Bullfrog Creek, located near Tampa, Florida. The results showed that, while including additional water quality drivers improved overall model performance for all target constituents, TP, TN, DO, and TSS could still be predicted satisfactorily using only publicly available datasets (Nash-Sutcliffe efficiency [NSE] > 0.75 and percent bias [PBIAS] < 10%), whereas FC could not (NSE < 0.49 and PBIAS >25%). Additionally, antecedent conditions slightly improved predictions and reduced the predictive uncertainty, particularly when paired with other water quality observations (6.9% increase in NSE for FC, and 2.7% for TP, TN, DO, and TSS). Also, comparable model performances of all water quality constituents in wet and dry seasons suggest minimal season-dependence of the predictions (<4% difference in NSE and < 10% difference in PBIAS). Our developed modeling framework is generic and can serve as a complementary tool for monitoring and predicting in-stream water quality constituents. [Display omitted] •In-stream water quality were predicted at the watershed scale using machine learning.•Publicly available data, antecedent conditions and other water quality constituents were used for predictions.•TN, TP, TSS and DO were predicted satisfactorily with only publicly available data.•FC cannot be predicted satisfactorily with only publicly available datasets.•Including antecedent conditions and other water quality constituents improved the predictions.
ISSN:	0169-7722 1873-6009
DOI:	10.1016/j.jconhyd.2022.104078