Interpretable CEEMDAN-FE-LSTM-transformer hybrid model for predicting total phosphorus concentrations in surface water

[Display omitted] •A novel data-driven CF-LT model for total phosphorus (TP) prediction is presented.•The proposed CF-LT model exhibits excellent overall and peak prediction results.•Turbidity and total nitrogen have the greatest influence on TP predictions. The complexity of the biogeochemical cycl...

Full description

Saved in:
Bibliographic Details
Published inJournal of hydrology (Amsterdam) Vol. 629; p. 130609
Main Authors Yao, Jiefu, Chen, Shuai, Ruan, Xiaohong
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.02.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:[Display omitted] •A novel data-driven CF-LT model for total phosphorus (TP) prediction is presented.•The proposed CF-LT model exhibits excellent overall and peak prediction results.•Turbidity and total nitrogen have the greatest influence on TP predictions. The complexity of the biogeochemical cycle of phosphorus in lakes makes it challenging to produce efficient and accurate predictions of total phosphorus (TP) concentrations. In this study, a hybrid model is developed for TP predictions. This model combines the Complete ensemble empirical mode decomposition with adaptive noise, Fuzzy entropy, Long short-term memory, and Transformer (CF-LT). The introduction of data split-frequency reconstruction effectively solves the problems of over- and underfitting suffered by previous machine learning models in the face of high-dimensional data, while an attention mechanism overcomes the inability of these models to establish long-term dependencies between data when making long-term predictions. The CF-LT model is applied to predict TP concentrations from January 1, 2015, to December 31, 2020, at Yaoxiangqiao, Zhihugang, and Guanduqiao, three national water quality monitoring stations at the inlet of Taihu Lake, China. Moreover, the Shapley additive explanations are used to interpret the CF-LT model and identify the essential input features. The prediction results demonstrate that the CF-LT model achieves a coefficient of determination (R2) of 0.37–0.87 on the test dataset, representing an improvement of 0.05–0.17 (6%-85%) over the control models. In addition, the CF-LT model provides the best peak value predictions. The model interpretation results indicate that the turbidity and total nitrogen are the essential factors influencing TP predictions. This demonstrates that the TP concentrations at the inlet of Taihu Lake are closely related to the non-point pollution discharge and the status of aquatic plants. It's worth noting that these two indicators exert a more significant influence on the prediction of TP during wet season. This work provides a viable modeling strategy for predicting TP concentrations and guidance for early warning and treatment of surface water eutrophication in the Taihu Lake basin.
ISSN:0022-1694
1879-2707
DOI:10.1016/j.jhydrol.2024.130609