Prediction modelling framework comparative analysis of dissolved oxygen concentration variations using support vector regression coupled with multiple feature engineering and optimization methods: A case study in China

[Display omitted] •A comprehensive SVR-based prediction framework for DO concentration was proposed.•Hybrid pathways to improve the SVR model using intelligent approaches were provided.•Water temperature and the CSPAR were the fixed predictors for DO in three stations.•Indiscriminate use of feature...

Full description

Saved in:
Bibliographic Details
Published inEcological indicators Vol. 146; p. 109845
Main Authors Nong, Xizhi, Lai, Cheng, Chen, Lihua, Shao, Dongguo, Zhang, Chi, Liang, Jiankui
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.02.2023
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:[Display omitted] •A comprehensive SVR-based prediction framework for DO concentration was proposed.•Hybrid pathways to improve the SVR model using intelligent approaches were provided.•Water temperature and the CSPAR were the fixed predictors for DO in three stations.•Indiscriminate use of feature selection may cause overfitting after data denoising.•Combination of feature selection and HPO can effectively improve model robustness. Dissolved oxygen (DO) is an essential indicator for assessing water quality and managing aquatic environments, but it is still a challenging topic to accurately understand and predict the spatiotemporal variation of DO concentrations under the complex effects of different environmental factors. In this study, a practical prediction framework was proposed for DO concentrations based on the support vector regression (SVR) model coupling multiple intelligence techniques (i.e., four data denoising techniques, three feature selection rules, and four hyperparameter optimization methods). The holistic framework was tested using a data matrix (17,532 observation data in total) of 12 indicators from three vital water quality monitoring stations of the longest inter-basin water diversion project in the world (i.e., the Middle-Route of the South-to-North Water Diversion Project of China), during the year 2017 to 2020 period. The results showed that the framework we advocated for could successfully and accurately predict DO concentration variations in different geographical locations. The model used the “wavelet analysis–LASSO regression–random search–SVR” combination of the Waihuanhe station has the best prediction performance, with the Root Mean Square Error (RMSE), Mean Square Error (MSE), Mean Absolute Error (MAE), and coefficient of determination (R2) values of 0.251, 0.063, 0.190, and 0.911, respectively. The combined methods using feature selection and hyperparameter optimization techniques can significantly promote the robustness and accuracy of the prediction model and can provide a new universal and practical way of investigating and understanding the environmental drivers of DO concentration variations. For the water quality management department, this proposed comprehensive framework can also identify and reveal the key parameters that should be concerned and monitored under different environmental factors change. More studies in terms of assessing potential integrated water quality risk using multi-indicators in mega water diversion projects and/or similar water bodies are required in the future.
ISSN:1470-160X
1872-7034
DOI:10.1016/j.ecolind.2022.109845