Predicting Water Quality Index (WQI) by feature selection and machine learning: A case study of An Kim Hai irrigation system

•Propose a novel ML-based approach combining FS and ML methods to estimate the WQI•Consider advantages of FS methods to select key WQ parameters feeding to ML models•Reduce significantly the number of WQ parameters in predicting the WQI values•Evaluate the WQI values accurately, save time and analyt...

Full description

Saved in:
Bibliographic Details
Published inEcological informatics Vol. 74; p. 101991
Main Authors Lap, Bui Quoc, Phan, Thi-Thu-Hong, Nguyen, Huu Du, Quang, Le Xuan, Hang, Phi Thi, Phi, Nguyen Quang, Hoang, Vinh Truong, Linh, Pham Gia, Hang, Bui Thi Thanh
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.05.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•Propose a novel ML-based approach combining FS and ML methods to estimate the WQI•Consider advantages of FS methods to select key WQ parameters feeding to ML models•Reduce significantly the number of WQ parameters in predicting the WQI values•Evaluate the WQI values accurately, save time and analytical costs considerably A variety of water quality indices have been used to assess the state of waterbodies all over the world. In calculating a Water Quality Index (WQI), traditional methods require the evaluation of many water quality parameters, making them costly and time-consuming. In recent years, machine learning (ML) algorithms have emerged as an effective tool to solve many environmental problems, including water quality management. In this study, we investigate the performance of the ML-based method in calculating the WQI. We apply several feature selection techniques to select the key parameters fed the ML models. Experiments are carried out to evaluate the WQI based on a dataset collected from 2007 to 2020 of An Kim Hai system, one of the most important irrigation systems in the north of Vietnam. The obtained results show that the application of selection methods allows reducing significantly the number of water quality parameters fed the ML models without losing their accuracy. In particular, by using the embedded method, we find out four important parameters, including Coliform, DO, Turbidity, and TSS, that have the greatest impact on water quality. Based on these parameters, the Random Forest model provides the best accuracy in predicting the WQI values from the An Kim Hai system with a Similarity of 0.94. The combination of feature selection and ML methods is then considered an effective alternative for calculating the WQI, leading to a desirable performance and a reduction of input parameters. This makes water quality monitoring less costly, substantial effort, and time.
ISSN:1574-9541
DOI:10.1016/j.ecoinf.2023.101991