Water quality prediction using machine learning models based on grid search method

Water quality is very dominant for humans, animals, plants, industries, and the environment. In the last decades, the quality of water has been impacted by contamination and pollution. In this paper, the challenge is to anticipate Water Quality Index (WQI) and Water Quality Classification (WQC), suc...

Full description

Saved in:

Bibliographic Details
Published in	Multimedia tools and applications Vol. 83; no. 12; pp. 35307 - 35334
Main Authors	Shams, Mahmoud Y., Elshewey, Ahmed M., El-kenawy, El-Sayed M., Ibrahim, Abdelhameed, Talaat, Fatma M., Tarek, Zahraa
Format	Journal Article
Language	English
Published	New York Springer US 01.04.2024 Springer Nature B.V
Subjects	Accuracy Classification Computation Computer Communication Networks Computer Science Correlation coefficients Data Structures and Information Theory Decision trees Errors Machine learning Multilayer perceptrons Multilayers Multimedia Information Systems Parameters Regression models Search methods Special Purpose and Application-Based Systems Tuning Water quality Grid search Water quality classification Machine learning models Water quality index Water quality
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Water quality is very dominant for humans, animals, plants, industries, and the environment. In the last decades, the quality of water has been impacted by contamination and pollution. In this paper, the challenge is to anticipate Water Quality Index (WQI) and Water Quality Classification (WQC), such that WQI is a vital indicator for water validity. In this study, parameters optimization and tuning are utilized to improve the accuracy of several machine learning models, where the machine learning techniques are utilized for the process of predicting WQI and WQC. Grid search is a vital method used for optimizing and tuning the parameters for four classification models and also, for optimizing and tuning the parameters for four regression models. Random forest (RF) model, Extreme Gradient Boosting (Xgboost) model, Gradient Boosting (GB) model, and Adaptive Boosting (AdaBoost) model are used as classification models for predicting WQC. K-nearest neighbor (KNN) regressor model, decision tree (DT) regressor model, support vector regressor (SVR) model, and multi-layer perceptron (MLP) regressor model are used as regression models for predicting WQI. In addition, preprocessing step including, data imputation (mean imputation) and data normalization were performed to fit the data and make it convenient for any further processing. The dataset used in this study includes 7 features and 1991 instances. To examine the efficacy of the classification approaches, five assessment metrics were computed: accuracy, recall, precision, Matthews's Correlation Coefficient (MCC), and F1 score. To assess the effectiveness of the regression models, four assessment metrics were computed: Mean Absolute Error (MAE), Median Absolute Error (MedAE), Mean Square Error (MSE), and coefficient of determination (R 2 ). In terms of classification, the testing findings showed that the GB model produced the best results, with an accuracy of 99.50% when predicting WQC values. According to the experimental results, the MLP regressor model outperformed other models in regression and achieved an R 2 value of 99.8% while predicting WQI values.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1573-7721 1380-7501 1573-7721
DOI:	10.1007/s11042-023-16737-4