Data augmentation using conditional generative adversarial network (cGAN): applications for sewer condition classification and testing using different machine learning techniques

The increasing availability of condition assessment data highlights the challenge of managing data imbalance in the asset management of aging infrastructure. Aging sewer pipes pose significant threats to health and the environment, underscoring the importance of proactive management practices to enh...

Full description

Saved in:
Bibliographic Details
Published inJournal of hydroinformatics Vol. 26; no. 7; pp. 1471 - 1489
Main Authors Woldesellasse, Haile, Tesfamariam, Solomon
Format Journal Article
LanguageEnglish
Published 01.07.2024
Online AccessGet full text

Cover

Loading…
More Information
Summary:The increasing availability of condition assessment data highlights the challenge of managing data imbalance in the asset management of aging infrastructure. Aging sewer pipes pose significant threats to health and the environment, underscoring the importance of proactive management practices to enhance asset maintenance and mitigate associated risks. While machine learning (ML) models are widely employed to model the complex deterioration process of sewer pipes, they face performance limitations when trained on imbalanced condition grade data. This paper addresses this issue by proposing a novel approach using conditional generative adversarial network (cGAN) for data augmentation. By generating synthetic data for minority classes, the skewed distribution of the sewer dataset is balanced, facilitating more robust and accurate predictive models. The utility of the proposed method is evaluated by training different ML classifiers, including neural network (NN), decision tree, quadratic discriminant analysis, Naïve Bayes, support vector machine (SVM), and K-nearest neighbor. Quadratic discriminant, Naïve Bayes, NN, and SVM classifiers demonstrated improvement. The cGAN-based data augmentation method also outperformed two other data imbalance handling techniques, random under-sampling, and cost-sensitive NN. Consequently, data generated by cGAN can effectively aid asset management by developing proactive classifiers that accurately predict pipes at a high risk of failure.
ISSN:1464-7141
1465-1734
DOI:10.2166/hydro.2024.135