Semi-supervised multitask learning using convolutional autoencoder for faulty code detection with limited data

Detecting faults in source code to fix is an important task in the software quality assurance. Building automated detectors using machine learning has been faced two big challenges of data imbalance and shortages. To address the issues, this paper proposes a deep neural network and training procedur...

Full description

Saved in:
Bibliographic Details
Published inApplied intelligence (Dordrecht, Netherlands) Vol. 53; no. 4; pp. 3877 - 3888
Main Authors Phan, Anh Viet, Nguyen, Khanh Duy Tung, Bui, Lam Thu
Format Journal Article
LanguageEnglish
Published New York Springer US 01.02.2023
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Detecting faults in source code to fix is an important task in the software quality assurance. Building automated detectors using machine learning has been faced two big challenges of data imbalance and shortages. To address the issues, this paper proposes a deep neural network and training procedures to allow learning with limited annotated data. The network is composed of an unsupervised auto-encoder and a supervised classifier. The two components share some first layers that plays as a program feature extractor. Interestingly, we can leverage a large amount of unlabeled data from various sources to train the auto-encoder independently then transfer to the target domain. Additionally, sharing layers, and jointly training the reconstruction and the classification tasks stimulate the generation of the sophisticated features. We conducted the experiments on four real datasets with different amount of labeled data and with adding more unlabeled data. The results have confirmed that the multi-task outperforms the single-task and leveraging the unlabeled data is beneficial. Specifically, when reducing the labeled data from 100% to 75%, 50%, 25%, the performance of several deep networks drops sharply, while it reduces gradually for our model.
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-022-03663-5