Semi-supervised multitask learning using convolutional autoencoder for faulty code detection with limited data
Detecting faults in source code to fix is an important task in the software quality assurance. Building automated detectors using machine learning has been faced two big challenges of data imbalance and shortages. To address the issues, this paper proposes a deep neural network and training procedur...
Saved in:
Published in | Applied intelligence (Dordrecht, Netherlands) Vol. 53; no. 4; pp. 3877 - 3888 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
Springer US
01.02.2023
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Detecting faults in source code to fix is an important task in the software quality assurance. Building automated detectors using machine learning has been faced two big challenges of data imbalance and shortages. To address the issues, this paper proposes a deep neural network and training procedures to allow learning with limited annotated data. The network is composed of an unsupervised auto-encoder and a supervised classifier. The two components share some first layers that plays as a program feature extractor. Interestingly, we can leverage a large amount of unlabeled data from various sources to train the auto-encoder independently then transfer to the target domain. Additionally, sharing layers, and jointly training the reconstruction and the classification tasks stimulate the generation of the sophisticated features. We conducted the experiments on four real datasets with different amount of labeled data and with adding more unlabeled data. The results have confirmed that the multi-task outperforms the single-task and leveraging the unlabeled data is beneficial. Specifically, when reducing the labeled data from 100% to 75%, 50%, 25%, the performance of several deep networks drops sharply, while it reduces gradually for our model. |
---|---|
ISSN: | 0924-669X 1573-7497 |
DOI: | 10.1007/s10489-022-03663-5 |