Grey Relational Analysis Based k Nearest Neighbor Missing Data Imputation for Software Quality Datasets

Software quality estimation is important yet difficult in software engineering studies. Historical quality datasets are used to build classification models for estimating fault-proneness. However, the missing values in the datasets severely affect the estimation ability and therefore, cause inconclu...

Full description

Saved in:

Bibliographic Details
Published in	2016 IEEE International Conference on Software Quality, Reliability and Security (QRS) pp. 86 - 91
Main Authors	Jianglin Huang, Hongyi Sun
Format	Conference Proceeding
Language	English
Published	IEEE 01.08.2016
Subjects	Data models empirical software engineering estimation Estimation imputation kNN Mars Measurement missing data Nickel Software engineering Software quality
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Software quality estimation is important yet difficult in software engineering studies. Historical quality datasets are used to build classification models for estimating fault-proneness. However, the missing values in the datasets severely affect the estimation ability and therefore, cause inconclusive decision-making. Among the single imputation approaches, k nearest neighbor (kNN) imputation is popular in empirical studies due to the relatively high accuracy. However, researchers are still calling for the optimal parameter setting of kNN imputation. In this study, a novel grey relational analysis based incomplete-instance kNN imputation is built for software quality data. An evaluation is conducted on four quality datasets with different simulated missingness scenarios to analyze the performance of the proposed imputation. The empirical results show that the proposed approach is superior to traditional kNN imputation and mean imputation in most cases. Moreover, the classification accuracy can be maintained or even improved by using this approach in classification tasks.
DOI:	10.1109/QRS.2016.20