Extension of the Hybrid Method for Efficient Imputation of Records with Several Missing Attributes
The treatment of records with several discrete missing values present in the databases is still a delicate problem. Indeed, these records can bias the results of data mining algorithms, thus invalidating the results. In this paper, we present an extension of the Hybrid Method for Efficient Imputatio...
Saved in:
Published in | e-Infrastructure and e-Services for Developing Countries pp. 264 - 280 |
---|---|
Main Authors | , , |
Format | Book Chapter |
Language | English |
Published |
Cham
Springer International Publishing
|
Series | Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The treatment of records with several discrete missing values present in the databases is still a delicate problem. Indeed, these records can bias the results of data mining algorithms, thus invalidating the results. In this paper, we present an extension of the Hybrid Method for Efficient Imputation of Discrete Missing Attributes (HMID) to effectively handle these records. The method consists of partitioning the database into two subsets, one containing complete records and the other incomplete records. From the complete set, decision trees for all missing discrete attributes are created. The multiple missing records can be in the same leaf or in different leaves. In the same leaf, they are estimated directly by the HMID method. Otherwise, the sheets containing them are merged into a horizontal segment to determine the dominant modality of the complete attributes. In which case, multiple records are estimated. We evaluate our algorithm using two databases. The Adult dataset extracted from the UCI Machine Learning database and SH_CDI_Single extracted from the World Bank database. Finally, we compare our algorithm with four imputation methods using the accuracy of missing value estimation and RMSE. Our results indicate that the proposed method performs better than the existing algorithms we compared. |
---|---|
ISBN: | 9783031063732 3031063732 |
ISSN: | 1867-8211 1867-822X |
DOI: | 10.1007/978-3-031-06374-9_17 |