Extension of the Hybrid Method for Efficient Imputation of Records with Several Missing Attributes

The treatment of records with several discrete missing values present in the databases is still a delicate problem. Indeed, these records can bias the results of data mining algorithms, thus invalidating the results. In this paper, we present an extension of the Hybrid Method for Efficient Imputatio...

Full description

Saved in:
Bibliographic Details
Published ine-Infrastructure and e-Services for Developing Countries pp. 264 - 280
Main Authors Dramane, Kone, Prosper, Kimou Kouadio, Tra, Goore Bi
Format Book Chapter
LanguageEnglish
Published Cham Springer International Publishing
SeriesLecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The treatment of records with several discrete missing values present in the databases is still a delicate problem. Indeed, these records can bias the results of data mining algorithms, thus invalidating the results. In this paper, we present an extension of the Hybrid Method for Efficient Imputation of Discrete Missing Attributes (HMID) to effectively handle these records. The method consists of partitioning the database into two subsets, one containing complete records and the other incomplete records. From the complete set, decision trees for all missing discrete attributes are created. The multiple missing records can be in the same leaf or in different leaves. In the same leaf, they are estimated directly by the HMID method. Otherwise, the sheets containing them are merged into a horizontal segment to determine the dominant modality of the complete attributes. In which case, multiple records are estimated. We evaluate our algorithm using two databases. The Adult dataset extracted from the UCI Machine Learning database and SH_CDI_Single extracted from the World Bank database. Finally, we compare our algorithm with four imputation methods using the accuracy of missing value estimation and RMSE. Our results indicate that the proposed method performs better than the existing algorithms we compared.
ISBN:9783031063732
3031063732
ISSN:1867-8211
1867-822X
DOI:10.1007/978-3-031-06374-9_17