iDMI: A novel technique for missing value imputation using a decision tree and expectation-maximization algorithm

In this paper we present a novel technique called iDMI that imputes missing values of a data set by combining a decision tree algorithm (DT) and an expectation-maximization (EMI) algorithm. We first divide a data set into horizontal segments through applying a DT algorithm such as C4.5, and then app...

Full description

Saved in:
Bibliographic Details
Published in16th Int'l Conf. Computer and Information Technology pp. 496 - 501
Main Authors Rahman, Md Geaur, Islam, Md Zahidul
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.03.2014
Subjects
Online AccessGet full text
DOI10.1109/ICCITechn.2014.6997351

Cover

Loading…
Abstract In this paper we present a novel technique called iDMI that imputes missing values of a data set by combining a decision tree algorithm (DT) and an expectation-maximization (EMI) algorithm. We first divide a data set into horizontal segments through applying a DT algorithm such as C4.5, and then apply an EMI algorithm on each segment in order to impute the missing values belong to the segment. If all numerical attribute values of a record are missing then we impute them by the mean values of the attributes of the records belong to a segment where the record falls in, and thereby reduce the computational time complexity of iDMI compare to an existing technique called DMI which calculate the mean value of an attribute by using all records of a data set. We evaluate the performance of iDMI over three high quality existing techniques on two real data sets in terms of four evaluation criteria. Our initial experimental results, including several statistical significance analysis, indicate the superiority of iDMI over the existing techniques.
AbstractList In this paper we present a novel technique called iDMI that imputes missing values of a data set by combining a decision tree algorithm (DT) and an expectation-maximization (EMI) algorithm. We first divide a data set into horizontal segments through applying a DT algorithm such as C4.5, and then apply an EMI algorithm on each segment in order to impute the missing values belong to the segment. If all numerical attribute values of a record are missing then we impute them by the mean values of the attributes of the records belong to a segment where the record falls in, and thereby reduce the computational time complexity of iDMI compare to an existing technique called DMI which calculate the mean value of an attribute by using all records of a data set. We evaluate the performance of iDMI over three high quality existing techniques on two real data sets in terms of four evaluation criteria. Our initial experimental results, including several statistical significance analysis, indicate the superiority of iDMI over the existing techniques.
Author Rahman, Md Geaur
Islam, Md Zahidul
Author_xml – sequence: 1
  givenname: Md Geaur
  surname: Rahman
  fullname: Rahman, Md Geaur
  email: grahman@csu.edu.au
  organization: Center for Res. in Complex Syst. (CRiCS), Charles Sturt Univ., Bathurst, NSW, Australia
– sequence: 2
  givenname: Md Zahidul
  surname: Islam
  fullname: Islam, Md Zahidul
  email: zislam@csu.edu.au
  organization: Center for Res. in Complex Syst. (CRiCS), Charles Sturt Univ., Bathurst, NSW, Australia
BookMark eNotkM1Og0AUhcdEF1r7BCZmXgCcvw6Muwa1ktS4Yd8MM5f2JjBQoE316W2lq5N8Od9ZnAdyG9oAhDxzFnPOzEueZXkBbhdiwbiKtTGJXPAbMjdJylVijFRnck_2-PaVv9IlDe0RajpeFNwfgFZtTxscBgxberT1mWDTHUY7Yhvo4R9b6sHhcAFjD0Bt8BROHbipFTX2hA3-Toqtt22P4655JHeVrQeYX3NGio_3IvuM1t-rPFuuIzRsjIAB0zY1zItKaecE87pSohS6TDQXqbWKpamvjE8XrFTScu6Eg8QZVTJZlXJGnqZZBIBN12Nj-5_N9Qf5B9CtWy0
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICCITechn.2014.6997351
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9781479934973
1479934976
EndPage 501
ExternalDocumentID 6997351
Genre orig-research
GroupedDBID 6IE
6IL
CBEJK
RIE
RIL
ID FETCH-LOGICAL-i90t-e0e06a890d2f46cc20d6f42b26b76128aa4088df9d850b43a11c2ce7c94b03fb3
IEDL.DBID RIE
IngestDate Thu Jun 29 18:37:37 EDT 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i90t-e0e06a890d2f46cc20d6f42b26b76128aa4088df9d850b43a11c2ce7c94b03fb3
PageCount 6
ParticipantIDs ieee_primary_6997351
PublicationCentury 2000
PublicationDate 2014-March
PublicationDateYYYYMMDD 2014-03-01
PublicationDate_xml – month: 03
  year: 2014
  text: 2014-March
PublicationDecade 2010
PublicationTitle 16th Int'l Conf. Computer and Information Technology
PublicationTitleAbbrev ICCITechn
PublicationYear 2014
Publisher IEEE
Publisher_xml – name: IEEE
Score 1.5863941
Snippet In this paper we present a novel technique called iDMI that imputes missing values of a data set by combining a decision tree algorithm (DT) and an...
SourceID ieee
SourceType Publisher
StartPage 496
SubjectTerms Accuracy
Computers
Correlation
data cleansing
Data pre-processing
Decision trees
Electromagnetic interference
EM algorithm
Information technology
missing value imputation
Remuneration
Title iDMI: A novel technique for missing value imputation using a decision tree and expectation-maximization algorithm
URI https://ieeexplore.ieee.org/document/6997351
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwGA1zJ08qm_ib7-DRdOmatYk3mY5NmHiYsNvITy1urY5WxL_epK0TxYO3EgJt89J86Zf33ofQudVWCIcz5mIQYSoox9JQimkkWWSYpCbxAufpXTx-oLfzwbyFLjZaGGNMRT4zgb-szvJ1rkqfKuvFnCeR10tvuWlWa7Ua0W9IeG8yHE6qfLQnbNGg6fyjakoVNEY7aPp1u5or8hyUhQzUxy8nxv8-zy7qfsvz4H4TePZQy2Qd9JpeTyeXcAVZ_maWsPFmBbcrBQemzwmAt_Y2kPpCDhUiUFbNAnRTagf8ITWITIO3_ld1L7wS7-mqEWyCWD7m67R4WnXRbHQzG45xU08Bp5wU2BBDYsE40X1LPV2a6NjSvuzHMnH7HCYEdUuOtlyzAZE0EmGo-sokilNJIiujfdTO8swcILD-27WJe0_m_jBCy3ioiGVSxXESuqX8EHX8aC1easeMRTNQR383H6Ntj1jN7DpB7WJdmlMX6gt5VmH8Ce4irtM
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT4MwGG2WedCTms342x48CitQoPVmpsumY_Ewk92W_lTiBrqAMf71toAzGg_eSNOE0hf6tV_fex8A51pqxgzODmVh4GCGqcMVxg4OOAkU4VjFVuCcTKLhA76dhbMWuFhrYZRSFflMufaxusuXuShtqqwXURoHVi-9YeI-Dmu1ViP79RDtjfr9UZWRtpQt7Dbdf9RNqcLGYBskXy-s2SLPbllwV3z88mL874h2QPdboAfv16FnF7RU1gGv6XUyuoRXMMvf1AKu3Vmh2ZdCA6fNCkBr7q1gaks5VJjAsmpmUDbFdqC9poYsk9Ca_4u6l7Nk7-mykWxCtnjMV2nxtOyC6eBm2h86TUUFJ6WocBRSKGKEIulrbAnTSEYa-9yPeGx2OoQxbBYdqakkIeI4YJ4nfKFiQTFHgebBHmhneab2AdT279Wx-U5izhieJtQTSBMuoij2zGJ-ADp2tuYvtWfGvJmow7-bz8DmcJqM5-PR5O4IbFn0ap7XMWgXq1KdmMBf8NMK708-g7Ig
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=16th+Int%27l+Conf.+Computer+and+Information+Technology&rft.atitle=iDMI%3A+A+novel+technique+for+missing+value+imputation+using+a+decision+tree+and+expectation-maximization+algorithm&rft.au=Rahman%2C+Md+Geaur&rft.au=Islam%2C+Md+Zahidul&rft.date=2014-03-01&rft.pub=IEEE&rft.spage=496&rft.epage=501&rft_id=info:doi/10.1109%2FICCITechn.2014.6997351&rft.externalDocID=6997351