iDMI: A novel technique for missing value imputation using a decision tree and expectation-maximization algorithm
In this paper we present a novel technique called iDMI that imputes missing values of a data set by combining a decision tree algorithm (DT) and an expectation-maximization (EMI) algorithm. We first divide a data set into horizontal segments through applying a DT algorithm such as C4.5, and then app...
Saved in:
Published in | 16th Int'l Conf. Computer and Information Technology pp. 496 - 501 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.03.2014
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/ICCITechn.2014.6997351 |
Cover
Loading…
Abstract | In this paper we present a novel technique called iDMI that imputes missing values of a data set by combining a decision tree algorithm (DT) and an expectation-maximization (EMI) algorithm. We first divide a data set into horizontal segments through applying a DT algorithm such as C4.5, and then apply an EMI algorithm on each segment in order to impute the missing values belong to the segment. If all numerical attribute values of a record are missing then we impute them by the mean values of the attributes of the records belong to a segment where the record falls in, and thereby reduce the computational time complexity of iDMI compare to an existing technique called DMI which calculate the mean value of an attribute by using all records of a data set. We evaluate the performance of iDMI over three high quality existing techniques on two real data sets in terms of four evaluation criteria. Our initial experimental results, including several statistical significance analysis, indicate the superiority of iDMI over the existing techniques. |
---|---|
AbstractList | In this paper we present a novel technique called iDMI that imputes missing values of a data set by combining a decision tree algorithm (DT) and an expectation-maximization (EMI) algorithm. We first divide a data set into horizontal segments through applying a DT algorithm such as C4.5, and then apply an EMI algorithm on each segment in order to impute the missing values belong to the segment. If all numerical attribute values of a record are missing then we impute them by the mean values of the attributes of the records belong to a segment where the record falls in, and thereby reduce the computational time complexity of iDMI compare to an existing technique called DMI which calculate the mean value of an attribute by using all records of a data set. We evaluate the performance of iDMI over three high quality existing techniques on two real data sets in terms of four evaluation criteria. Our initial experimental results, including several statistical significance analysis, indicate the superiority of iDMI over the existing techniques. |
Author | Rahman, Md Geaur Islam, Md Zahidul |
Author_xml | – sequence: 1 givenname: Md Geaur surname: Rahman fullname: Rahman, Md Geaur email: grahman@csu.edu.au organization: Center for Res. in Complex Syst. (CRiCS), Charles Sturt Univ., Bathurst, NSW, Australia – sequence: 2 givenname: Md Zahidul surname: Islam fullname: Islam, Md Zahidul email: zislam@csu.edu.au organization: Center for Res. in Complex Syst. (CRiCS), Charles Sturt Univ., Bathurst, NSW, Australia |
BookMark | eNotkM1Og0AUhcdEF1r7BCZmXgCcvw6Muwa1ktS4Yd8MM5f2JjBQoE316W2lq5N8Od9ZnAdyG9oAhDxzFnPOzEueZXkBbhdiwbiKtTGJXPAbMjdJylVijFRnck_2-PaVv9IlDe0RajpeFNwfgFZtTxscBgxberT1mWDTHUY7Yhvo4R9b6sHhcAFjD0Bt8BROHbipFTX2hA3-Toqtt22P4655JHeVrQeYX3NGio_3IvuM1t-rPFuuIzRsjIAB0zY1zItKaecE87pSohS6TDQXqbWKpamvjE8XrFTScu6Eg8QZVTJZlXJGnqZZBIBN12Nj-5_N9Qf5B9CtWy0 |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/ICCITechn.2014.6997351 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
EISBN | 9781479934973 1479934976 |
EndPage | 501 |
ExternalDocumentID | 6997351 |
Genre | orig-research |
GroupedDBID | 6IE 6IL CBEJK RIE RIL |
ID | FETCH-LOGICAL-i90t-e0e06a890d2f46cc20d6f42b26b76128aa4088df9d850b43a11c2ce7c94b03fb3 |
IEDL.DBID | RIE |
IngestDate | Thu Jun 29 18:37:37 EDT 2023 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i90t-e0e06a890d2f46cc20d6f42b26b76128aa4088df9d850b43a11c2ce7c94b03fb3 |
PageCount | 6 |
ParticipantIDs | ieee_primary_6997351 |
PublicationCentury | 2000 |
PublicationDate | 2014-March |
PublicationDateYYYYMMDD | 2014-03-01 |
PublicationDate_xml | – month: 03 year: 2014 text: 2014-March |
PublicationDecade | 2010 |
PublicationTitle | 16th Int'l Conf. Computer and Information Technology |
PublicationTitleAbbrev | ICCITechn |
PublicationYear | 2014 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
Score | 1.5863941 |
Snippet | In this paper we present a novel technique called iDMI that imputes missing values of a data set by combining a decision tree algorithm (DT) and an... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 496 |
SubjectTerms | Accuracy Computers Correlation data cleansing Data pre-processing Decision trees Electromagnetic interference EM algorithm Information technology missing value imputation Remuneration |
Title | iDMI: A novel technique for missing value imputation using a decision tree and expectation-maximization algorithm |
URI | https://ieeexplore.ieee.org/document/6997351 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwGA1zJ08qm_ib7-DRdOmatYk3mY5NmHiYsNvITy1urY5WxL_epK0TxYO3EgJt89J86Zf33ofQudVWCIcz5mIQYSoox9JQimkkWWSYpCbxAufpXTx-oLfzwbyFLjZaGGNMRT4zgb-szvJ1rkqfKuvFnCeR10tvuWlWa7Ua0W9IeG8yHE6qfLQnbNGg6fyjakoVNEY7aPp1u5or8hyUhQzUxy8nxv8-zy7qfsvz4H4TePZQy2Qd9JpeTyeXcAVZ_maWsPFmBbcrBQemzwmAt_Y2kPpCDhUiUFbNAnRTagf8ITWITIO3_ld1L7wS7-mqEWyCWD7m67R4WnXRbHQzG45xU08Bp5wU2BBDYsE40X1LPV2a6NjSvuzHMnH7HCYEdUuOtlyzAZE0EmGo-sokilNJIiujfdTO8swcILD-27WJe0_m_jBCy3ioiGVSxXESuqX8EHX8aC1easeMRTNQR383H6Ntj1jN7DpB7WJdmlMX6gt5VmH8Ce4irtM |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT4MwGG2WedCTms342x48CitQoPVmpsumY_Ewk92W_lTiBrqAMf71toAzGg_eSNOE0hf6tV_fex8A51pqxgzODmVh4GCGqcMVxg4OOAkU4VjFVuCcTKLhA76dhbMWuFhrYZRSFflMufaxusuXuShtqqwXURoHVi-9YeI-Dmu1ViP79RDtjfr9UZWRtpQt7Dbdf9RNqcLGYBskXy-s2SLPbllwV3z88mL874h2QPdboAfv16FnF7RU1gGv6XUyuoRXMMvf1AKu3Vmh2ZdCA6fNCkBr7q1gaks5VJjAsmpmUDbFdqC9poYsk9Ca_4u6l7Nk7-mykWxCtnjMV2nxtOyC6eBm2h86TUUFJ6WocBRSKGKEIulrbAnTSEYa-9yPeGx2OoQxbBYdqakkIeI4YJ4nfKFiQTFHgebBHmhneab2AdT279Wx-U5izhieJtQTSBMuoij2zGJ-ADp2tuYvtWfGvJmow7-bz8DmcJqM5-PR5O4IbFn0ap7XMWgXq1KdmMBf8NMK708-g7Ig |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=16th+Int%27l+Conf.+Computer+and+Information+Technology&rft.atitle=iDMI%3A+A+novel+technique+for+missing+value+imputation+using+a+decision+tree+and+expectation-maximization+algorithm&rft.au=Rahman%2C+Md+Geaur&rft.au=Islam%2C+Md+Zahidul&rft.date=2014-03-01&rft.pub=IEEE&rft.spage=496&rft.epage=501&rft_id=info:doi/10.1109%2FICCITechn.2014.6997351&rft.externalDocID=6997351 |