Discretization Using Combination of Heuristics for High Accuracy With Huge Noise Reduction

Over the years, several algorithms for discretization have been devised, but the problem of efficient, accurate discretization still remains an open problem. This paper proposes a novel discretization algorithm, called SPID5, based on combination of two heuristics, one being local and the other glob...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on knowledge and data engineering Vol. 34; no. 4; pp. 1710 - 1722
Main Authors Pal, Somnath, Ghosh, Saptarshi, Biswas, Himika, Patwari, Mitesh
Format Journal Article
LanguageEnglish
Published New York IEEE 01.04.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Over the years, several algorithms for discretization have been devised, but the problem of efficient, accurate discretization still remains an open problem. This paper proposes a novel discretization algorithm, called SPID5, based on combination of two heuristics, one being local and the other global, both being supervised and their combination resulting in a significant synergy. The local heuristic is the well-known information gain of the continuous attributes, and the global heuristic is a novel concept of iterative reduction of noise in the data set. The reduction of noise is achieved by reducing successively pseudo deletion count of the data set to be discretized. The performance of SPID5 algorithm is compared with that of three well-known and time-tested discretization algorithms, using six state-of-the-art classifiers and 35 real-world data sets from the standard UCI data repository. Performance of SPID5 compares favorably with that of all the three existing discretization algorithms it is compared with, not only in terms of classification accuracy but also in terms of noise reduction in the data sets.
ISSN:1041-4347
1558-2191
DOI:10.1109/TKDE.2020.2997719