Discretization Using Combination of Heuristics for High Accuracy With Huge Noise Reduction
Over the years, several algorithms for discretization have been devised, but the problem of efficient, accurate discretization still remains an open problem. This paper proposes a novel discretization algorithm, called SPID5, based on combination of two heuristics, one being local and the other glob...
Saved in:
Published in | IEEE transactions on knowledge and data engineering Vol. 34; no. 4; pp. 1710 - 1722 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.04.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Over the years, several algorithms for discretization have been devised, but the problem of efficient, accurate discretization still remains an open problem. This paper proposes a novel discretization algorithm, called SPID5, based on combination of two heuristics, one being local and the other global, both being supervised and their combination resulting in a significant synergy. The local heuristic is the well-known information gain of the continuous attributes, and the global heuristic is a novel concept of iterative reduction of noise in the data set. The reduction of noise is achieved by reducing successively pseudo deletion count of the data set to be discretized. The performance of SPID5 algorithm is compared with that of three well-known and time-tested discretization algorithms, using six state-of-the-art classifiers and 35 real-world data sets from the standard UCI data repository. Performance of SPID5 compares favorably with that of all the three existing discretization algorithms it is compared with, not only in terms of classification accuracy but also in terms of noise reduction in the data sets. |
---|---|
ISSN: | 1041-4347 1558-2191 |
DOI: | 10.1109/TKDE.2020.2997719 |