Complement-Class Harmonized Naïve Bayes Classifier

Naïve Bayes (NB) classification performance degrades if the conditional independence assumption is not satisfied or if the conditional probability estimate is not realistic due to the attributes of correlation and scarce data, respectively. Many works address these two problems, but few works tackle...

Full description

Saved in:
Bibliographic Details
Published inApplied sciences Vol. 13; no. 8; p. 4852
Main Authors Alenazi, Fahad S., El Hindi, Khalil, AsSadhan, Basil
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 01.04.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Naïve Bayes (NB) classification performance degrades if the conditional independence assumption is not satisfied or if the conditional probability estimate is not realistic due to the attributes of correlation and scarce data, respectively. Many works address these two problems, but few works tackle them simultaneously. Existing methods heuristically employ information theory or applied gradient optimization to enhance NB classification performance, however, to the best of our knowledge, the enhanced model generalization capability deteriorated especially on scant data. In this work, we propose a fine-grained boosting of the NB classifier to identify hidden and potential discriminative attribute values that lead the NB model to underfit or overfit on the training data and to enhance their predictive power. We employ the complement harmonic average of the conditional probability terms to measure their distribution divergence and impact on the classification performance for each attribute value. The proposed method is subtle yet significant enough in capturing the attribute values’ inter-correlation (between classes) and intra-correlation (within the class) and elegantly and effectively measuring their impact on the model’s performance. We compare our proposed complement-class harmonized Naïve Bayes classifier (CHNB) with the state-of-the-art Naive Bayes and imbalanced ensemble boosting methods on general and imbalanced machine-learning benchmark datasets, respectively. The empirical results demonstrate that CHNB significantly outperforms the compared methods.
ISSN:2076-3417
2076-3417
DOI:10.3390/app13084852