Rethinking Fano's Inequality in Ensemble Learning

We propose a fundamental theory on ensemble learning that answers the central question: what factors make an ensemble system good or bad? Previous studies used a variant of Fano's inequality of information theory and derived a lower bound of the classification error rate on the basis of the \(\...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Morishita, Terufumi, Morio, Gaku, Horiguchi, Shota, Ozaki, Hiroaki, Nukaga, Nobuo
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 16.11.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We propose a fundamental theory on ensemble learning that answers the central question: what factors make an ensemble system good or bad? Previous studies used a variant of Fano's inequality of information theory and derived a lower bound of the classification error rate on the basis of the \(\textit{accuracy}\) and \(\textit{diversity}\) of models. We revisit the original Fano's inequality and argue that the studies did not take into account the information lost when multiple model predictions are combined into a final prediction. To address this issue, we generalize the previous theory to incorporate the information loss, which we name \(\textit{combination loss}\). Further, we empirically validate and demonstrate the proposed theory through extensive experiments on actual systems. The theory reveals the strengths and weaknesses of systems on each metric, which will push the theoretical understanding of ensemble learning and give us insights into designing systems.
ISSN:2331-8422