A subspace ensemble framework for classification with high dimensional missing data

Real world classification tasks may involve high dimensional missing data. The traditional approach to handling the missing data is to impute the data first, and then apply the traditional classification algorithms on the imputed data. This method first assumes that there exist a distribution or fea...

Full description

Saved in:

Bibliographic Details
Published in	Multidimensional systems and signal processing Vol. 28; no. 4; pp. 1309 - 1324
Main Authors	Gao, Hang, Jian, Songlei, Peng, Yuxing, Liu, Xinwang
Format	Journal Article
Language	English
Published	New York Springer US 01.10.2017 Springer Nature B.V
Subjects	Artificial Intelligence Circuits and Systems Classification Classifiers Datasets Electrical Engineering Engineering Missing data Partitions Signal,Image and Speech Processing Subspaces Missing data High dimensional data Extreme learning machine Subspace ensemble
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Real world classification tasks may involve high dimensional missing data. The traditional approach to handling the missing data is to impute the data first, and then apply the traditional classification algorithms on the imputed data. This method first assumes that there exist a distribution or feature relations among the data, and then estimates missing items with existing observed values. A reasonable assumption is a necessary guarantee for accurate imputation. The distribution or feature relations of data, however, is often complex or even impossible to be captured in high dimensional data sets, leading to inaccurate imputation. In this paper, we propose a complete-case projection subspace ensemble framework, where two alternative partition strategies, namely bootstrap subspace partition and missing pattern-sensitive subspace partition, are developed for incomplete datasets with even missing patterns and uneven missing patterns, respectively. Multiple component classifiers are then separately trained in these subspaces. After that, a final ensemble classifier is constructed by a weighted majority vote of component classifiers. In the experiments, we demonstrate the effectiveness of the proposed framework over eight high dimensional UCI datasets. Meanwhile, we apply the two proposed partition strategies over data sets with different missing patterns. As indicated, the proposed algorithm significantly outperforms existing imputation methods in most cases.
ISSN:	0923-6082 1573-0824
DOI:	10.1007/s11045-016-0393-4