Matrix representation of the conditional entropy for incremental feature selection on multi-source data

In many real applications, the data are always collected from different information sources and are subject to evolve over time. Such data are referred to as dynamic multi-source data. How to efficiently select the informative features from dynamic multi-source data is a challenging problem in data...

Full description

Saved in:
Bibliographic Details
Published inInformation sciences Vol. 591; pp. 263 - 286
Main Authors Huang, Yanyong, Guo, Kejun, Xiuwen Yi, Li, Zhong, Li, Tianrui
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.04.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In many real applications, the data are always collected from different information sources and are subject to evolve over time. Such data are referred to as dynamic multi-source data. How to efficiently select the informative features from dynamic multi-source data is a challenging problem in data mining. Incremental feature selection with rough sets is an effective method to select features from dynamic data. However, existing methods focus on single-source data and are not suitable for dynamic multi-source data with variations in data sources. To deal with this issue, we present an incremental feature selection method based on the matrix representation of the conditional entropy. We first propose a novel conditional entropy for multi-source data and discuss its properties, including the monotonicity and boundedness. Then, matrix characterization of the conditional entropy is presented by employing the condition and decision relation matrices associated with some matrix operators. Finally, considering the addition and deletion of data sources in multi-source data, we employ the matrix approach to investigate the incremental mechanisms for the computation of the conditional entropy and develop the corresponding incremental feature selection algorithms. Extensive comparative experimental results are obtained to verify the effectiveness and efficiency of the proposed method.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2022.01.037