Measures of uncertainty for partially labeled categorical data based on an indiscernibility relation: an application in semi-supervised attribute reduction

In many practical applications of machine learning, only part of data is labeled because the cost of assessing class label is relatively high. This paper concentrates on measures of uncertainty for a partial label categorical decision information system (p-CDIS), and considers an application to semi...

Full description

Saved in:

Bibliographic Details
Published in	Applied intelligence (Dordrecht, Netherlands) Vol. 53; no. 23; pp. 29486 - 29513
Main Authors	He, Jiali, Zhang, Gangqiang, Huang, Dan, Wang, Pei, Yu, Guangji
Format	Journal Article
Language	English
Published	New York Springer US 01.12.2023 Springer Nature B.V
Subjects	Algorithms Artificial Intelligence Computer Science Information systems Labels Machine learning Machines Manufacturing Mechanical Engineering Processes Statistical tests Uncertainty p-CDIS Semi-supervised attribute reduction Indiscernibility relation Uncertainty measurement Conditional information amount Conditional information entropy
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In many practical applications of machine learning, only part of data is labeled because the cost of assessing class label is relatively high. This paper concentrates on measures of uncertainty for a partial label categorical decision information system (p-CDIS), and considers an application to semi-supervised attribute reduction. Firstly, two decision information systems (DISs) can be induced by a p-CDIS ( U , C , d ): one is for a decision information system for labeled categorical data ( U l , C , d ) and the other one is a decision information system for unlabeled categorical data ( U u , C , d ) , and the missing rate of labels in ( U , C , d ) is introduced. In view of partial label data, the existential research did not take into account the missing rate of labels and only considered one importance of each attribute subset. Then, four importance of an attribute subset P ⊆ C in ( U , C , d ) are defined based on an indiscernibility relation. They are the weighted sum of the importance of P in ( U l , C , d ) and ( U u , C , d ) determined by the missing rate of labels. These four importance can be regarded as four uncertainty measurements (UMs) for ( U , P , d ). Next, numerical experiments and statistical tests are carried out on 15 datasets of UCI to demonstrate four UMs’ advantages and disadvantages. Finally, as an application for UM in p-CDIS, two better UMs are used as semi-supervised attribute reduction and two corresponding algorithms are designed that can automatically adapt to different missing rates of labels. The experimental results show the feasibility and superiority of the designed algorithms.
ISSN:	0924-669X 1573-7497
DOI:	10.1007/s10489-023-05078-2