A lightweight filter based feature selection approach for multi-label text classification

Multi-label Text Classification (MTC) is a challenging task in Natural Language Processing (NLP). The goal of the MTC task is to label a document with a set of labels. By incorporating various term weighting schemes in MTC, high dimensional feature space has been generated; due to that, multi-label...

Full description

Saved in:

Bibliographic Details
Published in	Journal of ambient intelligence and humanized computing Vol. 14; no. 9; pp. 12345 - 12357
Main Authors	Dhal, Pradip, Azad, Chandrashekhar
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 01.09.2023 Springer Nature B.V
Subjects	Accuracy Algorithms Artificial Intelligence Classification Computational Intelligence Engineering Feature extraction Labels Lightweight Machine learning Multilayer perceptrons Multilayers Natural language processing Optimization Original Research Robotics and Automation Set theory Text categorization User Interfaces and Human Computer Interaction Weighting Multi-label text classification multi-layer perceptron chi-square based feature selection
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Multi-label Text Classification (MTC) is a challenging task in Natural Language Processing (NLP). The goal of the MTC task is to label a document with a set of labels. By incorporating various term weighting schemes in MTC, high dimensional feature space has been generated; due to that, multi-label learning algorithms face substantial problems in performing MTC tasks. To deal with these issues, Feature Selection (FS) approaches are effective solutions. This paper proposes a Lightweight Term-weighting FS (LwTwFS) approach based on a modified Chi-square (CHI) filter-based FS method to deal with this issue. The modified CHI approach works for Inter-Class Concentration (ICC) and Intra-Class Dispersion (ICD), and its strength has been increased by adding positive and negative correlations. A novel modified equation has been introduced to distribute the features among the categories (i.e., here, multi-label) in the corpus. The proposed modified CHI-based FS approach works on the term weighting-based Feature Extraction (FE) approach. Multi-Layer Perceptron (MLP) has been used in the classification phase due to the adaptive learning property, which refers to learning how to do tasks based on data provided during training or prior experience. We have used two publicly available multi-label corpora for experimental verification: the Arxiv Academic Paper Dataset (AAPD) and the Reuters Corpus Volume I (RCVI-V2). According to the results, in terms of performance, the LwTwFS methodology combined with the MLP classifier surpasses other combinations in terms of Jaccard Score (JS), Hamming Loss (HL), Ranking Loss (RL), Precision (Pr), Recall (Re), and F-micro and F-macro. For the AAPD corpus, the LwTwFS method achieves the best JS, HL, RL, Pr, F-micro, and F-macro values, which are 0.9636, 0.0121, 0.0303, 0.9636, 0.9882, and 0.9894. For the RCVI-V2 corpus, the LwTwFS method achieves the best JS, Pr, Re, F-micro, and F-macro values of 1.0000, and HL, RL values of 0.0000. Empirical results on widely used two benchmark multi-label text corpus show that LwTwFS achieves competitive performance, especially when labels are limited.
ISSN:	1868-5137 1868-5145
DOI:	10.1007/s12652-022-04335-5