Deep neural network for hierarchical extreme multi-label text classification

The classification of natural language texts has gained a growing importance in many real world applications due to its significant implications in relation to crucial tasks, such as Information Retrieval, Question Answering, Text Summarization, Natural Language Understanding. In this paper we prese...

Full description

Saved in:
Bibliographic Details
Published inApplied soft computing Vol. 79; pp. 125 - 138
Main Authors Gargiulo, Francesco, Silvestri, Stefano, Ciampi, Mario, De Pietro, Giuseppe
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.06.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The classification of natural language texts has gained a growing importance in many real world applications due to its significant implications in relation to crucial tasks, such as Information Retrieval, Question Answering, Text Summarization, Natural Language Understanding. In this paper we present an analysis of a Deep Learning architecture devoted to text classification, considering the extreme multi-class and multi-label text classification problem, when a hierarchical label set is defined. The paper presents a methodology named Hierarchical Label Set Expansion (HLSE), used to regularize the data labels, and an analysis of the impact of different Word Embedding (WE) models that explicitly incorporate grammatical and syntactic features. We evaluate the aforementioned methodologies on the PubMed scientific articles collection, where a multi-class and multi-label text classification problem is defined with the Medical Subject Headings (MeSH) label set, a hierarchical set of 27,775 classes. The experimental assessment proves the usefulness of the proposed HLSE methodology and also provides some interesting results relating to the impact of different uses and combinations of WE models as input to the neural network in this kind of application. [Display omitted] •Deep Neural Network architecture for extreme multilabel text classification.•Multi-label classification problem with a huge label space hierarchically organized.•Comparison among different word-embeddings methods for text representation.•Definition of a method for label set expansion exploiting the label hierarchy.•Experimental assessment based on flat and hierarchical measures.
ISSN:1568-4946
1872-9681
DOI:10.1016/j.asoc.2019.03.041