Knowledge-enhanced document embeddings for text classification

Accurate semantic representation models are essential in text mining applications. For a successful application of the text mining process, the text representation adopted must keep the interesting patterns to be discovered. Although competitive results for automatic text classification may be achie...

Full description

Saved in:

Bibliographic Details
Published in	Knowledge-based systems Vol. 163; pp. 955 - 971
Main Authors	Sinoara, Roberta A., Camacho-Collados, Jose, Rossi, Rafael G., Navigli, Roberto, Rezende, Solange O.
Format	Journal Article
Language	English
Published	Amsterdam Elsevier B.V 01.01.2019 Elsevier Science Ltd
Subjects	Automatic classification Classification Data mining Document embeddings Representations Semantic representation Semantics Text analysis Text classification Text mining Text mining Document embeddings Semantic representation Text classification
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Accurate semantic representation models are essential in text mining applications. For a successful application of the text mining process, the text representation adopted must keep the interesting patterns to be discovered. Although competitive results for automatic text classification may be achieved with traditional bag of words, such representation model cannot provide satisfactory classification performances on hard settings where richer text representations are required. In this paper, we present an approach to represent document collections based on embedded representations of words and word senses. We bring together the power of word sense disambiguation and the semantic richness of word- and word-sense embedded vectors to construct embedded representations of document collections. Our approach results in semantically enhanced and low-dimensional representations. We overcome the lack of interpretability of embedded vectors, which is a drawback of this kind of representation, with the use of word sense embedded vectors. Moreover, the experimental evaluation indicates that the use of the proposed representations provides stable classifiers with strong quantitative results, especially in semantically-complex classification scenarios.
ISSN:	0950-7051 1872-7409
DOI:	10.1016/j.knosys.2018.10.026