Optimizing the class information divergence for transductive classification of texts using propagation in bipartite graphs

•Scalable algorithm based on bipartite graphs to perform transduction learning.•Label propagation procedure that uses class information associated with vertices and edges.•Better performance than state-of-the-art algorithms based on vector space or graphs.•Comprehensive evaluation showing the propos...

Full description

Saved in:

Bibliographic Details
Published in	Pattern recognition letters Vol. 87; pp. 127 - 138
Main Authors	de Paulo Faleiros, Thiago, Geraldeli Rossi, Rafael, de Andrade Lopes, Alneu
Format	Journal Article
Language	English
Published	Amsterdam Elsevier B.V 01.02.2017 Elsevier Science Ltd
Subjects	Algorithms Bipartite graphs Classification Collection Divergence Graph theory Graph-based learning Graphical representations Graphs Label propagation Propagation Text analysis Text categorization Text classification Text mining Transductive learning Text mining Transductive learning Label propagation Graph-based learning Bipartite graphs Text classification
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•Scalable algorithm based on bipartite graphs to perform transduction learning.•Label propagation procedure that uses class information associated with vertices and edges.•Better performance than state-of-the-art algorithms based on vector space or graphs.•Comprehensive evaluation showing the proposal performance with few labeled instances.•Optimization process using KL-Divergence. Transductive classification is an useful way to classify a collection of unlabeled textual documents when only a small fraction of this collection can be manually labeled. Graph-based algorithms have aroused considerable interests in recent years to perform transductive classification since the graph-based representation facilitates label propagation through the graph edges. In a bipartite graph representation, nodes represent objects of two types, here documents and terms, and the edges between documents and terms represent the occurrences of the terms in the documents. In this context, the label propagation is performed from documents to terms and then from terms to documents iteratively. In this paper we propose a new graph-based transductive algorithm that use the bipartite graph structure to associate the available class information of labeled documents and then propagate these class information to assign labels for unlabeled documents. By associating the class information to edges linking documents to terms we guarantee that a single term can propagate different class information to its distinct neighbors. We also demonstrated that the proposed method surpasses the algorithms for transductive classification based on vector space model or graphs when only a small number of labeled documents is available.
ISSN:	0167-8655 1872-7344
DOI:	10.1016/j.patrec.2016.04.006