Optimizing the class information divergence for transductive classification of texts using propagation in bipartite graphs

•Scalable algorithm based on bipartite graphs to perform transduction learning.•Label propagation procedure that uses class information associated with vertices and edges.•Better performance than state-of-the-art algorithms based on vector space or graphs.•Comprehensive evaluation showing the propos...

Full description

Saved in:
Bibliographic Details
Published inPattern recognition letters Vol. 87; pp. 127 - 138
Main Authors de Paulo Faleiros, Thiago, Geraldeli Rossi, Rafael, de Andrade Lopes, Alneu
Format Journal Article
LanguageEnglish
Published Amsterdam Elsevier B.V 01.02.2017
Elsevier Science Ltd
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•Scalable algorithm based on bipartite graphs to perform transduction learning.•Label propagation procedure that uses class information associated with vertices and edges.•Better performance than state-of-the-art algorithms based on vector space or graphs.•Comprehensive evaluation showing the proposal performance with few labeled instances.•Optimization process using KL-Divergence. Transductive classification is an useful way to classify a collection of unlabeled textual documents when only a small fraction of this collection can be manually labeled. Graph-based algorithms have aroused considerable interests in recent years to perform transductive classification since the graph-based representation facilitates label propagation through the graph edges. In a bipartite graph representation, nodes represent objects of two types, here documents and terms, and the edges between documents and terms represent the occurrences of the terms in the documents. In this context, the label propagation is performed from documents to terms and then from terms to documents iteratively. In this paper we propose a new graph-based transductive algorithm that use the bipartite graph structure to associate the available class information of labeled documents and then propagate these class information to assign labels for unlabeled documents. By associating the class information to edges linking documents to terms we guarantee that a single term can propagate different class information to its distinct neighbors. We also demonstrated that the proposed method surpasses the algorithms for transductive classification based on vector space model or graphs when only a small number of labeled documents is available.
ISSN:0167-8655
1872-7344
DOI:10.1016/j.patrec.2016.04.006