Learning on Big Graph: Label Inference and Regularization with Anchor Hierarchy

Several models have been proposed to cope with the rapidly increasing size of data, such as Anchor Graph Regularization (AGR). The AGR approach significantly accelerates graph-based learning by exploring a set of anchors. However, when a dataset becomes much larger, AGR still faces a big graph which...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on knowledge and data engineering Vol. 29; no. 5; pp. 1101 - 1114
Main Authors	Wang, Meng, Fu, Weijie, Hao, Shijie, Liu, Hengchang, Wu, Xindong
Format	Journal Article
Language	English
Published	New York IEEE 01.05.2017 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Anchors Computational efficiency Computational modeling Data models graph-based learning label inference label smoothness regularization Laplace equations Manifolds Optimization Regularization Semi-supervised learning Semisupervised learning Smoothness
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Several models have been proposed to cope with the rapidly increasing size of data, such as Anchor Graph Regularization (AGR). The AGR approach significantly accelerates graph-based learning by exploring a set of anchors. However, when a dataset becomes much larger, AGR still faces a big graph which brings dramatically increasing computational costs. To overcome this issue, we propose a novel Hierarchical Anchor Graph Regularization (HAGR) approach by exploring multiple-layer anchors with a pyramid-style structure. In HAGR, the labels of datapoints are inferred from the coarsest anchors layer by layer in a coarse-to-fine manner. The label smoothness regularization is performed on all datapoints, and we demonstrate that the optimization process only involves a small-size reduced Laplacian matrix. We also introduce a fast approach to construct our hierarchical anchor graph based on an approximate nearest neighbor search technique. Experiments on million-scale datasets demonstrate the effectiveness and efficiency of the proposed HAGR approach over existing methods. Results show that the HAGR approach is even able to achieve a good performance within 3 minutes in an 8-million-example classification task.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2017.2654445