Scaling Stratified Stochastic Gradient Descent for Distributed Matrix Completion

Stratified SGD (SSGD) is the primary approach for achieving serializable parallel SGD for matrix completion. State-of-the-art parallelizations of SSGD fail to scale due to large communication overhead. During an SGD epoch, these methods send data proportional to one of the dimensions of the rating m...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on knowledge and data engineering Vol. 35; no. 10; pp. 1 - 13
Main Authors	Abubaker, Nabil, Karsavuran, M. Ozan, Aykanat, Cevdet
Format	Journal Article
Language	English
Published	New York IEEE 01.10.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Bandwidth Bandwidth cost Collaboration Collaborative Filtering Combinatorial algorithms Communication Communication cost minimization Convergence Costs Dynamic programming HPC Hypergraph partitioning Latency cost Matrix Completion Messages Microprocessors Processor scheduling Recommender Systems Scalability SGD Sparse matrices Upper bounds
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Stratified SGD (SSGD) is the primary approach for achieving serializable parallel SGD for matrix completion. State-of-the-art parallelizations of SSGD fail to scale due to large communication overhead. During an SGD epoch, these methods send data proportional to one of the dimensions of the rating matrix. We propose a framework for scalable SSGD through significantly reducing the communication overhead via exchanging point-to-point messages utilizing the sparsity of the rating matrix. We provide formulas to represent the essential communication for correctly performing parallel SSGD and we propose a dynamic programming algorithm for efficiently computing them to establish the point-to-point message schedules. This scheme, however, significantly increases the number of messages sent by a processor per epoch from <inline-formula><tex-math notation="LaTeX">\mathcal {O}(K)</tex-math></inline-formula> to <inline-formula><tex-math notation="LaTeX">\mathcal {O}(K^{2})</tex-math></inline-formula> for a K -processor system which might limit the scalability. To remedy this, we propose a Hold-and-Combine strategy to limit the upper-bound on the number of messages sent per processor to <inline-formula><tex-math notation="LaTeX">\mathcal {O}(K \,{\rm{lg}} K)</tex-math></inline-formula>. We also propose a hypergraph partitioning model that correctly encapsulates reducing the communication volume. Experimental results show that the framework successfully achieves a scalable distributed SSGD through significantly reducing the communication overhead. Our code is publicly available at: github.com/nfabubaker/CESSGD
ISSN:	1041-4347 1558-2191
DOI:	10.1109/TKDE.2023.3253791