A Unified Treatment of Partial Stragglers and Sparse Matrices in Coded Matrix Computation

The overall execution time of distributed matrix computations is often dominated by slow worker nodes (stragglers) over the clusters. Recently, different coding techniques have been utilized to mitigate the effect of stragglers where worker nodes are assigned the task of processing encoded submatric...

Full description

Saved in:
Bibliographic Details
Published in2021 IEEE Information Theory Workshop (ITW) pp. 1 - 6
Main Authors Das, Anindya Bijoy, Ramamoorthy, Aditya
Format Conference Proceeding
LanguageEnglish
Published IEEE 17.10.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The overall execution time of distributed matrix computations is often dominated by slow worker nodes (stragglers) over the clusters. Recently, different coding techniques have been utilized to mitigate the effect of stragglers where worker nodes are assigned the task of processing encoded submatrices of the original matrices. In many machine learning or optimization problems the relevant matrices are often sparse. Several coded computation methods operate with dense linear combinations of the original submatrices; this can significantly increase the worker node computation times and consequently the overall job execution time. Moreover, several existing techniques treat the stragglers as failures (erasures) and discard their computations. In this work, we present a coding approach which operates with limited encoding of the original submatrices and utilizes the partial computations done by the slower workers. Our scheme continues to have the optimal threshold of prior work. Extensive numerical experiments done in AWS (Amazon Web Services) cluster confirm that the proposed approach enhances the speed of the worker computations (and thus the whole process) significantly.
DOI:10.1109/ITW48936.2021.9611400