Speeding Up Distributed Machine Learning Using Codes

Codes are widely used in many engineering applications to offer robustness against noise . In large-scale systems, there are several types of noise that can affect the performance of distributed machine learning algorithms-straggler nodes, system failures, or communication bottlenecks-but there has...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on information theory Vol. 64; no. 3; pp. 1514 - 1529
Main Authors	Lee, Kangwook, Lam, Maximilian, Pedarsani, Ramtin, Papailiopoulos, Dimitris, Ramchandran, Kannan
Format	Journal Article
Language	English
Published	New York IEEE 01.03.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithm design and analysis Algorithms Artificial intelligence channel coding Codes Communication Communications systems Computer networks distributed computing Distributed databases Encoding Machine learning Machine learning algorithms Machine tools Multicast communication Multicasting Multiplication Multiplication & division Robustness Runtime System failures
Online Access	Get full text

Cover

Loading…

Abstract	Codes are widely used in many engineering applications to offer robustness against noise . In large-scale systems, there are several types of noise that can affect the performance of distributed machine learning algorithms-straggler nodes, system failures, or communication bottlenecks-but there has been little interaction cutting across codes, machine learning, and distributed systems. In this paper, we provide theoretical insights on how coded solutions can achieve significant gains compared with uncoded ones. We focus on two of the most basic building blocks of distributed learning algorithms: matrix multiplication and data shuffling . For matrix multiplication, we use codes to alleviate the effect of stragglers and show that if the number of homogeneous workers is <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula>, and the runtime of each subtask has an exponential tail, coded computation can speed up distributed matrix multiplication by a factor of <inline-formula> <tex-math notation="LaTeX">\log n </tex-math></inline-formula>. For data shuffling, we use codes to reduce communication bottlenecks, exploiting the excess in storage. We show that when a constant fraction <inline-formula> <tex-math notation="LaTeX">\alpha </tex-math></inline-formula> of the data matrix can be cached at each worker, and <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> is the number of workers, coded shuffling reduces the communication cost by a factor of <inline-formula> <tex-math notation="LaTeX">\left({\alpha + \frac {1}{n}}\right)\gamma (n) </tex-math></inline-formula> compared with uncoded shuffling, where <inline-formula> <tex-math notation="LaTeX">\gamma (n) </tex-math></inline-formula> is the ratio of the cost of unicasting <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> messages to <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> users to multicasting a common message (of the same size) to <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> users. For instance, <inline-formula> <tex-math notation="LaTeX">\gamma (n) \simeq n </tex-math></inline-formula> if multicasting a message to <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> users is as cheap as unicasting a message to one user. We also provide experimental results, corroborating our theoretical gains of the coded algorithms.
AbstractList	Codes are widely used in many engineering applications to offer robustness against noise . In large-scale systems, there are several types of noise that can affect the performance of distributed machine learning algorithms-straggler nodes, system failures, or communication bottlenecks-but there has been little interaction cutting across codes, machine learning, and distributed systems. In this paper, we provide theoretical insights on how coded solutions can achieve significant gains compared with uncoded ones. We focus on two of the most basic building blocks of distributed learning algorithms: matrix multiplication and data shuffling . For matrix multiplication, we use codes to alleviate the effect of stragglers and show that if the number of homogeneous workers is <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula>, and the runtime of each subtask has an exponential tail, coded computation can speed up distributed matrix multiplication by a factor of <inline-formula> <tex-math notation="LaTeX">\log n </tex-math></inline-formula>. For data shuffling, we use codes to reduce communication bottlenecks, exploiting the excess in storage. We show that when a constant fraction <inline-formula> <tex-math notation="LaTeX">\alpha </tex-math></inline-formula> of the data matrix can be cached at each worker, and <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> is the number of workers, coded shuffling reduces the communication cost by a factor of <inline-formula> <tex-math notation="LaTeX">\left({\alpha + \frac {1}{n}}\right)\gamma (n) </tex-math></inline-formula> compared with uncoded shuffling, where <inline-formula> <tex-math notation="LaTeX">\gamma (n) </tex-math></inline-formula> is the ratio of the cost of unicasting <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> messages to <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> users to multicasting a common message (of the same size) to <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> users. For instance, <inline-formula> <tex-math notation="LaTeX">\gamma (n) \simeq n </tex-math></inline-formula> if multicasting a message to <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> users is as cheap as unicasting a message to one user. We also provide experimental results, corroborating our theoretical gains of the coded algorithms. Codes are widely used in many engineering applications to offer robustness against noise. In large-scale systems, there are several types of noise that can affect the performance of distributed machine learning algorithms-straggler nodes, system failures, or communication bottlenecks-but there has been little interaction cutting across codes, machine learning, and distributed systems. In this paper, we provide theoretical insights on how coded solutions can achieve significant gains compared with uncoded ones. We focus on two of the most basic building blocks of distributed learning algorithms: matrix multiplication and data shuffling. For matrix multiplication, we use codes to alleviate the effect of stragglers and show that if the number of homogeneous workers is n, and the runtime of each subtask has an exponential tail, coded computation can speed up distributed matrix multiplication by a factor of log n. For data shuffling, we use codes to reduce communication bottlenecks, exploiting the excess in storage. We show that when a constant fraction α of the data matrix can be cached at each worker, and n is the number of workers, coded shuffling reduces the communication cost by a factor of (α+ n/1)y (n) compared with uncoded shuffling, where y (n) is the ratio of the cost of unicasting n messages to n users to multicasting a common message (of the same size) to n users. For instance, y (n) ≃ n if multicasting a message to n users is as cheap as unicasting a message to one user. We also provide experimental results, corroborating our theoretical gains of the coded algorithms.
Author	Papailiopoulos, Dimitris Pedarsani, Ramtin Ramchandran, Kannan Lee, Kangwook Lam, Maximilian
Author_xml	– sequence: 1 givenname: Kangwook orcidid: 0000-0002-3360-9678 surname: Lee fullname: Lee, Kangwook email: kw1jjang@kaist.ac.kr organization: School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea – sequence: 2 givenname: Maximilian surname: Lam fullname: Lam, Maximilian email: agnusmaximus@berkeley.edu organization: Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, USA – sequence: 3 givenname: Ramtin orcidid: 0000-0002-1126-0292 surname: Pedarsani fullname: Pedarsani, Ramtin email: ramtin@ece.ucsb.edu organization: Department of Electrical and Computer Engineering, University of California at Santa Barbara, Santa Barbara, CA, USA – sequence: 4 givenname: Dimitris surname: Papailiopoulos fullname: Papailiopoulos, Dimitris email: dimitris@papail.io organization: Department of Electrical and Computer Engineering, University of Wisconsin-Madison, Madison, WI, USA – sequence: 5 givenname: Kannan surname: Ramchandran fullname: Ramchandran, Kannan email: kannanr@eecs.berkeley.edu organization: Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, USA
BookMark	eNp9kD1PwzAQhi1UJNrCjsQSiTnl_BEnHlH5qlTEQDtbjnMBV8UJtjvw70lpxcDAcqfTvc-d9EzIyHceCbmkMKMU1M1qsZoxoOWMlVyClCdkTIuizJUsxIiMAWiVKyGqMzKJcTOMoqBsTMRrj9g4_5at--zOxRRcvUvYZM_GvjuP2RJN8D_7uK_zrsF4Tk5bs414cexTsn64X82f8uXL42J-u8wtUzTlqKSpBKsoWAAJqqUKpeWNMJW1vKhaxkvBoebSKkuNKWralo1saVkLUZuGT8n14W4fus8dxqQ33S744aVmw0UmlGJqSMlDyoYuxoCtti6Z5DqfgnFbTUHvDenBkN4b0kdDAwh_wD64DxO-_kOuDohDxN94BcCkYPwbtz5xLQ
CODEN	IETTAW
CitedBy_id	crossref_primary_10_1109_JSAC_2022_3142364 crossref_primary_10_1109_TCC_2024_3370834 crossref_primary_10_1109_TII_2018_2857203 crossref_primary_10_1109_TWC_2020_3025446 crossref_primary_10_1109_TCOMM_2021_3056089 crossref_primary_10_1109_JSAIT_2021_3102279 crossref_primary_10_1109_JSAIT_2021_3102956 crossref_primary_10_1109_TIT_2025_3529680 crossref_primary_10_1109_JIOT_2024_3394714 crossref_primary_10_1016_j_jpdc_2022_05_004 crossref_primary_10_3390_electronics13132502 crossref_primary_10_1109_TCOMM_2021_3053018 crossref_primary_10_1109_TCOMM_2024_3450797 crossref_primary_10_1109_TCOMM_2022_3164056 crossref_primary_10_1109_TCOMM_2021_3054906 crossref_primary_10_1109_TIT_2024_3380738 crossref_primary_10_1109_LCOMM_2019_2911496 crossref_primary_10_1109_OJCOMS_2024_3423362 crossref_primary_10_1109_TCOMM_2018_2877391 crossref_primary_10_1109_IOTM_001_2300247 crossref_primary_10_1109_TCOMM_2022_3211932 crossref_primary_10_1109_TIT_2023_3321918 crossref_primary_10_1109_TWC_2024_3357857 crossref_primary_10_1109_TCC_2021_3050012 crossref_primary_10_1109_LCOMM_2021_3116036 crossref_primary_10_1145_3529706_3529708 crossref_primary_10_1109_JSAC_2020_3036961 crossref_primary_10_1109_TCOMM_2020_3001873 crossref_primary_10_1587_transfun_2023TAP0011 crossref_primary_10_1109_TCOMM_2022_3166902 crossref_primary_10_1109_TCOMM_2023_3322174 crossref_primary_10_1109_TCOMM_2023_3244243 crossref_primary_10_1109_TCC_2024_3415165 crossref_primary_10_1109_TVT_2022_3231179 crossref_primary_10_1109_TWC_2023_3245601 crossref_primary_10_1109_TIT_2020_3015516 crossref_primary_10_1109_JSAIT_2023_3320068 crossref_primary_10_1109_TVT_2021_3135541 crossref_primary_10_1109_TCOMM_2023_3282603 crossref_primary_10_1109_JSAIT_2022_3182943 crossref_primary_10_1109_LCOMM_2020_3044727 crossref_primary_10_1109_MWC_001_2000108 crossref_primary_10_1109_TCOMM_2023_3326193 crossref_primary_10_1109_TIT_2019_2904055 crossref_primary_10_3390_e23080985 crossref_primary_10_1109_TNNLS_2020_2979762 crossref_primary_10_1109_JSAC_2022_3180781 crossref_primary_10_1109_LWC_2020_2983359 crossref_primary_10_1109_TWC_2023_3342095 crossref_primary_10_3390_e25030428 crossref_primary_10_1109_TWC_2023_3327372 crossref_primary_10_1109_JSAIT_2023_3310931 crossref_primary_10_1109_TWC_2022_3213256 crossref_primary_10_1109_TIFS_2022_3173417 crossref_primary_10_1109_LCOMM_2018_2880213 crossref_primary_10_1016_j_future_2020_06_007 crossref_primary_10_1109_TCOMM_2023_3334810 crossref_primary_10_1109_TCOMM_2023_3275194 crossref_primary_10_1109_TIT_2025_3541551 crossref_primary_10_1109_TIT_2022_3206868 crossref_primary_10_1109_TIT_2019_2909868 crossref_primary_10_1109_JSAC_2022_3142355 crossref_primary_10_1109_TIFS_2019_2940895 crossref_primary_10_1109_JSAIT_2024_3384395 crossref_primary_10_1109_LCOMM_2021_3140100 crossref_primary_10_1109_JSAC_2022_3142358 crossref_primary_10_1109_TIT_2019_2930692 crossref_primary_10_1109_TCOMM_2022_3230779 crossref_primary_10_1016_j_comnet_2023_109777 crossref_primary_10_1109_ACCESS_2021_3135581 crossref_primary_10_1109_JIOT_2021_3111624 crossref_primary_10_1109_LCOMM_2019_2909034 crossref_primary_10_1109_TCOMM_2020_2988506 crossref_primary_10_1109_LWC_2021_3125983 crossref_primary_10_1109_TCOMM_2024_3377715 crossref_primary_10_1049_iet_com_2019_0515 crossref_primary_10_1109_JPROC_2019_2898366 crossref_primary_10_1109_JBHI_2022_3185673 crossref_primary_10_1109_TCOMM_2020_3030667 crossref_primary_10_1109_JSAIT_2021_3056377 crossref_primary_10_1109_TIT_2022_3165400 crossref_primary_10_1109_TIT_2025_3542896 crossref_primary_10_1109_LNET_2019_2947144 crossref_primary_10_1109_TMC_2021_3106250 crossref_primary_10_1109_OJCOMS_2022_3194130 crossref_primary_10_1109_TIT_2022_3143199 crossref_primary_10_1109_MCOM_003_2200015 crossref_primary_10_1109_TPAMI_2022_3151434 crossref_primary_10_3390_app13084993 crossref_primary_10_1145_3527621 crossref_primary_10_1109_JIOT_2022_3143229 crossref_primary_10_1109_JSAIT_2021_3099811 crossref_primary_10_1109_TIFS_2020_3009610 crossref_primary_10_1007_s42979_024_03018_6 crossref_primary_10_1109_TWC_2023_3307140 crossref_primary_10_14778_3611479_3611514 crossref_primary_10_1109_JIOT_2020_2974045 crossref_primary_10_1109_TWC_2023_3257197 crossref_primary_10_1109_TWC_2024_3366547 crossref_primary_10_1109_TNET_2024_3441039 crossref_primary_10_3390_electronics13224403 crossref_primary_10_1109_TCOMM_2020_3016648 crossref_primary_10_1109_TIT_2020_3036763 crossref_primary_10_1109_TWC_2022_3187427 crossref_primary_10_1109_TIT_2019_2927558 crossref_primary_10_1109_TIT_2021_3050853 crossref_primary_10_1109_TIT_2023_3317168 crossref_primary_10_1109_TMC_2019_2963668 crossref_primary_10_1109_COMST_2020_3007787 crossref_primary_10_1109_TWC_2021_3091465 crossref_primary_10_1109_TCCN_2022_3174615 crossref_primary_10_1109_TIT_2022_3222736 crossref_primary_10_1109_TIT_2020_2966197 crossref_primary_10_1109_TIT_2018_2869794 crossref_primary_10_1109_TIT_2024_3425808 crossref_primary_10_1109_TWC_2020_2985039 crossref_primary_10_1016_j_orl_2020_11_010 crossref_primary_10_1109_JSAC_2022_3142295 crossref_primary_10_1109_TCOMM_2023_3347772 crossref_primary_10_1109_TMC_2021_3097380 crossref_primary_10_1109_TMC_2021_3117511 crossref_primary_10_1109_JSAC_2018_2815359 crossref_primary_10_1109_JSAIT_2020_2991361 crossref_primary_10_1109_LCOMM_2021_3067888 crossref_primary_10_1109_TIT_2024_3506990 crossref_primary_10_1109_TCOMM_2024_3418901 crossref_primary_10_1109_ACCESS_2020_3043825 crossref_primary_10_1109_TAC_2024_3468403 crossref_primary_10_1038_s41598_023_38916_x crossref_primary_10_3390_e25020266 crossref_primary_10_1109_JSAIT_2023_3312233 crossref_primary_10_1016_j_jisa_2021_102850 crossref_primary_10_1109_TIT_2021_3117695 crossref_primary_10_1109_TNSE_2021_3095040 crossref_primary_10_1109_JSAIT_2021_3102853 crossref_primary_10_1109_TNET_2019_2919553 crossref_primary_10_1109_TCCN_2024_3391317 crossref_primary_10_1109_JSAIT_2021_3103822 crossref_primary_10_1109_JSAIT_2023_3308768 crossref_primary_10_1109_COMST_2021_3086014 crossref_primary_10_1109_TSP_2025_3537409 crossref_primary_10_1109_JIOT_2021_3058116 crossref_primary_10_1016_j_inffus_2024_102922 crossref_primary_10_1109_TIT_2022_3152827 crossref_primary_10_1109_TCOMM_2022_3162576 crossref_primary_10_3390_e26100881 crossref_primary_10_1109_TDSC_2024_3362534 crossref_primary_10_1109_TIT_2022_3157835 crossref_primary_10_1186_s13638_020_01887_y crossref_primary_10_1109_TIFS_2024_3394256 crossref_primary_10_1109_LCOMM_2020_3037744 crossref_primary_10_1109_TIT_2021_3139009 crossref_primary_10_2478_amns_2025_0611 crossref_primary_10_1109_TSP_2018_2878551 crossref_primary_10_1016_j_knosys_2020_106002 crossref_primary_10_1109_TNET_2019_2946464 crossref_primary_10_1109_ACCESS_2021_3111118 crossref_primary_10_1109_TCOMM_2019_2940671 crossref_primary_10_1109_TCOMM_2023_3286420 crossref_primary_10_1109_JSAIT_2020_2983165 crossref_primary_10_1109_JSAIT_2022_3205475 crossref_primary_10_1109_LWC_2022_3189497 crossref_primary_10_1109_TIT_2020_2999675 crossref_primary_10_1109_JSAIT_2022_3197675 crossref_primary_10_1109_TCOMM_2021_3107942 crossref_primary_10_1109_TIFS_2024_3377929 crossref_primary_10_1109_TIT_2021_3127910 crossref_primary_10_1109_JSAC_2021_3118350 crossref_primary_10_1109_JIOT_2022_3173912 crossref_primary_10_1080_0952813X_2024_2391778 crossref_primary_10_1109_TCOMM_2020_3020549 crossref_primary_10_1109_JIOT_2021_3138855 crossref_primary_10_1109_JSAIT_2021_3104970 crossref_primary_10_1016_j_future_2020_01_004 crossref_primary_10_1109_ACCESS_2020_3001120 crossref_primary_10_1109_TCOMM_2020_2992721 crossref_primary_10_1109_TWC_2020_3030889 crossref_primary_10_1142_S2301385024420056 crossref_primary_10_1016_j_ijinfomgt_2019_07_003 crossref_primary_10_1016_j_jksuci_2024_102073 crossref_primary_10_1109_TIT_2021_3127920 crossref_primary_10_1109_TIT_2022_3158828 crossref_primary_10_1109_TWC_2023_3334732 crossref_primary_10_1109_TWC_2020_3040792 crossref_primary_10_1109_TCOMM_2024_3381711 crossref_primary_10_1109_TSC_2022_3201550 crossref_primary_10_1016_j_phycom_2024_102499 crossref_primary_10_1109_TIT_2021_3064827 crossref_primary_10_1109_ACCESS_2024_3443520 crossref_primary_10_1109_JIOT_2022_3150472 crossref_primary_10_1109_TIT_2020_3035868 crossref_primary_10_1109_JSAC_2022_3180811 crossref_primary_10_1109_JSAIT_2021_3055341 crossref_primary_10_1109_TIT_2024_3392685 crossref_primary_10_1109_TCC_2022_3228243 crossref_primary_10_1109_TSP_2023_3244084 crossref_primary_10_1109_JSAIT_2022_3186908 crossref_primary_10_1016_j_icte_2023_02_002 crossref_primary_10_1109_TPDS_2022_3224941 crossref_primary_10_1109_TIFS_2020_2972166 crossref_primary_10_1109_TIT_2024_3420222 crossref_primary_10_1109_TCOMM_2023_3345421 crossref_primary_10_1109_TIT_2020_2964547 crossref_primary_10_1109_JSAIT_2021_3105365 crossref_primary_10_1016_j_comnet_2021_107846 crossref_primary_10_1109_JSTSP_2022_3156756 crossref_primary_10_1109_JIOT_2024_3442012 crossref_primary_10_1109_TSC_2024_3395931 crossref_primary_10_1109_TNET_2024_3365815 crossref_primary_10_1109_LCOMM_2019_2934436 crossref_primary_10_1109_TIT_2019_2959760 crossref_primary_10_1109_TMC_2023_3246994 crossref_primary_10_1109_LCOMM_2023_3320283 crossref_primary_10_1109_TIT_2019_2924621 crossref_primary_10_1109_JSAIT_2024_3400995 crossref_primary_10_1109_ACCESS_2020_3031590 crossref_primary_10_1109_LCOMM_2019_2930513 crossref_primary_10_1109_TIT_2023_3247860 crossref_primary_10_1109_MCOM_001_2000394 crossref_primary_10_1109_TNET_2022_3181234 crossref_primary_10_1109_TSP_2022_3182221 crossref_primary_10_1016_j_peva_2023_102381 crossref_primary_10_1109_JPROC_2020_2986362 crossref_primary_10_1109_TIT_2020_3013152 crossref_primary_10_1109_TIT_2022_3152182 crossref_primary_10_1109_TSP_2019_2952051 crossref_primary_10_1109_TIT_2021_3137266 crossref_primary_10_1109_TIT_2024_3373128 crossref_primary_10_1109_TIT_2021_3068165 crossref_primary_10_1109_TNET_2020_2973224 crossref_primary_10_1109_TIT_2021_3050526 crossref_primary_10_1109_TNET_2021_3058685 crossref_primary_10_1109_TSIPN_2020_3044955 crossref_primary_10_1109_TC_2021_3063180 crossref_primary_10_1109_JSAIT_2022_3180941 crossref_primary_10_3390_e22050544 crossref_primary_10_1109_ACCESS_2022_3176385 crossref_primary_10_1109_TSP_2022_3185905 crossref_primary_10_1109_JSAC_2021_3126057 crossref_primary_10_1109_TPDS_2023_3295184 crossref_primary_10_1109_TVT_2021_3070723 crossref_primary_10_1109_TCOMM_2021_3087628 crossref_primary_10_1109_TETCI_2022_3170471 crossref_primary_10_1109_TIT_2021_3133791 crossref_primary_10_1109_TCOMM_2021_3107432 crossref_primary_10_1109_TCOMM_2024_3446641 crossref_primary_10_1109_JIOT_2020_3045277 crossref_primary_10_1109_TIT_2021_3095909 crossref_primary_10_1109_ACCESS_2020_3032637 crossref_primary_10_3390_e24091284 crossref_primary_10_1109_JSAIT_2022_3190859 crossref_primary_10_1109_TIT_2018_2888494 crossref_primary_10_1109_TIT_2022_3158868 crossref_primary_10_1109_TVT_2024_3493105 crossref_primary_10_1109_TNET_2020_3047513 crossref_primary_10_1109_TIT_2020_3029396 crossref_primary_10_1016_j_asej_2022_101745 crossref_primary_10_1109_TNET_2021_3122873 crossref_primary_10_1109_TMC_2024_3359040 crossref_primary_10_1109_TWC_2023_3331263 crossref_primary_10_1109_TIT_2019_2929328 crossref_primary_10_1109_JSAC_2021_3118432 crossref_primary_10_1109_TNET_2021_3075377 crossref_primary_10_1109_TPDS_2023_3276888 crossref_primary_10_1109_TIT_2022_3204488 crossref_primary_10_1109_COMST_2021_3091684 crossref_primary_10_1109_TIT_2025_3536323 crossref_primary_10_1109_TPDS_2023_3237752 crossref_primary_10_1109_TCOMM_2023_3236385 crossref_primary_10_1109_TWC_2021_3109448 crossref_primary_10_1109_TCOMM_2022_3165201 crossref_primary_10_1109_TIFS_2023_3326970 crossref_primary_10_1109_TIT_2023_3247559 crossref_primary_10_1109_TSIPN_2022_3163931 crossref_primary_10_1109_TVT_2021_3131395 crossref_primary_10_1016_j_ultrasmedbio_2020_06_001 crossref_primary_10_1109_TIFS_2024_3450288 crossref_primary_10_1109_JSAIT_2021_3052934 crossref_primary_10_1109_JSAIT_2021_3103770 crossref_primary_10_1109_JSAIT_2022_3181144 crossref_primary_10_1109_MSP_2020_2974149 crossref_primary_10_1109_JSAIT_2021_3103772 crossref_primary_10_1109_TCOMM_2022_3227286 crossref_primary_10_1145_3617995 crossref_primary_10_1109_TSIPN_2020_2998223 crossref_primary_10_1109_TVT_2024_3409554 crossref_primary_10_1109_LSP_2023_3299531 crossref_primary_10_1109_TIFS_2022_3147638 crossref_primary_10_1109_TVT_2022_3204839 crossref_primary_10_1109_TNSE_2022_3228322 crossref_primary_10_1002_rnc_6048 crossref_primary_10_1109_JSAC_2021_3078494 crossref_primary_10_1049_iet_cdt_2018_5008 crossref_primary_10_1109_TIT_2023_3283967 crossref_primary_10_1109_TPDS_2025_3539620 crossref_primary_10_1109_TNET_2021_3109097
Cites_doi	10.1109/ISIT.2017.8006991 10.1109/ICC.2014.6883597 10.1109/ISIT.2011.6033733 10.1109/Allerton.2011.6120328 10.1109/ISIT.2017.8006963 10.1109/ISIT.2015.7282699 10.1109/INFOCOM.2015.7218426 10.1109/JSAC.2014.140518 10.1145/2639108.2639140 10.1109/TIT.2017.2674671 10.1145/2408776.2408794 10.14778/2212351.2212354 10.1109/ISIT.2012.6284027 10.1109/TNET.2014.2317316 10.1145/2640087.2644155 10.1109/INFOCOM.2014.6848010 10.1109/GLOCOMW.2016.7848828 10.1017/CBO9781139042918 10.1109/ISIT.2012.6284026 10.1109/ISIT.2017.8006962 10.1145/2637364.2592042 10.14778/2535573.2488339 10.1109/ISIT.2014.6875212 10.1007/978-3-642-35289-8_25 10.1109/ALLERTON.2015.7447111 10.1109/ICDM.2013.158 10.1109/TIT.2013.2288784 10.1109/TIT.2006.874540 10.1145/2740070.2626325 10.1109/TAC.2008.2009515 10.1109/ISIT.2007.4557599 10.1109/ISIT.2016.7541612 10.1109/Allerton.2013.6736597 10.1145/2043164.2018442 10.1109/TIT.2011.2105003 10.1109/ISIT.2016.7541478 10.1145/2796314.2745873 10.1109/TIT.2014.2306938 10.1109/ITW.2013.6691247 10.1109/Allerton.2012.6483348 10.1109/TAC.2011.2161027 10.1109/TIT.2010.2054295 10.1109/TIT.2011.2159049 10.1109/IPDPSW.2015.50 10.1109/ALLERTON.2016.7852338 10.1145/3055281 10.1109/ALLERTON.2016.7852337 10.1109/INFCOM.2012.6195703 10.1145/2413176.2413199 10.1109/NCA.2007.37 10.1109/INFCOM.2011.5934901 10.1109/GLOCOM.2016.7841903 10.1109/TSP.2012.2198470 10.1561/2200000016 10.1109/ISIT.2012.6284028 10.1109/ISIT.2017.8007060 10.1017/CBO9780511804441 10.1007/s12532-013-0053-8 10.1109/TIT.2012.2208937 10.1109/TIT.2010.2103753 10.14778/2732977.2733001 10.1109/ALLERTON.2015.7447112 10.1109/ALLERTON.2015.7446991
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
DBID	97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D
DOI	10.1109/TIT.2017.2736066
DatabaseName	IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional
DatabaseTitleList	Technology Research Database
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Computer Science
EISSN	1557-9654
EndPage	1529
ExternalDocumentID	10_1109_TIT_2017_2736066 8002642
Genre	orig-research
GrantInformation_xml	– fundername: Korea government(MSIT) (Coding for High-Speed Distributed Networks) grantid: 2017-0-00694 – fundername: Brain Korea 21 Plus Project, and NSF CIF grant (Foundations of coding for modern distributed computing) grantid: 1703678 – fundername: Institute for Information & communications Technology Promotion(IITP) grant
GroupedDBID	-~X .DC 0R~ 29I 3EH 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFO ACGFS ACGOD ACIWK AENEX AETEA AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ H~9 IAAWW IBMZZ ICLAB IDIHD IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS RXW TAE TN5 VH1 VJK AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D RIG
ID	FETCH-LOGICAL-c291t-e96a842810c00609f19e6c3d4a8cc358f237430b36c9c1aa5b1f7d6f17b44bad3
IEDL.DBID	RIE
ISSN	0018-9448
IngestDate	Mon Jun 30 02:00:29 EDT 2025 Thu Apr 24 23:09:07 EDT 2025 Wed Aug 27 16:29:09 EDT 2025 Wed Aug 27 07:40:17 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	3
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c291t-e96a842810c00609f19e6c3d4a8cc358f237430b36c9c1aa5b1f7d6f17b44bad3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0002-1126-0292 0000-0002-3360-9678
PQID	2006249929
PQPubID	36024
PageCount	16
ParticipantIDs	crossref_citationtrail_10_1109_TIT_2017_2736066 crossref_primary_10_1109_TIT_2017_2736066 ieee_primary_8002642 proquest_journals_2006249929
PublicationCentury	2000
PublicationDate	2018-03-01
PublicationDateYYYYMMDD	2018-03-01
PublicationDate_xml	– month: 03 year: 2018 text: 2018-03-01 day: 01
PublicationDecade	2010
PublicationPlace	New York
PublicationPlace_xml	– name: New York
PublicationTitle	IEEE transactions on information theory
PublicationTitleAbbrev	TIT
PublicationYear	2018
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref57 ref13 ref12 ref59 kraska (ref58) 2013 ref15 ref14 ref53 ref52 ref55 ref11 rashmi (ref24) 2013 ref54 reisizadehmobarakeh (ref48) 0 ananthanarayanan (ref26) 2010 ref17 ref16 ref19 gürbüzbalaban (ref64) 2015 ref51 huang (ref22) 2012 ioffe (ref65) 2015 ref46 ref89 cover (ref78) 2012 ref47 ref86 ref42 ref41 recht (ref29) 2011 ref88 ref87 ref43 cadambe (ref10) 2011 ref49 ref8 ref7 lin (ref79) 2004; 2 bertsekas (ref50) 1999 ref9 (ref85) 2015 ref6 ref5 zaharia (ref3) 2010 ref81 ref40 dean (ref56) 2012 ref80 ref35 ref34 ref37 ref36 dean (ref4) 2004 ref75 ref31 zaharia (ref27) 2008 ref74 ref77 ref33 ref76 ref32 ref2 ref1 ref39 ref38 (ref84) 2015 kamath (ref18) 2012 agarwal (ref28) 2011 (ref83) 2015 tandon (ref45) 2016 ref71 ref70 ref73 ref72 ref68 ref67 ref23 dutta (ref44) 2016 ref69 ref20 ref63 ref66 ref21 ananthanarayanan (ref30) 2013 rashmi (ref25) 2014 ref60 ref62 ref61 meng (ref82) 2016; 17
References_xml	– ident: ref76 doi: 10.1109/ISIT.2017.8006991 – ident: ref68 doi: 10.1109/ICC.2014.6883597 – ident: ref9 doi: 10.1109/ISIT.2011.6033733 – start-page: 1232 year: 2012 ident: ref56 article-title: Large scale distributed deep networks publication-title: Proc 26th Annu Conf Neural Inf Process Syst (NIPS) – ident: ref11 doi: 10.1109/Allerton.2011.6120328 – year: 2013 ident: ref24 article-title: A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster publication-title: Proc USENIX HotStorage – ident: ref47 doi: 10.1109/ISIT.2017.8006963 – ident: ref42 doi: 10.1109/ISIT.2015.7282699 – ident: ref40 doi: 10.1109/INFOCOM.2015.7218426 – year: 2011 ident: ref10 publication-title: Optimal Repair of MDS Codes in Distributed Storage Via Subspace Interference Alignment – ident: ref39 doi: 10.1109/JSAC.2014.140518 – ident: ref88 doi: 10.1145/2639108.2639140 – ident: ref38 doi: 10.1109/TIT.2017.2674671 – ident: ref5 doi: 10.1145/2408776.2408794 – start-page: 2 year: 2013 ident: ref58 article-title: MLbase: A distributed machine-learning system publication-title: Proc Third Biennial Conf Innovative Data Systems Research (CIDR) – ident: ref57 doi: 10.14778/2212351.2212354 – year: 2015 ident: ref65 publication-title: Batch Normalization Accelerating Deep Network Training by Reducing Internal Covariate Shift – year: 1999 ident: ref50 publication-title: Nonlinear Programming – ident: ref17 doi: 10.1109/ISIT.2012.6284027 – ident: ref67 doi: 10.1109/TNET.2014.2317316 – year: 0 ident: ref48 publication-title: Coded computation over heterogeneous clusters – ident: ref60 doi: 10.1145/2640087.2644155 – year: 2015 ident: ref85 publication-title: BLAS (Basic Linear Algebra Subprograms) – year: 2012 ident: ref78 publication-title: Elements of Information Theory – start-page: 95 year: 2010 ident: ref3 article-title: Spark: Cluster computing with working sets publication-title: Proc of 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud) – ident: ref80 doi: 10.1109/INFOCOM.2014.6848010 – ident: ref77 doi: 10.1109/GLOCOMW.2016.7848828 – ident: ref53 doi: 10.1017/CBO9781139042918 – start-page: 2092 year: 2016 ident: ref44 article-title: Short-dot: Computing large linear transforms distributedly using coded short dot products publication-title: Proc Adv Neural Inf Process Syst – start-page: 185 year: 2013 ident: ref30 article-title: Effective straggler mitigation: Attack of the clones publication-title: Proc 12th USENIX Symp Netw Syst Des Implement (NSDI – ident: ref37 doi: 10.1109/ISIT.2012.6284026 – year: 2015 ident: ref64 publication-title: Why random reshuffling beats stochastic gradient descent – ident: ref49 doi: 10.1109/ISIT.2017.8006962 – ident: ref32 doi: 10.1145/2637364.2592042 – volume: 2 year: 2004 ident: ref79 publication-title: Error Control Coding – ident: ref23 doi: 10.14778/2535573.2488339 – ident: ref69 doi: 10.1109/ISIT.2014.6875212 – volume: 17 start-page: 1235 year: 2016 ident: ref82 article-title: Mllib: Machine learning in apache spark publication-title: J Mach Learn Res – ident: ref62 doi: 10.1007/978-3-642-35289-8_25 – ident: ref41 doi: 10.1109/ALLERTON.2015.7447111 – ident: ref59 doi: 10.1109/ICDM.2013.158 – ident: ref19 doi: 10.1109/TIT.2013.2288784 – ident: ref72 doi: 10.1109/TIT.2006.874540 – start-page: 331 year: 2014 ident: ref25 article-title: A hitchhiker's guide to fast and efficient data reconstruction in erasure-coded data centers publication-title: Proc ACM Conf SIGCOMM doi: 10.1145/2740070.2626325 – ident: ref51 doi: 10.1109/TAC.2008.2009515 – ident: ref15 doi: 10.1109/ISIT.2007.4557599 – ident: ref86 doi: 10.1109/ISIT.2016.7541612 – year: 2012 ident: ref18 publication-title: Codes with Local Regeneration – ident: ref31 doi: 10.1109/Allerton.2013.6736597 – ident: ref87 doi: 10.1145/2043164.2018442 – ident: ref8 doi: 10.1109/TIT.2011.2105003 – start-page: 137 year: 2004 ident: ref4 article-title: MapReduce: Simplified data processing on large clusters publication-title: Proc of the 5th Symp on Oper Syst Design and Implementation (OSDI) – ident: ref2 doi: 10.1109/ISIT.2016.7541478 – ident: ref33 doi: 10.1145/2796314.2745873 – ident: ref66 doi: 10.1109/TIT.2014.2306938 – ident: ref70 doi: 10.1109/ITW.2013.6691247 – ident: ref21 doi: 10.1109/Allerton.2012.6483348 – start-page: 873 year: 2011 ident: ref28 article-title: Distributed delayed stochastic optimization publication-title: Proc 25th Annu Conf Neural Inf Process Syst (NIPS) – ident: ref54 doi: 10.1109/TAC.2011.2161027 – ident: ref6 doi: 10.1109/TIT.2010.2054295 – start-page: 693 year: 2011 ident: ref29 article-title: Hogwild: A lock-free approach to parallelizing stochastic gradient descent publication-title: Proc 25th Annu Conf Neural Inf Process (NIPS) – ident: ref7 doi: 10.1109/TIT.2011.2159049 – ident: ref34 doi: 10.1109/IPDPSW.2015.50 – ident: ref75 doi: 10.1109/ALLERTON.2016.7852338 – ident: ref36 doi: 10.1145/3055281 – ident: ref43 doi: 10.1109/ALLERTON.2016.7852337 – ident: ref14 doi: 10.1109/INFCOM.2012.6195703 – start-page: 15 year: 2012 ident: ref22 article-title: Erasure coding in windows azure storage publication-title: Proc USENIX Annu Tech Conf (ATC) – ident: ref89 doi: 10.1145/2413176.2413199 – ident: ref16 doi: 10.1109/NCA.2007.37 – ident: ref13 doi: 10.1109/INFCOM.2011.5934901 – ident: ref74 doi: 10.1109/GLOCOM.2016.7841903 – ident: ref55 doi: 10.1109/TSP.2012.2198470 – ident: ref52 doi: 10.1561/2200000016 – ident: ref20 doi: 10.1109/ISIT.2012.6284028 – ident: ref46 doi: 10.1109/ISIT.2017.8007060 – ident: ref81 doi: 10.1017/CBO9780511804441 – start-page: 265 year: 2010 ident: ref26 article-title: Reining in the outliers in Map-Reduce clusters using Mantri publication-title: Proc 9th USENIX Conf Oper Syst Des Implement (OSDI) – start-page: 29 year: 2008 ident: ref27 article-title: Improving MapReduce performance in heterogeneous environments publication-title: Proc 9th USENIX Conf Oper Syst Des Implement (OSDI) – ident: ref1 doi: 10.1109/ISIT.2016.7541478 – year: 2015 ident: ref83 publication-title: Open MPI Open Source High Performance Computing – ident: ref61 doi: 10.1007/s12532-013-0053-8 – ident: ref12 doi: 10.1109/TIT.2012.2208937 – ident: ref73 doi: 10.1109/TIT.2010.2103753 – ident: ref63 doi: 10.14778/2732977.2733001 – year: 2015 ident: ref84 publication-title: StarCluster – year: 2016 ident: ref45 publication-title: Gradient coding – ident: ref71 doi: 10.1109/ALLERTON.2015.7447112 – ident: ref35 doi: 10.1109/ALLERTON.2015.7446991
SSID	ssj0014512
Score	2.6960359
Snippet	Codes are widely used in many engineering applications to offer robustness against noise . In large-scale systems, there are several types of noise that can... Codes are widely used in many engineering applications to offer robustness against noise. In large-scale systems, there are several types of noise that can...
SourceID	proquest crossref ieee
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	1514
SubjectTerms	Algorithm design and analysis Algorithms Artificial intelligence channel coding Codes Communication Communications systems Computer networks distributed computing Distributed databases Encoding Machine learning Machine learning algorithms Machine tools Multicast communication Multicasting Multiplication Multiplication & division Robustness Runtime System failures
Title	Speeding Up Distributed Machine Learning Using Codes
URI	https://ieeexplore.ieee.org/document/8002642 https://www.proquest.com/docview/2006249929
Volume	64
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwED21nWCg0IIoFJSBBYmkceI4yYgKVUEqC63ULfJXFlBb0XTh1-NLnIgvIbZIsSXLd_a9Z5_fAVzFksU81syNgyh3KY-oyylnLs21YL4wEFkgUZw9semCPi6jZQtumrcwWusy-Ux7-Fne5au13OFR2QjBjcHLbWgb4la91WpuDGhEKmVwYhaw4Rz1laSfjuYPc8zhij0TqhGvfwlBZU2VHxtxGV0mXZjV46qSSl68XSE8-f5NsvG_Az-EAwszndvKL46gpVc96NYlHBy7onuw_0mPsA_0eVMFM2exce5QUherYWnlzMqUS-1YNVbzH48YnPFa6e0xLCb38_HUtWUVXBmkpHB1ynhiWAfxJaqxpDlJNZOhojyRMoySPAgNrPBFyGQqCeeRIHmsWE5iQangKjyBzmq90qfgCMNp8XVrKJQhZkInCg9QUj-UkiYhUQMY1TOdSas5jqUvXrOSe_hpZmyToW0ya5sBXDc9NpXexh9t-zjVTTs7ywMY1sbM7ILcYrVNZpimAYNnv_c6h70AfadMLxtCp3jb6QuDNwpxWTraB62bzoI
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED6VMgADhQKiUCADCxJp48RxkhEVqhaaLrRSt8ivLKC2ounCr8eXR8VLiC1SbMnynX3fZ5-_A7gOJAt4oJkduH5qU-5Tm1PObJpqwRxhILJAohiP2WBKH2f-rAa3m7cwWus8-Ux38DO_y1cLucajsi6CG4OXt2DbxH2fFK-1NncG1CeFNjgxS9iwjupS0om6k-EEs7iCjgnWiNi_BKG8qsqPrTiPL_0GxNXIirSSl846Ex35_k208b9DP4D9Emhad4VnHEJNz5vQqIo4WOWabsLeJ0XCI6DPyyKcWdOldY-iulgPSysrzpMutVXqsZr_eMhg9RZKr45h2n-Y9AZ2WVjBlm5EMltHjIeGdxBHoh5LlJJIM-kpykMpPT9MXc8AC0d4TEaScO4LkgaKpSQQlAquvBOozxdzfQqWMKwW37d6QhlqJnSo8AglcjwpaegR1YJuNdOJLFXHsfjFa5KzDydKjG0StE1S2qYFN5sey0Jx44-2RzjVm3blLLegXRkzKZfkCuttMsM1DRw8-73XFewMJvEoGQ3HT-ew66If5clmbahnb2t9YdBHJi5zp_sAopXRyw
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Speeding+Up+Distributed+Machine+Learning+Using+Codes&rft.jtitle=IEEE+transactions+on+information+theory&rft.au=Lee%2C+Kangwook&rft.au=Lam%2C+Maximilian&rft.au=Pedarsani%2C+Ramtin&rft.au=Papailiopoulos%2C+Dimitris&rft.date=2018-03-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=0018-9448&rft.eissn=1557-9654&rft.volume=64&rft.issue=3&rft.spage=1514&rft_id=info:doi/10.1109%2FTIT.2017.2736066&rft.externalDBID=NO_FULL_TEXT
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9448&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9448&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9448&client=summon