Speeding Up Distributed Machine Learning Using Codes

Codes are widely used in many engineering applications to offer robustness against noise . In large-scale systems, there are several types of noise that can affect the performance of distributed machine learning algorithms-straggler nodes, system failures, or communication bottlenecks-but there has...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on information theory Vol. 64; no. 3; pp. 1514 - 1529
Main Authors Lee, Kangwook, Lam, Maximilian, Pedarsani, Ramtin, Papailiopoulos, Dimitris, Ramchandran, Kannan
Format Journal Article
LanguageEnglish
Published New York IEEE 01.03.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Codes are widely used in many engineering applications to offer robustness against noise . In large-scale systems, there are several types of noise that can affect the performance of distributed machine learning algorithms-straggler nodes, system failures, or communication bottlenecks-but there has been little interaction cutting across codes, machine learning, and distributed systems. In this paper, we provide theoretical insights on how coded solutions can achieve significant gains compared with uncoded ones. We focus on two of the most basic building blocks of distributed learning algorithms: matrix multiplication and data shuffling . For matrix multiplication, we use codes to alleviate the effect of stragglers and show that if the number of homogeneous workers is <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula>, and the runtime of each subtask has an exponential tail, coded computation can speed up distributed matrix multiplication by a factor of <inline-formula> <tex-math notation="LaTeX">\log n </tex-math></inline-formula>. For data shuffling, we use codes to reduce communication bottlenecks, exploiting the excess in storage. We show that when a constant fraction <inline-formula> <tex-math notation="LaTeX">\alpha </tex-math></inline-formula> of the data matrix can be cached at each worker, and <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> is the number of workers, coded shuffling reduces the communication cost by a factor of <inline-formula> <tex-math notation="LaTeX">\left({\alpha + \frac {1}{n}}\right)\gamma (n) </tex-math></inline-formula> compared with uncoded shuffling, where <inline-formula> <tex-math notation="LaTeX">\gamma (n) </tex-math></inline-formula> is the ratio of the cost of unicasting <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> messages to <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> users to multicasting a common message (of the same size) to <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> users. For instance, <inline-formula> <tex-math notation="LaTeX">\gamma (n) \simeq n </tex-math></inline-formula> if multicasting a message to <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> users is as cheap as unicasting a message to one user. We also provide experimental results, corroborating our theoretical gains of the coded algorithms.
AbstractList Codes are widely used in many engineering applications to offer robustness against noise . In large-scale systems, there are several types of noise that can affect the performance of distributed machine learning algorithms-straggler nodes, system failures, or communication bottlenecks-but there has been little interaction cutting across codes, machine learning, and distributed systems. In this paper, we provide theoretical insights on how coded solutions can achieve significant gains compared with uncoded ones. We focus on two of the most basic building blocks of distributed learning algorithms: matrix multiplication and data shuffling . For matrix multiplication, we use codes to alleviate the effect of stragglers and show that if the number of homogeneous workers is <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula>, and the runtime of each subtask has an exponential tail, coded computation can speed up distributed matrix multiplication by a factor of <inline-formula> <tex-math notation="LaTeX">\log n </tex-math></inline-formula>. For data shuffling, we use codes to reduce communication bottlenecks, exploiting the excess in storage. We show that when a constant fraction <inline-formula> <tex-math notation="LaTeX">\alpha </tex-math></inline-formula> of the data matrix can be cached at each worker, and <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> is the number of workers, coded shuffling reduces the communication cost by a factor of <inline-formula> <tex-math notation="LaTeX">\left({\alpha + \frac {1}{n}}\right)\gamma (n) </tex-math></inline-formula> compared with uncoded shuffling, where <inline-formula> <tex-math notation="LaTeX">\gamma (n) </tex-math></inline-formula> is the ratio of the cost of unicasting <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> messages to <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> users to multicasting a common message (of the same size) to <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> users. For instance, <inline-formula> <tex-math notation="LaTeX">\gamma (n) \simeq n </tex-math></inline-formula> if multicasting a message to <inline-formula> <tex-math notation="LaTeX">n </tex-math></inline-formula> users is as cheap as unicasting a message to one user. We also provide experimental results, corroborating our theoretical gains of the coded algorithms.
Codes are widely used in many engineering applications to offer robustness against noise. In large-scale systems, there are several types of noise that can affect the performance of distributed machine learning algorithms-straggler nodes, system failures, or communication bottlenecks-but there has been little interaction cutting across codes, machine learning, and distributed systems. In this paper, we provide theoretical insights on how coded solutions can achieve significant gains compared with uncoded ones. We focus on two of the most basic building blocks of distributed learning algorithms: matrix multiplication and data shuffling. For matrix multiplication, we use codes to alleviate the effect of stragglers and show that if the number of homogeneous workers is n, and the runtime of each subtask has an exponential tail, coded computation can speed up distributed matrix multiplication by a factor of log n. For data shuffling, we use codes to reduce communication bottlenecks, exploiting the excess in storage. We show that when a constant fraction α of the data matrix can be cached at each worker, and n is the number of workers, coded shuffling reduces the communication cost by a factor of (α+ n/1)y (n) compared with uncoded shuffling, where y (n) is the ratio of the cost of unicasting n messages to n users to multicasting a common message (of the same size) to n users. For instance, y (n) ≃ n if multicasting a message to n users is as cheap as unicasting a message to one user. We also provide experimental results, corroborating our theoretical gains of the coded algorithms.
Author Papailiopoulos, Dimitris
Pedarsani, Ramtin
Ramchandran, Kannan
Lee, Kangwook
Lam, Maximilian
Author_xml – sequence: 1
  givenname: Kangwook
  orcidid: 0000-0002-3360-9678
  surname: Lee
  fullname: Lee, Kangwook
  email: kw1jjang@kaist.ac.kr
  organization: School of Electrical Engineering, Korea Advanced Institute of Science and Technology, Daejeon, South Korea
– sequence: 2
  givenname: Maximilian
  surname: Lam
  fullname: Lam, Maximilian
  email: agnusmaximus@berkeley.edu
  organization: Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, USA
– sequence: 3
  givenname: Ramtin
  orcidid: 0000-0002-1126-0292
  surname: Pedarsani
  fullname: Pedarsani, Ramtin
  email: ramtin@ece.ucsb.edu
  organization: Department of Electrical and Computer Engineering, University of California at Santa Barbara, Santa Barbara, CA, USA
– sequence: 4
  givenname: Dimitris
  surname: Papailiopoulos
  fullname: Papailiopoulos, Dimitris
  email: dimitris@papail.io
  organization: Department of Electrical and Computer Engineering, University of Wisconsin-Madison, Madison, WI, USA
– sequence: 5
  givenname: Kannan
  surname: Ramchandran
  fullname: Ramchandran, Kannan
  email: kannanr@eecs.berkeley.edu
  organization: Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, Berkeley, CA, USA
BookMark eNp9kD1PwzAQhi1UJNrCjsQSiTnl_BEnHlH5qlTEQDtbjnMBV8UJtjvw70lpxcDAcqfTvc-d9EzIyHceCbmkMKMU1M1qsZoxoOWMlVyClCdkTIuizJUsxIiMAWiVKyGqMzKJcTOMoqBsTMRrj9g4_5at--zOxRRcvUvYZM_GvjuP2RJN8D_7uK_zrsF4Tk5bs414cexTsn64X82f8uXL42J-u8wtUzTlqKSpBKsoWAAJqqUKpeWNMJW1vKhaxkvBoebSKkuNKWralo1saVkLUZuGT8n14W4fus8dxqQ33S744aVmw0UmlGJqSMlDyoYuxoCtti6Z5DqfgnFbTUHvDenBkN4b0kdDAwh_wD64DxO-_kOuDohDxN94BcCkYPwbtz5xLQ
CODEN IETTAW
CitedBy_id crossref_primary_10_1109_JSAC_2022_3142364
crossref_primary_10_1109_TCC_2024_3370834
crossref_primary_10_1109_TII_2018_2857203
crossref_primary_10_1109_TWC_2020_3025446
crossref_primary_10_1109_TCOMM_2021_3056089
crossref_primary_10_1109_JSAIT_2021_3102279
crossref_primary_10_1109_JSAIT_2021_3102956
crossref_primary_10_1109_TIT_2025_3529680
crossref_primary_10_1109_JIOT_2024_3394714
crossref_primary_10_1016_j_jpdc_2022_05_004
crossref_primary_10_3390_electronics13132502
crossref_primary_10_1109_TCOMM_2021_3053018
crossref_primary_10_1109_TCOMM_2024_3450797
crossref_primary_10_1109_TCOMM_2022_3164056
crossref_primary_10_1109_TCOMM_2021_3054906
crossref_primary_10_1109_TIT_2024_3380738
crossref_primary_10_1109_LCOMM_2019_2911496
crossref_primary_10_1109_OJCOMS_2024_3423362
crossref_primary_10_1109_TCOMM_2018_2877391
crossref_primary_10_1109_IOTM_001_2300247
crossref_primary_10_1109_TCOMM_2022_3211932
crossref_primary_10_1109_TIT_2023_3321918
crossref_primary_10_1109_TWC_2024_3357857
crossref_primary_10_1109_TCC_2021_3050012
crossref_primary_10_1109_LCOMM_2021_3116036
crossref_primary_10_1145_3529706_3529708
crossref_primary_10_1109_JSAC_2020_3036961
crossref_primary_10_1109_TCOMM_2020_3001873
crossref_primary_10_1587_transfun_2023TAP0011
crossref_primary_10_1109_TCOMM_2022_3166902
crossref_primary_10_1109_TCOMM_2023_3322174
crossref_primary_10_1109_TCOMM_2023_3244243
crossref_primary_10_1109_TCC_2024_3415165
crossref_primary_10_1109_TVT_2022_3231179
crossref_primary_10_1109_TWC_2023_3245601
crossref_primary_10_1109_TIT_2020_3015516
crossref_primary_10_1109_JSAIT_2023_3320068
crossref_primary_10_1109_TVT_2021_3135541
crossref_primary_10_1109_TCOMM_2023_3282603
crossref_primary_10_1109_JSAIT_2022_3182943
crossref_primary_10_1109_LCOMM_2020_3044727
crossref_primary_10_1109_MWC_001_2000108
crossref_primary_10_1109_TCOMM_2023_3326193
crossref_primary_10_1109_TIT_2019_2904055
crossref_primary_10_3390_e23080985
crossref_primary_10_1109_TNNLS_2020_2979762
crossref_primary_10_1109_JSAC_2022_3180781
crossref_primary_10_1109_LWC_2020_2983359
crossref_primary_10_1109_TWC_2023_3342095
crossref_primary_10_3390_e25030428
crossref_primary_10_1109_TWC_2023_3327372
crossref_primary_10_1109_JSAIT_2023_3310931
crossref_primary_10_1109_TWC_2022_3213256
crossref_primary_10_1109_TIFS_2022_3173417
crossref_primary_10_1109_LCOMM_2018_2880213
crossref_primary_10_1016_j_future_2020_06_007
crossref_primary_10_1109_TCOMM_2023_3334810
crossref_primary_10_1109_TCOMM_2023_3275194
crossref_primary_10_1109_TIT_2025_3541551
crossref_primary_10_1109_TIT_2022_3206868
crossref_primary_10_1109_TIT_2019_2909868
crossref_primary_10_1109_JSAC_2022_3142355
crossref_primary_10_1109_TIFS_2019_2940895
crossref_primary_10_1109_JSAIT_2024_3384395
crossref_primary_10_1109_LCOMM_2021_3140100
crossref_primary_10_1109_JSAC_2022_3142358
crossref_primary_10_1109_TIT_2019_2930692
crossref_primary_10_1109_TCOMM_2022_3230779
crossref_primary_10_1016_j_comnet_2023_109777
crossref_primary_10_1109_ACCESS_2021_3135581
crossref_primary_10_1109_JIOT_2021_3111624
crossref_primary_10_1109_LCOMM_2019_2909034
crossref_primary_10_1109_TCOMM_2020_2988506
crossref_primary_10_1109_LWC_2021_3125983
crossref_primary_10_1109_TCOMM_2024_3377715
crossref_primary_10_1049_iet_com_2019_0515
crossref_primary_10_1109_JPROC_2019_2898366
crossref_primary_10_1109_JBHI_2022_3185673
crossref_primary_10_1109_TCOMM_2020_3030667
crossref_primary_10_1109_JSAIT_2021_3056377
crossref_primary_10_1109_TIT_2022_3165400
crossref_primary_10_1109_TIT_2025_3542896
crossref_primary_10_1109_LNET_2019_2947144
crossref_primary_10_1109_TMC_2021_3106250
crossref_primary_10_1109_OJCOMS_2022_3194130
crossref_primary_10_1109_TIT_2022_3143199
crossref_primary_10_1109_MCOM_003_2200015
crossref_primary_10_1109_TPAMI_2022_3151434
crossref_primary_10_3390_app13084993
crossref_primary_10_1145_3527621
crossref_primary_10_1109_JIOT_2022_3143229
crossref_primary_10_1109_JSAIT_2021_3099811
crossref_primary_10_1109_TIFS_2020_3009610
crossref_primary_10_1007_s42979_024_03018_6
crossref_primary_10_1109_TWC_2023_3307140
crossref_primary_10_14778_3611479_3611514
crossref_primary_10_1109_JIOT_2020_2974045
crossref_primary_10_1109_TWC_2023_3257197
crossref_primary_10_1109_TWC_2024_3366547
crossref_primary_10_1109_TNET_2024_3441039
crossref_primary_10_3390_electronics13224403
crossref_primary_10_1109_TCOMM_2020_3016648
crossref_primary_10_1109_TIT_2020_3036763
crossref_primary_10_1109_TWC_2022_3187427
crossref_primary_10_1109_TIT_2019_2927558
crossref_primary_10_1109_TIT_2021_3050853
crossref_primary_10_1109_TIT_2023_3317168
crossref_primary_10_1109_TMC_2019_2963668
crossref_primary_10_1109_COMST_2020_3007787
crossref_primary_10_1109_TWC_2021_3091465
crossref_primary_10_1109_TCCN_2022_3174615
crossref_primary_10_1109_TIT_2022_3222736
crossref_primary_10_1109_TIT_2020_2966197
crossref_primary_10_1109_TIT_2018_2869794
crossref_primary_10_1109_TIT_2024_3425808
crossref_primary_10_1109_TWC_2020_2985039
crossref_primary_10_1016_j_orl_2020_11_010
crossref_primary_10_1109_JSAC_2022_3142295
crossref_primary_10_1109_TCOMM_2023_3347772
crossref_primary_10_1109_TMC_2021_3097380
crossref_primary_10_1109_TMC_2021_3117511
crossref_primary_10_1109_JSAC_2018_2815359
crossref_primary_10_1109_JSAIT_2020_2991361
crossref_primary_10_1109_LCOMM_2021_3067888
crossref_primary_10_1109_TIT_2024_3506990
crossref_primary_10_1109_TCOMM_2024_3418901
crossref_primary_10_1109_ACCESS_2020_3043825
crossref_primary_10_1109_TAC_2024_3468403
crossref_primary_10_1038_s41598_023_38916_x
crossref_primary_10_3390_e25020266
crossref_primary_10_1109_JSAIT_2023_3312233
crossref_primary_10_1016_j_jisa_2021_102850
crossref_primary_10_1109_TIT_2021_3117695
crossref_primary_10_1109_TNSE_2021_3095040
crossref_primary_10_1109_JSAIT_2021_3102853
crossref_primary_10_1109_TNET_2019_2919553
crossref_primary_10_1109_TCCN_2024_3391317
crossref_primary_10_1109_JSAIT_2021_3103822
crossref_primary_10_1109_JSAIT_2023_3308768
crossref_primary_10_1109_COMST_2021_3086014
crossref_primary_10_1109_TSP_2025_3537409
crossref_primary_10_1109_JIOT_2021_3058116
crossref_primary_10_1016_j_inffus_2024_102922
crossref_primary_10_1109_TIT_2022_3152827
crossref_primary_10_1109_TCOMM_2022_3162576
crossref_primary_10_3390_e26100881
crossref_primary_10_1109_TDSC_2024_3362534
crossref_primary_10_1109_TIT_2022_3157835
crossref_primary_10_1186_s13638_020_01887_y
crossref_primary_10_1109_TIFS_2024_3394256
crossref_primary_10_1109_LCOMM_2020_3037744
crossref_primary_10_1109_TIT_2021_3139009
crossref_primary_10_2478_amns_2025_0611
crossref_primary_10_1109_TSP_2018_2878551
crossref_primary_10_1016_j_knosys_2020_106002
crossref_primary_10_1109_TNET_2019_2946464
crossref_primary_10_1109_ACCESS_2021_3111118
crossref_primary_10_1109_TCOMM_2019_2940671
crossref_primary_10_1109_TCOMM_2023_3286420
crossref_primary_10_1109_JSAIT_2020_2983165
crossref_primary_10_1109_JSAIT_2022_3205475
crossref_primary_10_1109_LWC_2022_3189497
crossref_primary_10_1109_TIT_2020_2999675
crossref_primary_10_1109_JSAIT_2022_3197675
crossref_primary_10_1109_TCOMM_2021_3107942
crossref_primary_10_1109_TIFS_2024_3377929
crossref_primary_10_1109_TIT_2021_3127910
crossref_primary_10_1109_JSAC_2021_3118350
crossref_primary_10_1109_JIOT_2022_3173912
crossref_primary_10_1080_0952813X_2024_2391778
crossref_primary_10_1109_TCOMM_2020_3020549
crossref_primary_10_1109_JIOT_2021_3138855
crossref_primary_10_1109_JSAIT_2021_3104970
crossref_primary_10_1016_j_future_2020_01_004
crossref_primary_10_1109_ACCESS_2020_3001120
crossref_primary_10_1109_TCOMM_2020_2992721
crossref_primary_10_1109_TWC_2020_3030889
crossref_primary_10_1142_S2301385024420056
crossref_primary_10_1016_j_ijinfomgt_2019_07_003
crossref_primary_10_1016_j_jksuci_2024_102073
crossref_primary_10_1109_TIT_2021_3127920
crossref_primary_10_1109_TIT_2022_3158828
crossref_primary_10_1109_TWC_2023_3334732
crossref_primary_10_1109_TWC_2020_3040792
crossref_primary_10_1109_TCOMM_2024_3381711
crossref_primary_10_1109_TSC_2022_3201550
crossref_primary_10_1016_j_phycom_2024_102499
crossref_primary_10_1109_TIT_2021_3064827
crossref_primary_10_1109_ACCESS_2024_3443520
crossref_primary_10_1109_JIOT_2022_3150472
crossref_primary_10_1109_TIT_2020_3035868
crossref_primary_10_1109_JSAC_2022_3180811
crossref_primary_10_1109_JSAIT_2021_3055341
crossref_primary_10_1109_TIT_2024_3392685
crossref_primary_10_1109_TCC_2022_3228243
crossref_primary_10_1109_TSP_2023_3244084
crossref_primary_10_1109_JSAIT_2022_3186908
crossref_primary_10_1016_j_icte_2023_02_002
crossref_primary_10_1109_TPDS_2022_3224941
crossref_primary_10_1109_TIFS_2020_2972166
crossref_primary_10_1109_TIT_2024_3420222
crossref_primary_10_1109_TCOMM_2023_3345421
crossref_primary_10_1109_TIT_2020_2964547
crossref_primary_10_1109_JSAIT_2021_3105365
crossref_primary_10_1016_j_comnet_2021_107846
crossref_primary_10_1109_JSTSP_2022_3156756
crossref_primary_10_1109_JIOT_2024_3442012
crossref_primary_10_1109_TSC_2024_3395931
crossref_primary_10_1109_TNET_2024_3365815
crossref_primary_10_1109_LCOMM_2019_2934436
crossref_primary_10_1109_TIT_2019_2959760
crossref_primary_10_1109_TMC_2023_3246994
crossref_primary_10_1109_LCOMM_2023_3320283
crossref_primary_10_1109_TIT_2019_2924621
crossref_primary_10_1109_JSAIT_2024_3400995
crossref_primary_10_1109_ACCESS_2020_3031590
crossref_primary_10_1109_LCOMM_2019_2930513
crossref_primary_10_1109_TIT_2023_3247860
crossref_primary_10_1109_MCOM_001_2000394
crossref_primary_10_1109_TNET_2022_3181234
crossref_primary_10_1109_TSP_2022_3182221
crossref_primary_10_1016_j_peva_2023_102381
crossref_primary_10_1109_JPROC_2020_2986362
crossref_primary_10_1109_TIT_2020_3013152
crossref_primary_10_1109_TIT_2022_3152182
crossref_primary_10_1109_TSP_2019_2952051
crossref_primary_10_1109_TIT_2021_3137266
crossref_primary_10_1109_TIT_2024_3373128
crossref_primary_10_1109_TIT_2021_3068165
crossref_primary_10_1109_TNET_2020_2973224
crossref_primary_10_1109_TIT_2021_3050526
crossref_primary_10_1109_TNET_2021_3058685
crossref_primary_10_1109_TSIPN_2020_3044955
crossref_primary_10_1109_TC_2021_3063180
crossref_primary_10_1109_JSAIT_2022_3180941
crossref_primary_10_3390_e22050544
crossref_primary_10_1109_ACCESS_2022_3176385
crossref_primary_10_1109_TSP_2022_3185905
crossref_primary_10_1109_JSAC_2021_3126057
crossref_primary_10_1109_TPDS_2023_3295184
crossref_primary_10_1109_TVT_2021_3070723
crossref_primary_10_1109_TCOMM_2021_3087628
crossref_primary_10_1109_TETCI_2022_3170471
crossref_primary_10_1109_TIT_2021_3133791
crossref_primary_10_1109_TCOMM_2021_3107432
crossref_primary_10_1109_TCOMM_2024_3446641
crossref_primary_10_1109_JIOT_2020_3045277
crossref_primary_10_1109_TIT_2021_3095909
crossref_primary_10_1109_ACCESS_2020_3032637
crossref_primary_10_3390_e24091284
crossref_primary_10_1109_JSAIT_2022_3190859
crossref_primary_10_1109_TIT_2018_2888494
crossref_primary_10_1109_TIT_2022_3158868
crossref_primary_10_1109_TVT_2024_3493105
crossref_primary_10_1109_TNET_2020_3047513
crossref_primary_10_1109_TIT_2020_3029396
crossref_primary_10_1016_j_asej_2022_101745
crossref_primary_10_1109_TNET_2021_3122873
crossref_primary_10_1109_TMC_2024_3359040
crossref_primary_10_1109_TWC_2023_3331263
crossref_primary_10_1109_TIT_2019_2929328
crossref_primary_10_1109_JSAC_2021_3118432
crossref_primary_10_1109_TNET_2021_3075377
crossref_primary_10_1109_TPDS_2023_3276888
crossref_primary_10_1109_TIT_2022_3204488
crossref_primary_10_1109_COMST_2021_3091684
crossref_primary_10_1109_TIT_2025_3536323
crossref_primary_10_1109_TPDS_2023_3237752
crossref_primary_10_1109_TCOMM_2023_3236385
crossref_primary_10_1109_TWC_2021_3109448
crossref_primary_10_1109_TCOMM_2022_3165201
crossref_primary_10_1109_TIFS_2023_3326970
crossref_primary_10_1109_TIT_2023_3247559
crossref_primary_10_1109_TSIPN_2022_3163931
crossref_primary_10_1109_TVT_2021_3131395
crossref_primary_10_1016_j_ultrasmedbio_2020_06_001
crossref_primary_10_1109_TIFS_2024_3450288
crossref_primary_10_1109_JSAIT_2021_3052934
crossref_primary_10_1109_JSAIT_2021_3103770
crossref_primary_10_1109_JSAIT_2022_3181144
crossref_primary_10_1109_MSP_2020_2974149
crossref_primary_10_1109_JSAIT_2021_3103772
crossref_primary_10_1109_TCOMM_2022_3227286
crossref_primary_10_1145_3617995
crossref_primary_10_1109_TSIPN_2020_2998223
crossref_primary_10_1109_TVT_2024_3409554
crossref_primary_10_1109_LSP_2023_3299531
crossref_primary_10_1109_TIFS_2022_3147638
crossref_primary_10_1109_TVT_2022_3204839
crossref_primary_10_1109_TNSE_2022_3228322
crossref_primary_10_1002_rnc_6048
crossref_primary_10_1109_JSAC_2021_3078494
crossref_primary_10_1049_iet_cdt_2018_5008
crossref_primary_10_1109_TIT_2023_3283967
crossref_primary_10_1109_TPDS_2025_3539620
crossref_primary_10_1109_TNET_2021_3109097
Cites_doi 10.1109/ISIT.2017.8006991
10.1109/ICC.2014.6883597
10.1109/ISIT.2011.6033733
10.1109/Allerton.2011.6120328
10.1109/ISIT.2017.8006963
10.1109/ISIT.2015.7282699
10.1109/INFOCOM.2015.7218426
10.1109/JSAC.2014.140518
10.1145/2639108.2639140
10.1109/TIT.2017.2674671
10.1145/2408776.2408794
10.14778/2212351.2212354
10.1109/ISIT.2012.6284027
10.1109/TNET.2014.2317316
10.1145/2640087.2644155
10.1109/INFOCOM.2014.6848010
10.1109/GLOCOMW.2016.7848828
10.1017/CBO9781139042918
10.1109/ISIT.2012.6284026
10.1109/ISIT.2017.8006962
10.1145/2637364.2592042
10.14778/2535573.2488339
10.1109/ISIT.2014.6875212
10.1007/978-3-642-35289-8_25
10.1109/ALLERTON.2015.7447111
10.1109/ICDM.2013.158
10.1109/TIT.2013.2288784
10.1109/TIT.2006.874540
10.1145/2740070.2626325
10.1109/TAC.2008.2009515
10.1109/ISIT.2007.4557599
10.1109/ISIT.2016.7541612
10.1109/Allerton.2013.6736597
10.1145/2043164.2018442
10.1109/TIT.2011.2105003
10.1109/ISIT.2016.7541478
10.1145/2796314.2745873
10.1109/TIT.2014.2306938
10.1109/ITW.2013.6691247
10.1109/Allerton.2012.6483348
10.1109/TAC.2011.2161027
10.1109/TIT.2010.2054295
10.1109/TIT.2011.2159049
10.1109/IPDPSW.2015.50
10.1109/ALLERTON.2016.7852338
10.1145/3055281
10.1109/ALLERTON.2016.7852337
10.1109/INFCOM.2012.6195703
10.1145/2413176.2413199
10.1109/NCA.2007.37
10.1109/INFCOM.2011.5934901
10.1109/GLOCOM.2016.7841903
10.1109/TSP.2012.2198470
10.1561/2200000016
10.1109/ISIT.2012.6284028
10.1109/ISIT.2017.8007060
10.1017/CBO9780511804441
10.1007/s12532-013-0053-8
10.1109/TIT.2012.2208937
10.1109/TIT.2010.2103753
10.14778/2732977.2733001
10.1109/ALLERTON.2015.7447112
10.1109/ALLERTON.2015.7446991
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TIT.2017.2736066
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 1557-9654
EndPage 1529
ExternalDocumentID 10_1109_TIT_2017_2736066
8002642
Genre orig-research
GrantInformation_xml – fundername: Korea government(MSIT) (Coding for High-Speed Distributed Networks)
  grantid: 2017-0-00694
– fundername: Brain Korea 21 Plus Project, and NSF CIF grant (Foundations of coding for modern distributed computing)
  grantid: 1703678
– fundername: Institute for Information & communications Technology Promotion(IITP) grant
GroupedDBID -~X
.DC
0R~
29I
3EH
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFO
ACGFS
ACGOD
ACIWK
AENEX
AETEA
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
HZ~
H~9
IAAWW
IBMZZ
ICLAB
IDIHD
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
RXW
TAE
TN5
VH1
VJK
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
RIG
ID FETCH-LOGICAL-c291t-e96a842810c00609f19e6c3d4a8cc358f237430b36c9c1aa5b1f7d6f17b44bad3
IEDL.DBID RIE
ISSN 0018-9448
IngestDate Mon Jun 30 02:00:29 EDT 2025
Thu Apr 24 23:09:07 EDT 2025
Wed Aug 27 16:29:09 EDT 2025
Wed Aug 27 07:40:17 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c291t-e96a842810c00609f19e6c3d4a8cc358f237430b36c9c1aa5b1f7d6f17b44bad3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-1126-0292
0000-0002-3360-9678
PQID 2006249929
PQPubID 36024
PageCount 16
ParticipantIDs crossref_citationtrail_10_1109_TIT_2017_2736066
crossref_primary_10_1109_TIT_2017_2736066
ieee_primary_8002642
proquest_journals_2006249929
PublicationCentury 2000
PublicationDate 2018-03-01
PublicationDateYYYYMMDD 2018-03-01
PublicationDate_xml – month: 03
  year: 2018
  text: 2018-03-01
  day: 01
PublicationDecade 2010
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on information theory
PublicationTitleAbbrev TIT
PublicationYear 2018
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref57
ref13
ref12
ref59
kraska (ref58) 2013
ref15
ref14
ref53
ref52
ref55
ref11
rashmi (ref24) 2013
ref54
reisizadehmobarakeh (ref48) 0
ananthanarayanan (ref26) 2010
ref17
ref16
ref19
gürbüzbalaban (ref64) 2015
ref51
huang (ref22) 2012
ioffe (ref65) 2015
ref46
ref89
cover (ref78) 2012
ref47
ref86
ref42
ref41
recht (ref29) 2011
ref88
ref87
ref43
cadambe (ref10) 2011
ref49
ref8
ref7
lin (ref79) 2004; 2
bertsekas (ref50) 1999
ref9
(ref85) 2015
ref6
ref5
zaharia (ref3) 2010
ref81
ref40
dean (ref56) 2012
ref80
ref35
ref34
ref37
ref36
dean (ref4) 2004
ref75
ref31
zaharia (ref27) 2008
ref74
ref77
ref33
ref76
ref32
ref2
ref1
ref39
ref38
(ref84) 2015
kamath (ref18) 2012
agarwal (ref28) 2011
(ref83) 2015
tandon (ref45) 2016
ref71
ref70
ref73
ref72
ref68
ref67
ref23
dutta (ref44) 2016
ref69
ref20
ref63
ref66
ref21
ananthanarayanan (ref30) 2013
rashmi (ref25) 2014
ref60
ref62
ref61
meng (ref82) 2016; 17
References_xml – ident: ref76
  doi: 10.1109/ISIT.2017.8006991
– ident: ref68
  doi: 10.1109/ICC.2014.6883597
– ident: ref9
  doi: 10.1109/ISIT.2011.6033733
– start-page: 1232
  year: 2012
  ident: ref56
  article-title: Large scale distributed deep networks
  publication-title: Proc 26th Annu Conf Neural Inf Process Syst (NIPS)
– ident: ref11
  doi: 10.1109/Allerton.2011.6120328
– year: 2013
  ident: ref24
  article-title: A solution to the network challenges of data recovery in erasure-coded distributed storage systems: A study on the Facebook warehouse cluster
  publication-title: Proc USENIX HotStorage
– ident: ref47
  doi: 10.1109/ISIT.2017.8006963
– ident: ref42
  doi: 10.1109/ISIT.2015.7282699
– ident: ref40
  doi: 10.1109/INFOCOM.2015.7218426
– year: 2011
  ident: ref10
  publication-title: Optimal Repair of MDS Codes in Distributed Storage Via Subspace Interference Alignment
– ident: ref39
  doi: 10.1109/JSAC.2014.140518
– ident: ref88
  doi: 10.1145/2639108.2639140
– ident: ref38
  doi: 10.1109/TIT.2017.2674671
– ident: ref5
  doi: 10.1145/2408776.2408794
– start-page: 2
  year: 2013
  ident: ref58
  article-title: MLbase: A distributed machine-learning system
  publication-title: Proc Third Biennial Conf Innovative Data Systems Research (CIDR)
– ident: ref57
  doi: 10.14778/2212351.2212354
– year: 2015
  ident: ref65
  publication-title: Batch Normalization Accelerating Deep Network Training by Reducing Internal Covariate Shift
– year: 1999
  ident: ref50
  publication-title: Nonlinear Programming
– ident: ref17
  doi: 10.1109/ISIT.2012.6284027
– ident: ref67
  doi: 10.1109/TNET.2014.2317316
– year: 0
  ident: ref48
  publication-title: Coded computation over heterogeneous clusters
– ident: ref60
  doi: 10.1145/2640087.2644155
– year: 2015
  ident: ref85
  publication-title: BLAS (Basic Linear Algebra Subprograms)
– year: 2012
  ident: ref78
  publication-title: Elements of Information Theory
– start-page: 95
  year: 2010
  ident: ref3
  article-title: Spark: Cluster computing with working sets
  publication-title: Proc of 2nd USENIX Workshop on Hot Topics in Cloud Computing (HotCloud)
– ident: ref80
  doi: 10.1109/INFOCOM.2014.6848010
– ident: ref77
  doi: 10.1109/GLOCOMW.2016.7848828
– ident: ref53
  doi: 10.1017/CBO9781139042918
– start-page: 2092
  year: 2016
  ident: ref44
  article-title: Short-dot: Computing large linear transforms distributedly using coded short dot products
  publication-title: Proc Adv Neural Inf Process Syst
– start-page: 185
  year: 2013
  ident: ref30
  article-title: Effective straggler mitigation: Attack of the clones
  publication-title: Proc 12th USENIX Symp Netw Syst Des Implement (NSDI
– ident: ref37
  doi: 10.1109/ISIT.2012.6284026
– year: 2015
  ident: ref64
  publication-title: Why random reshuffling beats stochastic gradient descent
– ident: ref49
  doi: 10.1109/ISIT.2017.8006962
– ident: ref32
  doi: 10.1145/2637364.2592042
– volume: 2
  year: 2004
  ident: ref79
  publication-title: Error Control Coding
– ident: ref23
  doi: 10.14778/2535573.2488339
– ident: ref69
  doi: 10.1109/ISIT.2014.6875212
– volume: 17
  start-page: 1235
  year: 2016
  ident: ref82
  article-title: Mllib: Machine learning in apache spark
  publication-title: J Mach Learn Res
– ident: ref62
  doi: 10.1007/978-3-642-35289-8_25
– ident: ref41
  doi: 10.1109/ALLERTON.2015.7447111
– ident: ref59
  doi: 10.1109/ICDM.2013.158
– ident: ref19
  doi: 10.1109/TIT.2013.2288784
– ident: ref72
  doi: 10.1109/TIT.2006.874540
– start-page: 331
  year: 2014
  ident: ref25
  article-title: A hitchhiker's guide to fast and efficient data reconstruction in erasure-coded data centers
  publication-title: Proc ACM Conf SIGCOMM
  doi: 10.1145/2740070.2626325
– ident: ref51
  doi: 10.1109/TAC.2008.2009515
– ident: ref15
  doi: 10.1109/ISIT.2007.4557599
– ident: ref86
  doi: 10.1109/ISIT.2016.7541612
– year: 2012
  ident: ref18
  publication-title: Codes with Local Regeneration
– ident: ref31
  doi: 10.1109/Allerton.2013.6736597
– ident: ref87
  doi: 10.1145/2043164.2018442
– ident: ref8
  doi: 10.1109/TIT.2011.2105003
– start-page: 137
  year: 2004
  ident: ref4
  article-title: MapReduce: Simplified data processing on large clusters
  publication-title: Proc of the 5th Symp on Oper Syst Design and Implementation (OSDI)
– ident: ref2
  doi: 10.1109/ISIT.2016.7541478
– ident: ref33
  doi: 10.1145/2796314.2745873
– ident: ref66
  doi: 10.1109/TIT.2014.2306938
– ident: ref70
  doi: 10.1109/ITW.2013.6691247
– ident: ref21
  doi: 10.1109/Allerton.2012.6483348
– start-page: 873
  year: 2011
  ident: ref28
  article-title: Distributed delayed stochastic optimization
  publication-title: Proc 25th Annu Conf Neural Inf Process Syst (NIPS)
– ident: ref54
  doi: 10.1109/TAC.2011.2161027
– ident: ref6
  doi: 10.1109/TIT.2010.2054295
– start-page: 693
  year: 2011
  ident: ref29
  article-title: Hogwild: A lock-free approach to parallelizing stochastic gradient descent
  publication-title: Proc 25th Annu Conf Neural Inf Process (NIPS)
– ident: ref7
  doi: 10.1109/TIT.2011.2159049
– ident: ref34
  doi: 10.1109/IPDPSW.2015.50
– ident: ref75
  doi: 10.1109/ALLERTON.2016.7852338
– ident: ref36
  doi: 10.1145/3055281
– ident: ref43
  doi: 10.1109/ALLERTON.2016.7852337
– ident: ref14
  doi: 10.1109/INFCOM.2012.6195703
– start-page: 15
  year: 2012
  ident: ref22
  article-title: Erasure coding in windows azure storage
  publication-title: Proc USENIX Annu Tech Conf (ATC)
– ident: ref89
  doi: 10.1145/2413176.2413199
– ident: ref16
  doi: 10.1109/NCA.2007.37
– ident: ref13
  doi: 10.1109/INFCOM.2011.5934901
– ident: ref74
  doi: 10.1109/GLOCOM.2016.7841903
– ident: ref55
  doi: 10.1109/TSP.2012.2198470
– ident: ref52
  doi: 10.1561/2200000016
– ident: ref20
  doi: 10.1109/ISIT.2012.6284028
– ident: ref46
  doi: 10.1109/ISIT.2017.8007060
– ident: ref81
  doi: 10.1017/CBO9780511804441
– start-page: 265
  year: 2010
  ident: ref26
  article-title: Reining in the outliers in Map-Reduce clusters using Mantri
  publication-title: Proc 9th USENIX Conf Oper Syst Des Implement (OSDI)
– start-page: 29
  year: 2008
  ident: ref27
  article-title: Improving MapReduce performance in heterogeneous environments
  publication-title: Proc 9th USENIX Conf Oper Syst Des Implement (OSDI)
– ident: ref1
  doi: 10.1109/ISIT.2016.7541478
– year: 2015
  ident: ref83
  publication-title: Open MPI Open Source High Performance Computing
– ident: ref61
  doi: 10.1007/s12532-013-0053-8
– ident: ref12
  doi: 10.1109/TIT.2012.2208937
– ident: ref73
  doi: 10.1109/TIT.2010.2103753
– ident: ref63
  doi: 10.14778/2732977.2733001
– year: 2015
  ident: ref84
  publication-title: StarCluster
– year: 2016
  ident: ref45
  publication-title: Gradient coding
– ident: ref71
  doi: 10.1109/ALLERTON.2015.7447112
– ident: ref35
  doi: 10.1109/ALLERTON.2015.7446991
SSID ssj0014512
Score 2.6960359
Snippet Codes are widely used in many engineering applications to offer robustness against noise . In large-scale systems, there are several types of noise that can...
Codes are widely used in many engineering applications to offer robustness against noise. In large-scale systems, there are several types of noise that can...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 1514
SubjectTerms Algorithm design and analysis
Algorithms
Artificial intelligence
channel coding
Codes
Communication
Communications systems
Computer networks
distributed computing
Distributed databases
Encoding
Machine learning
Machine learning algorithms
Machine tools
Multicast communication
Multicasting
Multiplication
Multiplication & division
Robustness
Runtime
System failures
Title Speeding Up Distributed Machine Learning Using Codes
URI https://ieeexplore.ieee.org/document/8002642
https://www.proquest.com/docview/2006249929
Volume 64
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwED21nWCg0IIoFJSBBYmkceI4yYgKVUEqC63ULfJXFlBb0XTh1-NLnIgvIbZIsSXLd_a9Z5_fAVzFksU81syNgyh3KY-oyylnLs21YL4wEFkgUZw9semCPi6jZQtumrcwWusy-Ux7-Fne5au13OFR2QjBjcHLbWgb4la91WpuDGhEKmVwYhaw4Rz1laSfjuYPc8zhij0TqhGvfwlBZU2VHxtxGV0mXZjV46qSSl68XSE8-f5NsvG_Az-EAwszndvKL46gpVc96NYlHBy7onuw_0mPsA_0eVMFM2exce5QUherYWnlzMqUS-1YNVbzH48YnPFa6e0xLCb38_HUtWUVXBmkpHB1ynhiWAfxJaqxpDlJNZOhojyRMoySPAgNrPBFyGQqCeeRIHmsWE5iQangKjyBzmq90qfgCMNp8XVrKJQhZkInCg9QUj-UkiYhUQMY1TOdSas5jqUvXrOSe_hpZmyToW0ya5sBXDc9NpXexh9t-zjVTTs7ywMY1sbM7ILcYrVNZpimAYNnv_c6h70AfadMLxtCp3jb6QuDNwpxWTraB62bzoI
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwED6VMgADhQKiUCADCxJp48RxkhEVqhaaLrRSt8ivLKC2ounCr8eXR8VLiC1SbMnynX3fZ5-_A7gOJAt4oJkduH5qU-5Tm1PObJpqwRxhILJAohiP2WBKH2f-rAa3m7cwWus8-Ux38DO_y1cLucajsi6CG4OXt2DbxH2fFK-1NncG1CeFNjgxS9iwjupS0om6k-EEs7iCjgnWiNi_BKG8qsqPrTiPL_0GxNXIirSSl846Ex35_k208b9DP4D9Emhad4VnHEJNz5vQqIo4WOWabsLeJ0XCI6DPyyKcWdOldY-iulgPSysrzpMutVXqsZr_eMhg9RZKr45h2n-Y9AZ2WVjBlm5EMltHjIeGdxBHoh5LlJJIM-kpykMpPT9MXc8AC0d4TEaScO4LkgaKpSQQlAquvBOozxdzfQqWMKwW37d6QhlqJnSo8AglcjwpaegR1YJuNdOJLFXHsfjFa5KzDydKjG0StE1S2qYFN5sey0Jx44-2RzjVm3blLLegXRkzKZfkCuttMsM1DRw8-73XFewMJvEoGQ3HT-ew66If5clmbahnb2t9YdBHJi5zp_sAopXRyw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Speeding+Up+Distributed+Machine+Learning+Using+Codes&rft.jtitle=IEEE+transactions+on+information+theory&rft.au=Lee%2C+Kangwook&rft.au=Lam%2C+Maximilian&rft.au=Pedarsani%2C+Ramtin&rft.au=Papailiopoulos%2C+Dimitris&rft.date=2018-03-01&rft.pub=The+Institute+of+Electrical+and+Electronics+Engineers%2C+Inc.+%28IEEE%29&rft.issn=0018-9448&rft.eissn=1557-9654&rft.volume=64&rft.issue=3&rft.spage=1514&rft_id=info:doi/10.1109%2FTIT.2017.2736066&rft.externalDBID=NO_FULL_TEXT
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0018-9448&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0018-9448&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0018-9448&client=summon