Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing

Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors—including NVIDIA, Intel, AMD, and IBM—have architectural road maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating lar...

Full description

Saved in:

Bibliographic Details
Published in	Future generation computer systems Vol. 108; no. C
Main Authors	Gawande, Nitin A., Daily, Jeff A., Siegel, Charles, Tallent, Nathan R., Vishnu, Abhinav
Format	Journal Article
Language	English
Published	United States Elsevier 05.05.2018
Subjects	Caffe Convolutional neural networks Deep learning Intel Knights Landing maTEx MATHEMATICS AND COMPUTING NVIDIA DGX-1
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors—including NVIDIA, Intel, AMD, and IBM—have architectural road maps influenced by DL workloads. Furthermore, several vendors have recently advertised new computing products as accelerating large DL workloads. Unfortunately, it is difficult for data scientists to quantify the potential of these different products. Here, this article provides a performance and power analysis of important DL workloads on two major parallel architectures: NVIDIA DGX-1 (eight Pascal P100 GPUs interconnected with NVLink) and Intel Knights Landing (KNL) CPUs interconnected with Intel Omni-Path or Cray Aries. Our evaluation consists of a cross section of convolutional neural net workloads: CifarNet, AlexNet, GoogLeNet, and ResNet50 topologies using the Cifar10 and ImageNet datasets. The workloads are vendor-optimized for each architecture. We use sequentially equivalent implementations to maintain iso-accuracy between parallel and sequential DL models. Our analysis indicates that although GPUs provide the highest overall performance, the gap can close for some convolutional networks; and the KNL can be competitive in performance/watt. We find that NVLink facilitates scaling efficiency on GPUs. However, its importance is heavily dependent on neural network architecture. Furthermore, for weak-scaling—sometimes encouraged by restricted GPU memory—NVLink is less important.
Bibliography:	PNNL-SA-134513 USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR) AC05-76RL01830
ISSN:	0167-739X 1872-7115