Reveal training performance mystery between TensorFlow and PyTorch in the single GPU environment

Deep learning has gained tremendous success in various fields while training deep neural networks (DNNs) is very compute-intensive, which results in numerous deep learning frameworks that aim to offer better usability and higher performance to deep learning practitioners. TensorFlow and PyTorch are...

Full description

Saved in:
Bibliographic Details
Published inScience China. Information sciences Vol. 65; no. 1; p. 112103
Main Authors Dai, Hulin, Peng, Xuan, Shi, Xuanhua, He, Ligang, Xiong, Qian, Jin, Hai
Format Journal Article
LanguageEnglish
Published Beijing Science China Press 2022
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Deep learning has gained tremendous success in various fields while training deep neural networks (DNNs) is very compute-intensive, which results in numerous deep learning frameworks that aim to offer better usability and higher performance to deep learning practitioners. TensorFlow and PyTorch are the two most popular frameworks. TensorFlow is more promising within the industry context, while PyTorch is more appealing in academia. However, these two frameworks differ much owing to the opposite design philosophy: static vs dynamic computation graph. TensorFlow is regarded as being more performance-friendly as it has more opportunities to perform optimizations with the full view of the computation graph. However, there are also claims that PyTorch is faster than TensorFlow sometimes, which confuses the end-users on the choice between them. In this paper, we carry out the analytical and experimental analysis to unravel the mystery of comparison in training speed on single-GPU between TensorFlow and PyTorch. To ensure that our investigation is as comprehensive as possible, we carefully select seven popular neural networks, which cover computer vision, speech recognition, and natural language processing (NLP). The contributions of this work are two-fold. First, we conduct the detailed benchmarking experiments on TensorFlow and PyTorch and analyze the reasons for their performance difference. This work provides the guidance for the end-users to choose between these two frameworks. Second, we identify some key factors that affect the performance, which can direct the end-users to write their models more efficiently.
ISSN:1674-733X
1869-1919
DOI:10.1007/s11432-020-3182-1