Bus Width Aware Off-Chip Memory Access Minimization for CNN Accelerators

Convolutional Neural Network (CNN) accelerators have gained popularity due to their ability to speed up the CNN based applications. However, the energy efficiency of these accelerators is limiting their ubiquitous usage in energy-constrained devices. A significant fraction of their energy consumptio...

Full description

Saved in:
Bibliographic Details
Published in2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) pp. 240 - 245
Main Authors Tewari, Saurabh, Kumar, Anshul, Paul, Kolin
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.07.2020
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Convolutional Neural Network (CNN) accelerators have gained popularity due to their ability to speed up the CNN based applications. However, the energy efficiency of these accelerators is limiting their ubiquitous usage in energy-constrained devices. A significant fraction of their energy consumption results from off-chip memory accesses. In order to get high throughput, these accelerators connect to off-chip memory by a wide data bus. However, accessing the data of size, not a multiple of the bus width, results in wastage of energy. We observed that off-chip memory accesses could be reduced significantly by partitioning the data that optimally utilizes bus width and increases the number of aligned accesses. In this work, we propose a bus width aware approach to determine the optimal partition of the convolution layers to reduce the off-chip memory accesses. Our tool evaluates the off-chip memory accesses for different data partitions, and data reuse schemes to find the optimal partition. We have experimented with two popular CNNs, VGG16 and AlexNet. Our approach reduces off-chip memory accesses of VGG16 by 16% and 29% and of AlexNet by 9% and 16% on 64 and 128 bits data bus, respectively, compared to the state of the art approach.
ISSN:2159-3477
DOI:10.1109/ISVLSI49217.2020.00051