iPIM: Programmable In-Memory Image Processing Accelerator Using Near-Bank Architecture

Image processing is becoming an increasingly important domain for many applications on workstations and the datacenter that require accelerators for high performance and energy efficiency. GPU, which is the state-of-the-art accelerator for image processing, suffers from the memory bandwidth bottlene...

Full description

Saved in:

Bibliographic Details
Published in	2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) pp. 804 - 817
Main Authors	Gu, Peng, Xie, Xinfeng, Ding, Yufei, Chen, Guoyang, Zhang, Weifeng, Niu, Dimin, Xie, Yuan
Format	Conference Proceeding
Language	English
Published	IEEE 01.05.2020
Subjects	Accelerator Bandwidth Computer architecture Energy conservation Graphics processing units Hardware Image processing Optimization Pipelines Process-in-memory Registers Resource management
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Image processing is becoming an increasingly important domain for many applications on workstations and the datacenter that require accelerators for high performance and energy efficiency. GPU, which is the state-of-the-art accelerator for image processing, suffers from the memory bandwidth bottleneck. To tackle this bottleneck, near-bank architecture provides a promising solution due to its enormous bank-internal bandwidth and low-energy memory access. However, previous work lacks hardware programmability, while image processing workloads contain numerous heterogeneous pipeline stages with diverse computation and memory access patterns. Enabling programmable near-bank architecture with low hardware overhead remains challenging.This work proposes iPIM, the first programmable in-memory image processing accelerator using near-bank architecture. We first design a decoupled control-execution architecture to provide lightweight programmability support. Second, we propose the SIMB (Single-Instruction-Multiple-Bank) ISA to enable flexible control flow and data access. Third, we present an end-to-end compilation flow based on Halide that supports a wide range of image processing applications and maps them to our SIMB ISA. We further develop iPIM-aware compiler optimizations, including register allocation, instruction reordering, and memory order enforcement to improve performance. We evaluate a set of representative image processing applications on iPIM and demonstrate that on average iPIM obtains 11.02× acceleration and 79.49% energy saving over an NVIDIA Tesla V100 GPU. Further analysis shows that our compiler optimizations contribute 3.19× speedup over the unoptimized baseline.
DOI:	10.1109/ISCA45697.2020.00071