An Energy-Efficient CNN/Transformer Hybrid Neural Semantic Segmentation Processor With Chunk-Based Bit Plane Data Compression and Similarity-Based Token-Level Skipping Exploitation

A novel energy-efficient semantic segmentation (SS) processor is proposed for achieving high system energy efficiency on mobile devices. 1) Excessive external memory access and 2) a large amount of redundant computation hinders energy-efficient SS acceleration. Three key features enable real-time en...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems. I, Regular papers pp. 1 - 12
Main Authors Park, Jongjun, Kim, Seryeong, Park, Wonhoon, Song, Seokchan, Yoo, Hoi-Jun
Format Journal Article
LanguageEnglish
Published IEEE 16.10.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A novel energy-efficient semantic segmentation (SS) processor is proposed for achieving high system energy efficiency on mobile devices. 1) Excessive external memory access and 2) a large amount of redundant computation hinders energy-efficient SS acceleration. Three key features enable real-time energy-efficient CNN/ViT hybrid SS. A new compression method named Chunk-based Bit Plane Compression (CBPC) reduces the memory footprint and energy consumption due to external memory access. CBPC enhances compression ratio by leveraging the high inter-token similarity of feature maps and applying bit plane compression in sign-magnitude data representation, using chunk-wise low-bit plane shared bias. The proposed CBPC encoder/decoder supports CBPC with minimum area overhead. Additionally, the Similar Token Coarse Skipping (STCS) Core enhances the throughput and reduces the computation power by eliminating redundant computations. STCS core employs Row-wise Line Gating for low-power computation and Array-wise Coarse Skipping to minimize redundant computation. As a result, our proposed processor reduces external memory access energy by 67.6% and achieves a core energy efficiency of 19.24 TOPS/W. Our solution achieves 3.55mJ/frame system-level energy efficiency which is 79.7% higher than the previous SOTA SS processor.
ISSN:1549-8328
1558-0806
DOI:10.1109/TCSI.2024.3446662