A 0.8 V Intelligent Vision Sensor With Tiny Convolutional Neural Network and Programmable Weights Using Mixed-Mode Processing-in-Sensor Technique for Image Classification

This article presents an intelligent vision sensor (IVS) with embedded tiny convolutional neural network (CNN) model and programmable processing-in-sensor (PIS) circuit for real-time inference applications of low-power edge devices. The proposed imager realizes the full computing functions of a cust...

Full description

Saved in:
Bibliographic Details
Published inIEEE journal of solid-state circuits Vol. 58; no. 11; pp. 1 - 9
Main Authors Hsu, Tzu-Hsiang, Chen, Guan-Cheng, Chen, Yi-Ren, Liu, Ren-Shuo, Lo, Chung-Chuan, Tang, Kea-Tiong, Chang, Meng-Fan, Hsieh, Chih-Cheng
Format Journal Article
LanguageEnglish
Published New York IEEE 01.11.2023
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This article presents an intelligent vision sensor (IVS) with embedded tiny convolutional neural network (CNN) model and programmable processing-in-sensor (PIS) circuit for real-time inference applications of low-power edge devices. The proposed imager realizes the full computing functions of a customized three-layers tiny network, which includes a <inline-formula> <tex-math notation="LaTeX">3 \times 3</tex-math> </inline-formula> convolution layer (stride <inline-formula> <tex-math notation="LaTeX">=</tex-math> </inline-formula> 3) with activation function of rectified linear unit (ReLU), a <inline-formula> <tex-math notation="LaTeX">2 \times 2</tex-math> </inline-formula> maximum pooling (MP) layer (stride <inline-formula> <tex-math notation="LaTeX">=</tex-math> </inline-formula> 2), and a <inline-formula> <tex-math notation="LaTeX">1 \times 1</tex-math> </inline-formula> fully connected (FC) layer for inference. A 0.8 V <inline-formula> <tex-math notation="LaTeX">128 \times 128</tex-math> </inline-formula> IVS prototype was fabricated and verified in TSMC 0.18 <inline-formula> <tex-math notation="LaTeX">\mu</tex-math> </inline-formula>m standard CMOS technology. In normal image mode, it consumed 76.4 <inline-formula> <tex-math notation="LaTeX">\mu</tex-math> </inline-formula>W with full-resolution (<inline-formula> <tex-math notation="LaTeX">126 \times 126</tex-math> </inline-formula> active resolution) image output at 125 f/s. In CNN mode, it consumed 134.5 <inline-formula> <tex-math notation="LaTeX">\mu</tex-math> </inline-formula>W at 250 f/s and an achieved iFoMs of 33.8 pJ/pixel<inline-formula> <tex-math notation="LaTeX">\cdot</tex-math> </inline-formula>frame. Using the proposed mixed-mode PIS circuits, the prototype is configured to demonstrate a "human face or not detection" task with an achieved accuracy of 93.6%.
ISSN:0018-9200
1558-173X
DOI:10.1109/JSSC.2023.3285734