Tightly Coupled Machine Learning Coprocessor Architecture With Analog In-Memory Computing for Instruction-Level Acceleration

Low-profile mobile computing platforms often need to execute a variety of machine learning algorithms with limited memory and processing power. To address this challenge, this work presents Coara, an instruction-level processor acceleration architecture, which efficiently integrates an approximate a...

Full description

Saved in:

Bibliographic Details
Published in	IEEE journal on emerging and selected topics in circuits and systems Vol. 9; no. 3; pp. 544 - 561
Main Authors	Chung, SungWon, Wang, Jiemi
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.09.2019 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Acceleration Accelerators Algorithms analog datapath analog register file approximate analog computing Artificial intelligence CMOS Compilers Computer architecture Coprocessors Data conversion deep learning Dependence Hardware in-memory computing Machine learning Machine learning hardware accelerator Microprocessors Mobile computing programmable accelerator Registers Software switched capacitor circuit tightly coupled coprocessor
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Low-profile mobile computing platforms often need to execute a variety of machine learning algorithms with limited memory and processing power. To address this challenge, this work presents Coara, an instruction-level processor acceleration architecture, which efficiently integrates an approximate analog in-memory computing coprocessor for accelerating general machine learning applications by exploiting analog register file cache. The instruction-level acceleration offers true programmability beyond the degree of freedom provided by reconfigurable machine learning accelerators, and also allows the code generation stage of a compiler back-end to control the coprocessor execution and data flow, so that applications do not need high-level machine learning software frameworks with a large memory footprint. Conventional analog and mixed-signal accelerators suffer from the overhead of frequent data conversion between analog and digital signals. To solve this classical problem, Coara uses an analog register file cache, which interfaces the analog in-memory computing coprocessor with the digital register file of the processor core. As a result, more than 90% of data conversion overhead with ADC and DAC can be eliminated by temporarily storing the result of analog computation in a switched-capacitor analog memory cell until data dependency occurs. Cycle-accurate Verilog RTL model of the proposed architecture is evaluated with 45 nm CMOS technology parameters while executing machine learning benchmark computation codes that are generated by a customized cross-compiler without using machine learning software frameworks.
ISSN:	2156-3357 2156-3365
DOI:	10.1109/JETCAS.2019.2934929