Using Quantized Neural Network for Speaker Recognition on Edge Computing Devices

Abstract Most successful CNN architectures are deep networks, and their intensive memory and processing requirements have made it difficult to deploy them to microcontrollers or other real-time systems in which memory footprint and power consumption may not be neglected. Consequently, many efforts h...

Full description

Saved in:
Bibliographic Details
Published inJournal of physics. Conference series Vol. 1992; no. 2; pp. 22177 - 22184
Main Author Dai, Tongwei
Format Journal Article
LanguageEnglish
Published Bristol IOP Publishing 01.08.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Abstract Most successful CNN architectures are deep networks, and their intensive memory and processing requirements have made it difficult to deploy them to microcontrollers or other real-time systems in which memory footprint and power consumption may not be neglected. Consequently, many efforts have been made to adapt deep networks to such contexts, either through hardware specialization and optimization or through architectural modifications. This paper concentrates on the application of CNNs in the field of voice recognition and verification, marked by the works of Simonyan and Zisserman and their proposed VGGNet neural network. As the mentioned model is commonly trained on an audio-visual dataset known as VoxCeleb, this paper further evaluates and improve its performance on a challenging Chinese-speaking audio dataset collected in various media with little preprocessing. Quantization is used to reduce its memory usage and to gear it towards edge computing applications.
ISSN:1742-6588
1742-6596
DOI:10.1088/1742-6596/1992/2/022177