ADAPTIVE FRAME BATCHING TO REDUCE SPEECH RECOGNITION LATENCY

Embodiments may include collection of a first batch of acoustic feature frames of an audio signal, the number of acoustic feature frames of the first batch equal to a first batch size, input of the first batch to a speech recognition network, collection, in response to detection of a word hypothesis...

Full description

Saved in:

Bibliographic Details
Main Authors	GONG, YIFAN, PATHAK, SAYAN, STOIMENOV, EMILIAN Y, KHALIL, HOSAM A, LIU, CHAOJUN, AGARWAL, AMIT K, BASOGLU, CHRISTOPHER H, PARIHAR, NAVEEN
Format	Patent
Language	English French
Published	22.07.2021
Subjects	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Embodiments may include collection of a first batch of acoustic feature frames of an audio signal, the number of acoustic feature frames of the first batch equal to a first batch size, input of the first batch to a speech recognition network, collection, in response to detection of a word hypothesis output by the speech recognition network, of a second batch of acoustic feature frames of the audio signal, the number of acoustic feature frames of the second batch equal to a second batch size greater than the first batch size, and input of the second batch to the speech recognition network. Selon l'invention, des modes de réalisation peuvent comprendre la collecte d'un premier lot de trames de caractéristiques acoustiques d'un signal audio, le nombre de trames de caractéristiques acoustiques du premier lot étant égal à une première taille de lot, l'entrée du premier lot dans un réseau de reconnaissance de la parole, la collecte, en réponse à la détection d'une hypothèse de mot produite par le réseau de reconnaissance vocale, d'un second lot de trames de caractéristiques acoustiques du signal audio, le nombre de trames de caractéristiques acoustiques du second lot étant égal à une seconde taille de lot supérieure à la première taille de lot, et l'entrée du second lot dans le réseau de reconnaissance de parole.
Bibliography:	Application Number: CA20203166381