Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems

Transferring the knowledge of large language models (LLMs) is a promising technique to incorporate linguistic knowledge into end-to-end automatic speech recognition (ASR) systems. However, existing works only transfer a single representation of LLM (e.g. the last layer of pretrained BERT), while the...

Full description

Saved in:

Bibliographic Details
Published in	ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 10176 - 10180
Main Authors	Udagawa, Takuma, Suzuki, Masayuki, Kurata, Gakuto, Muraoka, Masayasu, Saon, George
Format	Conference Proceeding
Language	English
Published	IEEE 14.04.2024
Subjects	Acoustics Automatic speech recognition BERT Context modeling knowledge distillation large language models Linguistics Signal processing Speech processing Switches Transducers
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Transferring the knowledge of large language models (LLMs) is a promising technique to incorporate linguistic knowledge into end-to-end automatic speech recognition (ASR) systems. However, existing works only transfer a single representation of LLM (e.g. the last layer of pretrained BERT), while the representation of a text is inherently non-unique and can be obtained variously from different layers, contexts and models. In this work, we explore a wide range of techniques to obtain and transfer multiple representations of LLMs into a transducer-based ASR system. While being conceptually simple, we show that transferring multiple representations of LLMs can be an effective alternative to transferring only a single LLM representation.
ISSN:	2379-190X
DOI:	10.1109/ICASSP48485.2024.10448022