Speech Perception Improvement Algorithm Based on a Dual-Path Long Short-Term Memory Network

Current deep learning-based speech enhancement methods focus on enhancing the time–frequency representation of the signal. However, conventional methods can lead to speech damage due to resolution mismatch problems that emphasize only specific information in the time or frequency domain. To address...

Full description

Saved in:

Bibliographic Details
Published in	Bioengineering (Basel) Vol. 10; no. 11; p. 1325
Main Authors	Koh, Hyeong Il, Na, Sungdae, Kim, Myoung Nam
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.11.2023
Subjects	Algorithms Bioengineering Datasets Deep learning dual-path network encoder–decoder structure Fourier transforms Frequency domain analysis Intelligibility Long short-term memory LSTM Neural networks Semantics Signal processing spectral extension block Speech speech enhancement Speech perception Speech processing STFT Time-frequency analysis Waveforms South Korea
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Current deep learning-based speech enhancement methods focus on enhancing the time–frequency representation of the signal. However, conventional methods can lead to speech damage due to resolution mismatch problems that emphasize only specific information in the time or frequency domain. To address these challenges, this paper introduces a speech enhancement model designed with a dual-path structure that identifies key speech characteristics in both the time and time–frequency domains. Specifically, the time path aims to model semantic features hidden in the waveform, while the time–frequency path attempts to compensate for the spectral details via a spectral extension block. These two paths enhance temporal and spectral features via mask functions modeled as LSTM, respectively, offering a comprehensive approach to speech enhancement. Experimental results show that the proposed dual-path LSTM network consistently outperforms conventional single-domain speech enhancement methods in terms of speech quality and intelligibility.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2306-5354 2306-5354
DOI:	10.3390/bioengineering10111325