A mobile Deep Sparse Wavelet autoencoder for Arabic acoustic unit modeling and recognition

In this manuscript, we introduce a novel methodology for modeling acoustic units within a mobile architecture, employing a synergistic combination of various motivating techniques, including deep learning, sparse coding, and wavelet networks. The core concept involves constructing a Deep Sparse Wave...

Full description

Saved in:

Bibliographic Details
Published in	Heliyon Vol. 10; no. 4; p. e26583
Main Authors	Alzakari, Sarah A., Hassairi, Salima, Ali Alhussan, Amel, Ejbali, Ridha
Format	Journal Article
Language	English
Published	England Elsevier Ltd 29.02.2024 Elsevier
Subjects	Acoustic units Deep learning Deep sparse wavelet networks Mel-frequency cepstral coefficients Mobile architecture Perceptual linear predictive Stacked wavelet autoencoders Wavelet networks Deep learning Perceptual linear predictive Mel-frequency cepstral coefficients Stacked wavelet autoencoders Acoustic units Mobile architecture Wavelet networks Deep sparse wavelet networks
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this manuscript, we introduce a novel methodology for modeling acoustic units within a mobile architecture, employing a synergistic combination of various motivating techniques, including deep learning, sparse coding, and wavelet networks. The core concept involves constructing a Deep Sparse Wavelet Network (DSWN) through the integration of stacked wavelet autoencoders. The DSWN is designed to classify a specific class and discern it from other classes within a dataset of acoustic units. Mel-frequency cepstral coefficients (MFCC) and perceptual linear predictive (PLP) features are utilized for encoding speech units. This approach is tailored to leverage the computational capabilities of mobile devices by establishing deep networks with minimal connections, thereby immediately reducing computational overhead. The experimental findings demonstrate the efficacy of our system when applied to a segmented corpus of Arabic words. Notwithstanding promising results, we will explore the limitations of our methodology. One limitation concerns the use of a specific dataset of Arabic words. The generalizability of the sparse deep wavelet network (DSWN) to various contexts requires further investigation “We will evaluate the impact of speech variations, such as accents, on the performance of our model, for a nuanced understanding.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	2405-8440 2405-8440
DOI:	10.1016/j.heliyon.2024.e26583