Textless NLP -- Zero Resource Challenge with Low Resource Compute

This work addresses the persistent challenges of substantial training time and GPU resource requirements even when training lightweight encoder-vocoder models for Textless NLP. We reduce training steps significantly while improving performance by a) leveraging learning rate schedulers for efficient...

Full description

Saved in:

Bibliographic Details
Main Authors	Ramadass, Krithiga, Singh, Abrit Pal, J, Srihari, Kalyani, Sheetal
Format	Journal Article
Language	English
Published	24.09.2024
Subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning Computer Science - Sound
Online Access	Get full text

Cover

Loading…

Abstract	This work addresses the persistent challenges of substantial training time and GPU resource requirements even when training lightweight encoder-vocoder models for Textless NLP. We reduce training steps significantly while improving performance by a) leveraging learning rate schedulers for efficient and faster convergence b) optimizing hop length and c) tuning the interpolation scale factors for better audio quality. Additionally, we explore the latent space representation for Indian languages such as Tamil and Bengali for the acoustic unit discovery and voice conversion task. Our approach leverages a quantized encoder architecture, in conjunction with a vocoder which utilizes the proposed mixture of optimized hop length, tuned interpolation scale factors and a cyclic learning rate scheduler. We obtain consistently good results across English, Tamil and Bengali datasets. The proposed method excels in capturing complex linguistic patterns, resulting in clear reconstructed audio during voice conversion with significantly reduced training time.
AbstractList	This work addresses the persistent challenges of substantial training time and GPU resource requirements even when training lightweight encoder-vocoder models for Textless NLP. We reduce training steps significantly while improving performance by a) leveraging learning rate schedulers for efficient and faster convergence b) optimizing hop length and c) tuning the interpolation scale factors for better audio quality. Additionally, we explore the latent space representation for Indian languages such as Tamil and Bengali for the acoustic unit discovery and voice conversion task. Our approach leverages a quantized encoder architecture, in conjunction with a vocoder which utilizes the proposed mixture of optimized hop length, tuned interpolation scale factors and a cyclic learning rate scheduler. We obtain consistently good results across English, Tamil and Bengali datasets. The proposed method excels in capturing complex linguistic patterns, resulting in clear reconstructed audio during voice conversion with significantly reduced training time.
Author	Singh, Abrit Pal Ramadass, Krithiga J, Srihari Kalyani, Sheetal
Author_xml	– sequence: 1 givenname: Krithiga surname: Ramadass fullname: Ramadass, Krithiga – sequence: 2 givenname: Abrit Pal surname: Singh fullname: Singh, Abrit Pal – sequence: 3 givenname: Srihari surname: J fullname: J, Srihari – sequence: 4 givenname: Sheetal surname: Kalyani fullname: Kalyani, Sheetal
BackLink	https://doi.org/10.48550/arXiv.2409.19015$$DView paper in arXiv
BookMark	eNrjYmDJy89LZWCQNDTQM7EwNTXQTyyqyCzTMzIxsNQztDQwNOVkcAxJrSjJSS0uVvDzCVDQ1VWISi3KVwhKLc4vLUpOVXDOSMzJSc1LT1UozyzJUPDJL0eSy88tKC1J5WFgTUvMKU7lhdLcDPJuriHOHrpgy-ILijJzE4sq40GWxoMtNSasAgDoCDfO
ContentType	Journal Article
Copyright	http://creativecommons.org/licenses/by-nc-sa/4.0
Copyright_xml	– notice: http://creativecommons.org/licenses/by-nc-sa/4.0
DBID	AKY GOX
DOI	10.48550/arxiv.2409.19015
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2409_19015
GroupedDBID	AKY GOX
ID	FETCH-arxiv_primary_2409_190153
IEDL.DBID	GOX
IngestDate	Thu Oct 03 12:25:33 EDT 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-arxiv_primary_2409_190153
OpenAccessLink	https://arxiv.org/abs/2409.19015
ParticipantIDs	arxiv_primary_2409_19015
PublicationCentury	2000
PublicationDate	2024-09-24
PublicationDateYYYYMMDD	2024-09-24
PublicationDate_xml	– month: 09 year: 2024 text: 2024-09-24 day: 24
PublicationDecade	2020
PublicationYear	2024
Score	3.8771641
SecondaryResourceType	preprint
Snippet	This work addresses the persistent challenges of substantial training time and GPU resource requirements even when training lightweight encoder-vocoder models...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning Computer Science - Sound
Title	Textless NLP -- Zero Resource Challenge with Low Resource Compute
URI	https://arxiv.org/abs/2409.19015
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQAaYJ8yQLU0tdSwsDc1AHxVTX0iwpWdcsOTnV0iI1Nc0iBTSg7-tn5hFq4hVhGsHEoADbC5NYVJFZBjkfOKlYH1jdWOqBqyxmBmYjI9CSLXf_CMjkJPgoLqh6hDpgGxMshFRJuAky8ENbdwqOkOgQYmBKzRNhcAwBFoA5wBJFwc8nQEFXVyEqtShfATZuruAMu85EATQmquCTX44kB7lzQZRB3s01xNlDF2xpfAHkhIh4kHviwe4xFmNgAfbjUyUYFBINjFISgfV_UpqlqYlZkoGlkaGxYZIh6MQ3o1TLlERJBglcpkjhlpJm4DIC1rO64IkSGQaWkqLSVFlgPVmSJAcOLACYT2sL
link.rule.ids	228,230,783,888
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Textless+NLP+--+Zero+Resource+Challenge+with+Low+Resource+Compute&rft.au=Ramadass%2C+Krithiga&rft.au=Singh%2C+Abrit+Pal&rft.au=J%2C+Srihari&rft.au=Kalyani%2C+Sheetal&rft.date=2024-09-24&rft_id=info:doi/10.48550%2Farxiv.2409.19015&rft.externalDocID=2409_19015