Investigation of Transfer Learning for End-to-End Russian Speech Recognition

End-to-end speech recognition systems reduce the speech decoding time and required amount of memory comparing to standard systems. However they need much more data for training, which complicates creation of such systems for low-resourced languages. One way to improve performance of end-to-end low-r...

Full description

Saved in:

Bibliographic Details
Published in	Speech and Computer Vol. 13721; pp. 349 - 357
Main Author	Kipyatkova, Irina
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2022 Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Encoder-decoder End-to-end speech recognition Russian speech Transfer learning
Online Access	Get full text
ISBN	3031209796 9783031209796
ISSN	0302-9743 1611-3349
DOI	10.1007/978-3-031-20980-2_30

Cover

Abstract	End-to-end speech recognition systems reduce the speech decoding time and required amount of memory comparing to standard systems. However they need much more data for training, which complicates creation of such systems for low-resourced languages. One way to improve performance of end-to-end low-resourced speech recognition system is model’s pre-training by transfer learning, that is training the model on the non-target data and then transferring the trained parameters to the target model. The aim of the current research was to investigate application of transfer learning to the training of the end-to-end Russian speech recognition system in low-resourced conditions. We used several speech corpora of different languages for pre-training. Then end-to-end model was fine-tuned on a small Russian speech corpus of 60 h. We conducted experiments on application of transfer learning in different parts of the model (feature extraction block, encoder, and attention mechanism) as well as on freezing of the lower layers. We have achieved 24.53% relative word error rate reduction comparing to the baseline system trained without transfer learning.
AbstractList	End-to-end speech recognition systems reduce the speech decoding time and required amount of memory comparing to standard systems. However they need much more data for training, which complicates creation of such systems for low-resourced languages. One way to improve performance of end-to-end low-resourced speech recognition system is model’s pre-training by transfer learning, that is training the model on the non-target data and then transferring the trained parameters to the target model. The aim of the current research was to investigate application of transfer learning to the training of the end-to-end Russian speech recognition system in low-resourced conditions. We used several speech corpora of different languages for pre-training. Then end-to-end model was fine-tuned on a small Russian speech corpus of 60 h. We conducted experiments on application of transfer learning in different parts of the model (feature extraction block, encoder, and attention mechanism) as well as on freezing of the lower layers. We have achieved 24.53% relative word error rate reduction comparing to the baseline system trained without transfer learning.
Author	Kipyatkova, Irina
Author_xml	– sequence: 1 givenname: Irina orcidid: 0000-0002-1264-4458 surname: Kipyatkova fullname: Kipyatkova, Irina email: kipyatkova@iias.spb.su
BookMark	eNo9kF1OwzAQhA0URAu9AQ-5gGHtTez4ESH-pAqkUiTeLCdx2gCyg51ynp6Fk-EA4mmkWc1o9puRifPOEnLG4JwByAslS4oUkFEOqgTKNcIemWFyfoyXfTJlgjGKmKuD_4NUYkKmgMCpkjkekRnDAqQCCfkxmcf4CgC8RACmpuTh3n3aOHRrM3TeZb792q2CcbG1IVtYE1zn1lnrw9fu2jV08DRJttzG2BmXPfXW1ptsaWu_dt1YcEoOW_Me7fxPT8jzzfXq6o4uHm_vry4XtGcSB6qMsZLXpmiAcVYIiY1qjBRFVbKKlZA3TAjeKGXyUuZlJUuuuLBVI60QrSjwhPDf3tiHtNAGXXn_FjUDPbLTiZ1GnYDoH1R6ZJdC-W-oD_5jm77WdkzV1g3BvNcb0w82RC0TLxRCo2Q6KX4DHyhv_w
ContentType	Book Chapter
Copyright	Springer Nature Switzerland AG 2022
Copyright_xml	– notice: Springer Nature Switzerland AG 2022
DBID	FFUUA
DEWEY	006.35
DOI	10.1007/978-3-031-20980-2_30
DatabaseName	ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	303120980X 9783031209802
EISSN	1611-3349
Editor	Agrawal, Shyam S Karpov, Alexey Samudravijaya, K Prasanna, S. R. Mahadeva
Editor_xml	– sequence: 1 fullname: Agrawal, Shyam S – sequence: 2 fullname: Karpov, Alexey – sequence: 3 fullname: Prasanna, S. R. Mahadeva – sequence: 4 fullname: Samudravijaya, K
EndPage	357
ExternalDocumentID	EBC7135366_371_366
GroupedDBID	38. AABBV AAZWU ABSVR ABTHU ABVND ACBPT ACHZO ACPMC ADNVS AEDXK AEJLV AEKFX AHVRR AIYYB ALMA_UNASSIGNED_HOLDINGS BBABE CZZ FFUUA IEZ SBO TPJZQ TSXQS Z5O Z7R Z7S Z7U Z7V Z7W Z7X Z7Y Z7Z Z81 Z82 Z83 Z84 Z85 Z87 Z88
ID	FETCH-LOGICAL-p173t-9aae72ca5d01215673d9da765b81b1804d1662d99a48748b782926ebd7e66f653
ISBN	3031209796 9783031209796
ISSN	0302-9743
IngestDate	Tue Jul 29 20:20:25 EDT 2025 Mon Apr 28 21:44:15 EDT 2025
IsPeerReviewed	true
IsScholarly	true
LCCallNum	Q334-342
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-p173t-9aae72ca5d01215673d9da765b81b1804d1662d99a48748b782926ebd7e66f653
OCLC	1350790704
ORCID	0000-0002-1264-4458
PQID	EBC7135366_371_366
PageCount	9
ParticipantIDs	springer_books_10_1007_978_3_031_20980_2_30 proquest_ebookcentralchapters_7135366_371_366
PublicationCentury	2000
PublicationDate	2022 20221110
PublicationDateYYYYMMDD	2022-01-01 2022-11-10
PublicationDate_xml	– year: 2022 text: 2022
PublicationDecade	2020
PublicationPlace	Switzerland
PublicationPlace_xml	– name: Switzerland – name: Cham
PublicationSeriesSubtitle	Lecture Notes in Artificial Intelligence
PublicationSeriesTitle	Lecture Notes in Computer Science
PublicationSeriesTitleAlternate	Lect.Notes Computer
PublicationSubtitle	24th International Conference, SPECOM 2022, Gurugram, India, November 14-16, 2022, Proceedings
PublicationTitle	Speech and Computer
PublicationYear	2022
Publisher	Springer International Publishing AG Springer International Publishing
Publisher_xml	– name: Springer International Publishing AG – name: Springer International Publishing
RelatedPersons	Hartmanis, Juris Gao, Wen Steffen, Bernhard Bertino, Elisa Goos, Gerhard Yung, Moti
RelatedPersons_xml	– sequence: 1 givenname: Gerhard surname: Goos fullname: Goos, Gerhard – sequence: 2 givenname: Juris surname: Hartmanis fullname: Hartmanis, Juris – sequence: 3 givenname: Elisa surname: Bertino fullname: Bertino, Elisa – sequence: 4 givenname: Wen surname: Gao fullname: Gao, Wen – sequence: 5 givenname: Bernhard orcidid: 0000-0001-9619-1558 surname: Steffen fullname: Steffen, Bernhard – sequence: 6 givenname: Moti orcidid: 0000-0003-0848-0873 surname: Yung fullname: Yung, Moti
SSID	ssj0002830019 ssj0002792
Score	2.047132
Snippet	End-to-end speech recognition systems reduce the speech decoding time and required amount of memory comparing to standard systems. However they need much more...
SourceID	springer proquest
SourceType	Publisher
StartPage	349
SubjectTerms	Encoder-decoder End-to-end speech recognition Russian speech Transfer learning
Title	Investigation of Transfer Learning for End-to-End Russian Speech Recognition
URI	http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=7135366&ppg=366 http://link.springer.com/10.1007/978-3-031-20980-2_30
Volume	13721
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV29T-swELegLIiBjwd6fMoDGzJK4sSOBwaEigAVBgRPbFYSO8uTmqoNSPDXc_5K2sICS9pEUezcz7Hvzve7Q-iUljqPdV2bL02QVOeCiBy-qwpWv7wQDFR6QxS-f2A3z-ndS_bS1--07JK2PK8-vuWV_AZVuAa4GpbsD5DtHgoX4D_gC0dAGI5Lyu-im9VxOCZaV46XFkoz9Bvqk_ei_d-8Wc3wdupLZPdh611uDact2gWr1tOQbdXFVg7HirQNGRoG4-vMki19m48h6shjal5Wzy5Gfj_ioWltmFfXrzCDzLsYwDq1YW5fXIxLTsreT7Zgk8KaaPi43BWqDdwsmHfBcnFTmXZTLTMJFKlLWOqnz3DmVmLqUld_meTn4zqgMRgSIo9IImm0ilZ5ng7Q2uXwbvSvW5FNksTO8WYynkW24EvXK8P56XrtsjL153N8y--aXLBMljbTrY7ytIU2DG8FG0IJyG8brejxDtoMEGAPwR80WoAfNzUO8OMAPwb4cQ8_9vBjBz-eg38XPV8Pn65uiK-pQSYxpy0RRaF5UhWZMsn8MsapEqrgLCvBfonzKFUxY4kSogBLNs1LUCBFwnSpuGasZhndQ4NxM9Z_EVaipiKjVRKpOgW1U6gInl2IhKdVDYriPiJBMtLu_Ptw48rJYSZNdUjKmKQ8lvC7j86C-KS5fSZDSm2Qu6QS5C6t3KWR-8GP7j5E6_3APkKDdvqqj0GbbMsTP1g-AUesaoM
linkProvider	Library Specific Holdings
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Speech+and+Computer&rft.au=Kipyatkova%2C+Irina&rft.atitle=Investigation+of+Transfer+Learning+for+End-to-End+Russian+Speech+Recognition&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2022-11-10&rft.pub=Springer+International+Publishing&rft.isbn=9783031209796&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=349&rft.epage=357&rft_id=info:doi/10.1007%2F978-3-031-20980-2_30
thumbnail_s	http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F7135366-l.jpg