Investigation of Transfer Learning for End-to-End Russian Speech Recognition

End-to-end speech recognition systems reduce the speech decoding time and required amount of memory comparing to standard systems. However they need much more data for training, which complicates creation of such systems for low-resourced languages. One way to improve performance of end-to-end low-r...

Full description

Saved in:
Bibliographic Details
Published inSpeech and Computer Vol. 13721; pp. 349 - 357
Main Author Kipyatkova, Irina
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2022
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN3031209796
9783031209796
ISSN0302-9743
1611-3349
DOI10.1007/978-3-031-20980-2_30

Cover

Abstract End-to-end speech recognition systems reduce the speech decoding time and required amount of memory comparing to standard systems. However they need much more data for training, which complicates creation of such systems for low-resourced languages. One way to improve performance of end-to-end low-resourced speech recognition system is model’s pre-training by transfer learning, that is training the model on the non-target data and then transferring the trained parameters to the target model. The aim of the current research was to investigate application of transfer learning to the training of the end-to-end Russian speech recognition system in low-resourced conditions. We used several speech corpora of different languages for pre-training. Then end-to-end model was fine-tuned on a small Russian speech corpus of 60 h. We conducted experiments on application of transfer learning in different parts of the model (feature extraction block, encoder, and attention mechanism) as well as on freezing of the lower layers. We have achieved 24.53% relative word error rate reduction comparing to the baseline system trained without transfer learning.
AbstractList End-to-end speech recognition systems reduce the speech decoding time and required amount of memory comparing to standard systems. However they need much more data for training, which complicates creation of such systems for low-resourced languages. One way to improve performance of end-to-end low-resourced speech recognition system is model’s pre-training by transfer learning, that is training the model on the non-target data and then transferring the trained parameters to the target model. The aim of the current research was to investigate application of transfer learning to the training of the end-to-end Russian speech recognition system in low-resourced conditions. We used several speech corpora of different languages for pre-training. Then end-to-end model was fine-tuned on a small Russian speech corpus of 60 h. We conducted experiments on application of transfer learning in different parts of the model (feature extraction block, encoder, and attention mechanism) as well as on freezing of the lower layers. We have achieved 24.53% relative word error rate reduction comparing to the baseline system trained without transfer learning.
Author Kipyatkova, Irina
Author_xml – sequence: 1
  givenname: Irina
  orcidid: 0000-0002-1264-4458
  surname: Kipyatkova
  fullname: Kipyatkova, Irina
  email: kipyatkova@iias.spb.su
BookMark eNo9kF1OwzAQhA0URAu9AQ-5gGHtTez4ESH-pAqkUiTeLCdx2gCyg51ynp6Fk-EA4mmkWc1o9puRifPOEnLG4JwByAslS4oUkFEOqgTKNcIemWFyfoyXfTJlgjGKmKuD_4NUYkKmgMCpkjkekRnDAqQCCfkxmcf4CgC8RACmpuTh3n3aOHRrM3TeZb792q2CcbG1IVtYE1zn1lnrw9fu2jV08DRJttzG2BmXPfXW1ptsaWu_dt1YcEoOW_Me7fxPT8jzzfXq6o4uHm_vry4XtGcSB6qMsZLXpmiAcVYIiY1qjBRFVbKKlZA3TAjeKGXyUuZlJUuuuLBVI60QrSjwhPDf3tiHtNAGXXn_FjUDPbLTiZ1GnYDoH1R6ZJdC-W-oD_5jm77WdkzV1g3BvNcb0w82RC0TLxRCo2Q6KX4DHyhv_w
ContentType Book Chapter
Copyright Springer Nature Switzerland AG 2022
Copyright_xml – notice: Springer Nature Switzerland AG 2022
DBID FFUUA
DEWEY 006.35
DOI 10.1007/978-3-031-20980-2_30
DatabaseName ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 303120980X
9783031209802
EISSN 1611-3349
Editor Agrawal, Shyam S
Karpov, Alexey
Samudravijaya, K
Prasanna, S. R. Mahadeva
Editor_xml – sequence: 1
  fullname: Agrawal, Shyam S
– sequence: 2
  fullname: Karpov, Alexey
– sequence: 3
  fullname: Prasanna, S. R. Mahadeva
– sequence: 4
  fullname: Samudravijaya, K
EndPage 357
ExternalDocumentID EBC7135366_371_366
GroupedDBID 38.
AABBV
AAZWU
ABSVR
ABTHU
ABVND
ACBPT
ACHZO
ACPMC
ADNVS
AEDXK
AEJLV
AEKFX
AHVRR
AIYYB
ALMA_UNASSIGNED_HOLDINGS
BBABE
CZZ
FFUUA
IEZ
SBO
TPJZQ
TSXQS
Z5O
Z7R
Z7S
Z7U
Z7V
Z7W
Z7X
Z7Y
Z7Z
Z81
Z82
Z83
Z84
Z85
Z87
Z88
ID FETCH-LOGICAL-p173t-9aae72ca5d01215673d9da765b81b1804d1662d99a48748b782926ebd7e66f653
ISBN 3031209796
9783031209796
ISSN 0302-9743
IngestDate Tue Jul 29 20:20:25 EDT 2025
Mon Apr 28 21:44:15 EDT 2025
IsPeerReviewed true
IsScholarly true
LCCallNum Q334-342
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p173t-9aae72ca5d01215673d9da765b81b1804d1662d99a48748b782926ebd7e66f653
OCLC 1350790704
ORCID 0000-0002-1264-4458
PQID EBC7135366_371_366
PageCount 9
ParticipantIDs springer_books_10_1007_978_3_031_20980_2_30
proquest_ebookcentralchapters_7135366_371_366
PublicationCentury 2000
PublicationDate 2022
20221110
PublicationDateYYYYMMDD 2022-01-01
2022-11-10
PublicationDate_xml – year: 2022
  text: 2022
PublicationDecade 2020
PublicationPlace Switzerland
PublicationPlace_xml – name: Switzerland
– name: Cham
PublicationSeriesSubtitle Lecture Notes in Artificial Intelligence
PublicationSeriesTitle Lecture Notes in Computer Science
PublicationSeriesTitleAlternate Lect.Notes Computer
PublicationSubtitle 24th International Conference, SPECOM 2022, Gurugram, India, November 14-16, 2022, Proceedings
PublicationTitle Speech and Computer
PublicationYear 2022
Publisher Springer International Publishing AG
Springer International Publishing
Publisher_xml – name: Springer International Publishing AG
– name: Springer International Publishing
RelatedPersons Hartmanis, Juris
Gao, Wen
Steffen, Bernhard
Bertino, Elisa
Goos, Gerhard
Yung, Moti
RelatedPersons_xml – sequence: 1
  givenname: Gerhard
  surname: Goos
  fullname: Goos, Gerhard
– sequence: 2
  givenname: Juris
  surname: Hartmanis
  fullname: Hartmanis, Juris
– sequence: 3
  givenname: Elisa
  surname: Bertino
  fullname: Bertino, Elisa
– sequence: 4
  givenname: Wen
  surname: Gao
  fullname: Gao, Wen
– sequence: 5
  givenname: Bernhard
  orcidid: 0000-0001-9619-1558
  surname: Steffen
  fullname: Steffen, Bernhard
– sequence: 6
  givenname: Moti
  orcidid: 0000-0003-0848-0873
  surname: Yung
  fullname: Yung, Moti
SSID ssj0002830019
ssj0002792
Score 2.047132
Snippet End-to-end speech recognition systems reduce the speech decoding time and required amount of memory comparing to standard systems. However they need much more...
SourceID springer
proquest
SourceType Publisher
StartPage 349
SubjectTerms Encoder-decoder
End-to-end speech recognition
Russian speech
Transfer learning
Title Investigation of Transfer Learning for End-to-End Russian Speech Recognition
URI http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=7135366&ppg=366
http://link.springer.com/10.1007/978-3-031-20980-2_30
Volume 13721
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV29T-swELegLIiBjwd6fMoDGzJK4sSOBwaEigAVBgRPbFYSO8uTmqoNSPDXc_5K2sICS9pEUezcz7Hvzve7Q-iUljqPdV2bL02QVOeCiBy-qwpWv7wQDFR6QxS-f2A3z-ndS_bS1--07JK2PK8-vuWV_AZVuAa4GpbsD5DtHgoX4D_gC0dAGI5Lyu-im9VxOCZaV46XFkoz9Bvqk_ei_d-8Wc3wdupLZPdh611uDact2gWr1tOQbdXFVg7HirQNGRoG4-vMki19m48h6shjal5Wzy5Gfj_ioWltmFfXrzCDzLsYwDq1YW5fXIxLTsreT7Zgk8KaaPi43BWqDdwsmHfBcnFTmXZTLTMJFKlLWOqnz3DmVmLqUld_meTn4zqgMRgSIo9IImm0ilZ5ng7Q2uXwbvSvW5FNksTO8WYynkW24EvXK8P56XrtsjL153N8y--aXLBMljbTrY7ytIU2DG8FG0IJyG8brejxDtoMEGAPwR80WoAfNzUO8OMAPwb4cQ8_9vBjBz-eg38XPV8Pn65uiK-pQSYxpy0RRaF5UhWZMsn8MsapEqrgLCvBfonzKFUxY4kSogBLNs1LUCBFwnSpuGasZhndQ4NxM9Z_EVaipiKjVRKpOgW1U6gInl2IhKdVDYriPiJBMtLu_Ptw48rJYSZNdUjKmKQ8lvC7j86C-KS5fSZDSm2Qu6QS5C6t3KWR-8GP7j5E6_3APkKDdvqqj0GbbMsTP1g-AUesaoM
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Speech+and+Computer&rft.au=Kipyatkova%2C+Irina&rft.atitle=Investigation+of+Transfer+Learning+for+End-to-End+Russian+Speech+Recognition&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2022-11-10&rft.pub=Springer+International+Publishing&rft.isbn=9783031209796&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=349&rft.epage=357&rft_id=info:doi/10.1007%2F978-3-031-20980-2_30
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F7135366-l.jpg