Investigation of Transfer Learning for End-to-End Russian Speech Recognition
End-to-end speech recognition systems reduce the speech decoding time and required amount of memory comparing to standard systems. However they need much more data for training, which complicates creation of such systems for low-resourced languages. One way to improve performance of end-to-end low-r...
Saved in:
Published in | Speech and Computer Vol. 13721; pp. 349 - 357 |
---|---|
Main Author | |
Format | Book Chapter |
Language | English |
Published |
Switzerland
Springer International Publishing AG
2022
Springer International Publishing |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
ISBN | 3031209796 9783031209796 |
ISSN | 0302-9743 1611-3349 |
DOI | 10.1007/978-3-031-20980-2_30 |
Cover
Abstract | End-to-end speech recognition systems reduce the speech decoding time and required amount of memory comparing to standard systems. However they need much more data for training, which complicates creation of such systems for low-resourced languages. One way to improve performance of end-to-end low-resourced speech recognition system is model’s pre-training by transfer learning, that is training the model on the non-target data and then transferring the trained parameters to the target model. The aim of the current research was to investigate application of transfer learning to the training of the end-to-end Russian speech recognition system in low-resourced conditions. We used several speech corpora of different languages for pre-training. Then end-to-end model was fine-tuned on a small Russian speech corpus of 60 h. We conducted experiments on application of transfer learning in different parts of the model (feature extraction block, encoder, and attention mechanism) as well as on freezing of the lower layers. We have achieved 24.53% relative word error rate reduction comparing to the baseline system trained without transfer learning. |
---|---|
AbstractList | End-to-end speech recognition systems reduce the speech decoding time and required amount of memory comparing to standard systems. However they need much more data for training, which complicates creation of such systems for low-resourced languages. One way to improve performance of end-to-end low-resourced speech recognition system is model’s pre-training by transfer learning, that is training the model on the non-target data and then transferring the trained parameters to the target model. The aim of the current research was to investigate application of transfer learning to the training of the end-to-end Russian speech recognition system in low-resourced conditions. We used several speech corpora of different languages for pre-training. Then end-to-end model was fine-tuned on a small Russian speech corpus of 60 h. We conducted experiments on application of transfer learning in different parts of the model (feature extraction block, encoder, and attention mechanism) as well as on freezing of the lower layers. We have achieved 24.53% relative word error rate reduction comparing to the baseline system trained without transfer learning. |
Author | Kipyatkova, Irina |
Author_xml | – sequence: 1 givenname: Irina orcidid: 0000-0002-1264-4458 surname: Kipyatkova fullname: Kipyatkova, Irina email: kipyatkova@iias.spb.su |
BookMark | eNo9kF1OwzAQhA0URAu9AQ-5gGHtTez4ESH-pAqkUiTeLCdx2gCyg51ynp6Fk-EA4mmkWc1o9puRifPOEnLG4JwByAslS4oUkFEOqgTKNcIemWFyfoyXfTJlgjGKmKuD_4NUYkKmgMCpkjkekRnDAqQCCfkxmcf4CgC8RACmpuTh3n3aOHRrM3TeZb792q2CcbG1IVtYE1zn1lnrw9fu2jV08DRJttzG2BmXPfXW1ptsaWu_dt1YcEoOW_Me7fxPT8jzzfXq6o4uHm_vry4XtGcSB6qMsZLXpmiAcVYIiY1qjBRFVbKKlZA3TAjeKGXyUuZlJUuuuLBVI60QrSjwhPDf3tiHtNAGXXn_FjUDPbLTiZ1GnYDoH1R6ZJdC-W-oD_5jm77WdkzV1g3BvNcb0w82RC0TLxRCo2Q6KX4DHyhv_w |
ContentType | Book Chapter |
Copyright | Springer Nature Switzerland AG 2022 |
Copyright_xml | – notice: Springer Nature Switzerland AG 2022 |
DBID | FFUUA |
DEWEY | 006.35 |
DOI | 10.1007/978-3-031-20980-2_30 |
DatabaseName | ProQuest Ebook Central - Book Chapters - Demo use only |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 303120980X 9783031209802 |
EISSN | 1611-3349 |
Editor | Agrawal, Shyam S Karpov, Alexey Samudravijaya, K Prasanna, S. R. Mahadeva |
Editor_xml | – sequence: 1 fullname: Agrawal, Shyam S – sequence: 2 fullname: Karpov, Alexey – sequence: 3 fullname: Prasanna, S. R. Mahadeva – sequence: 4 fullname: Samudravijaya, K |
EndPage | 357 |
ExternalDocumentID | EBC7135366_371_366 |
GroupedDBID | 38. AABBV AAZWU ABSVR ABTHU ABVND ACBPT ACHZO ACPMC ADNVS AEDXK AEJLV AEKFX AHVRR AIYYB ALMA_UNASSIGNED_HOLDINGS BBABE CZZ FFUUA IEZ SBO TPJZQ TSXQS Z5O Z7R Z7S Z7U Z7V Z7W Z7X Z7Y Z7Z Z81 Z82 Z83 Z84 Z85 Z87 Z88 |
ID | FETCH-LOGICAL-p173t-9aae72ca5d01215673d9da765b81b1804d1662d99a48748b782926ebd7e66f653 |
ISBN | 3031209796 9783031209796 |
ISSN | 0302-9743 |
IngestDate | Tue Jul 29 20:20:25 EDT 2025 Mon Apr 28 21:44:15 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
LCCallNum | Q334-342 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-p173t-9aae72ca5d01215673d9da765b81b1804d1662d99a48748b782926ebd7e66f653 |
OCLC | 1350790704 |
ORCID | 0000-0002-1264-4458 |
PQID | EBC7135366_371_366 |
PageCount | 9 |
ParticipantIDs | springer_books_10_1007_978_3_031_20980_2_30 proquest_ebookcentralchapters_7135366_371_366 |
PublicationCentury | 2000 |
PublicationDate | 2022 20221110 |
PublicationDateYYYYMMDD | 2022-01-01 2022-11-10 |
PublicationDate_xml | – year: 2022 text: 2022 |
PublicationDecade | 2020 |
PublicationPlace | Switzerland |
PublicationPlace_xml | – name: Switzerland – name: Cham |
PublicationSeriesSubtitle | Lecture Notes in Artificial Intelligence |
PublicationSeriesTitle | Lecture Notes in Computer Science |
PublicationSeriesTitleAlternate | Lect.Notes Computer |
PublicationSubtitle | 24th International Conference, SPECOM 2022, Gurugram, India, November 14-16, 2022, Proceedings |
PublicationTitle | Speech and Computer |
PublicationYear | 2022 |
Publisher | Springer International Publishing AG Springer International Publishing |
Publisher_xml | – name: Springer International Publishing AG – name: Springer International Publishing |
RelatedPersons | Hartmanis, Juris Gao, Wen Steffen, Bernhard Bertino, Elisa Goos, Gerhard Yung, Moti |
RelatedPersons_xml | – sequence: 1 givenname: Gerhard surname: Goos fullname: Goos, Gerhard – sequence: 2 givenname: Juris surname: Hartmanis fullname: Hartmanis, Juris – sequence: 3 givenname: Elisa surname: Bertino fullname: Bertino, Elisa – sequence: 4 givenname: Wen surname: Gao fullname: Gao, Wen – sequence: 5 givenname: Bernhard orcidid: 0000-0001-9619-1558 surname: Steffen fullname: Steffen, Bernhard – sequence: 6 givenname: Moti orcidid: 0000-0003-0848-0873 surname: Yung fullname: Yung, Moti |
SSID | ssj0002830019 ssj0002792 |
Score | 2.047132 |
Snippet | End-to-end speech recognition systems reduce the speech decoding time and required amount of memory comparing to standard systems. However they need much more... |
SourceID | springer proquest |
SourceType | Publisher |
StartPage | 349 |
SubjectTerms | Encoder-decoder End-to-end speech recognition Russian speech Transfer learning |
Title | Investigation of Transfer Learning for End-to-End Russian Speech Recognition |
URI | http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=7135366&ppg=366 http://link.springer.com/10.1007/978-3-031-20980-2_30 |
Volume | 13721 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV29T-swELegLIiBjwd6fMoDGzJK4sSOBwaEigAVBgRPbFYSO8uTmqoNSPDXc_5K2sICS9pEUezcz7Hvzve7Q-iUljqPdV2bL02QVOeCiBy-qwpWv7wQDFR6QxS-f2A3z-ndS_bS1--07JK2PK8-vuWV_AZVuAa4GpbsD5DtHgoX4D_gC0dAGI5Lyu-im9VxOCZaV46XFkoz9Bvqk_ei_d-8Wc3wdupLZPdh611uDact2gWr1tOQbdXFVg7HirQNGRoG4-vMki19m48h6shjal5Wzy5Gfj_ioWltmFfXrzCDzLsYwDq1YW5fXIxLTsreT7Zgk8KaaPi43BWqDdwsmHfBcnFTmXZTLTMJFKlLWOqnz3DmVmLqUld_meTn4zqgMRgSIo9IImm0ilZ5ng7Q2uXwbvSvW5FNksTO8WYynkW24EvXK8P56XrtsjL153N8y--aXLBMljbTrY7ytIU2DG8FG0IJyG8brejxDtoMEGAPwR80WoAfNzUO8OMAPwb4cQ8_9vBjBz-eg38XPV8Pn65uiK-pQSYxpy0RRaF5UhWZMsn8MsapEqrgLCvBfonzKFUxY4kSogBLNs1LUCBFwnSpuGasZhndQ4NxM9Z_EVaipiKjVRKpOgW1U6gInl2IhKdVDYriPiJBMtLu_Ptw48rJYSZNdUjKmKQ8lvC7j86C-KS5fSZDSm2Qu6QS5C6t3KWR-8GP7j5E6_3APkKDdvqqj0GbbMsTP1g-AUesaoM |
linkProvider | Library Specific Holdings |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Speech+and+Computer&rft.au=Kipyatkova%2C+Irina&rft.atitle=Investigation+of+Transfer+Learning+for+End-to-End+Russian+Speech+Recognition&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2022-11-10&rft.pub=Springer+International+Publishing&rft.isbn=9783031209796&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=349&rft.epage=357&rft_id=info:doi/10.1007%2F978-3-031-20980-2_30 |
thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F7135366-l.jpg |