Recent Advances in End-to-End Automatic Speech Recognition

Recently, the speech community is seeing a significant trend of moving from deep neural network based hybrid modeling to end-to-end (E2E) modeling for automatic speech recognition (ASR). While E2E models achieve the state-of-the-art results in most benchmarks in terms of ASR accuracy, hybrid models...

Full description

Saved in:
Bibliographic Details
Published inAPSIPA transactions on signal and information processing Vol. 11; no. 1
Main Author Li, Jinyu
Format Journal Article
LanguageEnglish
Published Boston — Delft Now Publishers 01.01.2022
Subjects
Online AccessGet full text
ISSN2048-7703
2048-7703
DOI10.1561/116.00000050

Cover

Loading…
Abstract Recently, the speech community is seeing a significant trend of moving from deep neural network based hybrid modeling to end-to-end (E2E) modeling for automatic speech recognition (ASR). While E2E models achieve the state-of-the-art results in most benchmarks in terms of ASR accuracy, hybrid models are still used in a large proportion of commercial ASR systems at the current time. There are lots of practical factors that affect the production model deployment decision. Traditional hybrid models, being optimized for production for decades, are usually good at these factors. Without providing excellent solutions to all these factors, it is hard for E2E models to be widely commercialized. In this paper, we will overview the recent advances in E2E models, focusing on technologies addressing those challenges from the industry's perspective.
AbstractList Recently, the speech community is seeing a significant trend of moving from deep neural network based hybrid modeling to end-to-end (E2E) modeling for automatic speech recognition (ASR). While E2E models achieve the state-of-the-art results in most benchmarks in terms of ASR accuracy, hybrid models are still used in a large proportion of commercial ASR systems at the current time. There are lots of practical factors that affect the production model deployment decision. Traditional hybrid models, being optimized for production for decades, are usually good at these factors. Without providing excellent solutions to all these factors, it is hard for E2E models to be widely commercialized. In this paper, we will overview the recent advances in E2E models, focusing on technologies addressing those challenges from the industry's perspective.
Author Li, Jinyu
Author_xml – sequence: 1
  givenname: Jinyu
  surname: Li
  fullname: Li, Jinyu
  email: jinyli@microsoft.com
  organization: Microsoft, USA
BookMark eNp1kE1LAzEQhoNUsNbe_AF79ODWTDab3fVWStVCQfDjHLLZpKZsk5JN_fj3pq2KiJ3LOwzPvMy8p6hnnVUInQMeQc7gCoCN8K5yfIT6BNMyLQqc9X71J2jYdcuIAJC8YrSPrh-UVDYk4-ZVWKm6xNhkaps0uDRKMt4EtxLByORxrZR8SSLuFtYE4-wZOtai7dTwSwfo-Wb6NLlL5_e3s8l4nkoKOKSFKkHrSmMFoCUTAIJiiUWFiawAZyTOqWY0LwBTTZnUedM0osBM05LVWTZAs71v48SSr71ZCf_BnTB8N3B-wYWPJ7aKU5KRqipB5XVNhcRlVioBqqwaVdSE0eh1ufeS3nWdV_rHDzDfxshjjPw7xoiTP7g0QWyfD16Y9tDSxX7Juje-dBtvYzqH0PQ_9BfC143metO2Qb2H7BP7V47w
CitedBy_id crossref_primary_10_1109_JSEN_2023_3336722
crossref_primary_10_1109_TASLP_2024_3426924
crossref_primary_10_1007_s40799_023_00697_3
crossref_primary_10_1080_01691864_2024_2388145
crossref_primary_10_1109_MSP_2024_3486469
crossref_primary_10_3390_app13106192
crossref_primary_10_1016_j_mlwa_2023_100489
crossref_primary_10_1002_for_3147
crossref_primary_10_1007_s00521_024_09435_1
crossref_primary_10_1109_TASLP_2023_3328283
crossref_primary_10_1109_TASLPRO_2025_3530324
crossref_primary_10_1016_j_ijhcs_2025_103460
crossref_primary_10_3390_fi16050150
crossref_primary_10_3390_app13010326
crossref_primary_10_3390_app14188532
crossref_primary_10_3390_s23084025
crossref_primary_10_1016_j_specom_2024_103109
crossref_primary_10_1109_TASLP_2023_3301230
crossref_primary_10_1109_TASLP_2024_3434425
crossref_primary_10_1007_s10489_024_06119_0
crossref_primary_10_1109_TASLP_2023_3250842
crossref_primary_10_3390_math11122665
crossref_primary_10_1016_j_specom_2025_103188
crossref_primary_10_1109_TASLP_2024_3519879
crossref_primary_10_1007_s10462_023_10668_0
crossref_primary_10_1049_ell2_12823
crossref_primary_10_12677_mos_2024_134402
crossref_primary_10_1007_s11071_025_10871_4
crossref_primary_10_1109_TMC_2023_3309633
crossref_primary_10_3390_rs15194844
crossref_primary_10_1155_2022_6825555
crossref_primary_10_1016_j_future_2025_107816
crossref_primary_10_1109_ACCESS_2024_3496617
crossref_primary_10_3390_electronics14010128
crossref_primary_10_3390_s22166304
crossref_primary_10_1016_j_measen_2024_101095
crossref_primary_10_3390_electronics13020307
crossref_primary_10_1109_TASLP_2024_3419421
crossref_primary_10_1093_jamia_ocac241
crossref_primary_10_1186_s13636_024_00349_3
crossref_primary_10_1016_j_eswa_2022_119220
crossref_primary_10_3389_fauot_2023_1226946
crossref_primary_10_3390_s24144715
crossref_primary_10_1145_3636513
crossref_primary_10_1007_s11063_024_11614_z
crossref_primary_10_3390_bdcc8120195
crossref_primary_10_3390_mti9010006
crossref_primary_10_1007_s11390_024_3872_3
crossref_primary_10_1109_TASLP_2024_3350893
crossref_primary_10_1016_j_ijcce_2024_12_007
crossref_primary_10_1142_S2717554523500248
crossref_primary_10_1016_j_ins_2024_121420
crossref_primary_10_1007_s12539_024_00609_y
crossref_primary_10_2196_40031
crossref_primary_10_1007_s11042_023_16554_9
crossref_primary_10_1109_TASLP_2022_3205753
crossref_primary_10_1007_s10462_024_10721_6
crossref_primary_10_32604_cmc_2024_058675
crossref_primary_10_3390_s25020341
crossref_primary_10_1007_s00034_023_02570_5
crossref_primary_10_32604_cmes_2023_030512
crossref_primary_10_47576_2949_1908_2024_1_1_012
crossref_primary_10_1109_TASLP_2024_3389630
crossref_primary_10_3390_s22197319
crossref_primary_10_1088_1742_6596_2858_1_012017
crossref_primary_10_1007_s12204_024_2738_8
crossref_primary_10_1016_j_asoc_2024_111422
crossref_primary_10_32604_cmc_2023_046746
crossref_primary_10_1109_TASLP_2023_3306709
crossref_primary_10_7717_peerj_cs_1650
crossref_primary_10_1162_coli_a_00526
crossref_primary_10_1049_cit2_12212
crossref_primary_10_1016_j_eswa_2024_124159
crossref_primary_10_1109_TASLP_2023_3263789
crossref_primary_10_3390_s22145381
crossref_primary_10_1007_s11571_024_10199_6
crossref_primary_10_3390_sym16121701
crossref_primary_10_1109_TBCAS_2024_3418085
crossref_primary_10_1109_LSP_2023_3347148
crossref_primary_10_1109_TASLP_2023_3336517
crossref_primary_10_1080_10447318_2023_2233128
crossref_primary_10_1007_s11554_025_01647_5
crossref_primary_10_1016_j_fsisyn_2024_100563
crossref_primary_10_1016_j_inffus_2023_101869
crossref_primary_10_1109_TASLP_2023_3345150
crossref_primary_10_1007_s11227_024_06351_y
crossref_primary_10_1109_TASLP_2023_3304476
crossref_primary_10_1007_s11063_022_10885_8
crossref_primary_10_3390_make5020030
crossref_primary_10_1007_s11042_024_18753_4
crossref_primary_10_1109_TASLP_2024_3444490
crossref_primary_10_1215_00031283_11466494
crossref_primary_10_1007_s10462_023_10513_4
crossref_primary_10_1016_j_apacoust_2024_109883
crossref_primary_10_3390_app13074100
ContentType Journal Article
Copyright 2022 J. Li
Copyright_xml – notice: 2022 J. Li
DBID NOJ
AAYXX
CITATION
DOA
DOI 10.1561/116.00000050
DatabaseName Now Publishers Journals
CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2048-7703
ExternalDocumentID oai_doaj_org_article_42329981e5bb4ac0838ea1e89de7b264
10_1561_116_00000050
SIP-2021-0050
GroupedDBID .FH
3V.
5VS
74X
74Y
7~V
8FE
8FG
AACJH
AAGFV
AAKTX
AARAB
ABBXD
ABCFP
ABGDZ
ABKKG
ABQTM
ABROB
ACBMC
ACGFS
ACIMK
ACUIJ
ACZBM
ACZUX
ADBBV
ADCGK
ADFEC
AEBAK
AEYYC
AFFUJ
AFKRA
AFLVW
AFUTZ
AGABE
AISIE
AJPFC
AJQAS
ALMA_UNASSIGNED_HOLDINGS
ALWZO
ARABE
ARAPS
ARCSS
AZQEC
BBLKV
BCNDV
BENPR
BGHMG
BLZWO
BMAJL
BPHCQ
C0O
GROUPED_DOAJ
HCIFZ
HG-
HZ~
I.6
IKXGN
IS6
I~P
J38
J3A
JHPGK
K6V
K7-
KQ8
M-V
M0N
M~E
NIKVX
NOJ
NOT
O9-
OK1
P62
PIMPY
PQQKQ
PROAC
PYCCK
RAMDC
RCA
RNS
S6-
S6U
T9M
UT1
WFFJZ
AABES
AABWE
AASVR
AAYXX
ABMWE
ABVKB
ACQPF
ADOVH
AEHGV
AENGE
AFKQG
AFLOS
AHQXX
AIGNW
AIHIV
AIOIP
AJCYY
AUXHV
CBIIA
CCQAD
CFAFE
CHEAL
CITATION
DOHLZ
IOEEP
JQKCU
KCGVB
KFECR
M48
SAAAG
ZYDXJ
ID FETCH-LOGICAL-c410t-7e81ff9f0e11fc6a11a40c0a902c91032e114f6457104f46cf5ddda706f486b33
IEDL.DBID DOA
ISSN 2048-7703
IngestDate Wed Aug 27 01:31:12 EDT 2025
Thu Apr 24 23:04:09 EDT 2025
Tue Jul 01 04:35:59 EDT 2025
Thu Jan 09 12:18:21 EST 2025
Thu Dec 05 17:33:15 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 1
Language English
License open-access: https://creativecommons.org/licenses/by-nc/4.0/: This is published under the terms of CC BY-NC.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c410t-7e81ff9f0e11fc6a11a40c0a902c91032e114f6457104f46cf5ddda706f486b33
Notes streaming
transducer
SIP-2021-0050
transformer
attention
adaptation
automatic speech recognition
Now Publishers
End-to-end
OpenAccessLink https://doaj.org/article/42329981e5bb4ac0838ea1e89de7b264
PageCount 64
ParticipantIDs now_journals_10_1561_116_00000050
now_journals_116_00000050_pdf_fulltext
doaj_primary_oai_doaj_org_article_42329981e5bb4ac0838ea1e89de7b264
crossref_primary_10_1561_116_00000050
crossref_citationtrail_10_1561_116_00000050
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2022-01-01
PublicationDateYYYYMMDD 2022-01-01
PublicationDate_xml – month: 01
  year: 2022
  text: 2022-01-01
  day: 01
PublicationDecade 2020
PublicationPlace Boston — Delft
PublicationPlace_xml – name: Boston — Delft
PublicationTitle APSIPA transactions on signal and information processing
PublicationTitleAlternate SIP
PublicationYear 2022
Publisher Now Publishers
Publisher_xml – name: Now Publishers
SSID ssj0001125964
Score 2.6418507
Snippet Recently, the speech community is seeing a significant trend of moving from deep neural network based hybrid modeling to end-to-end (E2E) modeling for...
SourceID doaj
crossref
now
SourceType Open Website
Enrichment Source
Index Database
Publisher
SubjectTerms Engineering
Signal Processing
Technology
Title Recent Advances in End-to-End Automatic Speech Recognition
URI https://doi.org/10.1561/116.00000050
https://doaj.org/article/42329981e5bb4ac0838ea1e89de7b264
Volume 11
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LS8NAEF60Jz2IT6wvVlAvErrbbDa73qq0FkEPaqG3sE8UJCk2_f_OJmmJh-LFU2AzIctMNt_M7sw3CF0pHRsIZWXErPcR04mHJcVo5BKRCniCJ3EoTn5-4eMJe5om01arr5ATVtMD14rrhYNECAmoS7RmyoDHIJyiTkjrUg1oHv6-gHmtYKraXQHclpw1me7gI_Qo5TVXIQkl9i0Mqqj6AVnyJSdkhSyjXbTTuIR4UE9lD224fB9tt4gCD9AdeHeADnhQn9jP8WeOh7mNyiKCCx4syqKiXsVvM-fMB35dpgUV-SGajIbvD-Oo6XoQGUZJGaVOUO-lJ45Sb7iiVDFiiJKkb2Sgv4Nx5jlLwDdgnnHjE2utSgn3THAdx0eokxe5O0ZYJsr2faz7zHDmiFXUBjGhhfKWadlFt0s9ZKahBA-dKb6yEBqA1iA-CFlvtda66HolPaupMNbI3QeVrmQCgXU1AGbNGrNmf5m1iy7BIFmzoOZrXnTzW6Z1L5tZn_kmf-bkP2Z0irb6ofSh2n45Q53ye-HOwSEp9QXaFKPHi-oL_AERTdjz
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Recent+Advances+in+End-to-End+Automatic+Speech+Recognition&rft.jtitle=APSIPA+transactions+on+signal+and+information+processing&rft.au=Li%2C+Jinyu&rft.date=2022-01-01&rft.issn=2048-7703&rft.eissn=2048-7703&rft.volume=11&rft.issue=1&rft_id=info:doi/10.1561%2F116.00000050&rft.externalDBID=n%2Fa&rft.externalDocID=10_1561_116_00000050
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2048-7703&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2048-7703&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2048-7703&client=summon