Recent Advances in End-to-End Automatic Speech Recognition
Recently, the speech community is seeing a significant trend of moving from deep neural network based hybrid modeling to end-to-end (E2E) modeling for automatic speech recognition (ASR). While E2E models achieve the state-of-the-art results in most benchmarks in terms of ASR accuracy, hybrid models...
Saved in:
Published in | APSIPA transactions on signal and information processing Vol. 11; no. 1 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Boston — Delft
Now Publishers
01.01.2022
|
Subjects | |
Online Access | Get full text |
ISSN | 2048-7703 2048-7703 |
DOI | 10.1561/116.00000050 |
Cover
Loading…
Abstract | Recently, the speech community is seeing a significant trend of moving from deep neural network based hybrid modeling to end-to-end (E2E) modeling for automatic speech recognition (ASR). While E2E models achieve the state-of-the-art results in most benchmarks in terms of ASR accuracy, hybrid models are still used in a large proportion of commercial ASR systems at the current time. There are lots of practical factors that affect the production model deployment decision. Traditional hybrid models, being optimized for production for decades, are usually good at these factors. Without providing excellent solutions to all these factors, it is hard for E2E models to be widely commercialized. In this paper, we will overview the recent advances in E2E models, focusing on technologies addressing those challenges from the industry's perspective. |
---|---|
AbstractList | Recently, the speech community is seeing a significant trend of moving from deep neural network based hybrid modeling to end-to-end (E2E) modeling for automatic speech recognition (ASR). While E2E models achieve the state-of-the-art results in most benchmarks in terms of ASR accuracy, hybrid models are still used in a large proportion of commercial ASR systems at the current time. There are lots of practical factors that affect the production model deployment decision. Traditional hybrid models, being optimized for production for decades, are usually good at these factors. Without providing excellent solutions to all these factors, it is hard for E2E models to be widely commercialized. In this paper, we will overview the recent advances in E2E models, focusing on technologies addressing those challenges from the industry's perspective. |
Author | Li, Jinyu |
Author_xml | – sequence: 1 givenname: Jinyu surname: Li fullname: Li, Jinyu email: jinyli@microsoft.com organization: Microsoft, USA |
BookMark | eNp1kE1LAzEQhoNUsNbe_AF79ODWTDab3fVWStVCQfDjHLLZpKZsk5JN_fj3pq2KiJ3LOwzPvMy8p6hnnVUInQMeQc7gCoCN8K5yfIT6BNMyLQqc9X71J2jYdcuIAJC8YrSPrh-UVDYk4-ZVWKm6xNhkaps0uDRKMt4EtxLByORxrZR8SSLuFtYE4-wZOtai7dTwSwfo-Wb6NLlL5_e3s8l4nkoKOKSFKkHrSmMFoCUTAIJiiUWFiawAZyTOqWY0LwBTTZnUedM0osBM05LVWTZAs71v48SSr71ZCf_BnTB8N3B-wYWPJ7aKU5KRqipB5XVNhcRlVioBqqwaVdSE0eh1ufeS3nWdV_rHDzDfxshjjPw7xoiTP7g0QWyfD16Y9tDSxX7Juje-dBtvYzqH0PQ_9BfC143metO2Qb2H7BP7V47w |
CitedBy_id | crossref_primary_10_1109_JSEN_2023_3336722 crossref_primary_10_1109_TASLP_2024_3426924 crossref_primary_10_1007_s40799_023_00697_3 crossref_primary_10_1080_01691864_2024_2388145 crossref_primary_10_1109_MSP_2024_3486469 crossref_primary_10_3390_app13106192 crossref_primary_10_1016_j_mlwa_2023_100489 crossref_primary_10_1002_for_3147 crossref_primary_10_1007_s00521_024_09435_1 crossref_primary_10_1109_TASLP_2023_3328283 crossref_primary_10_1109_TASLPRO_2025_3530324 crossref_primary_10_1016_j_ijhcs_2025_103460 crossref_primary_10_3390_fi16050150 crossref_primary_10_3390_app13010326 crossref_primary_10_3390_app14188532 crossref_primary_10_3390_s23084025 crossref_primary_10_1016_j_specom_2024_103109 crossref_primary_10_1109_TASLP_2023_3301230 crossref_primary_10_1109_TASLP_2024_3434425 crossref_primary_10_1007_s10489_024_06119_0 crossref_primary_10_1109_TASLP_2023_3250842 crossref_primary_10_3390_math11122665 crossref_primary_10_1016_j_specom_2025_103188 crossref_primary_10_1109_TASLP_2024_3519879 crossref_primary_10_1007_s10462_023_10668_0 crossref_primary_10_1049_ell2_12823 crossref_primary_10_12677_mos_2024_134402 crossref_primary_10_1007_s11071_025_10871_4 crossref_primary_10_1109_TMC_2023_3309633 crossref_primary_10_3390_rs15194844 crossref_primary_10_1155_2022_6825555 crossref_primary_10_1016_j_future_2025_107816 crossref_primary_10_1109_ACCESS_2024_3496617 crossref_primary_10_3390_electronics14010128 crossref_primary_10_3390_s22166304 crossref_primary_10_1016_j_measen_2024_101095 crossref_primary_10_3390_electronics13020307 crossref_primary_10_1109_TASLP_2024_3419421 crossref_primary_10_1093_jamia_ocac241 crossref_primary_10_1186_s13636_024_00349_3 crossref_primary_10_1016_j_eswa_2022_119220 crossref_primary_10_3389_fauot_2023_1226946 crossref_primary_10_3390_s24144715 crossref_primary_10_1145_3636513 crossref_primary_10_1007_s11063_024_11614_z crossref_primary_10_3390_bdcc8120195 crossref_primary_10_3390_mti9010006 crossref_primary_10_1007_s11390_024_3872_3 crossref_primary_10_1109_TASLP_2024_3350893 crossref_primary_10_1016_j_ijcce_2024_12_007 crossref_primary_10_1142_S2717554523500248 crossref_primary_10_1016_j_ins_2024_121420 crossref_primary_10_1007_s12539_024_00609_y crossref_primary_10_2196_40031 crossref_primary_10_1007_s11042_023_16554_9 crossref_primary_10_1109_TASLP_2022_3205753 crossref_primary_10_1007_s10462_024_10721_6 crossref_primary_10_32604_cmc_2024_058675 crossref_primary_10_3390_s25020341 crossref_primary_10_1007_s00034_023_02570_5 crossref_primary_10_32604_cmes_2023_030512 crossref_primary_10_47576_2949_1908_2024_1_1_012 crossref_primary_10_1109_TASLP_2024_3389630 crossref_primary_10_3390_s22197319 crossref_primary_10_1088_1742_6596_2858_1_012017 crossref_primary_10_1007_s12204_024_2738_8 crossref_primary_10_1016_j_asoc_2024_111422 crossref_primary_10_32604_cmc_2023_046746 crossref_primary_10_1109_TASLP_2023_3306709 crossref_primary_10_7717_peerj_cs_1650 crossref_primary_10_1162_coli_a_00526 crossref_primary_10_1049_cit2_12212 crossref_primary_10_1016_j_eswa_2024_124159 crossref_primary_10_1109_TASLP_2023_3263789 crossref_primary_10_3390_s22145381 crossref_primary_10_1007_s11571_024_10199_6 crossref_primary_10_3390_sym16121701 crossref_primary_10_1109_TBCAS_2024_3418085 crossref_primary_10_1109_LSP_2023_3347148 crossref_primary_10_1109_TASLP_2023_3336517 crossref_primary_10_1080_10447318_2023_2233128 crossref_primary_10_1007_s11554_025_01647_5 crossref_primary_10_1016_j_fsisyn_2024_100563 crossref_primary_10_1016_j_inffus_2023_101869 crossref_primary_10_1109_TASLP_2023_3345150 crossref_primary_10_1007_s11227_024_06351_y crossref_primary_10_1109_TASLP_2023_3304476 crossref_primary_10_1007_s11063_022_10885_8 crossref_primary_10_3390_make5020030 crossref_primary_10_1007_s11042_024_18753_4 crossref_primary_10_1109_TASLP_2024_3444490 crossref_primary_10_1215_00031283_11466494 crossref_primary_10_1007_s10462_023_10513_4 crossref_primary_10_1016_j_apacoust_2024_109883 crossref_primary_10_3390_app13074100 |
ContentType | Journal Article |
Copyright | 2022 J. Li |
Copyright_xml | – notice: 2022 J. Li |
DBID | NOJ AAYXX CITATION DOA |
DOI | 10.1561/116.00000050 |
DatabaseName | Now Publishers Journals CrossRef DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 2048-7703 |
ExternalDocumentID | oai_doaj_org_article_42329981e5bb4ac0838ea1e89de7b264 10_1561_116_00000050 SIP-2021-0050 |
GroupedDBID | .FH 3V. 5VS 74X 74Y 7~V 8FE 8FG AACJH AAGFV AAKTX AARAB ABBXD ABCFP ABGDZ ABKKG ABQTM ABROB ACBMC ACGFS ACIMK ACUIJ ACZBM ACZUX ADBBV ADCGK ADFEC AEBAK AEYYC AFFUJ AFKRA AFLVW AFUTZ AGABE AISIE AJPFC AJQAS ALMA_UNASSIGNED_HOLDINGS ALWZO ARABE ARAPS ARCSS AZQEC BBLKV BCNDV BENPR BGHMG BLZWO BMAJL BPHCQ C0O GROUPED_DOAJ HCIFZ HG- HZ~ I.6 IKXGN IS6 I~P J38 J3A JHPGK K6V K7- KQ8 M-V M0N M~E NIKVX NOJ NOT O9- OK1 P62 PIMPY PQQKQ PROAC PYCCK RAMDC RCA RNS S6- S6U T9M UT1 WFFJZ AABES AABWE AASVR AAYXX ABMWE ABVKB ACQPF ADOVH AEHGV AENGE AFKQG AFLOS AHQXX AIGNW AIHIV AIOIP AJCYY AUXHV CBIIA CCQAD CFAFE CHEAL CITATION DOHLZ IOEEP JQKCU KCGVB KFECR M48 SAAAG ZYDXJ |
ID | FETCH-LOGICAL-c410t-7e81ff9f0e11fc6a11a40c0a902c91032e114f6457104f46cf5ddda706f486b33 |
IEDL.DBID | DOA |
ISSN | 2048-7703 |
IngestDate | Wed Aug 27 01:31:12 EDT 2025 Thu Apr 24 23:04:09 EDT 2025 Tue Jul 01 04:35:59 EDT 2025 Thu Jan 09 12:18:21 EST 2025 Thu Dec 05 17:33:15 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Language | English |
License | open-access: https://creativecommons.org/licenses/by-nc/4.0/: This is published under the terms of CC BY-NC. |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c410t-7e81ff9f0e11fc6a11a40c0a902c91032e114f6457104f46cf5ddda706f486b33 |
Notes | streaming transducer SIP-2021-0050 transformer attention adaptation automatic speech recognition Now Publishers End-to-end |
OpenAccessLink | https://doaj.org/article/42329981e5bb4ac0838ea1e89de7b264 |
PageCount | 64 |
ParticipantIDs | now_journals_10_1561_116_00000050 now_journals_116_00000050_pdf_fulltext doaj_primary_oai_doaj_org_article_42329981e5bb4ac0838ea1e89de7b264 crossref_primary_10_1561_116_00000050 crossref_citationtrail_10_1561_116_00000050 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2022-01-01 |
PublicationDateYYYYMMDD | 2022-01-01 |
PublicationDate_xml | – month: 01 year: 2022 text: 2022-01-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | Boston — Delft |
PublicationPlace_xml | – name: Boston — Delft |
PublicationTitle | APSIPA transactions on signal and information processing |
PublicationTitleAlternate | SIP |
PublicationYear | 2022 |
Publisher | Now Publishers |
Publisher_xml | – name: Now Publishers |
SSID | ssj0001125964 |
Score | 2.6418507 |
Snippet | Recently, the speech community is seeing a significant trend of moving from deep neural network based hybrid modeling to end-to-end (E2E) modeling for... |
SourceID | doaj crossref now |
SourceType | Open Website Enrichment Source Index Database Publisher |
SubjectTerms | Engineering Signal Processing Technology |
Title | Recent Advances in End-to-End Automatic Speech Recognition |
URI | https://doi.org/10.1561/116.00000050 https://doaj.org/article/42329981e5bb4ac0838ea1e89de7b264 |
Volume | 11 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LS8NAEF60Jz2IT6wvVlAvErrbbDa73qq0FkEPaqG3sE8UJCk2_f_OJmmJh-LFU2AzIctMNt_M7sw3CF0pHRsIZWXErPcR04mHJcVo5BKRCniCJ3EoTn5-4eMJe5om01arr5ATVtMD14rrhYNECAmoS7RmyoDHIJyiTkjrUg1oHv6-gHmtYKraXQHclpw1me7gI_Qo5TVXIQkl9i0Mqqj6AVnyJSdkhSyjXbTTuIR4UE9lD224fB9tt4gCD9AdeHeADnhQn9jP8WeOh7mNyiKCCx4syqKiXsVvM-fMB35dpgUV-SGajIbvD-Oo6XoQGUZJGaVOUO-lJ45Sb7iiVDFiiJKkb2Sgv4Nx5jlLwDdgnnHjE2utSgn3THAdx0eokxe5O0ZYJsr2faz7zHDmiFXUBjGhhfKWadlFt0s9ZKahBA-dKb6yEBqA1iA-CFlvtda66HolPaupMNbI3QeVrmQCgXU1AGbNGrNmf5m1iy7BIFmzoOZrXnTzW6Z1L5tZn_kmf-bkP2Z0irb6ofSh2n45Q53ye-HOwSEp9QXaFKPHi-oL_AERTdjz |
linkProvider | Directory of Open Access Journals |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Recent+Advances+in+End-to-End+Automatic+Speech+Recognition&rft.jtitle=APSIPA+transactions+on+signal+and+information+processing&rft.au=Li%2C+Jinyu&rft.date=2022-01-01&rft.issn=2048-7703&rft.eissn=2048-7703&rft.volume=11&rft.issue=1&rft_id=info:doi/10.1561%2F116.00000050&rft.externalDBID=n%2Fa&rft.externalDocID=10_1561_116_00000050 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2048-7703&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2048-7703&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2048-7703&client=summon |