Automatic speech recognition: a survey

Recently great strides have been made in the field of automatic speech recognition (ASR) by using various deep learning techniques. In this study, we present a thorough comparison between cutting-edged techniques currently being used in this area, with a special focus on the various deep learning me...

Full description

Saved in:

Bibliographic Details
Published in	Multimedia tools and applications Vol. 80; no. 6; pp. 9411 - 9457
Main Authors	Malik, Mishaim, Malik, Muhammad Kamran, Mehmood, Khawar, Makhdoom, Imran
Format	Journal Article
Language	English
Published	New York Springer US 01.03.2021 Springer Nature B.V
Subjects	Automatic speech recognition Computer Communication Networks Computer Science Data Structures and Information Theory Deep learning Feature extraction Language modeling Machine learning Multimedia Information Systems Special Purpose and Application-Based Systems Speech recognition Voice recognition ASR Feature extraction Classification models Language models Speech recognition Automatic speech recognition
Online Access	Get full text

Cover

Loading…

Abstract	Recently great strides have been made in the field of automatic speech recognition (ASR) by using various deep learning techniques. In this study, we present a thorough comparison between cutting-edged techniques currently being used in this area, with a special focus on the various deep learning methods. This study explores different feature extraction methods, state-of-the-art classification models, and vis-a-vis their impact on an ASR. As deep learning techniques are very data-dependent different speech datasets that are available online are also discussed in detail. In the end, the various online toolkits, resources, and language models that can be helpful in the formulation of an ASR are also proffered. In this study, we captured every aspect that can impact the performance of an ASR. Hence, we speculate that this work is a good starting point for academics interested in ASR research.
AbstractList	Recently great strides have been made in the field of automatic speech recognition (ASR) by using various deep learning techniques. In this study, we present a thorough comparison between cutting-edged techniques currently being used in this area, with a special focus on the various deep learning methods. This study explores different feature extraction methods, state-of-the-art classification models, and vis-a-vis their impact on an ASR. As deep learning techniques are very data-dependent different speech datasets that are available online are also discussed in detail. In the end, the various online toolkits, resources, and language models that can be helpful in the formulation of an ASR are also proffered. In this study, we captured every aspect that can impact the performance of an ASR. Hence, we speculate that this work is a good starting point for academics interested in ASR research.
Author	Malik, Mishaim Makhdoom, Imran Malik, Muhammad Kamran Mehmood, Khawar
Author_xml	– sequence: 1 givenname: Mishaim orcidid: 0000-0002-4917-7144 surname: Malik fullname: Malik, Mishaim email: mishaimmalik30@gmail.com organization: Punjab University College of Information Technology (PUCIT) – sequence: 2 givenname: Muhammad Kamran surname: Malik fullname: Malik, Muhammad Kamran organization: Faculty of Punjab University College of Information Technology (PUCIT) – sequence: 3 givenname: Khawar surname: Mehmood fullname: Mehmood, Khawar organization: School of Engineering and Information Technology, University of New South Wales (UNSW) Canberra at ADFA – sequence: 4 givenname: Imran surname: Makhdoom fullname: Makhdoom, Imran organization: Faculty of Engineering and IT, University of Technology Sydney
BookMark	eNp9kE9LAzEQxYNUsFa_gKcFwVt0kuzu7HorxX9Q8KLnkGaTuqXd1CQr9NubdQXBQ08zA-83896ck0nnOkPIFYNbBoB3gTHIOQUOdJgFxRMyZcXQIGeT1IsKKBbAzsh5CBsAVhY8n5KbeR_dTsVWZ2FvjP7IvNFu3bWxdd19prLQ-y9zuCCnVm2DufytM_L--PC2eKbL16eXxXxJtWB1pCsO1Uo3RVNWSqBgVjQCy0YboVQyImpYlZCsVoUqbMMRLdSokyS3Fi0qMSPX4969d5-9CVFuXO-7dFLyvM6RlVVeJhUfVdq7ELyxcu_bnfIHyUAO-eX4D5n-8TMLiQmq_kG6jWqIGb1qt8dRMaIh3enWxv-5OkJ9A4J7dI0
CitedBy_id	crossref_primary_10_1044_2023_JSLHR_22_00642 crossref_primary_10_1007_s11042_023_16748_1 crossref_primary_10_3390_app12031091 crossref_primary_10_1016_j_entcom_2024_100787 crossref_primary_10_3390_electronics11121829 crossref_primary_10_1109_TASLP_2022_3198555 crossref_primary_10_1007_s11042_022_13249_5 crossref_primary_10_1016_j_eswa_2022_118943 crossref_primary_10_1109_TSC_2023_3304312 crossref_primary_10_1080_03772063_2024_2315588 crossref_primary_10_1007_s00521_022_07234_0 crossref_primary_10_3390_electronics14010128 crossref_primary_10_1016_j_ins_2024_120802 crossref_primary_10_3390_jimaging9040082 crossref_primary_10_1007_s11042_023_15413_x crossref_primary_10_1080_08839514_2022_2095039 crossref_primary_10_3390_info15100608 crossref_primary_10_1007_s10489_023_04669_3 crossref_primary_10_5351_KJAS_2023_36_1_033 crossref_primary_10_1016_j_bspc_2023_105595 crossref_primary_10_3390_data6120130 crossref_primary_10_32604_cmes_2022_021755 crossref_primary_10_1007_s10772_023_10033_0 crossref_primary_10_1371_journal_pone_0314898 crossref_primary_10_1007_s11042_021_11706_1 crossref_primary_10_3390_s24113289 crossref_primary_10_1007_s12559_023_10122_x crossref_primary_10_3390_biomedinformatics4010047 crossref_primary_10_1007_s13198_023_01995_0 crossref_primary_10_1109_ACCESS_2023_3325402 crossref_primary_10_3390_bdcc7030132 crossref_primary_10_4018_IJSI_303576 crossref_primary_10_1016_j_dsp_2021_103134 crossref_primary_10_3390_fi16030087 crossref_primary_10_3390_drones7030147 crossref_primary_10_3389_fenrg_2024_1376677 crossref_primary_10_3390_electronics14020345 crossref_primary_10_3390_s24072345 crossref_primary_10_3390_s23187879 crossref_primary_10_3390_s22030923 crossref_primary_10_1155_2023_9959015 crossref_primary_10_1021_acsnano_4c12884 crossref_primary_10_3390_s23010062 crossref_primary_10_1016_j_csl_2024_101754 crossref_primary_10_1371_journal_pone_0275479 crossref_primary_10_3934_mbe_2024272 crossref_primary_10_1109_TAFFC_2024_3395117 crossref_primary_10_1016_j_ijar_2024_109301 crossref_primary_10_1007_s11227_024_06351_y crossref_primary_10_1109_TSE_2023_3285280 crossref_primary_10_1108_LHT_09_2021_0333 crossref_primary_10_3390_math11183814 crossref_primary_10_1007_s11042_024_18753_4 crossref_primary_10_1016_j_csl_2022_101442 crossref_primary_10_3390_s22186966 crossref_primary_10_1109_TASLP_2024_3374064 crossref_primary_10_3390_info14020137 crossref_primary_10_1109_TAFFC_2022_3221749 crossref_primary_10_1007_s00521_023_08306_5 crossref_primary_10_3389_frsip_2022_999457 crossref_primary_10_3390_app14188532 crossref_primary_10_1044_2024_AJSLP_24_00218 crossref_primary_10_1109_ACCESS_2023_3255982 crossref_primary_10_1016_j_neunet_2024_106976 crossref_primary_10_51574_ijrer_v1i2_390 crossref_primary_10_3390_app112411957 crossref_primary_10_14778_3681954_3681998 crossref_primary_10_3390_e25010124 crossref_primary_10_1007_s10462_023_10668_0 crossref_primary_10_3390_app14041325 crossref_primary_10_1016_j_csi_2024_103856 crossref_primary_10_1145_3643830 crossref_primary_10_3390_s22166304 crossref_primary_10_1016_j_inffus_2024_102840 crossref_primary_10_3389_fphy_2024_1404503 crossref_primary_10_1186_s13636_021_00213_8 crossref_primary_10_1186_s13636_024_00388_w crossref_primary_10_1007_s00034_024_02794_z crossref_primary_10_1109_TR_2023_3298685 crossref_primary_10_2196_49132 crossref_primary_10_1016_j_cosrev_2023_100614 crossref_primary_10_1007_s41870_024_02285_z crossref_primary_10_3390_s22083027 crossref_primary_10_1016_j_ijcce_2024_12_007 crossref_primary_10_1016_j_neucom_2023_126436 crossref_primary_10_1155_2022_7593750 crossref_primary_10_1016_j_ins_2024_121420 crossref_primary_10_1007_s11042_023_16554_9 crossref_primary_10_1016_j_apacoust_2022_108813 crossref_primary_10_1016_j_knosys_2023_110851 crossref_primary_10_3389_fncom_2022_980613 crossref_primary_10_1007_s13198_023_02014_y crossref_primary_10_3390_s21155025 crossref_primary_10_3389_fdata_2023_1210559 crossref_primary_10_47495_okufbed_1457532 crossref_primary_10_3390_app122211727 crossref_primary_10_1016_j_chaos_2023_113554 crossref_primary_10_3390_s23020970 crossref_primary_10_1016_j_psep_2023_07_059 crossref_primary_10_1109_JSAC_2023_3280966 crossref_primary_10_1371_journal_pone_0302394 crossref_primary_10_3389_fcomm_2022_803452 crossref_primary_10_29407_intensif_v9i1_23723 crossref_primary_10_3390_app14124973 crossref_primary_10_1007_s11042_023_17080_4 crossref_primary_10_1007_s13042_025_02529_9 crossref_primary_10_3390_electronics13214227 crossref_primary_10_1016_j_inffus_2024_102422 crossref_primary_10_1007_s40747_024_01506_z crossref_primary_10_1121_10_0035829 crossref_primary_10_1007_s11277_024_11448_x crossref_primary_10_1109_ACCESS_2024_3471183 crossref_primary_10_1016_j_rineng_2025_103943 crossref_primary_10_3390_ai6040065 crossref_primary_10_1021_acsphotonics_4c01284 crossref_primary_10_1155_2024_4976944 crossref_primary_10_5937_telfor2401008B
Cites_doi	10.1109/TFUZZ.2010.2042721 10.3115/1075434.1075467 10.1109/MELCON.2010.5476306 10.1109/5.237532 10.1109/TASL.2009.2035151 10.1007/s10772-010-9088-7 10.1016/S0925-2312(00)00308-8 10.21437/ICSLP.1996-544 10.7551/mitpress/7503.003.0161 10.1109/AMS.2009.101 10.1109/89.701359 10.1029/JB076i008p01905 10.1090/S0002-9904-1967-11751-8 10.1006/csla.1993.1007 10.1109/TNN.2003.820838 10.1109/IJCNN.1998.682377 10.1109/ICASSP.1999.759734 10.1007/s10462-020-09825-6 10.1109/SPED.2011.5940729 10.1109/ICASSP.1987.1169748 10.1109/ICASSP.2016.7472621 10.1121/1.1907653 10.1186/1687-4722-2012-7 10.1109/LSP.2009.2024113 10.1109/TCOM.1981.1095031 10.1007/s10579-008-9076-6 10.1109/ICICS.2003.1292740 10.1007/978-3-540-71505-4_11 10.1016/j.patrec.2011.01.017 10.1109/TASL.2006.879805 10.1109/29.32278 10.1109/89.326616 10.1109/IAMA.2009.5228022 10.1109/CCOMS.2019.8821629 10.1109/JPROC.2003.817117 10.1109/ICASSP.2011.5947563 10.7763/IJCTE.2010.V2.262 10.1109/ICAPR.2009.80 10.1121/1.399423 10.21236/ADA458711 10.1109/IJCNN.2006.247398 10.3844/jcssp.2007.608.616 10.1109/AISP.2011.5960989 10.1109/ICECTECH.2011.5941788 10.1109/5.18626 10.1109/ICASSP.2011.5947489 10.1109/ICSDA.2017.8384449 10.1109/MC.2006.401 10.1145/1143844.1143891 10.1109/72.991432 10.1109/ICASSP.2018.8462105 10.1109/APCIP.2009.212 10.3115/100964.101006 10.1109/MWSCAS.2002.1187258 10.1109/ICASSP.2015.7178964 10.1109/IADCC.2009.4808998 10.21437/Interspeech.2009-604 10.1109/TASL.2011.2129510 10.1016/j.dsp.2010.07.004 10.1109/ICASSP.1995.479276 10.1109/34.192463 10.1007/11760023_23 10.21437/Interspeech.2005-237 10.1109/MWSCAS.2003.1562377 10.1007/3-540-45065-3_33 10.1109/MELCON.2010.5476361 10.1007/978-3-540-30549-1_116 10.21437/Eurospeech.2001-396 10.1109/ELMAR.2006.329528 10.1145/2500887 10.1109/PACCS.2009.138 10.1109/TASSP.1978.1163055 10.2478/jaiscr-2019-0006 10.5120/21581-4672 10.1109/TIE.2011.2164773 10.1109/ASRU46091.2019.9004036 10.3115/1075527.1075552 10.1109/TASL.2012.2221459 10.1109/ICASSP.2010.5495097 10.1109/ICASSP40776.2020.9053889 10.3390/sym11050644 10.1080/00401706.1991.10484833 10.1109/ICNC.2008.666 10.1016/S0020-7373(70)80008-6 10.21437/Interspeech.2019-1341 10.1109/ICASSP.1991.150344 10.1007/BF00337288 10.1007/978-0-387-76569-3_1 10.1109/ICCCNT.2010.5591733 10.1049/iet-spr.2012.0151 10.1109/ICASSP.1990.115720 10.1016/j.specom.2013.07.008 10.1117/12.836711 10.1142/S0218488501001253 10.1109/45.1890 10.1109/FSKD.2011.6019893 10.1109/ICASSP.2001.940770 10.1121/1.1906946 10.3115/116580.116683 10.1109/ICSAP.2010.21 10.1109/ICASSP.2005.1415166 10.1109/ICSMC.2011.6083880 10.1109/78.668544 10.1109/TFUZZ.2005.859320 10.1109/NCC.2011.5734729 10.1109/ETNCC.2011.5958519 10.1121/1.3040022 10.1109/72.991427 10.1109/JSTSP.2010.2080812 10.1109/ICASSP.2018.8461972 10.1109/ICASSP.1992.225957 10.1007/11494683_28 10.21437/Interspeech.2015-350 10.3923/itj.2009.796.800 10.1016/j.patcog.2008.05.008
ContentType	Journal Article
Copyright	Springer Science+Business Media, LLC, part of Springer Nature 2020 Springer Science+Business Media, LLC, part of Springer Nature 2020.
Copyright_xml	– notice: Springer Science+Business Media, LLC, part of Springer Nature 2020 – notice: Springer Science+Business Media, LLC, part of Springer Nature 2020.
DBID	AAYXX CITATION 3V. 7SC 7T9 7WY 7WZ 7XB 87Z 8AL 8AO 8FD 8FE 8FG 8FK 8FL 8G5 ABUWG AFKRA ARAPS AZQEC BENPR BEZIV BGLVJ CCPQU DWQXO FRNLG F~G GNUQQ GUQSH HCIFZ JQ2 K60 K6~ K7- L.- L7M L~C L~D M0C M0N M2O MBDVC P5Z P62 PHGZM PHGZT PKEHL PQBIZ PQBZA PQEST PQGLB PQQKQ PQUKI PRINS Q9U
DOI	10.1007/s11042-020-10073-7
DatabaseName	CrossRef ProQuest Central (Corporate) Computer and Information Systems Abstracts Linguistics and Language Behavior Abstracts (LLBA) ABI/INFORM Collection ABI/INFORM Global (PDF only) ProQuest Central (purchase pre-March 2016) ABI/INFORM Collection Computing Database (Alumni Edition) ProQuest Pharma Collection Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) (purchase pre-March 2016) ABI/INFORM Collection (Alumni) ProQuest Research Library ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Central Business Premium Collection Technology Collection ProQuest One Community College ProQuest Central Business Premium Collection (Alumni) ABI/INFORM Global (Corporate) ProQuest Central Student Research Library Prep SciTech Premium Collection ProQuest Computer Science Collection ProQuest Business Collection (Alumni Edition) ProQuest Business Collection Computer Science Database ABI/INFORM Professional Advanced Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional ABI/INFORM Global Computing Database Research Library Research Library (Corporate) ProQuest advanced technologies & aerospace journals ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic ProQuest One Academic Middle East (New) ProQuest One Business ProQuest One Business (Alumni) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China ProQuest Central Basic
DatabaseTitle	CrossRef ProQuest Business Collection (Alumni Edition) Research Library Prep Computer Science Database ProQuest Central Student ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts SciTech Premium Collection ProQuest Central China ABI/INFORM Complete ProQuest One Applied & Life Sciences ProQuest Central (New) Advanced Technologies & Aerospace Collection Business Premium Collection ABI/INFORM Global ProQuest One Academic Eastern Edition Linguistics and Language Behavior Abstracts (LLBA) ProQuest Technology Collection ProQuest Business Collection ProQuest One Academic UKI Edition ProQuest One Academic ProQuest One Academic (New) ABI/INFORM Global (Corporate) ProQuest One Business Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) ProQuest Central (Alumni Edition) ProQuest One Community College Research Library (Alumni Edition) ProQuest Pharma Collection ProQuest Central ABI/INFORM Professional Advanced ProQuest Central Korea ProQuest Research Library Advanced Technologies Database with Aerospace ABI/INFORM Complete (Alumni Edition) ProQuest Computing ABI/INFORM Global (Alumni Edition) ProQuest Central Basic ProQuest Computing (Alumni Edition) ProQuest SciTech Collection Computer and Information Systems Abstracts Professional Advanced Technologies & Aerospace Database ProQuest One Business (Alumni) ProQuest Central (Alumni) Business Premium Collection (Alumni)
DatabaseTitleList	ProQuest Business Collection (Alumni Edition)
Database_xml	– sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Computer Science
EISSN	1573-7721
EndPage	9457
ExternalDocumentID	10_1007_s11042_020_10073_7
GroupedDBID	-4Z -59 -5G -BR -EM -Y2 -~C .4S .86 .DC .VR 06D 0R~ 0VY 123 1N0 1SB 2.D 203 28- 29M 2J2 2JN 2JY 2KG 2LR 2P1 2VQ 2~H 30V 3EH 3V. 4.4 406 408 409 40D 40E 5QI 5VS 67Z 6NX 7WY 8AO 8FE 8FG 8FL 8G5 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AAOBN AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDZT ABECU ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABUWG ABWNU ABXPI ACAOD ACBXY ACDTI ACGFO ACGFS ACHSB ACHXU ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACREN ACSNA ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADMLS ADRFC ADTPH ADURQ ADYFF ADYOE ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFKRA AFLOW AFQWF AFWTZ AFYQB AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHKAY AHSBF AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMTXH AMXSW AMYLF AMYQR AOCGG ARAPS ARCSS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN AZQEC B-. BA0 BBWZM BDATZ BENPR BEZIV BGLVJ BGNMA BPHCQ BSONS CAG CCPQU COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 DWQXO EBLON EBS EIOEI EJD ESBYG FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRNLG FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNUQQ GNWQR GQ6 GQ7 GQ8 GROUPED_ABI_INFORM_COMPLETE GUQSH GXS H13 HCIFZ HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ I-F I09 IHE IJ- IKXTQ ITG ITH ITM IWAJR IXC IXE IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ K60 K6V K6~ K7- KDC KOV KOW LAK LLZTM M0C M0N M2O M4Y MA- N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM OVD P19 P2P P62 P9O PF0 PQBIZ PQBZA PQQKQ PROAC PT4 PT5 Q2X QOK QOS R4E R89 R9I RHV RNI RNS ROL RPX RSV RZC RZE RZK S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SDM SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TEORI TH9 TSG TSK TSV TUC TUS U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW W23 W48 WK8 YLTOR Z45 Z7R Z7S Z7W Z7X Z7Y Z7Z Z81 Z83 Z86 Z88 Z8M Z8N Z8Q Z8R Z8S Z8T Z8U Z8W Z92 ZMTXR ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ACMFV ACSTC ADHKG ADKFA AEZWR AFDZB AFHIU AFOHR AGQPQ AHPBZ AHWEU AIXLP ATHPR AYFIA CITATION PHGZM PHGZT 7SC 7T9 7XB 8AL 8FD 8FK ABRTQ JQ2 L.- L7M L~C L~D MBDVC PKEHL PQEST PQGLB PQUKI PRINS Q9U
ID	FETCH-LOGICAL-c319t-b208bcd5d68a3731f3d376dce3aa772390b6011085a5fd277f097c3764ff7f7a3
IEDL.DBID	BENPR
ISSN	1380-7501
IngestDate	Fri Jul 25 06:51:19 EDT 2025 Thu Apr 24 23:01:02 EDT 2025 Tue Jul 01 04:13:07 EDT 2025 Fri Feb 21 02:49:28 EST 2025
IsPeerReviewed	true
IsScholarly	true
Issue	6
Keywords	ASR Feature extraction Classification models Language models Speech recognition Automatic speech recognition
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c319t-b208bcd5d68a3731f3d376dce3aa772390b6011085a5fd277f097c3764ff7f7a3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0002-4917-7144
PQID	2494716846
PQPubID	54626
PageCount	47
ParticipantIDs	proquest_journals_2494716846 crossref_primary_10_1007_s11042_020_10073_7 crossref_citationtrail_10_1007_s11042_020_10073_7 springer_journals_10_1007_s11042_020_10073_7
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	20210300 2021-03-00 20210301
PublicationDateYYYYMMDD	2021-03-01
PublicationDate_xml	– month: 3 year: 2021 text: 20210300
PublicationDecade	2020
PublicationPlace	New York
PublicationPlace_xml	– name: New York – name: Dordrecht
PublicationSubtitle	An International Journal
PublicationTitle	Multimedia tools and applications
PublicationTitleAbbrev	Multimed Tools Appl
PublicationYear	2021
Publisher	Springer US Springer Nature B.V
Publisher_xml	– name: Springer US – name: Springer Nature B.V
References	Forsberg M (2003) Why is speech recognition difficult. Chalmers University of Technology. AnusuyaMAKattiSKFront end analysis of speech recognition: a reviewInt J Speech Technol201114299145 BernardoJMBayarriMJBergerJODawidAPHeckermanDSmithAFMWestMGenerative or discriminative? Getting the best of both worldsBayesian stat2007833242433187 Messaoud Z B, Hamida A B (2010) CDHMM parameters selection for speaker-independent phone recognition in continuous speech system. In MELECON 2010-2010 15th IEEE Mediterranean Electrotechnical conference (pp. 253-258). IEEE. PaulsonLDSpeech recognition moves from software to hardwareComputer200639111518 Bu H, Du J, Na X, Wu B, Zheng H (2017). Aishell-1: an open-source mandarin speech corpus and a speech recognition baseline. In 2017 20th conference of the oriental chapter of the international coordinating committee on speech databases and speech I/O systems and assessment (O-COCOSDA) (pp. 1-5). IEEE. Sabah R, Ainon RN (2009) Isolated digit speech recognition in Malay language using neuro-fuzzy approach. In 2009 third Asia international conference on Modelling & Simulation (pp. 336-340). IEEE DavisKHBiddulphRBalashekSAutomatic recognition of spoken digitsJ Acoust Soc Am1952246637642 Lazli L, Sellami M (2003) Connectionist probability estimators in HMM arabic speech recognition using fuzzy logic. In international workshop on machine learning and data Mining in Pattern Recognition (pp. 379-388). Springer, Berlin, Heidelberg. Lawrence R (2008) Fundamentals of speech recognition. Pearson Education India. Tang H, Meng CH, Lee LS (2010) An initial attempt for phoneme recognition using structured support vector machine (SVM). In 2010 IEEE international conference on acoustics, speech and signal processing (pp. 4926-4929). IEEE Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Elsen E, ..., Ng A Y (2014) Deep speech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567. Toshniwal S, Sainath T N, Weiss R J, Li B, Moreno P, Weinstein E, Rao K (2018) Multilingual speech recognition with a single end-to-end model. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4904-4908). IEEE. TrentinEGoriMA survey of hybrid ANN/HMM models for automatic speech recognitionNeurocomputing2001371–4911260963.68651 Venkateswarlu R L K, Kumari R V (2011) Novel approach for speech recognition by using self—organized maps. In 2011 international conference on emerging trends in networks and computer communications (ETNCC) (pp. 215-222). IEEE. Tavanaei A, Manzuri M T, Sameti H (2011) Mel-scaled discrete wavelet transform and dynamic features for the Persian phoneme recognition. In 2011 international symposium on artificial intelligence and signal processing (AISP) (pp. 138-140). IEEE. ForgieJWForgieCDResults obtained from a vowel recognition computer programJ Acoust Soc Am1959311114801489 TrentinEGoriMRobust combination of neural networks and hidden Markov models for speech recognitionIEEE Trans Neural Netw200314615191531 Leung K F, Leung F H, Lam H K, Tam P K S (2003) Recognition of speech commands using a modified neural fuzzy network and an improved GA. In the 12th IEEE international conference on fuzzy systems, 2003. FUZZ’03. (Vol. 1, pp. 190-195). IEEE. Lee J Y, Hung J W (2011) Exploiting principal component analysis in modulation spectrum enhancement for robust speech recognition. In 2011 eighth international conference on fuzzy systems and knowledge discovery (FSKD) (Vol. 3, pp. 1947-1951). IEEE. ThubthongNKijsirikulBSupport vector machines for Thai phoneme recognitionInt J Uncertainty Fuzziness Knowledge Based Syst20019068038131113.68474 PiconeJWSignal modeling techniques in speech recognitionProc IEEE199381912151247 Vapnik V (2013) The nature of statistical learning theory. Springer science & business media Weston J, Watkins C (1998) Multi-class support vector machines (pp. 98-04). Technical report CSD-TR-98-04, Department of Computer Science, Royal Holloway, University of London, may Saha G, Chakroborty S, Senapati S (2005) A new silence removal and endpoint detection algorithm for speech and speaker recognition applications. In proceedings of the NCC (pp. 56-61). BussoCBulutMLeeCCKazemzadehAMowerEKimSChangJNLeeSNarayananSSIEMOCAP: interactive emotional dyadic motion capture databaseLang Resour Eval2008424335359 KohonenTSelf-organized formation of topologically correct feature mapsBiol Cybern198243159696678890466.92002 PingZLi-ZhenTDong-FengXSpeech recognition algorithm of parallel subband HMM based on wavelet analysis and neural networkInf Technol J200985796800 Li T F, Chang S C (2007) Speech recognition of mandarin syllables using both linear predict coding cepstra and Mel frequency cepstra. In ROCLING 2007 poster papers (pp. 379-390). LinCFWangSDFuzzy support vector machinesIEEE Trans Neural Netw2002132464471 Rousseau A, Deléglise P, Esteve Y (2012) TED-LIUM: an automatic speech recognition dedicated corpus. In LREC (pp. 125-129). Rosenfeld R (1994) A hybrid approach to adaptive statistical language modeling. CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE MehlaRAggarwalRAutomatic speech recognition: a surveyInt J Adv Res Comput Sci Electron Eng (IJARCSEE)2014314553 O’ShaughnessyDAutomatic speech recognition: history, methods and challengesPattern Recogn20084110296529791161.68772 Zhao Y, Wakita H, Zhuang X (1991) An HMM based speaker-independent continuous speech recognition system with experiments on the TIMIT DATABASE. In acoustics, speech, and signal processing, IEEE international conference on (pp. 333-336). IEEE computer society AnusuyaMAKattiSKComparison of different speech feature extraction techniques with and without wavelet transform to Kannada speech recognitionInt J Comput Appl20112641924 Garofolo JS (1993) TIMIT acoustic phonetic continuous speech corpus. Linguist Data Consortium 1993 HermanskyHMorganNRASTA processing of speechIEEE Trans Speech Audio Process199424578589 RadhaVVimalaCA review on speech recognition challenges and approachesDoaj Org20122117 Woodland PC, Leggetter CJ, Odell JJ, Valtchev V, Young SJ (1995) The 1994 HTK large vocabulary speech recognition system. In 1995 international conference on acoustics, speech, and signal processing (Vol. 1, pp. 73-76). IEEE Duan KB, Keerthi SS (2005) Which is the best multiclass SVM method? An empirical study. In international workshop on multiple classifier systems (pp. 278-285). Springer, Berlin, Heidelberg Solera-Ureña R, Padrell-Sendra J, Martín-Iglesias D, Gallardo-Antolín A, Peláez-Moreno C, Díaz-de-María F (2007) Svms for automatic speech recognition: a survey. In Progress in nonlinear speech processing (pp. 190–216). Springer, Berlin, Heidelberg VelichkoVMZagoruykoNGAutomatic recognition of 200 wordsInt J Man Mach Stud197023223234 SaeedTRSalmanJAliAHClassification improvement of spoken arabic language based on radial basis functionInt J Electr Comput Eng20199120888708 Gamulkiewicz B, Weeks M (2003) Wavelet based speech recognition. In 2003 46th Midwest symposium on circuits and systems (Vol. 2, pp. 678-681). IEEE. O'ShaughnessyDLinear predictive codingIEEE potentials1988712932 Nataraj K S, Pandey P C, Shah M S (2011) Improving the consistency of vocal tract shape estimation. In 2011 National Conference on communications (NCC) (pp. 1-5). IEEE. WalkerSLFooSYOptimal wavelets for speech signal representationsJ Syst Cybern Inform2003144446 Sárosi G, Mozsáry M, Mihajlik P, Fegyó T (2011) Comparison of feature extraction methods for speech recognition in noise-free and in traffic noise environment. In 2011 6th conference on speech technology and human-computer dialogue (SpeD) (pp. 1-8). IEEE. Chen C P, Bilmes J, Ellis D P (2005) Speech feature smoothing for robust ASR. In proceedings.(ICASSP'05). IEEE international conference on acoustics, speech, and signal processing, 2005. (Vol. 1, pp. I-525). IEEE. Mohamadpour M, Farokhi F (2009) A new approach for Persian speech recognition. In 2009 IEEE international advance computing conference (pp. 153-158). IEEE Makino T, Liao H, Assael Y, Shillingford B, Garcia B, Braga O, Siohan O (2019) Recurrent neural network transducer for audio-visual speech recognition. In 2019 IEEE automatic speech recognition and understanding workshop (ASRU) (pp. 905-912). IEEE Venkateswarlu RLK, Kumari RV, Jayasri GV (2011) Speech recognition using radial basis function neural network. In 2011 3rd international conference on electronics computer technology (Vol. 3, pp. 441-445). IEEE Du X P, He P L (2006) The clustering solution of speech recognition models with SOM. In international symposium on neural networks (pp. 150-157). Springer, Berlin, Heidelberg. Coifman R R, Meyer Y, Wickerhauser V (1992) Wavelet analysis and signal processing. In In Wavelets and their applications. HungJWFanHTSubband feature statistics normalization techniques based on a discrete wavelet transform for robust speech recognitionIEEE Signal Process Lett20091698068092572421 Ranjan S (2010) A discrete wavelet transform based approach to Hindi speech recognition. In 2010 international conference on signal acquisition and processing (pp. 345-348). IEEE. Tang X (2009) Hybrid hidden Markov model and artificial neural network for automatic speech recognition. In 2009 Pacific-Asia conference on circuits, communications and systems (pp. 682-685). IEEE. Cutajar M, Gatt E, Micallef J, Grech I, Casha O (2010) Digital hardware implementation of self-organising maps. In Melecon 2010-2010 15th IEEE Mediterranean Electrotechnical conference (pp. 1123-1128). IEEE Fontaine V, Ris C, Leich H (1996) Nonlinear discriminant analysis with neural networks for speech recognition. In 1996 8th European signal processing conference (EUSIPCO 1996) (pp. 1-4). IEEE. Bourlard H A, Morgan N (2012). Connectionist speech recognition: a hybrid approach (Vol. 247). Springer Science & Business Media. Cheng O, Abdulla W, Salcic Z (2005) Performance evaluation of front-end processing for speech recognition systems. The University of Auckland. Jung S, Son J, Bae K JW Picone (10073_CR119) 1993; 81 10073_CR156 R Batuwita (10073_CR9) 2010; 18 10073_CR155 NS Nehe (10073_CR109) 2012; 2012 P Kaur (10073_CR71) 2012; 3 10073_CR159 10073_CR72 D O'Shaughnessy (10073_CR113) 1988; 7 10073_CR73 10073_CR74 NU Maheswari (10073_CR93) 2010; 2 H Sakoe (10073_CR138) 1978; 26 10073_CR70 O Birkenes (10073_CR13) 2009; 18 10073_CR69 B Zamani (10073_CR178) 2011; 32 10073_CR65 LR Rabiner (10073_CR123) 1989; 77 10073_CR66 10073_CR150 L Rabiner (10073_CR125) 1981; 29 10073_CR152 10073_CR151 10073_CR153 10073_CR166 SG Mallat (10073_CR96) 1989; 11 C Cortes (10073_CR28) 1995; 20 10073_CR64 10073_CR60 10073_CR58 10073_CR54 SK Gaikwad (10073_CR42) 2010; 10 10073_CR57 CW Hsu (10073_CR59) 2002; 13 Y Wang (10073_CR168) 2012; 21 VV Krishnan (10073_CR78) 2009; 1 10073_CR161 T Kohonen (10073_CR75) 1982; 43 10073_CR162 H Hermansky (10073_CR55) 1990; 87 10073_CR165 JW Forgie (10073_CR39) 1959; 31 10073_CR179 SL Walker (10073_CR167) 2003; 1 10073_CR50 RL Hardy (10073_CR51) 1971; 76 10073_CR52 10073_CR47 10073_CR48 10073_CR49 10073_CR43 C Busso (10073_CR16) 2008; 42 10073_CR45 MA Anusuya (10073_CR4) 2011; 14 10073_CR46 JW Hung (10073_CR63) 2009; 16 10073_CR172 10073_CR171 KR Lekshmi (10073_CR86) 2016; 7 10073_CR174 MA Anusuya (10073_CR5) 2011; 26 10073_CR173 10073_CR175 10073_CR101 10073_CR103 10073_CR102 10073_CR104 10073_CR107 E Trentin (10073_CR157) 2001; 37 10073_CR106 10073_CR40 V Radha (10073_CR126) 2012; 2 10073_CR36 10073_CR37 TR Saeed (10073_CR134) 2019; 9 10073_CR38 10073_CR32 10073_CR34 10073_CR35 A Shewalkar (10073_CR143) 2019; 9 H Jiang (10073_CR67) 2006; 14 L Besacier (10073_CR12) 2014; 56 G Hemakumar (10073_CR53) 2013; 2 M Cutajar (10073_CR30) 2013; 7 X Huang (10073_CR62) 2014; 57 10073_CR111 D O'Shaughnessy (10073_CR114) 2003; 91 JH Friedman (10073_CR41) 1996 10073_CR116 GS Sivaram (10073_CR146) 2011; 20 10073_CR115 NS Nehe (10073_CR108) 2009; 2 10073_CR117 10073_CR31 H Veisi (10073_CR163) 2011; 21 10073_CR25 10073_CR26 10073_CR27 CF Lin (10073_CR90) 2002; 13 10073_CR21 LR Bahl (10073_CR7) 1989; 37 10073_CR22 VM Velichko (10073_CR164) 1970; 2 10073_CR23 MS Crouse (10073_CR29) 1998; 46 10073_CR24 10073_CR3 Z Ping (10073_CR120) 2009; 8 10073_CR122 10073_CR1 10073_CR124 10073_CR127 10073_CR8 10073_CR129 10073_CR6 10073_CR128 10073_CR20 I Mporas (10073_CR105) 2007; 3 JM Bernardo (10073_CR11) 2007; 8 10073_CR14 10073_CR15 AY Vadwala (10073_CR160) 2017; 175 X Huang (10073_CR61) 1993; 7 BH Juang (10073_CR68) 1991; 33 LE Baum (10073_CR10) 1967; 73 10073_CR17 10073_CR98 10073_CR99 10073_CR18 10073_CR19 10073_CR121 D Wang (10073_CR170) 2019; 11 10073_CR133 10073_CR136 10073_CR135 R Mehla (10073_CR97) 2014; 3 10073_CR137 B Yegnanarayana (10073_CR176) 1998; 6 10073_CR139 10073_CR94 10073_CR95 10073_CR91 10073_CR92 D O’Shaughnessy (10073_CR112) 2008; 41 10073_CR87 10073_CR88 S Ganapathy (10073_CR44) 2009; 125 10073_CR89 H Hermansky (10073_CR56) 1994; 2 KH Davis (10073_CR33) 1952; 24 10073_CR130 Y Wang (10073_CR169) 2005; 13 10073_CR132 10073_CR131 10073_CR145 10073_CR144 E Trentin (10073_CR158) 2003; 14 P Nguyen (10073_CR110) 2010; 4 10073_CR147 TS Shanthi (10073_CR142) 2013; 2 10073_CR149 10073_CR148 10073_CR83 10073_CR84 10073_CR85 DH Milone (10073_CR100) 2008; 12 10073_CR80 10073_CR81 10073_CR82 N Thubthong (10073_CR154) 2001; 9 S Abe (10073_CR2) 2003; 21 10073_CR76 10073_CR77 10073_CR79 LD Paulson (10073_CR118) 2006; 39 10073_CR141 10073_CR140 H Yu (10073_CR177) 2011; 58
References_xml	– reference: MiloneDHDi PersiaLELearning hidden Markov models with hidden Markov trees as observation distributions. Inteligencia artificialRevista Iberoamericana de Inteligencia Artificial20081237713 – reference: Kupiec J (1989) Probabilistic models of short and long distance word dependencies in running text. In Speech and Natural Language: Proceedings of a Workshop Held at Philadelphia, Pennsylvania, February 21-23, 1989 – reference: SivaramGSHermanskyHSparse multilayer perceptron for phoneme recognitionIEEE Trans Audio Speech Lang Process20112012329 – reference: WalkerSLFooSYOptimal wavelets for speech signal representationsJ Syst Cybern Inform2003144446 – reference: Venkateswarlu RLK, Kumari RV, Jayasri GV (2011) Speech recognition using radial basis function neural network. In 2011 3rd international conference on electronics computer technology (Vol. 3, pp. 441-445). IEEE – reference: GanapathySThomasSHermanskyHModulation frequency features for phoneme recognition in noisy speechJ Acoust Soc Am20091251EL8EL12 – reference: Woodland PC, Leggetter CJ, Odell JJ, Valtchev V, Young SJ (1995) The 1994 HTK large vocabulary speech recognition system. In 1995 international conference on acoustics, speech, and signal processing (Vol. 1, pp. 73-76). IEEE – reference: Muller D N, De Siqueira M L, Navaux P O A (2006) A connectionist approach to speech understanding. In the 2006 IEEE international joint conference on neural network proceedings (pp. 3790-3797). IEEE. – reference: Smaragdis P, Radhakrishnan R, Wilson K W (2009) Context extraction through audio signal analysis. In multimedia content analysis (pp. 1–34). Springer, Boston, MA – reference: Hermansky H, Morgan N, Bayya A, Kohn P (1991) RASTA-PLP speech analysis. In Proc. IEEE Int’l Conf. Acoustics, speech and signal processing (Vol. 1, pp. 121-124). – reference: Malekzadeh S, Gholizadeh M H, Razavi S N (2018). Persian vowel recognition with MFCC and ANN on PCVC speech dataset. arXiv preprint arXiv:1812.06953. – reference: Nataraj K S, Pandey P C, Shah M S (2011) Improving the consistency of vocal tract shape estimation. In 2011 National Conference on communications (NCC) (pp. 1-5). IEEE. – reference: HsuCWLinCJA comparison of methods for multiclass support vector machinesIEEE Trans Neural Netw2002132415425 – reference: Panayotov V, Chen G, Povey D, Khudanpur S (2015) Librispeech: an asr corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5206-5210). IEEE. – reference: Saha G, Chakroborty S, Senapati S (2005) A new silence removal and endpoint detection algorithm for speech and speaker recognition applications. In proceedings of the NCC (pp. 56-61). – reference: Deshmukh N, Picone J (1995) Methodologies for language modeling and search in continuous speech recognition. In proceedings IEEE Southeastcon’95. Visualize the future (pp. 192-198). IEEE – reference: Sainath TN, Pang R, Rybach D, He Y, Prabhavalkar R, Li W, ..., McGraw I (2019) Two-pass end-to-end speech recognition. arXiv preprint arXiv:1908.10992 – reference: PiconeJWSignal modeling techniques in speech recognitionProc IEEE199381912151247 – reference: WangYWangSLaiKKA new fuzzy support vector machine to evaluate credit riskIEEE Trans Fuzzy Syst2005136820831 – reference: Tang H, Meng CH, Lee LS (2010) An initial attempt for phoneme recognition using structured support vector machine (SVM). In 2010 IEEE international conference on acoustics, speech and signal processing (pp. 4926-4929). IEEE – reference: TrentinEGoriMRobust combination of neural networks and hidden Markov models for speech recognitionIEEE Trans Neural Netw200314615191531 – reference: BernardoJMBayarriMJBergerJODawidAPHeckermanDSmithAFMWestMGenerative or discriminative? Getting the best of both worldsBayesian stat2007833242433187 – reference: Islam J, Mubassira M, Islam MR, Das AK (2019) A speech recognition system for Bengali language using recurrent neural network. In 2019 IEEE 4th international conference on computer and communication systems (ICCCS) (pp. 73-76). IEEE – reference: Barker J, Watanabe S, Vincent E, Trmal J (2018) The fifth’CHiME’speech separation and recognition challenge: dataset, task and baselines. arXiv preprint arXiv:1803.10609. – reference: Rybach D, Gollan C, Heigold G, Hoffmeister B, Lööf J, Schlüter R, Ney H (2009) The RWTH Aachen University open source speech recognition system. In Tenth Annual Conference of the International Speech Communication Association – reference: AbeSAnalysis of multiclass support vector machinesThyroid20032133772 – reference: Hou X (2009) Noise robust speech recognition based on wavelet-RBF neural network. In PIAGENG 2009: intelligent information, control, and communication Technology for Agricultural Engineering (Vol. 7490, p. 74902O). International Society for Optics and Photonics – reference: RabinerLLevinsonSIsolated and connected word recognition-theory and selected applicationsIEEE Trans Commun1981295621659 – reference: Nouza J, Zdansky J, Cerva P (2010) System for automatic collection, annotation and indexing of Czech broadcast speech with full-text search. In MELECON 2010–2010 15th IEEE Mediterranean Electrotechnical Conference (pp. 202–205). IEEE – reference: Lee A, Kawahara T, Shikano K (2001) Julius---an open source real-time large vocabulary recognition engine – reference: Sukumar AR, Shah AF, Anto PB (2010) Isolated question words recognition from speech queries by using artificial neural networks. In 2010 second international conference on computing, communication and networking technologies (pp. 1-4). IEEE. – reference: ShanthiTSLingamCReview of feature extraction techniques in automatic speech recognitionInt J Sci Eng Technol201326479484 – reference: Zhao Y, Wakita H, Zhuang X (1991) An HMM based speaker-independent continuous speech recognition system with experiments on the TIMIT DATABASE. In acoustics, speech, and signal processing, IEEE international conference on (pp. 333-336). IEEE computer society – reference: Duan KB, Keerthi SS (2005) Which is the best multiclass SVM method? An empirical study. In international workshop on multiple classifier systems (pp. 278-285). Springer, Berlin, Heidelberg – reference: GaikwadSKGawaliBWYannawarPA review on speech recognition techniqueInt J Comput Appl20101031624 – reference: Kesarkar M P (2003) Feature extraction for speech recognition. Electronic systems, EE. Dept., IIT Bombay. – reference: BussoCBulutMLeeCCKazemzadehAMowerEKimSChangJNLeeSNarayananSSIEMOCAP: interactive emotional dyadic motion capture databaseLang Resour Eval2008424335359 – reference: HermanskyHPerceptual linear predictive (PLP) analysis of speech. TheJ Acoust Soc Am199087417381752 – reference: Polikar R (1996) The wavelet tutorial. – reference: BahlLRBrownPFde SouzaPVMercerRLA tree-based statistical language model for natural language speech recognitionIEEE Trans Acoust Speech Signal Process198937710011008 – reference: HermanskyHMorganNRASTA processing of speechIEEE Trans Speech Audio Process199424578589 – reference: Toshniwal S, Sainath T N, Weiss R J, Li B, Moreno P, Weinstein E, Rao K (2018) Multilingual speech recognition with a single end-to-end model. In 2018 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4904-4908). IEEE. – reference: Jung S, Son J, Bae K (2004) Feature extraction based on wavelet domain hidden Markov tree model for robust speech recognition. In Australasian joint conference on artificial intelligence (pp. 1154-1159). Springer, Berlin, Heidelberg. – reference: Gamulkiewicz B, Weeks M (2003) Wavelet based speech recognition. In 2003 46th Midwest symposium on circuits and systems (Vol. 2, pp. 678-681). IEEE. – reference: Rousseau A, Deléglise P, Esteve Y (2012) TED-LIUM: an automatic speech recognition dedicated corpus. In LREC (pp. 125-129). – reference: YegnanarayanaBVeldhuisRNExtraction of vocal-tract system characteristics from speech signalsIEEE Trans Speech Audio Process199864313327 – reference: O'ShaughnessyDLinear predictive codingIEEE potentials1988712932 – reference: Rosenblatt F (1961). Principles of neurodynamics. Perceptrons and the theory of brain mechanisms (no. VG-1196-G-8). Cornell aeronautical lab Inc Buffalo NY – reference: Makino T, Liao H, Assael Y, Shillingford B, Garcia B, Braga O, Siohan O (2019) Recurrent neural network transducer for audio-visual speech recognition. In 2019 IEEE automatic speech recognition and understanding workshop (ASRU) (pp. 905-912). IEEE – reference: NeheNSHolambeRSDWT and LPC based feature extraction methods for isolated word recognitionEURASIP J Audio Speech Music Process2012201217 – reference: Singh MT, Fayjie AR, Kachari B (2015) A survey report on speech recognition system. Int J Comput Appl 121(11) – reference: TrentinEGoriMA survey of hybrid ANN/HMM models for automatic speech recognitionNeurocomputing2001371–4911260963.68651 – reference: Umarani SD, Raviram P, Wahidabanu RSD (2009) Implementation of HMM and radial basis function for speech recognition. In 2009 international conference on Intelligent Agent & Multi-Agent Systems (pp. 1-4). IEEE – reference: KrishnanVVAntoPBFeatures of wavelet packet decomposition and discrete wavelet transform for malayalam speech recognitionInt J Recent Trends Eng20091293 – reference: Alkhaldi W, Fakhr W, Hamdy N (2002) Automatic speech/speaker recognition in noisy environments using wavelet transform, The 2002 45th Midwest Symposium on Circuits and Systems, 2002. MWSCAS-2002., Tulsa, OK, USA, pp. I-463, doi: https://doi.org/10.1109/MWSCAS.2002.1187258. – reference: Coifman R R, Meyer Y, Wickerhauser V (1992) Wavelet analysis and signal processing. In In Wavelets and their applications. – reference: Chang T H, Luo Z Q, Deng L, Chi C Y (2008) A convex optimization method for joint mean and variance parameter estimation of large-margin CDHMM. In 2008 IEEE international conference on acoustics, speech and signal processing (pp. 4053-4056). IEEE. – reference: RadhaVVimalaCA review on speech recognition challenges and approachesDoaj Org20122117 – reference: BirkenesOMatsuiTTanabeKSiniscalchiSMMyrvollTAJohnsenMHPenalized logistic regression with HMM log-likelihood regressors for speech recognitionIEEE Trans Audio Speech Lang Process200918614401454 – reference: Kriman S, Beliaev S, Ginsburg B, Huang J, Kuchaiev O, Lavrukhin V, ..., Zhang Y (2020) Quartznet: Deep automatic speech recognition with 1d time-channel separable convolutions. In ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 6124–6128). IEEE – reference: HemakumarGPunithaPSpeech recognition technology: a survey on Indian languagesInt J Inf Sci Intell Syst201324138 – reference: Rabiner L, Juang B H (1993) Fundamental of speech recognition prentice-hall international. – reference: Weston J, Watkins C (1998) Multi-class support vector machines (pp. 98-04). Technical report CSD-TR-98-04, Department of Computer Science, Royal Holloway, University of London, may – reference: KaurPSinghPGargVSpeech recognition system; challenges and techniquesInt J Comput Sci Inf Technol20123339893992 – reference: Cutajar M, Gatt E, Micallef J, Grech I, Casha O (2010) Digital hardware implementation of self-organising maps. In Melecon 2010-2010 15th IEEE Mediterranean Electrotechnical conference (pp. 1123-1128). IEEE – reference: Cheng O, Abdulla W, Salcic Z (2005) Performance evaluation of front-end processing for speech recognition systems. The University of Auckland. – reference: Dansena D K, Rathore Y A Survey Paper on Automatic Speech Recognition by Machine – reference: Juang B H, Rabiner L R (2005) Automatic speech recognition–a brief history of the technology development. Georgia Institute of Technology. Atlanta Rutgers University and the University of California. Santa Barbara, 1, 67. – reference: Sonkamble BA, Doye DD, Sonkamble S, PICT P, MMCOE P (2009) An efficient use of support vector machines for speech signal classification. In Proc eighth WSEAS Int Conf computational intelligence., man-machine systems and cybernetics (pp. 117-120) – reference: Du X P, He P L (2006) The clustering solution of speech recognition models with SOM. In international symposium on neural networks (pp. 150-157). Springer, Berlin, Heidelberg. – reference: HungJWFanHTSubband feature statistics normalization techniques based on a discrete wavelet transform for robust speech recognitionIEEE Signal Process Lett20091698068092572421 – reference: ThubthongNKijsirikulBSupport vector machines for Thai phoneme recognitionInt J Uncertainty Fuzziness Knowledge Based Syst20019068038131113.68474 – reference: Helmi N, Helmi BH (2008) Speech recognition with fuzzy neural network for discrete words. In 2008 fourth international conference on natural computation (Vol. 7, pp. 265-269). IEEE – reference: Vapnik V (2013) The nature of statistical learning theory. Springer science & business media – reference: O'ShaughnessyDInteracting with computers by voice: automatic speech recognition and synthesisProc IEEE200391912721305 – reference: SakoeHChibaSDynamic programming algorithm optimization for spoken word recognitionIEEE Trans Acoust Speech Signal Process197826143490371.68035 – reference: Li T F, Chang S C (2007) Speech recognition of mandarin syllables using both linear predict coding cepstra and Mel frequency cepstra. In ROCLING 2007 poster papers (pp. 379-390). – reference: Tóth L (2011) A hierarchical, context-dependent neural network architecture for improved phone recognition. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5040–5043). IEEE – reference: BatuwitaRPaladeVFSVM-CIL: fuzzy support vector machines for class imbalance learningIEEE Trans Fuzzy Syst2010183558571 – reference: Tavanaei A, Manzuri M T, Sameti H (2011) Mel-scaled discrete wavelet transform and dynamic features for the Persian phoneme recognition. In 2011 international symposium on artificial intelligence and signal processing (AISP) (pp. 138-140). IEEE. – reference: AnusuyaMAKattiSKFront end analysis of speech recognition: a reviewInt J Speech Technol201114299145 – reference: VadwalaAYSutharKAKarmakarYAPandyaNSurvey paper on different speech recognition algorithm: challenges and techniquesInt J Comput Appl201717513136 – reference: BaumLEEagonJAAn inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecologyBull Am Math Soc19677333603632102170157.11101 – reference: Lin CT (1996) Neural fuzzy systems: a neuro-fuzzy synergism to intelligent systems. Prentice hall PTR – reference: Lowerre BT (1976) The HARPY speech recognition system. CARNEGIE-MELLON UNIV PITTSBURGH PA DEPT OF COMPUTER SCIENCE – reference: Wang B, Yin Y, Lin H (2020) Attention-based transducer for online speech recognition. arXiv preprint arXiv:2005.08497 – reference: Tang X (2009) Hybrid hidden Markov model and artificial neural network for automatic speech recognition. In 2009 Pacific-Asia conference on circuits, communications and systems (pp. 682-685). IEEE. – reference: NeheNSHolambeRSNew feature extraction techniques for Marathi digit recognitionInt J Recent Trends Eng20092222 – reference: ZamaniBAkbariANasersharifBJalalvandAOptimized discriminative transformations for speech features based on minimum classification errorPattern Recogn Lett2011327948955 – reference: Hu X, Zhan L, Xue Y, Zhou W, Zhang L (2011) Spoken arabic digits recognition based on wavelet neural networks. In 2011 IEEE international conference on systems, man, and cybernetics (pp. 1481-1485). IEEE. – reference: Khan A, Sohail A, Zahoora U, Qureshi AS (2020) A survey of the recent architectures of deep convolutional neural networks. Artif Intell Rev, 1–62 – reference: Hennebert J, Hasler M, Dedieu H (1994) Neural networks in speech recognition. Department of Electrical Engineering, Swiss Federal Institute of Technology, 1015. – reference: Sivaram GS, Hermansky H (2011) Multilayer perceptron with sparse hidden outputs for phoneme recognition. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5336-5339). IEEE – reference: Hunt A, Favero R (1994) Using principal component analysis with wavelets in speech recognition. In SST Conf., ASSTA Inc., Perth (pp. 296-301). – reference: VeisiHSametiHThe integration of principal component analysis and cepstral mean subtraction in parallel model combination for robust speech recognitionDigital Signal Process20112113653 – reference: Sak H, Senior A, Rao K, Beaufays F (2015) Fast and accurate recurrent neural network acoustic models for speech recognition. arXiv preprint arXiv:1507.06947. – reference: HuangXBakerJReddyRA historical perspective of speech recognitionCommun ACM201457194103 – reference: MporasIGanchevTSiafarikasMFakotakisNComparison of speech features on the speech recognition taskJ Comput Sci200738608616 – reference: Rosenfeld R, Huang X (1992) Improvements in stochastic language modeling. In Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992 – reference: DavisKHBiddulphRBalashekSAutomatic recognition of spoken digitsJ Acoust Soc Am1952246637642 – reference: BesacierLBarnardEKarpovASchultzTAutomatic speech recognition for under-resourced languages: a surveySpeech Comm20145685100 – reference: Köhn A, Stegen F, Baumann T (2016) Mining the spoken wikipedia for speech data and beyond. In proceedings of the tenth international conference on language resources and evaluation (LREC’16) (pp. 4644-4647). – reference: Dumitru C O, Gavat I (2006) A comparative study of feature extraction methods applied to continuous speech recognition in romanian language. In proceedings ELMAR 2006 (pp. 115-118). IEEE. – reference: PingZLi-ZhenTDong-FengXSpeech recognition algorithm of parallel subband HMM based on wavelet analysis and neural networkInf Technol J200985796800 – reference: Hai J, Joo E M (2003) Improved linear predictive coding method for speech recognition. In fourth international conference on information, communications and signal processing, 2003 and the fourth Pacific rim conference on multimedia. Proceedings of the 2003 joint (Vol. 3, pp. 1614-1618). IEEE. – reference: JiangHLiXLiuCLarge margin hidden Markov models for speech recognitionIEEE Trans Audio Speech Lang Process200614515841595 – reference: JuangBHRabinerLRHidden Markov models for speech recognitionTechnometrics199133325127211326650762.62036 – reference: Rosenfeld R (1994) A hybrid approach to adaptive statistical language modeling. CARNEGIE-MELLON UNIV PITTSBURGH PA SCHOOL OF COMPUTER SCIENCE – reference: RabinerLRA tutorial on hidden Markov models and selected applications in speech recognitionProc IEEE1989772257286 – reference: WangDWangXLvSEnd-to-end mandarin speech recognition combining CNN and BLSTMSymmetry2019115644 – reference: Hannun A, Case C, Casper J, Catanzaro B, Diamos G, Elsen E, ..., Ng A Y (2014) Deep speech: Scaling up end-to-end speech recognition. arXiv preprint arXiv:1412.5567. – reference: ForgieJWForgieCDResults obtained from a vowel recognition computer programJ Acoust Soc Am1959311114801489 – reference: Chen C P, Bilmes J, Ellis D P (2005) Speech feature smoothing for robust ASR. In proceedings.(ICASSP'05). IEEE international conference on acoustics, speech, and signal processing, 2005. (Vol. 1, pp. I-525). IEEE. – reference: Illina I, Gong Y (1996) Improvement in N-best search for continuous speech recognition. In proceeding of fourth international conference on spoken language processing. ICSLP'96 (Vol. 4, pp. 2147-2150). IEEE – reference: Sabah R, Ainon RN (2009) Isolated digit speech recognition in Malay language using neuro-fuzzy approach. In 2009 third Asia international conference on Modelling & Simulation (pp. 336-340). IEEE – reference: Chiu, C. C., Sainath, T. N., Wu, Y., Prabhavalkar, R., Nguyen, P., Chen, Z., ..., Jaitly, N. (2018) State-of-the-art speech recognition with sequence-to-sequence models. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4774–4778). IEEE. – reference: Sárosi G, Mozsáry M, Mihajlik P, Fegyó T (2011) Comparison of feature extraction methods for speech recognition in noise-free and in traffic noise environment. In 2011 6th conference on speech technology and human-computer dialogue (SpeD) (pp. 1-8). IEEE. – reference: Garofolo JS (1993) TIMIT acoustic phonetic continuous speech corpus. Linguist Data Consortium 1993 – reference: LekshmiKRElizabethSAutomatic speech recognition using different neural network architectures – a surveyInt J Comput Sci Inf Technol20167624222427 – reference: Forsberg M (2003) Why is speech recognition difficult. Chalmers University of Technology. – reference: MaheswariNUKabilanAPVenkateshRA hybrid model of neural network approach for speaker independent word recognitionInt J Comput Theory Eng201026912 – reference: Paul AK, Das D, Kamal MM (2009) Bangla speech recognition system using LPC and ANN. In 2009 seventh international conference on advances in pattern recognition (pp. 171-174). IEEE – reference: Chow YL, Schwartz R (1989) The n-best algorithm: an efficient procedure for finding top n sentence hypotheses. In proceedings of the workshop on speech and natural language (pp. 199-202). Association for Computational Linguistics – reference: Lazli L, Sellami M (2003) Connectionist probability estimators in HMM arabic speech recognition using fuzzy logic. In international workshop on machine learning and data Mining in Pattern Recognition (pp. 379-388). Springer, Berlin, Heidelberg. – reference: NguyenPHeigoldGZweigGSpeech recognition with flat direct modelsIEEE J Sel Top Sign Proces2010469941006 – reference: CortesCVapnikVSupport-vector networksMach Learn19952032732970831.68098 – reference: Venkateswarlu R L K, Kumari R V (2011) Novel approach for speech recognition by using self—organized maps. In 2011 international conference on emerging trends in networks and computer communications (ETNCC) (pp. 215-222). IEEE. – reference: MehlaRAggarwalRAutomatic speech recognition: a surveyInt J Adv Res Comput Sci Electron Eng (IJARCSEE)2014314553 – reference: Weston J, Watkins C (1999) Support vector machines for multi-class pattern recognition. In Esann (Vol. 99, pp. 219-224) – reference: Korba M C A, Messadeg D, Djemili R, Bourouba H (2008) Robust speech recognition using perceptual wavelet denoising and mel-frequency product spectrum cepstral coefficient features. Informatica, 32(3). – reference: LinCFWangSDFuzzy support vector machinesIEEE Trans Neural Netw2002132464471 – reference: Pallett DS, Fiscus JG, Garofolo JS (1990) DARPA resource management. In speech and natural language: proceedings of a workshop held at Hidden Valley, Pennsylvania, June 24-27, 1990 (p. 298). Morgan Kaufmann pub – reference: Messaoud Z B, Hamida A B (2010) CDHMM parameters selection for speaker-independent phone recognition in continuous speech system. In MELECON 2010-2010 15th IEEE Mediterranean Electrotechnical conference (pp. 253-258). IEEE. – reference: Sha F, Saul LK (2007) Large margin hidden Markov models for automatic speech recognition. In advances in neural information processing systems (pp. 1249-1256) – reference: WangYHanKWangDExploring monaural features for classification-based speech segregationIEEE Trans Audio Speech Lang Process2012212270279 – reference: YuHXieTPaszczynskiSWilamowskiBMAdvantages of radial basis function networks for dynamic system designIEEE Trans Ind Electron2011581254385450 – reference: Graves A, Fernández S, Gomez F, Schmidhuber J (2006) Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In proceedings of the 23rd international conference on machine learning (pp. 369-376) – reference: Chow Y, Dunham M, Kimball O, Krasner M, Kubala G, Makhoul J, ..., Schwartz R (1987) BYBLOS: The BBN continuous speech recognition system. In ICASSP'87. IEEE International Conference on Acoustics, Speech, and Signal Processing (Vol. 12, pp. 89–92). IEEE – reference: Veaux C, Yamagishi J, MacDonald K (2016) Superseded-cstr vctk corpus: English multi-speaker corpus for cstr voice cloning toolkit. – reference: VelichkoVMZagoruykoNGAutomatic recognition of 200 wordsInt J Man Mach Stud197023223234 – reference: SaeedTRSalmanJAliAHClassification improvement of spoken arabic language based on radial basis functionInt J Electr Comput Eng20199120888708 – reference: MallatSGA theory for multiresolution signal decomposition: the wavelet representationIEEE Trans Pattern Anal Mach Intell19891176746930709.94650 – reference: PaulsonLDSpeech recognition moves from software to hardwareComputer200639111518 – reference: Wijoyo S, Wijoyo S (2011) Speech recognition using linear predictive coding and artificial neural network for controlling movement of mobile robot. In proceedings of 2011 international conference on information and electronics engineering (ICIEE 2011) (pp. 28-29). – reference: Krüger SE, Schafföner M, Katz M, Andelic E, Wendemuth A (2005) Speech recognition with support vector machines in a hybrid system. In Ninth European Conference on Speech Communication and Technology – reference: CutajarMGattEGrechICashaOMicallefJComparative study of automatic speech recognition techniquesIET Signal Proc2013712546 – reference: Molau S, Pitz M, Schluter R, Ney H (2001) Computing mel-frequency cepstral coefficients on the power spectrum. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (cat. No. 01CH37221) (Vol. 1, pp. 73-76). IEEE. – reference: Solera-Ureña R, Padrell-Sendra J, Martín-Iglesias D, Gallardo-Antolín A, Peláez-Moreno C, Díaz-de-María F (2007) Svms for automatic speech recognition: a survey. In Progress in nonlinear speech processing (pp. 190–216). Springer, Berlin, Heidelberg – reference: Fontaine V, Ris C, Leich H (1996) Nonlinear discriminant analysis with neural networks for speech recognition. In 1996 8th European signal processing conference (EUSIPCO 1996) (pp. 1-4). IEEE. – reference: Leung K F, Leung F H, Lam H K, Tam P K S (2003) Recognition of speech commands using a modified neural fuzzy network and an improved GA. In the 12th IEEE international conference on fuzzy systems, 2003. FUZZ’03. (Vol. 1, pp. 190-195). IEEE. – reference: Chan W, Jaitly N, Le Q, Vinyals O (2016) Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 4960-4964). IEEE. – reference: KohonenTSelf-organized formation of topologically correct feature mapsBiol Cybern198243159696678890466.92002 – reference: Lawrence R (2008) Fundamentals of speech recognition. Pearson Education India. – reference: Clarkson P, Moreno PJ (1999) On the use of support vector machines for phonetic classification. In 1999 IEEE international conference on acoustics, speech, and signal processing. Proceedings. ICASSP99 (cat. No. 99CH36258) (Vol. 2, pp. 585-588). IEEE – reference: Sayers C (1991). Self organizing feature maps and their applications to robotics – reference: Abdulla W H, Kasabov N (1999) The concepts of hidden Markov model in speech recognition. – reference: Povey D, Ghoshal A, Boulianne G, Burget L, Glembek O, Goel N, ..., Silovsky J (2011) The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding (No. CONF). IEEE Signal Process Soc – reference: Collobert R, Puhrsch C, Synnaeve G (2016) Wav2letter: an end-to-end convnet-based speech recognition system. arXiv preprint arXiv:1609.03193. – reference: Lamere P, Kwok P, Gouvea E, Raj B, Singh R, Walker W, ..., Wolf P (2003) The CMU SPHINX-4 speech recognition system. In IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP 2003), Hong Kong (Vol. 1, pp. 2–5) – reference: Mohamadpour M, Farokhi F (2009) A new approach for Persian speech recognition. In 2009 IEEE international advance computing conference (pp. 153-158). IEEE – reference: Meyer Y (1993) Wavelets: Algorithms and Applications, SIAM, Philadelphia, 1993. MR 95f, 94005. – reference: ShewalkarANyavanandiDLudwigSAPerformance evaluation of deep neural networks applied to speech recognition: RNN, LSTM and GRUJ Artif Intel Soft Comput Res201994235245 – reference: Atmaja BT, Akagi M (2020) Deep multilayer Perceptrons for dimensional speech emotion recognition. arXiv preprint arXiv:2004.02355. – reference: Modic R, Lindberg B, Petek B (2003) Comparative wavelet and mfcc speech recognition experiments on the slovenian and english speechdat2. In ISCA tutorial and research workshop on non-linear speech processing – reference: Lee J Y, Hung J W (2011) Exploiting principal component analysis in modulation spectrum enhancement for robust speech recognition. In 2011 eighth international conference on fuzzy systems and knowledge discovery (FSKD) (Vol. 3, pp. 1947-1951). IEEE. – reference: HuangXAllevaFHonHWHwangMYLeeKFRosenfeldRThe SPHINX-II speech recognition system: an overviewComput Speech Lang199372137148 – reference: Bu H, Du J, Na X, Wu B, Zheng H (2017). Aishell-1: an open-source mandarin speech corpus and a speech recognition baseline. In 2017 20th conference of the oriental chapter of the international coordinating committee on speech databases and speech I/O systems and assessment (O-COCOSDA) (pp. 1-5). IEEE. – reference: Campos MM, Carpenter GA (1998) WSOM: building adaptive wavelets with self-organizing maps. In 1998 IEEE international joint conference on neural networks proceedings. IEEE world congress on computational intelligence (cat. No. 98CH36227) (Vol. 1, pp. 763-767). IEEE – reference: FriedmanJHAnother approach to polychotomous classification1996Technical ReportStatistics Department, Stanford University – reference: HardyRLMultiquadric equations of topography and other irregular surfacesJ Geophys Res197176819051915 – reference: Gupta M, Gilbert A (2001) Robust speech recognition using wavelet coefficient features. In IEEE workshop on automatic speech recognition and understanding, 2001. ASRU'01. (pp. 445-448). IEEE. – reference: Morgan N, Bourlard H (1990). Continuous speech recognition using multilayer perceptrons with hidden Markov models. In international conference on acoustics, speech, and signal processing (pp. 413-416). IEEE – reference: Halabi N (2016) Modern standard arabic phonetics for speech synthesis (Doctoral dissertation, University of Southampton). – reference: AnusuyaMAKattiSKComparison of different speech feature extraction techniques with and without wavelet transform to Kannada speech recognitionInt J Comput Appl20112641924 – reference: Ranjan S (2010) A discrete wavelet transform based approach to Hindi speech recognition. In 2010 international conference on signal acquisition and processing (pp. 345-348). IEEE. – reference: Bourlard H A, Morgan N (2012). Connectionist speech recognition: a hybrid approach (Vol. 247). Springer Science & Business Media. – reference: O’ShaughnessyDAutomatic speech recognition: history, methods and challengesPattern Recogn20084110296529791161.68772 – reference: Liu X (2009) A new wavelet threshold denoising algorithm in speech recognition. In 2009 Asia-Pacific conference on information processing (Vol. 2, pp. 310-313). IEEE. – reference: CrouseMSNowakRDBaraniukRGWavelet-based statistical signal processing using hidden Markov modelsIEEE Trans Signal Process19984648869021665651 – ident: 10073_CR8 – ident: 10073_CR174 – volume: 18 start-page: 558 issue: 3 year: 2010 ident: 10073_CR9 publication-title: IEEE Trans Fuzzy Syst doi: 10.1109/TFUZZ.2010.2042721 – ident: 10073_CR40 – volume: 12 start-page: 7 issue: 37 year: 2008 ident: 10073_CR100 publication-title: Revista Iberoamericana de Inteligencia Artificial – ident: 10073_CR19 – ident: 10073_CR24 doi: 10.3115/1075434.1075467 – ident: 10073_CR111 doi: 10.1109/MELCON.2010.5476306 – ident: 10073_CR95 – ident: 10073_CR72 – ident: 10073_CR34 – volume: 81 start-page: 1215 issue: 9 year: 1993 ident: 10073_CR119 publication-title: Proc IEEE doi: 10.1109/5.237532 – volume: 2 start-page: 1 issue: 1 year: 2012 ident: 10073_CR126 publication-title: Doaj Org – volume: 18 start-page: 1440 issue: 6 year: 2009 ident: 10073_CR13 publication-title: IEEE Trans Audio Speech Lang Process doi: 10.1109/TASL.2009.2035151 – volume: 14 start-page: 99 issue: 2 year: 2011 ident: 10073_CR4 publication-title: Int J Speech Technol doi: 10.1007/s10772-010-9088-7 – volume: 37 start-page: 91 issue: 1–4 year: 2001 ident: 10073_CR157 publication-title: Neurocomputing doi: 10.1016/S0925-2312(00)00308-8 – ident: 10073_CR65 doi: 10.21437/ICSLP.1996-544 – ident: 10073_CR141 doi: 10.7551/mitpress/7503.003.0161 – ident: 10073_CR133 doi: 10.1109/AMS.2009.101 – volume: 2 start-page: 1 issue: 4 year: 2013 ident: 10073_CR53 publication-title: Int J Inf Sci Intell Syst – volume: 6 start-page: 313 issue: 4 year: 1998 ident: 10073_CR176 publication-title: IEEE Trans Speech Audio Process doi: 10.1109/89.701359 – ident: 10073_CR98 – volume: 76 start-page: 1905 issue: 8 year: 1971 ident: 10073_CR51 publication-title: J Geophys Res doi: 10.1029/JB076i008p01905 – volume: 73 start-page: 360 issue: 3 year: 1967 ident: 10073_CR10 publication-title: Bull Am Math Soc doi: 10.1090/S0002-9904-1967-11751-8 – volume: 7 start-page: 137 issue: 2 year: 1993 ident: 10073_CR61 publication-title: Comput Speech Lang doi: 10.1006/csla.1993.1007 – ident: 10073_CR131 – ident: 10073_CR45 – volume: 14 start-page: 1519 issue: 6 year: 2003 ident: 10073_CR158 publication-title: IEEE Trans Neural Netw doi: 10.1109/TNN.2003.820838 – ident: 10073_CR124 – volume: 8 start-page: 3 issue: 3 year: 2007 ident: 10073_CR11 publication-title: Bayesian stat – ident: 10073_CR17 doi: 10.1109/IJCNN.1998.682377 – ident: 10073_CR162 – ident: 10073_CR14 – ident: 10073_CR25 doi: 10.1109/ICASSP.1999.759734 – ident: 10073_CR128 – ident: 10073_CR49 – ident: 10073_CR73 doi: 10.1007/s10462-020-09825-6 – ident: 10073_CR139 doi: 10.1109/SPED.2011.5940729 – ident: 10073_CR23 doi: 10.1109/ICASSP.1987.1169748 – ident: 10073_CR172 – ident: 10073_CR18 doi: 10.1109/ICASSP.2016.7472621 – volume: 31 start-page: 1480 issue: 11 year: 1959 ident: 10073_CR39 publication-title: J Acoust Soc Am doi: 10.1121/1.1907653 – volume: 2012 start-page: 7 issue: 1 year: 2012 ident: 10073_CR109 publication-title: EURASIP J Audio Speech Music Process doi: 10.1186/1687-4722-2012-7 – volume: 16 start-page: 806 issue: 9 year: 2009 ident: 10073_CR63 publication-title: IEEE Signal Process Lett doi: 10.1109/LSP.2009.2024113 – volume: 29 start-page: 621 issue: 5 year: 1981 ident: 10073_CR125 publication-title: IEEE Trans Commun doi: 10.1109/TCOM.1981.1095031 – volume: 42 start-page: 335 issue: 4 year: 2008 ident: 10073_CR16 publication-title: Lang Resour Eval doi: 10.1007/s10579-008-9076-6 – ident: 10073_CR48 doi: 10.1109/ICICS.2003.1292740 – ident: 10073_CR148 doi: 10.1007/978-3-540-71505-4_11 – volume: 32 start-page: 948 issue: 7 year: 2011 ident: 10073_CR178 publication-title: Pattern Recogn Lett doi: 10.1016/j.patrec.2011.01.017 – volume: 14 start-page: 1584 issue: 5 year: 2006 ident: 10073_CR67 publication-title: IEEE Trans Audio Speech Lang Process doi: 10.1109/TASL.2006.879805 – ident: 10073_CR69 – volume: 37 start-page: 1001 issue: 7 year: 1989 ident: 10073_CR7 publication-title: IEEE Trans Acoust Speech Signal Process doi: 10.1109/29.32278 – volume: 2 start-page: 578 issue: 4 year: 1994 ident: 10073_CR56 publication-title: IEEE Trans Speech Audio Process doi: 10.1109/89.326616 – ident: 10073_CR159 doi: 10.1109/IAMA.2009.5228022 – ident: 10073_CR66 doi: 10.1109/CCOMS.2019.8821629 – volume: 91 start-page: 1272 issue: 9 year: 2003 ident: 10073_CR114 publication-title: Proc IEEE doi: 10.1109/JPROC.2003.817117 – ident: 10073_CR145 doi: 10.1109/ICASSP.2011.5947563 – volume: 2 start-page: 912 issue: 6 year: 2010 ident: 10073_CR93 publication-title: Int J Comput Theory Eng doi: 10.7763/IJCTE.2010.V2.262 – ident: 10073_CR117 doi: 10.1109/ICAPR.2009.80 – volume: 87 start-page: 1738 issue: 4 year: 1990 ident: 10073_CR55 publication-title: J Acoust Soc Am doi: 10.1121/1.399423 – ident: 10073_CR129 doi: 10.21236/ADA458711 – ident: 10073_CR106 doi: 10.1109/IJCNN.2006.247398 – volume: 3 start-page: 608 issue: 8 year: 2007 ident: 10073_CR105 publication-title: J Comput Sci doi: 10.3844/jcssp.2007.608.616 – ident: 10073_CR153 doi: 10.1109/AISP.2011.5960989 – ident: 10073_CR166 doi: 10.1109/ICECTECH.2011.5941788 – ident: 10073_CR27 – volume: 77 start-page: 257 issue: 2 year: 1989 ident: 10073_CR123 publication-title: Proc IEEE doi: 10.1109/5.18626 – volume: 21 start-page: 3772 issue: 3 year: 2003 ident: 10073_CR2 publication-title: Thyroid – ident: 10073_CR156 doi: 10.1109/ICASSP.2011.5947489 – ident: 10073_CR15 doi: 10.1109/ICSDA.2017.8384449 – ident: 10073_CR81 – volume: 1 start-page: 44 issue: 4 year: 2003 ident: 10073_CR167 publication-title: J Syst Cybern Inform – volume: 39 start-page: 15 issue: 11 year: 2006 ident: 10073_CR118 publication-title: Computer doi: 10.1109/MC.2006.401 – ident: 10073_CR46 doi: 10.1145/1143844.1143891 – volume: 13 start-page: 464 issue: 2 year: 2002 ident: 10073_CR90 publication-title: IEEE Trans Neural Netw doi: 10.1109/72.991432 – ident: 10073_CR22 doi: 10.1109/ICASSP.2018.8462105 – volume: 175 start-page: 31 issue: 1 year: 2017 ident: 10073_CR160 publication-title: Int J Comput Appl – ident: 10073_CR89 – ident: 10073_CR91 doi: 10.1109/APCIP.2009.212 – ident: 10073_CR80 doi: 10.3115/100964.101006 – ident: 10073_CR3 doi: 10.1109/MWSCAS.2002.1187258 – ident: 10073_CR64 – ident: 10073_CR92 – ident: 10073_CR116 doi: 10.1109/ICASSP.2015.7178964 – ident: 10073_CR50 – ident: 10073_CR102 doi: 10.1109/IADCC.2009.4808998 – ident: 10073_CR132 doi: 10.21437/Interspeech.2009-604 – volume: 20 start-page: 23 issue: 1 year: 2011 ident: 10073_CR146 publication-title: IEEE Trans Audio Speech Lang Process doi: 10.1109/TASL.2011.2129510 – volume: 1 start-page: 93 issue: 2 year: 2009 ident: 10073_CR78 publication-title: Int J Recent Trends Eng – ident: 10073_CR6 – volume: 21 start-page: 36 issue: 1 year: 2011 ident: 10073_CR163 publication-title: Digital Signal Process doi: 10.1016/j.dsp.2010.07.004 – ident: 10073_CR175 doi: 10.1109/ICASSP.1995.479276 – ident: 10073_CR122 – volume: 20 start-page: 273 issue: 3 year: 1995 ident: 10073_CR28 publication-title: Mach Learn – volume: 11 start-page: 674 issue: 7 year: 1989 ident: 10073_CR96 publication-title: IEEE Trans Pattern Anal Mach Intell doi: 10.1109/34.192463 – ident: 10073_CR35 doi: 10.1007/11760023_23 – ident: 10073_CR79 doi: 10.21437/Interspeech.2005-237 – ident: 10073_CR82 – ident: 10073_CR99 – ident: 10073_CR76 – ident: 10073_CR47 – ident: 10073_CR43 doi: 10.1109/MWSCAS.2003.1562377 – ident: 10073_CR83 doi: 10.1007/3-540-45065-3_33 – ident: 10073_CR31 doi: 10.1109/MELCON.2010.5476361 – ident: 10073_CR38 – volume: 2 start-page: 22 issue: 2 year: 2009 ident: 10073_CR108 publication-title: Int J Recent Trends Eng – volume: 26 start-page: 19 issue: 4 year: 2011 ident: 10073_CR5 publication-title: Int J Comput Appl – ident: 10073_CR70 doi: 10.1007/978-3-540-30549-1_116 – ident: 10073_CR161 – ident: 10073_CR85 doi: 10.21437/Eurospeech.2001-396 – ident: 10073_CR37 doi: 10.1109/ELMAR.2006.329528 – volume: 57 start-page: 94 issue: 1 year: 2014 ident: 10073_CR62 publication-title: Commun ACM doi: 10.1145/2500887 – ident: 10073_CR151 doi: 10.1109/PACCS.2009.138 – ident: 10073_CR1 – volume: 26 start-page: 43 issue: 1 year: 1978 ident: 10073_CR138 publication-title: IEEE Trans Acoust Speech Signal Process doi: 10.1109/TASSP.1978.1163055 – volume: 9 start-page: 235 issue: 4 year: 2019 ident: 10073_CR143 publication-title: J Artif Intel Soft Comput Res doi: 10.2478/jaiscr-2019-0006 – ident: 10073_CR144 doi: 10.5120/21581-4672 – volume: 58 start-page: 5438 issue: 12 year: 2011 ident: 10073_CR177 publication-title: IEEE Trans Ind Electron doi: 10.1109/TIE.2011.2164773 – ident: 10073_CR94 doi: 10.1109/ASRU46091.2019.9004036 – ident: 10073_CR130 doi: 10.3115/1075527.1075552 – volume: 21 start-page: 270 issue: 2 year: 2012 ident: 10073_CR168 publication-title: IEEE Trans Audio Speech Lang Process doi: 10.1109/TASL.2012.2221459 – ident: 10073_CR152 doi: 10.1109/ICASSP.2010.5495097 – ident: 10073_CR173 – ident: 10073_CR135 – ident: 10073_CR77 doi: 10.1109/ICASSP40776.2020.9053889 – ident: 10073_CR21 – volume: 11 start-page: 644 issue: 5 year: 2019 ident: 10073_CR170 publication-title: Symmetry doi: 10.3390/sym11050644 – volume: 33 start-page: 251 issue: 3 year: 1991 ident: 10073_CR68 publication-title: Technometrics doi: 10.1080/00401706.1991.10484833 – ident: 10073_CR52 doi: 10.1109/ICNC.2008.666 – volume: 3 start-page: 45 issue: 1 year: 2014 ident: 10073_CR97 publication-title: Int J Adv Res Comput Sci Electron Eng (IJARCSEE) – ident: 10073_CR87 – volume: 2 start-page: 223 issue: 3 year: 1970 ident: 10073_CR164 publication-title: Int J Man Mach Stud doi: 10.1016/S0020-7373(70)80008-6 – ident: 10073_CR149 – ident: 10073_CR136 doi: 10.21437/Interspeech.2019-1341 – ident: 10073_CR179 doi: 10.1109/ICASSP.1991.150344 – volume: 9 start-page: 2088 issue: 1 year: 2019 ident: 10073_CR134 publication-title: Int J Electr Comput Eng – volume: 43 start-page: 59 issue: 1 year: 1982 ident: 10073_CR75 publication-title: Biol Cybern doi: 10.1007/BF00337288 – ident: 10073_CR147 doi: 10.1007/978-0-387-76569-3_1 – volume-title: Another approach to polychotomous classification year: 1996 ident: 10073_CR41 – ident: 10073_CR150 doi: 10.1109/ICCCNT.2010.5591733 – volume: 7 start-page: 25 issue: 1 year: 2013 ident: 10073_CR30 publication-title: IET Signal Proc doi: 10.1049/iet-spr.2012.0151 – ident: 10073_CR74 – ident: 10073_CR104 doi: 10.1109/ICASSP.1990.115720 – volume: 2 start-page: 479 issue: 6 year: 2013 ident: 10073_CR142 publication-title: Int J Sci Eng Technol – ident: 10073_CR26 – volume: 56 start-page: 85 year: 2014 ident: 10073_CR12 publication-title: Speech Comm doi: 10.1016/j.specom.2013.07.008 – ident: 10073_CR58 doi: 10.1117/12.836711 – volume: 9 start-page: 803 issue: 06 year: 2001 ident: 10073_CR154 publication-title: Int J Uncertainty Fuzziness Knowledge Based Syst doi: 10.1142/S0218488501001253 – volume: 7 start-page: 29 issue: 1 year: 1988 ident: 10073_CR113 publication-title: IEEE potentials doi: 10.1109/45.1890 – ident: 10073_CR88 – ident: 10073_CR84 doi: 10.1109/FSKD.2011.6019893 – volume: 10 start-page: 16 issue: 3 year: 2010 ident: 10073_CR42 publication-title: Int J Comput Appl – ident: 10073_CR121 – ident: 10073_CR32 – ident: 10073_CR103 doi: 10.1109/ICASSP.2001.940770 – ident: 10073_CR140 – ident: 10073_CR54 – volume: 24 start-page: 637 issue: 6 year: 1952 ident: 10073_CR33 publication-title: J Acoust Soc Am doi: 10.1121/1.1906946 – ident: 10073_CR115 doi: 10.3115/116580.116683 – ident: 10073_CR127 doi: 10.1109/ICSAP.2010.21 – ident: 10073_CR20 doi: 10.1109/ICASSP.2005.1415166 – ident: 10073_CR60 doi: 10.1109/ICSMC.2011.6083880 – ident: 10073_CR171 – volume: 46 start-page: 886 issue: 4 year: 1998 ident: 10073_CR29 publication-title: IEEE Trans Signal Process doi: 10.1109/78.668544 – volume: 13 start-page: 820 issue: 6 year: 2005 ident: 10073_CR169 publication-title: IEEE Trans Fuzzy Syst doi: 10.1109/TFUZZ.2005.859320 – volume: 7 start-page: 2422 issue: 6 year: 2016 ident: 10073_CR86 publication-title: Int J Comput Sci Inf Technol – ident: 10073_CR107 doi: 10.1109/NCC.2011.5734729 – ident: 10073_CR165 doi: 10.1109/ETNCC.2011.5958519 – volume: 125 start-page: EL8 issue: 1 year: 2009 ident: 10073_CR44 publication-title: J Acoust Soc Am doi: 10.1121/1.3040022 – volume: 13 start-page: 415 issue: 2 year: 2002 ident: 10073_CR59 publication-title: IEEE Trans Neural Netw doi: 10.1109/72.991427 – volume: 4 start-page: 994 issue: 6 year: 2010 ident: 10073_CR110 publication-title: IEEE J Sel Top Sign Proces doi: 10.1109/JSTSP.2010.2080812 – ident: 10073_CR155 doi: 10.1109/ICASSP.2018.8461972 – ident: 10073_CR57 doi: 10.1109/ICASSP.1992.225957 – ident: 10073_CR36 doi: 10.1007/11494683_28 – volume: 3 start-page: 3989 issue: 3 year: 2012 ident: 10073_CR71 publication-title: Int J Comput Sci Inf Technol – ident: 10073_CR101 – ident: 10073_CR137 doi: 10.21437/Interspeech.2015-350 – volume: 8 start-page: 796 issue: 5 year: 2009 ident: 10073_CR120 publication-title: Inf Technol J doi: 10.3923/itj.2009.796.800 – volume: 41 start-page: 2965 issue: 10 year: 2008 ident: 10073_CR112 publication-title: Pattern Recogn doi: 10.1016/j.patcog.2008.05.008
SSID	ssj0016524
Score	2.6037211
Snippet	Recently great strides have been made in the field of automatic speech recognition (ASR) by using various deep learning techniques. In this study, we present a...
SourceID	proquest crossref springer
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	9411
SubjectTerms	Automatic speech recognition Computer Communication Networks Computer Science Data Structures and Information Theory Deep learning Feature extraction Language modeling Machine learning Multimedia Information Systems Special Purpose and Application-Based Systems Speech recognition Voice recognition
SummonAdditionalLinks	– databaseName: SpringerLink Journals (ICM) dbid: U2A link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV09T8MwED1BWWDgo4AoFJQBsYClOBfbCVuFqCoGJip1ixzHFgNKK5Ii8e-xU6cBBEiMVhwPl5zfO93dO4BLGfOCGyaI4KEiMRokeUwtkZOU8lQap6Huqi0e-WQaP8zYzDeFVW21e5uSbG7qrtmNulYSF-64NRKxCVvMxu6ukGsajda5A878KNskJBYPqW-V-fmMr3DUccxvadEGbcb7sOtpYjBafdcD2NBlH_baEQyB98g-7HzSEzyEq9GynjcarEG10Fo9B-v6oHl5G8igWr6-6fcjmI7vn-4mxA9CIMp6SE3yKExyVbCCJxIFUoOFvRcKpVFKy44xDXPucDxhkpkiEsKEqVB2S2yMMELiMfTKealPIMDYWNDSoUGq40SrVCAyFLnTXY9SZgZAW3tkyquEu2EVL1mnb-xsmFkbNmvMxACu1-8sVhoZf-4etmbOvL9UmQ0CLUpyS4YGcNOavnv8-2mn_9t-BtuRK0ppisiG0Ktfl_rcsoo6v2h-og9qcr3- priority: 102 providerName: Springer Nature
Title	Automatic speech recognition: a survey
URI	https://link.springer.com/article/10.1007/s11042-020-10073-7 https://www.proquest.com/docview/2494716846
Volume	80
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3NS8MwFH-47aIHP6bidI4exIsG26ZJWi-yyT5QGCIO5qmkaYIHWefWCf73Jl26qeBOJSTN4SV5v5e8934P4IIHNKWKMMSoK1CAFUZJ4GlDjnsejbgyHOom2mJIB6PgYUzG9sFtbsMqS51YKOo0E-aN_EZfE7QepRou76YfyFSNMt5VW0KjAjWtgsOwCrVOd_j0vPIjUGLL2oYu0tjo2bSZZfKcZ1JTzPXJtDFiv6FpbW_-cZEWyNPbh11rMjrt5RofwJac1GGvLMfg2NNZh50f3IKHcNle5FnBx-rMp1KKN2cVK5RNbh3uzBezT_l1BKNe9-V-gGxRBCT0aclR4rthIlKS0pBjhj2FU60jUiEx59pSxpGbUIPpIeFEpT5jyo2Y0EMCpZhiHB9DdZJN5Ak4OFAawKSrsCeDUIqIYUwwSwwHux8R1QCvlEcsLGO4KVzxHq-5jo0MYy3Doo1j1oCr1T_TJV_GxtHNUsyxPTvzeL3SDbguRb_u_n-2082zncG2bwJSigCyJlTz2UKea4siT1pQCXv9FtTavU5naL7918duy24m3Tvy29-SmceM
linkProvider	ProQuest
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT8JAEJ4gHtSDD9SIovagXnRj2213WxNjiIogyAkSbnW73Y0HA0hBw5_yN7rbB6iJ3Dxuup3DdHa-me7MNwAnzCERkS5FlJgcOVhiFDqWCuSYZRGfSc2hrqst2qTedR57bq8An3kvjC6rzH1i4qijAdf_yC9VmqD8KFFweTN8Q3pqlL5dzUdopGbRFNMPlbLF14079X1Pbbt237mto2yqAOLK3MYotE0v5JEbEY9hii2JI3XIIi4wYyrUxL4ZEg2KnstcGdmUStOnXG1xpKSSMqzkLsGygxWS68702sPs1oK42RBdz0QKia2sSSdt1bN0I4xO1vQaI_oTCOfR7a8L2QTnapuwngWoRjW1qC0oiH4JNvLhD0bmC0qw9o3JcBvOqpPxIGF_NeKhEPzFmFUmDfpXBjPiyehdTHeg-y_K2oVif9AXe2BgRyq4FKbElnA8wX2KsYtpqBnfbd-VZbByfQQ84yfXYzJegzmzstZhoHSYrHFAy3A-e2eYsnMs3F3J1RxkJzUO5nZVhotc9fPHf0vbXyztGFbqnadW0Gq0mwewautSmKR0rQLF8WgiDlUsMw6PEgMy4Pm_LfYLDaj9nw
linkToPdf	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3NT8IwFH9RSIwe_ECNKOoO6kUbtnVrNxNjUCAqhhgjibfZdW08GEA-NPxr_nW2owM1kRvHZt07vP36Ptb3fg_giHkkIdKniBKbIw9LjGLPUYEccxwSMqk51HW1RZPctLy7Z_95Ab6yXhhdVpnZxNRQJx2u_5GXVZqg7ChR7rIsTVnEQ7V-2X1HeoKUvmnNxmmMIdIQo0-VvvUvbqvqWx-7br32dH2DzIQBxBX0Bih27SDmiZ-QgGGKHYkTdeASLjBjKuzEoR0T7SADn_kycSmVdki52uJJSSVlWMldhDzVWVEO8le15sPj5A6D-GakbmAj5Zcd07IzbtxzdFuMTt30GiP62y1OY90_17Op16uvw6oJV63KGF8bsCDaBVjLRkFYxjIUYOUHr-EmnFSGg07KBWv1u0LwV2tSp9Rpn1vM6g97H2K0Ba25qGsbcu1OW-yAhT2pnKewJXaEFwgeUox9TGPN_-6GviyCk-kj4oatXA_NeIumPMtah5HSYbrGES3C6eSd7pirY-buUqbmyJzbfjRFWRHOMtVPH_8vbXe2tENYUmiN7m-bjT1YdnVdTFrHVoLcoDcU-yqwGcQHBkEWvMwbtN9wYwNA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Automatic+speech+recognition%3A+a+survey&rft.jtitle=Multimedia+tools+and+applications&rft.au=Malik+Mishaim&rft.au=Malik%2C+Muhammad+Kamran&rft.au=Mehmood+Khawar&rft.au=Makhdoom+Imran&rft.date=2021-03-01&rft.pub=Springer+Nature+B.V&rft.issn=1380-7501&rft.eissn=1573-7721&rft.volume=80&rft.issue=6&rft.spage=9411&rft.epage=9457&rft_id=info:doi/10.1007%2Fs11042-020-10073-7&rft.externalDBID=HAS_PDF_LINK
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1380-7501&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1380-7501&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1380-7501&client=summon