A multi-tasking model of speaker-keyword classification for keeping human in the loop of drone-assisted inspection

Audio commands are a preferred communication medium to keep inspectors in the loop of civil infrastructure inspection performed by a semi-autonomous drone. To understand job-specific commands from a group of heterogeneous and dynamic inspectors, a model must be developed cost-effectively for the gro...

Full description

Saved in:

Bibliographic Details
Published in	Engineering applications of artificial intelligence Vol. 117; p. 105597
Main Authors	Li, Yu, Parsan, Anisha, Wang, Bill, Dong, Penghao, Yao, Shanshan, Qin, Ruwen
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.01.2023
Subjects	Human-in-the-loop Human–robot interaction Infrastructure inspection Keyword classification Speaker recognition Infrastructure inspection Human–robot interaction Speaker recognition Keyword classification Human-in-the-loop
Online Access	Get full text
ISSN	0952-1976 1873-6769
DOI	10.1016/j.engappai.2022.105597

Cover

Loading…

Abstract	Audio commands are a preferred communication medium to keep inspectors in the loop of civil infrastructure inspection performed by a semi-autonomous drone. To understand job-specific commands from a group of heterogeneous and dynamic inspectors, a model must be developed cost-effectively for the group and easily adapted when the group changes. This paper is motivated to build a multi-tasking deep learning model that possesses a Share–Split–Collaborate architecture. This architecture allows the two classification tasks to share the feature extractor and then split subject-specific and keyword-specific features intertwined in the extracted features through feature projection and collaborative training. A base model for a group of five authorized subjects is trained and tested on the inspection keyword dataset collected by this study. The model achieved a 95.3% or higher mean accuracy in classifying the keywords of any authorized inspectors. Its mean accuracy in speaker classification is 99.2%. Due to the richer keyword representations that the model learns from the pooled training data Adapting the base model to a new inspector requires only a little training data from that inspector Like five utterances per keyword. Using the speaker classification scores for inspector verification can achieve a success rate of at least 93.9% in verifying authorized inspectors and 76.1% in detecting unauthorized ones. Further The paper demonstrates the applicability of the proposed model to larger-size groups on a public dataset. This paper provides a solution to addressing challenges facing AI-assisted human–robot interaction Including worker heterogeneity Worker dynamics And job heterogeneity. •The Share–Split–Collaborate multitask learning architecture is suitable for speaker-keyword classification.•Subject-specific and phonetic-specific features intertwined in audio data can be disentangled.•Rich keyword representations are learned from multi-subject spoken command data.•Small data of new speakers are sufficient for adding new classes to the speaker classifier.•Speaker classification scores are also effective for the speaker verification.
AbstractList	Audio commands are a preferred communication medium to keep inspectors in the loop of civil infrastructure inspection performed by a semi-autonomous drone. To understand job-specific commands from a group of heterogeneous and dynamic inspectors, a model must be developed cost-effectively for the group and easily adapted when the group changes. This paper is motivated to build a multi-tasking deep learning model that possesses a Share–Split–Collaborate architecture. This architecture allows the two classification tasks to share the feature extractor and then split subject-specific and keyword-specific features intertwined in the extracted features through feature projection and collaborative training. A base model for a group of five authorized subjects is trained and tested on the inspection keyword dataset collected by this study. The model achieved a 95.3% or higher mean accuracy in classifying the keywords of any authorized inspectors. Its mean accuracy in speaker classification is 99.2%. Due to the richer keyword representations that the model learns from the pooled training data Adapting the base model to a new inspector requires only a little training data from that inspector Like five utterances per keyword. Using the speaker classification scores for inspector verification can achieve a success rate of at least 93.9% in verifying authorized inspectors and 76.1% in detecting unauthorized ones. Further The paper demonstrates the applicability of the proposed model to larger-size groups on a public dataset. This paper provides a solution to addressing challenges facing AI-assisted human–robot interaction Including worker heterogeneity Worker dynamics And job heterogeneity. •The Share–Split–Collaborate multitask learning architecture is suitable for speaker-keyword classification.•Subject-specific and phonetic-specific features intertwined in audio data can be disentangled.•Rich keyword representations are learned from multi-subject spoken command data.•Small data of new speakers are sufficient for adding new classes to the speaker classifier.•Speaker classification scores are also effective for the speaker verification.
ArticleNumber	105597
Author	Wang, Bill Dong, Penghao Yao, Shanshan Li, Yu Qin, Ruwen Parsan, Anisha
Author_xml	– sequence: 1 givenname: Yu orcidid: 0000-0002-7245-0284 surname: Li fullname: Li, Yu email: yu.li.5@stonybrook.edu organization: Department of Civil Engineering, Stony Brook University, 2427 Computer Science Building, Stony Brook, 11794, NY, United States – sequence: 2 givenname: Anisha orcidid: 0000-0001-5999-9857 surname: Parsan fullname: Parsan, Anisha organization: Department of Civil Engineering, Stony Brook University, 2427 Computer Science Building, Stony Brook, 11794, NY, United States – sequence: 3 givenname: Bill surname: Wang fullname: Wang, Bill organization: Department of Civil Engineering, Stony Brook University, 2427 Computer Science Building, Stony Brook, 11794, NY, United States – sequence: 4 givenname: Penghao orcidid: 0000-0001-8975-3911 surname: Dong fullname: Dong, Penghao organization: Department of Mechanical Engineering, Stony Brook University, 161 Light Engineering Building, Stony Brook, 11794, NY, United States – sequence: 5 givenname: Shanshan orcidid: 0000-0002-2076-162X surname: Yao fullname: Yao, Shanshan organization: Department of Mechanical Engineering, Stony Brook University, 161 Light Engineering Building, Stony Brook, 11794, NY, United States – sequence: 6 givenname: Ruwen orcidid: 0000-0003-2656-8705 surname: Qin fullname: Qin, Ruwen email: ruwen.qin@stonybrook.edu organization: Department of Civil Engineering, Stony Brook University, 2427 Computer Science Building, Stony Brook, 11794, NY, United States
BookMark	eNqFkMtKAzEUhoNUsK2-guQFUpPM5DLgwlK8QcGNrkOaSdp0LhmSqdK3d8bqxk1XB87h-zn_NwOTNrQWgFuCFwQTfrdf2Haru077BcWUDkvGCnEBpkSKDHHBiwmY4oJRRArBr8AspT3GOJM5n4K4hM2h7j3qdap8u4VNKG0Ng4Ops7qyEVX2-BViCU2tU_LOG9370EIXIqys7UZmd2h0C30L-52FdQjdyJdxeBONTOptOVyHQDOi1-DS6TrZm985Bx9Pj--rF7R-e35dLdfIZIT2iGDJuWCCO8qI1jk2pBRaCCypkI5SzahgzOF8o4uMyNw5Q4ygMmPSbbINzuaAn3JNDClF61QXfaPjURGsRnNqr_7MqdGcOpkbwPt_oPH9T-s-al-fxx9OuB3KfXobVTLetsaWPg4GVBn8uYhvRc6SMg
CitedBy_id	crossref_primary_10_1002_admt_202400990 crossref_primary_10_1016_j_patrec_2023_10_022 crossref_primary_10_1016_j_eswa_2023_123099 crossref_primary_10_1039_D3MH01062G
Cites_doi	10.1109/TASLP.2016.2639323 10.1142/S0218001417500410 10.1109/CVPR.2016.90 10.1109/89.365379 10.1016/j.neunet.2021.03.004 10.1109/TASLP.2018.2831456 10.3390/app11083603 10.1109/TASLP.2020.2984089 10.1109/THMS.2022.3155373 10.1145/3336191.3371802 10.1561/1100000005 10.1109/TASL.2010.2064307 10.21437/Interspeech.2019-1943 10.21437/Interspeech.2018-1929 10.21437/Interspeech.2020-1420
ContentType	Journal Article
Copyright	2022 Elsevier Ltd
Copyright_xml	– notice: 2022 Elsevier Ltd
DBID	AAYXX CITATION
DOI	10.1016/j.engappai.2022.105597
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences Computer Science
EISSN	1873-6769
ExternalDocumentID	10_1016_j_engappai_2022_105597 S0952197622005875
GroupedDBID	--K --M .DC .~1 0R~ 1B1 1~. 1~5 29G 4.4 457 4G. 5GY 5VS 7-5 71M 8P~ 9JN AABNK AACTN AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABMAC ABXDB ABYKQ ACDAQ ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADJOM ADMUD ADTZH AEBSH AECPX AEKER AENEX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 F5P FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q GBLVA GBOLZ HLZ HVGLF HZ~ IHE J1W JJJVA KOM LG9 LY7 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SBC SDF SDG SDP SES SET SEW SPC SPCBC SST SSV SSZ T5K TN5 UHS WUQ ZMT ~G- AATTM AAXKI AAYWO AAYXX ABJNI ABWVN ACRPL ACVFH ADCNI ADNMO AEIPS AEUPX AFJKZ AFPUW AFXIZ AGCQF AGQPQ AGRNS AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP BNPGV CITATION SSH
ID	FETCH-LOGICAL-c312t-108667576f251aa40c1d7a7708278f22a52755f04ba93184ffc1c728358fb3b03
IEDL.DBID	.~1
ISSN	0952-1976
IngestDate	Tue Jul 01 01:04:06 EDT 2025 Thu Apr 24 22:50:52 EDT 2025 Fri Feb 23 02:40:12 EST 2024
IsPeerReviewed	true
IsScholarly	true
Keywords	Infrastructure inspection Human–robot interaction Speaker recognition Keyword classification Human-in-the-loop
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c312t-108667576f251aa40c1d7a7708278f22a52755f04ba93184ffc1c728358fb3b03
ORCID	0000-0003-2656-8705 0000-0001-8975-3911 0000-0001-5999-9857 0000-0002-2076-162X 0000-0002-7245-0284
ParticipantIDs	crossref_primary_10_1016_j_engappai_2022_105597 crossref_citationtrail_10_1016_j_engappai_2022_105597 elsevier_sciencedirect_doi_10_1016_j_engappai_2022_105597
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	January 2023 2023-01-00
PublicationDateYYYYMMDD	2023-01-01
PublicationDate_xml	– month: 01 year: 2023 text: January 2023
PublicationDecade	2020
PublicationTitle	Engineering applications of artificial intelligence
PublicationYear	2023
Publisher	Elsevier Ltd
Publisher_xml	– name: Elsevier Ltd
References	Parveen, Qadeer, Green (b29) 2000 Gong, Chung, Glass (b14) 2021 Fu, Li, Zi, Zhang, Wu, He, Zhou (b40) 2021 Garcia-Romero, McCree, Snyder, Sell (b27) 2020 FHWA (b3) 2022 Wang, Yao, Li, Fang (b26) 2020 Becker, Ackermann, Lapuschkin, Müller, Samek (b45) 2018 Black (b1) 2022 Gaikwad, Gawali, Yannawar (b15) 2010; 10 Arriany, Musbah (b7) 2016 Peter, Roth, Pernkopf (b11) 2022 ASCE (b4) 2021 Chung, J.S., Nagrani, A., Zisserman, A., 2018. VoxCeleb2: Deep Speaker Recognition. In: Proc. Interspeech 2018. pp. 1086–1090. López-Espejo, Tan, Jensen (b36) 2020; 28 He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778. Mukherjee, Shivam, Gangwal, Khaitan, Das (b13) 2019 Nossier, Wall, Moniri, Glackin, Cannings (b12) 2020 Chaudhary, Srivastava, Bhardwaj (b22) 2017; 31 Yadav, Rai (b25) 2018 Kenny, Stafylakis, Ouellet, Gupta, Alam (b20) 2014 Variani, Lei, McDermott, Moreno, Gonzalez-Dominguez (b21) 2014 Li (b44) 2022 Chen, Parada, Heigold (b9) 2014 Bai, Zhang (b23) 2021; 140 Dehak, Kenny, Dehak, Dumouchel, Ouellet (b19) 2010; 19 El Shafey, L., Soltau, H., Shafran, I., 2019. Joint Speech Recognition and Speaker Diarization via Sequence Transduction. In: Proc. Interspeech. pp. 396–400. Hanifa, Isa, Mohamad (b16) 2021; 90 Chauhan, Isshiki, Li (b31) 2019 Lukic, Vogt, Dürr, Stadelmann (b32) 2016 Ye, Yang (b33) 2021; 11 Tang, Lin (b10) 2018 Reynolds, Rose (b18) 1995; 3 Li, Karim, Qin (b5) 2022; 52 Chauhan, Chandra (b30) 2017 Tang, Li, Wang, Vipperla (b37) 2016; 25 Goodrich, Schultz (b6) 2008; 1 Heigold, Moreno, Bengio, Shazeer (b17) 2016 Jung, M., Jung, Y., Goo, J., Kim, H., 2020. Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification Using CTC-Based Soft VAD and Global Query Attention. In: Proc. Interspeech 2020. pp. 931–935. Li, R., Jiang, J.-Y., Li, J.L., Hsieh, C.-C., Wang, W., 2020. Automatic speaker recognition with limited data. In: Proceedings of the 13th International Conference on Web Search and Data Mining. pp. 340–348. Holone (b8) 2015; 9 Hussain, Nguyen, Zhang, Visser (b39) 2022 Sigtia, Marchi, Kajarekar, Naik, Bridle (b34) 2020 FHWA (b2) 2020 Anand, Singh, Srivastava, Lall (b42) 2019 Zhang, Koishida, Hansen (b24) 2018; 26 Chaudhary (10.1016/j.engappai.2022.105597_b22) 2017; 31 Reynolds (10.1016/j.engappai.2022.105597_b18) 1995; 3 Zhang (10.1016/j.engappai.2022.105597_b24) 2018; 26 FHWA (10.1016/j.engappai.2022.105597_b2) 2020 Tang (10.1016/j.engappai.2022.105597_b37) 2016; 25 Gaikwad (10.1016/j.engappai.2022.105597_b15) 2010; 10 Wang (10.1016/j.engappai.2022.105597_b26) 2020 Li (10.1016/j.engappai.2022.105597_b44) 2022 10.1016/j.engappai.2022.105597_b38 Bai (10.1016/j.engappai.2022.105597_b23) 2021; 140 Lukic (10.1016/j.engappai.2022.105597_b32) 2016 Anand (10.1016/j.engappai.2022.105597_b42) 2019 Parveen (10.1016/j.engappai.2022.105597_b29) 2000 10.1016/j.engappai.2022.105597_b35 Hanifa (10.1016/j.engappai.2022.105597_b16) 2021; 90 Chauhan (10.1016/j.engappai.2022.105597_b31) 2019 Ye (10.1016/j.engappai.2022.105597_b33) 2021; 11 Chen (10.1016/j.engappai.2022.105597_b9) 2014 Black (10.1016/j.engappai.2022.105597_b1) 2022 Dehak (10.1016/j.engappai.2022.105597_b19) 2010; 19 FHWA (10.1016/j.engappai.2022.105597_b3) 2022 Hussain (10.1016/j.engappai.2022.105597_b39) 2022 Sigtia (10.1016/j.engappai.2022.105597_b34) 2020 Garcia-Romero (10.1016/j.engappai.2022.105597_b27) 2020 Mukherjee (10.1016/j.engappai.2022.105597_b13) 2019 Gong (10.1016/j.engappai.2022.105597_b14) 2021 Fu (10.1016/j.engappai.2022.105597_b40) 2021 10.1016/j.engappai.2022.105597_b28 Tang (10.1016/j.engappai.2022.105597_b10) 2018 Holone (10.1016/j.engappai.2022.105597_b8) 2015; 9 Nossier (10.1016/j.engappai.2022.105597_b12) 2020 Heigold (10.1016/j.engappai.2022.105597_b17) 2016 Arriany (10.1016/j.engappai.2022.105597_b7) 2016 10.1016/j.engappai.2022.105597_b41 López-Espejo (10.1016/j.engappai.2022.105597_b36) 2020; 28 10.1016/j.engappai.2022.105597_b43 ASCE (10.1016/j.engappai.2022.105597_b4) 2021 Li (10.1016/j.engappai.2022.105597_b5) 2022; 52 Goodrich (10.1016/j.engappai.2022.105597_b6) 2008; 1 Chauhan (10.1016/j.engappai.2022.105597_b30) 2017 Becker (10.1016/j.engappai.2022.105597_b45) 2018 Peter (10.1016/j.engappai.2022.105597_b11) 2022 Yadav (10.1016/j.engappai.2022.105597_b25) 2018 Kenny (10.1016/j.engappai.2022.105597_b20) 2014 Variani (10.1016/j.engappai.2022.105597_b21) 2014
References_xml	– year: 2021 ident: b4 article-title: 2021 Report Card for America Infrastructure – year: 2019 ident: b42 article-title: Few shot speaker recognition using deep neural networks – start-page: 4087 year: 2014 end-page: 4091 ident: b9 article-title: Small-footprint keyword spotting using deep neural networks publication-title: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) – reference: El Shafey, L., Soltau, H., Shafran, I., 2019. Joint Speech Recognition and Speaker Diarization via Sequence Transduction. In: Proc. Interspeech. pp. 396–400. – volume: 25 start-page: 493 year: 2016 end-page: 504 ident: b37 article-title: Collaborative joint training with multitask recurrent model for speech and speaker recognition publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. – volume: 19 start-page: 788 year: 2010 end-page: 798 ident: b19 article-title: Front-end factor analysis for speaker verification publication-title: IEEE Trans. Audio Speech Lang. Process. – start-page: 293 year: 2014 end-page: 298 ident: b20 article-title: Deep neural networks for extracting baum-welch statistics for speaker recognition publication-title: Odyssey, Vol. 2014 – start-page: 6844 year: 2020 end-page: 6848 ident: b34 article-title: Multi-task learning for speaker verification and voice trigger detection publication-title: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) – year: 2018 ident: b45 article-title: Interpreting and explaining deep neural networks for classification of audio signals – start-page: 5115 year: 2016 end-page: 5119 ident: b17 article-title: End-to-end text-dependent speaker verification publication-title: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) – reference: Jung, M., Jung, Y., Goo, J., Kim, H., 2020. Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification Using CTC-Based Soft VAD and Global Query Attention. In: Proc. Interspeech 2020. pp. 931–935. – year: 2022 ident: b3 article-title: Status of the nation’s highways, bridges and transit condition and performance report – year: 2022 ident: b1 article-title: 2022 Bridge Report – volume: 140 start-page: 65 year: 2021 end-page: 99 ident: b23 article-title: Speaker recognition based on deep learning: An overview publication-title: Neural Netw. – start-page: 130 year: 2019 end-page: 133 ident: b31 article-title: Speaker recognition using LPC, MFCC, ZCR features with ANN and SVM classifier for large input database publication-title: 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS) – volume: 90 year: 2021 ident: b16 article-title: A review on speaker recognition: Technology and challenges publication-title: Comput. Electr. Eng. – year: 2022 ident: b44 article-title: Speech dataset for drone assisted inspection – volume: 28 start-page: 1233 year: 2020 end-page: 1247 ident: b36 article-title: Improved external speaker-robust keyword spotting for hearing assistive devices publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. – start-page: 1147 year: 2017 end-page: 1149 ident: b30 article-title: Speaker recognition and verification using artificial neural network publication-title: 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET) – start-page: 6137 year: 2022 end-page: 6141 ident: b39 article-title: Multi-task voice activated framework using self-supervised learning publication-title: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) – volume: 52 start-page: 591 year: 2022 end-page: 601 ident: b5 article-title: A virtual-reality-based training and assessment system for bridge inspectors with an assistant drone publication-title: IEEE Trans. Hum.-Mach. Syst. – start-page: 4052 year: 2014 end-page: 4056 ident: b21 article-title: Deep neural networks for small footprint text-dependent speaker verification publication-title: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) – volume: 9 start-page: 1933 year: 2015 end-page: 1942 ident: b8 article-title: Possibilities, challenges and the state of the art of automatic speech recognition in air traffic control publication-title: Int. J. Comput. Inf. Eng. – start-page: 1 year: 2016 end-page: 6 ident: b32 article-title: Speaker identification and clustering using convolutional neural networks publication-title: 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP) – volume: 11 start-page: 3603 year: 2021 ident: b33 article-title: A deep neural network model for speaker identification publication-title: Appl. Sci. – start-page: 6464 year: 2020 end-page: 6468 ident: b26 article-title: Multi-resolution multi-head attention in deep speaker embedding publication-title: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) – reference: Chung, J.S., Nagrani, A., Zisserman, A., 2018. VoxCeleb2: Deep Speaker Recognition. In: Proc. Interspeech 2018. pp. 1086–1090. – start-page: 306 year: 2000 end-page: 309 ident: b29 article-title: Speaker recognition with recurrent neural networks. publication-title: Interspeech – volume: 10 start-page: 16 year: 2010 end-page: 24 ident: b15 article-title: A review on speech recognition technique publication-title: Int. J. Comput. Appl. – volume: 26 start-page: 1633 year: 2018 end-page: 1644 ident: b24 article-title: Text-independent speaker verification based on triplet convolutional neural network embeddings publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. – reference: Li, R., Jiang, J.-Y., Li, J.L., Hsieh, C.-C., Wang, W., 2020. Automatic speaker recognition with limited data. In: Proceedings of the 13th International Conference on Web Search and Data Mining. pp. 340–348. – start-page: 1 year: 2021 end-page: 6 ident: b14 article-title: AST: Audio spectrogram transformer publication-title: 2021 INTERSPEACH – volume: 1 start-page: 203 year: 2008 end-page: 275 ident: b6 article-title: Human-robot interaction: a survey publication-title: Foundations and Trends® in Human–Computer Interaction – start-page: 1 year: 2016 end-page: 6 ident: b7 article-title: Applying voice recognition technology for smart home networks publication-title: 2016 International Conference on Engineering & MIS (ICEMIS) – start-page: 3423 year: 2022 end-page: 3427 ident: b11 article-title: End-to-end keyword spotting using neural architecture search and quantization publication-title: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) – start-page: 1 year: 2020 end-page: 8 ident: b12 article-title: A comparative study of time and frequency domain approaches to deep learning based speech enhancementsainath2015convolutional publication-title: 2020 International Joint Conference on Neural Networks (IJCNN) – year: 2020 ident: b2 article-title: Highway Statistics 2020 – volume: 3 start-page: 72 year: 1995 end-page: 83 ident: b18 article-title: Robust text-independent speaker identification using Gaussian mixture speaker models publication-title: IEEE Trans. Speech Audio Process. – reference: He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778. – start-page: 5484 year: 2018 end-page: 5488 ident: b10 article-title: Deep residual learning for small-footprint keyword spotting publication-title: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) – start-page: 320 year: 2021 end-page: 327 ident: b40 article-title: Incremental learning for end-to-end automatic speech recognition publication-title: 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) – start-page: 37 year: 2019 end-page: 41 ident: b13 article-title: Spoken language recognition using CNN publication-title: 2019 International Conference on Information Technology (ICIT) – volume: 31 year: 2017 ident: b22 article-title: Feature extraction methods for speaker recognition: A review publication-title: Int. J. Pattern Recognit. Artif. Intell. – start-page: 2237 year: 2018 end-page: 2241 ident: b25 article-title: Learning discriminative features for speaker identification and verification. publication-title: Interspeech – start-page: 7559 year: 2020 end-page: 7563 ident: b27 article-title: Jhu-hltcoe system for the voxsrc speaker recognition challenge publication-title: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) – start-page: 6464 year: 2020 ident: 10.1016/j.engappai.2022.105597_b26 article-title: Multi-resolution multi-head attention in deep speaker embedding – start-page: 1 year: 2020 ident: 10.1016/j.engappai.2022.105597_b12 article-title: A comparative study of time and frequency domain approaches to deep learning based speech enhancementsainath2015convolutional – start-page: 2237 year: 2018 ident: 10.1016/j.engappai.2022.105597_b25 article-title: Learning discriminative features for speaker identification and verification. – start-page: 4087 year: 2014 ident: 10.1016/j.engappai.2022.105597_b9 article-title: Small-footprint keyword spotting using deep neural networks – volume: 90 year: 2021 ident: 10.1016/j.engappai.2022.105597_b16 article-title: A review on speaker recognition: Technology and challenges publication-title: Comput. Electr. Eng. – year: 2020 ident: 10.1016/j.engappai.2022.105597_b2 – volume: 25 start-page: 493 issue: 3 year: 2016 ident: 10.1016/j.engappai.2022.105597_b37 article-title: Collaborative joint training with multitask recurrent model for speech and speaker recognition publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASLP.2016.2639323 – volume: 31 issue: 12 year: 2017 ident: 10.1016/j.engappai.2022.105597_b22 article-title: Feature extraction methods for speaker recognition: A review publication-title: Int. J. Pattern Recognit. Artif. Intell. doi: 10.1142/S0218001417500410 – start-page: 306 year: 2000 ident: 10.1016/j.engappai.2022.105597_b29 article-title: Speaker recognition with recurrent neural networks. – ident: 10.1016/j.engappai.2022.105597_b43 doi: 10.1109/CVPR.2016.90 – volume: 3 start-page: 72 issue: 1 year: 1995 ident: 10.1016/j.engappai.2022.105597_b18 article-title: Robust text-independent speaker identification using Gaussian mixture speaker models publication-title: IEEE Trans. Speech Audio Process. doi: 10.1109/89.365379 – volume: 140 start-page: 65 year: 2021 ident: 10.1016/j.engappai.2022.105597_b23 article-title: Speaker recognition based on deep learning: An overview publication-title: Neural Netw. doi: 10.1016/j.neunet.2021.03.004 – volume: 26 start-page: 1633 issue: 9 year: 2018 ident: 10.1016/j.engappai.2022.105597_b24 article-title: Text-independent speaker verification based on triplet convolutional neural network embeddings publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASLP.2018.2831456 – start-page: 1147 year: 2017 ident: 10.1016/j.engappai.2022.105597_b30 article-title: Speaker recognition and verification using artificial neural network – volume: 11 start-page: 3603 issue: 8 year: 2021 ident: 10.1016/j.engappai.2022.105597_b33 article-title: A deep neural network model for speaker identification publication-title: Appl. Sci. doi: 10.3390/app11083603 – volume: 28 start-page: 1233 year: 2020 ident: 10.1016/j.engappai.2022.105597_b36 article-title: Improved external speaker-robust keyword spotting for hearing assistive devices publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASLP.2020.2984089 – start-page: 1 year: 2021 ident: 10.1016/j.engappai.2022.105597_b14 article-title: AST: Audio spectrogram transformer – start-page: 130 year: 2019 ident: 10.1016/j.engappai.2022.105597_b31 article-title: Speaker recognition using LPC, MFCC, ZCR features with ANN and SVM classifier for large input database – start-page: 37 year: 2019 ident: 10.1016/j.engappai.2022.105597_b13 article-title: Spoken language recognition using CNN – start-page: 3423 year: 2022 ident: 10.1016/j.engappai.2022.105597_b11 article-title: End-to-end keyword spotting using neural architecture search and quantization – year: 2022 ident: 10.1016/j.engappai.2022.105597_b1 – start-page: 320 year: 2021 ident: 10.1016/j.engappai.2022.105597_b40 article-title: Incremental learning for end-to-end automatic speech recognition – year: 2018 ident: 10.1016/j.engappai.2022.105597_b45 – year: 2021 ident: 10.1016/j.engappai.2022.105597_b4 – start-page: 5115 year: 2016 ident: 10.1016/j.engappai.2022.105597_b17 article-title: End-to-end text-dependent speaker verification – start-page: 1 year: 2016 ident: 10.1016/j.engappai.2022.105597_b32 article-title: Speaker identification and clustering using convolutional neural networks – start-page: 6844 year: 2020 ident: 10.1016/j.engappai.2022.105597_b34 article-title: Multi-task learning for speaker verification and voice trigger detection – volume: 52 start-page: 591 issue: 4 year: 2022 ident: 10.1016/j.engappai.2022.105597_b5 article-title: A virtual-reality-based training and assessment system for bridge inspectors with an assistant drone publication-title: IEEE Trans. Hum.-Mach. Syst. doi: 10.1109/THMS.2022.3155373 – volume: 9 start-page: 1933 issue: 8 year: 2015 ident: 10.1016/j.engappai.2022.105597_b8 article-title: Possibilities, challenges and the state of the art of automatic speech recognition in air traffic control publication-title: Int. J. Comput. Inf. Eng. – ident: 10.1016/j.engappai.2022.105597_b41 doi: 10.1145/3336191.3371802 – year: 2022 ident: 10.1016/j.engappai.2022.105597_b3 article-title: Status of the nation’s highways, bridges and transit condition and performance report – start-page: 4052 year: 2014 ident: 10.1016/j.engappai.2022.105597_b21 article-title: Deep neural networks for small footprint text-dependent speaker verification – start-page: 293 year: 2014 ident: 10.1016/j.engappai.2022.105597_b20 article-title: Deep neural networks for extracting baum-welch statistics for speaker recognition – year: 2022 ident: 10.1016/j.engappai.2022.105597_b44 – volume: 1 start-page: 203 issue: 3 year: 2008 ident: 10.1016/j.engappai.2022.105597_b6 article-title: Human-robot interaction: a survey publication-title: Foundations and Trends® in Human–Computer Interaction doi: 10.1561/1100000005 – volume: 19 start-page: 788 issue: 4 year: 2010 ident: 10.1016/j.engappai.2022.105597_b19 article-title: Front-end factor analysis for speaker verification publication-title: IEEE Trans. Audio Speech Lang. Process. doi: 10.1109/TASL.2010.2064307 – ident: 10.1016/j.engappai.2022.105597_b35 doi: 10.21437/Interspeech.2019-1943 – start-page: 7559 year: 2020 ident: 10.1016/j.engappai.2022.105597_b27 article-title: Jhu-hltcoe system for the voxsrc speaker recognition challenge – ident: 10.1016/j.engappai.2022.105597_b28 doi: 10.21437/Interspeech.2018-1929 – volume: 10 start-page: 16 issue: 3 year: 2010 ident: 10.1016/j.engappai.2022.105597_b15 article-title: A review on speech recognition technique publication-title: Int. J. Comput. Appl. – start-page: 6137 year: 2022 ident: 10.1016/j.engappai.2022.105597_b39 article-title: Multi-task voice activated framework using self-supervised learning – year: 2019 ident: 10.1016/j.engappai.2022.105597_b42 – start-page: 1 year: 2016 ident: 10.1016/j.engappai.2022.105597_b7 article-title: Applying voice recognition technology for smart home networks – start-page: 5484 year: 2018 ident: 10.1016/j.engappai.2022.105597_b10 article-title: Deep residual learning for small-footprint keyword spotting – ident: 10.1016/j.engappai.2022.105597_b38 doi: 10.21437/Interspeech.2020-1420
SSID	ssj0003846
Score	2.3922398
Snippet	Audio commands are a preferred communication medium to keep inspectors in the loop of civil infrastructure inspection performed by a semi-autonomous drone. To...
SourceID	crossref elsevier
SourceType	Enrichment Source Index Database Publisher
StartPage	105597
SubjectTerms	Human-in-the-loop Human–robot interaction Infrastructure inspection Keyword classification Speaker recognition
Title	A multi-tasking model of speaker-keyword classification for keeping human in the loop of drone-assisted inspection
URI	https://dx.doi.org/10.1016/j.engappai.2022.105597
Volume	117
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07a8MwEBYhXbr0XfoMGroqjmXLcsYQGtIGMrQNzWZkRSp5YBsnpVt_e-_8aFMoZCgejGUdNjqd7rvjHoTcCcFjT7sus2BdMB8gPwuDjmRWuq4N4JnbIsp3HAwn_uNUTBukX-fCYFhldfaXZ3pxWlcjTrWaTjafO88ADkDcQJjRMQKwGzPYfYn189ufP2EeXlgm68BkhrO3soQXbZO8qSxTc7ATOceWtwKLP_2loLaUzuCIHFRokfbKHzomDZOckMMKOdJKLtcwVDdnqMdOSd6jRbAg26g1usNp0fOGppauM6OWJmcgvh9gelKNABojhgomUUCxdGkM5lHRooMfnScUYCJdpWmG9LM8TQxDGnSVwtsyWzNNzshkcP_SH7KqwQLTnss3DLssgcEgAwsoRym_o92ZVFICLJCh5VwJLoWwHT9WXZB931rtaokV2kIbe3HHOyfNBD55QagSOvRdBWgH9L3UPNRGAZqadTVcgCIuiahXNdJV9XFsgrGK6jCzRVRzI0JuRCU3LonzTZeV9Td2UnRrpkW_dlIESmIH7dU_aK_JPraiL90zN6S5yd_NLQCWTdwqdmSL7PUeRsMx3kdPr6Mvxynr6w
linkProvider	Elsevier
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LT8MwDI4GO8CFN-JNDlxD17RpuuOEQBuPXdgkblWaJWgbaqtuiL-P3abTkJA4oJ6a1moVx_Znyw9CboTgaaB9n1nwLlgIkJ_FUUcyK33fRnDPbZXlO4z64_DxTby1yF1TC4NplU731zq90tZuxXO76RXTqfcK4ADEDYQZAyMAuzdIG7tTwWFv9wZP_eFKIQdxXa8D7zMkWCsUnt2a7F0VhZqCq8g5Tr0V2P_pNxu1Znce9siOA4y0V__TPmmZ7IDsOvBInWguYKmZz9CsHZKyR6t8QbZUC4yI02rsDc0tXRRGzU3JQIK_wPukGjE0Jg1VfKIAZOncGCylotUQPzrNKCBF-pHnBdJPyjwzDGkwWgpP64LNPDsi44f70V2fuRkLTAc-XzIctAQ-g4wsAB2lwo72J1JJCchAxpZzJbgUwnbCVHVB_ENrta8lNmmLbRqkneCYbGbwyRNCldBx6CsAPGDypeaxNgoA1aSr4QIgcUpEs6uJdg3IcQ7GR9Jkms2ShhsJciOpuXFKvBVdUbfg-JOi2zAt-XGYErATf9Ce_YP2mmz1Ry_PyfNg-HROtnEyfR2tuSCby_LTXAJ-WaZX7nx-A3h57Pk
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+multi-tasking+model+of+speaker-keyword+classification+for+keeping+human+in+the+loop+of+drone-assisted+inspection&rft.jtitle=Engineering+applications+of+artificial+intelligence&rft.au=Li%2C+Yu&rft.au=Parsan%2C+Anisha&rft.au=Wang%2C+Bill&rft.au=Dong%2C+Penghao&rft.date=2023-01-01&rft.issn=0952-1976&rft.volume=117&rft.spage=105597&rft_id=info:doi/10.1016%2Fj.engappai.2022.105597&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_engappai_2022_105597
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0952-1976&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0952-1976&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0952-1976&client=summon