A multi-tasking model of speaker-keyword classification for keeping human in the loop of drone-assisted inspection

Audio commands are a preferred communication medium to keep inspectors in the loop of civil infrastructure inspection performed by a semi-autonomous drone. To understand job-specific commands from a group of heterogeneous and dynamic inspectors, a model must be developed cost-effectively for the gro...

Full description

Saved in:
Bibliographic Details
Published inEngineering applications of artificial intelligence Vol. 117; p. 105597
Main Authors Li, Yu, Parsan, Anisha, Wang, Bill, Dong, Penghao, Yao, Shanshan, Qin, Ruwen
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.01.2023
Subjects
Online AccessGet full text
ISSN0952-1976
1873-6769
DOI10.1016/j.engappai.2022.105597

Cover

Loading…
Abstract Audio commands are a preferred communication medium to keep inspectors in the loop of civil infrastructure inspection performed by a semi-autonomous drone. To understand job-specific commands from a group of heterogeneous and dynamic inspectors, a model must be developed cost-effectively for the group and easily adapted when the group changes. This paper is motivated to build a multi-tasking deep learning model that possesses a Share–Split–Collaborate architecture. This architecture allows the two classification tasks to share the feature extractor and then split subject-specific and keyword-specific features intertwined in the extracted features through feature projection and collaborative training. A base model for a group of five authorized subjects is trained and tested on the inspection keyword dataset collected by this study. The model achieved a 95.3% or higher mean accuracy in classifying the keywords of any authorized inspectors. Its mean accuracy in speaker classification is 99.2%. Due to the richer keyword representations that the model learns from the pooled training data Adapting the base model to a new inspector requires only a little training data from that inspector Like five utterances per keyword. Using the speaker classification scores for inspector verification can achieve a success rate of at least 93.9% in verifying authorized inspectors and 76.1% in detecting unauthorized ones. Further The paper demonstrates the applicability of the proposed model to larger-size groups on a public dataset. This paper provides a solution to addressing challenges facing AI-assisted human–robot interaction Including worker heterogeneity Worker dynamics And job heterogeneity. •The Share–Split–Collaborate multitask learning architecture is suitable for speaker-keyword classification.•Subject-specific and phonetic-specific features intertwined in audio data can be disentangled.•Rich keyword representations are learned from multi-subject spoken command data.•Small data of new speakers are sufficient for adding new classes to the speaker classifier.•Speaker classification scores are also effective for the speaker verification.
AbstractList Audio commands are a preferred communication medium to keep inspectors in the loop of civil infrastructure inspection performed by a semi-autonomous drone. To understand job-specific commands from a group of heterogeneous and dynamic inspectors, a model must be developed cost-effectively for the group and easily adapted when the group changes. This paper is motivated to build a multi-tasking deep learning model that possesses a Share–Split–Collaborate architecture. This architecture allows the two classification tasks to share the feature extractor and then split subject-specific and keyword-specific features intertwined in the extracted features through feature projection and collaborative training. A base model for a group of five authorized subjects is trained and tested on the inspection keyword dataset collected by this study. The model achieved a 95.3% or higher mean accuracy in classifying the keywords of any authorized inspectors. Its mean accuracy in speaker classification is 99.2%. Due to the richer keyword representations that the model learns from the pooled training data Adapting the base model to a new inspector requires only a little training data from that inspector Like five utterances per keyword. Using the speaker classification scores for inspector verification can achieve a success rate of at least 93.9% in verifying authorized inspectors and 76.1% in detecting unauthorized ones. Further The paper demonstrates the applicability of the proposed model to larger-size groups on a public dataset. This paper provides a solution to addressing challenges facing AI-assisted human–robot interaction Including worker heterogeneity Worker dynamics And job heterogeneity. •The Share–Split–Collaborate multitask learning architecture is suitable for speaker-keyword classification.•Subject-specific and phonetic-specific features intertwined in audio data can be disentangled.•Rich keyword representations are learned from multi-subject spoken command data.•Small data of new speakers are sufficient for adding new classes to the speaker classifier.•Speaker classification scores are also effective for the speaker verification.
ArticleNumber 105597
Author Wang, Bill
Dong, Penghao
Yao, Shanshan
Li, Yu
Qin, Ruwen
Parsan, Anisha
Author_xml – sequence: 1
  givenname: Yu
  orcidid: 0000-0002-7245-0284
  surname: Li
  fullname: Li, Yu
  email: yu.li.5@stonybrook.edu
  organization: Department of Civil Engineering, Stony Brook University, 2427 Computer Science Building, Stony Brook, 11794, NY, United States
– sequence: 2
  givenname: Anisha
  orcidid: 0000-0001-5999-9857
  surname: Parsan
  fullname: Parsan, Anisha
  organization: Department of Civil Engineering, Stony Brook University, 2427 Computer Science Building, Stony Brook, 11794, NY, United States
– sequence: 3
  givenname: Bill
  surname: Wang
  fullname: Wang, Bill
  organization: Department of Civil Engineering, Stony Brook University, 2427 Computer Science Building, Stony Brook, 11794, NY, United States
– sequence: 4
  givenname: Penghao
  orcidid: 0000-0001-8975-3911
  surname: Dong
  fullname: Dong, Penghao
  organization: Department of Mechanical Engineering, Stony Brook University, 161 Light Engineering Building, Stony Brook, 11794, NY, United States
– sequence: 5
  givenname: Shanshan
  orcidid: 0000-0002-2076-162X
  surname: Yao
  fullname: Yao, Shanshan
  organization: Department of Mechanical Engineering, Stony Brook University, 161 Light Engineering Building, Stony Brook, 11794, NY, United States
– sequence: 6
  givenname: Ruwen
  orcidid: 0000-0003-2656-8705
  surname: Qin
  fullname: Qin, Ruwen
  email: ruwen.qin@stonybrook.edu
  organization: Department of Civil Engineering, Stony Brook University, 2427 Computer Science Building, Stony Brook, 11794, NY, United States
BookMark eNqFkMtKAzEUhoNUsK2-guQFUpPM5DLgwlK8QcGNrkOaSdp0LhmSqdK3d8bqxk1XB87h-zn_NwOTNrQWgFuCFwQTfrdf2Haru077BcWUDkvGCnEBpkSKDHHBiwmY4oJRRArBr8AspT3GOJM5n4K4hM2h7j3qdap8u4VNKG0Ng4Ops7qyEVX2-BViCU2tU_LOG9370EIXIqys7UZmd2h0C30L-52FdQjdyJdxeBONTOptOVyHQDOi1-DS6TrZm985Bx9Pj--rF7R-e35dLdfIZIT2iGDJuWCCO8qI1jk2pBRaCCypkI5SzahgzOF8o4uMyNw5Q4ygMmPSbbINzuaAn3JNDClF61QXfaPjURGsRnNqr_7MqdGcOpkbwPt_oPH9T-s-al-fxx9OuB3KfXobVTLetsaWPg4GVBn8uYhvRc6SMg
CitedBy_id crossref_primary_10_1002_admt_202400990
crossref_primary_10_1016_j_patrec_2023_10_022
crossref_primary_10_1016_j_eswa_2023_123099
crossref_primary_10_1039_D3MH01062G
Cites_doi 10.1109/TASLP.2016.2639323
10.1142/S0218001417500410
10.1109/CVPR.2016.90
10.1109/89.365379
10.1016/j.neunet.2021.03.004
10.1109/TASLP.2018.2831456
10.3390/app11083603
10.1109/TASLP.2020.2984089
10.1109/THMS.2022.3155373
10.1145/3336191.3371802
10.1561/1100000005
10.1109/TASL.2010.2064307
10.21437/Interspeech.2019-1943
10.21437/Interspeech.2018-1929
10.21437/Interspeech.2020-1420
ContentType Journal Article
Copyright 2022 Elsevier Ltd
Copyright_xml – notice: 2022 Elsevier Ltd
DBID AAYXX
CITATION
DOI 10.1016/j.engappai.2022.105597
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
Computer Science
EISSN 1873-6769
ExternalDocumentID 10_1016_j_engappai_2022_105597
S0952197622005875
GroupedDBID --K
--M
.DC
.~1
0R~
1B1
1~.
1~5
29G
4.4
457
4G.
5GY
5VS
7-5
71M
8P~
9JN
AABNK
AACTN
AAEDT
AAEDW
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABMAC
ABXDB
ABYKQ
ACDAQ
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADJOM
ADMUD
ADTZH
AEBSH
AECPX
AEKER
AENEX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
F5P
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
IHE
J1W
JJJVA
KOM
LG9
LY7
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SBC
SDF
SDG
SDP
SES
SET
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
TN5
UHS
WUQ
ZMT
~G-
AATTM
AAXKI
AAYWO
AAYXX
ABJNI
ABWVN
ACRPL
ACVFH
ADCNI
ADNMO
AEIPS
AEUPX
AFJKZ
AFPUW
AFXIZ
AGCQF
AGQPQ
AGRNS
AIGII
AIIUN
AKBMS
AKRWK
AKYEP
ANKPU
APXCP
BNPGV
CITATION
SSH
ID FETCH-LOGICAL-c312t-108667576f251aa40c1d7a7708278f22a52755f04ba93184ffc1c728358fb3b03
IEDL.DBID .~1
ISSN 0952-1976
IngestDate Tue Jul 01 01:04:06 EDT 2025
Thu Apr 24 22:50:52 EDT 2025
Fri Feb 23 02:40:12 EST 2024
IsPeerReviewed true
IsScholarly true
Keywords Infrastructure inspection
Human–robot interaction
Speaker recognition
Keyword classification
Human-in-the-loop
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c312t-108667576f251aa40c1d7a7708278f22a52755f04ba93184ffc1c728358fb3b03
ORCID 0000-0003-2656-8705
0000-0001-8975-3911
0000-0001-5999-9857
0000-0002-2076-162X
0000-0002-7245-0284
ParticipantIDs crossref_primary_10_1016_j_engappai_2022_105597
crossref_citationtrail_10_1016_j_engappai_2022_105597
elsevier_sciencedirect_doi_10_1016_j_engappai_2022_105597
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate January 2023
2023-01-00
PublicationDateYYYYMMDD 2023-01-01
PublicationDate_xml – month: 01
  year: 2023
  text: January 2023
PublicationDecade 2020
PublicationTitle Engineering applications of artificial intelligence
PublicationYear 2023
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References Parveen, Qadeer, Green (b29) 2000
Gong, Chung, Glass (b14) 2021
Fu, Li, Zi, Zhang, Wu, He, Zhou (b40) 2021
Garcia-Romero, McCree, Snyder, Sell (b27) 2020
FHWA (b3) 2022
Wang, Yao, Li, Fang (b26) 2020
Becker, Ackermann, Lapuschkin, Müller, Samek (b45) 2018
Black (b1) 2022
Gaikwad, Gawali, Yannawar (b15) 2010; 10
Arriany, Musbah (b7) 2016
Peter, Roth, Pernkopf (b11) 2022
ASCE (b4) 2021
Chung, J.S., Nagrani, A., Zisserman, A., 2018. VoxCeleb2: Deep Speaker Recognition. In: Proc. Interspeech 2018. pp. 1086–1090.
López-Espejo, Tan, Jensen (b36) 2020; 28
He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778.
Mukherjee, Shivam, Gangwal, Khaitan, Das (b13) 2019
Nossier, Wall, Moniri, Glackin, Cannings (b12) 2020
Chaudhary, Srivastava, Bhardwaj (b22) 2017; 31
Yadav, Rai (b25) 2018
Kenny, Stafylakis, Ouellet, Gupta, Alam (b20) 2014
Variani, Lei, McDermott, Moreno, Gonzalez-Dominguez (b21) 2014
Li (b44) 2022
Chen, Parada, Heigold (b9) 2014
Bai, Zhang (b23) 2021; 140
Dehak, Kenny, Dehak, Dumouchel, Ouellet (b19) 2010; 19
El Shafey, L., Soltau, H., Shafran, I., 2019. Joint Speech Recognition and Speaker Diarization via Sequence Transduction. In: Proc. Interspeech. pp. 396–400.
Hanifa, Isa, Mohamad (b16) 2021; 90
Chauhan, Isshiki, Li (b31) 2019
Lukic, Vogt, Dürr, Stadelmann (b32) 2016
Ye, Yang (b33) 2021; 11
Tang, Lin (b10) 2018
Reynolds, Rose (b18) 1995; 3
Li, Karim, Qin (b5) 2022; 52
Chauhan, Chandra (b30) 2017
Tang, Li, Wang, Vipperla (b37) 2016; 25
Goodrich, Schultz (b6) 2008; 1
Heigold, Moreno, Bengio, Shazeer (b17) 2016
Jung, M., Jung, Y., Goo, J., Kim, H., 2020. Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification Using CTC-Based Soft VAD and Global Query Attention. In: Proc. Interspeech 2020. pp. 931–935.
Li, R., Jiang, J.-Y., Li, J.L., Hsieh, C.-C., Wang, W., 2020. Automatic speaker recognition with limited data. In: Proceedings of the 13th International Conference on Web Search and Data Mining. pp. 340–348.
Holone (b8) 2015; 9
Hussain, Nguyen, Zhang, Visser (b39) 2022
Sigtia, Marchi, Kajarekar, Naik, Bridle (b34) 2020
FHWA (b2) 2020
Anand, Singh, Srivastava, Lall (b42) 2019
Zhang, Koishida, Hansen (b24) 2018; 26
Chaudhary (10.1016/j.engappai.2022.105597_b22) 2017; 31
Reynolds (10.1016/j.engappai.2022.105597_b18) 1995; 3
Zhang (10.1016/j.engappai.2022.105597_b24) 2018; 26
FHWA (10.1016/j.engappai.2022.105597_b2) 2020
Tang (10.1016/j.engappai.2022.105597_b37) 2016; 25
Gaikwad (10.1016/j.engappai.2022.105597_b15) 2010; 10
Wang (10.1016/j.engappai.2022.105597_b26) 2020
Li (10.1016/j.engappai.2022.105597_b44) 2022
10.1016/j.engappai.2022.105597_b38
Bai (10.1016/j.engappai.2022.105597_b23) 2021; 140
Lukic (10.1016/j.engappai.2022.105597_b32) 2016
Anand (10.1016/j.engappai.2022.105597_b42) 2019
Parveen (10.1016/j.engappai.2022.105597_b29) 2000
10.1016/j.engappai.2022.105597_b35
Hanifa (10.1016/j.engappai.2022.105597_b16) 2021; 90
Chauhan (10.1016/j.engappai.2022.105597_b31) 2019
Ye (10.1016/j.engappai.2022.105597_b33) 2021; 11
Chen (10.1016/j.engappai.2022.105597_b9) 2014
Black (10.1016/j.engappai.2022.105597_b1) 2022
Dehak (10.1016/j.engappai.2022.105597_b19) 2010; 19
FHWA (10.1016/j.engappai.2022.105597_b3) 2022
Hussain (10.1016/j.engappai.2022.105597_b39) 2022
Sigtia (10.1016/j.engappai.2022.105597_b34) 2020
Garcia-Romero (10.1016/j.engappai.2022.105597_b27) 2020
Mukherjee (10.1016/j.engappai.2022.105597_b13) 2019
Gong (10.1016/j.engappai.2022.105597_b14) 2021
Fu (10.1016/j.engappai.2022.105597_b40) 2021
10.1016/j.engappai.2022.105597_b28
Tang (10.1016/j.engappai.2022.105597_b10) 2018
Holone (10.1016/j.engappai.2022.105597_b8) 2015; 9
Nossier (10.1016/j.engappai.2022.105597_b12) 2020
Heigold (10.1016/j.engappai.2022.105597_b17) 2016
Arriany (10.1016/j.engappai.2022.105597_b7) 2016
10.1016/j.engappai.2022.105597_b41
López-Espejo (10.1016/j.engappai.2022.105597_b36) 2020; 28
10.1016/j.engappai.2022.105597_b43
ASCE (10.1016/j.engappai.2022.105597_b4) 2021
Li (10.1016/j.engappai.2022.105597_b5) 2022; 52
Goodrich (10.1016/j.engappai.2022.105597_b6) 2008; 1
Chauhan (10.1016/j.engappai.2022.105597_b30) 2017
Becker (10.1016/j.engappai.2022.105597_b45) 2018
Peter (10.1016/j.engappai.2022.105597_b11) 2022
Yadav (10.1016/j.engappai.2022.105597_b25) 2018
Kenny (10.1016/j.engappai.2022.105597_b20) 2014
Variani (10.1016/j.engappai.2022.105597_b21) 2014
References_xml – year: 2021
  ident: b4
  article-title: 2021 Report Card for America Infrastructure
– year: 2019
  ident: b42
  article-title: Few shot speaker recognition using deep neural networks
– start-page: 4087
  year: 2014
  end-page: 4091
  ident: b9
  article-title: Small-footprint keyword spotting using deep neural networks
  publication-title: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
– reference: El Shafey, L., Soltau, H., Shafran, I., 2019. Joint Speech Recognition and Speaker Diarization via Sequence Transduction. In: Proc. Interspeech. pp. 396–400.
– volume: 25
  start-page: 493
  year: 2016
  end-page: 504
  ident: b37
  article-title: Collaborative joint training with multitask recurrent model for speech and speaker recognition
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
– volume: 19
  start-page: 788
  year: 2010
  end-page: 798
  ident: b19
  article-title: Front-end factor analysis for speaker verification
  publication-title: IEEE Trans. Audio Speech Lang. Process.
– start-page: 293
  year: 2014
  end-page: 298
  ident: b20
  article-title: Deep neural networks for extracting baum-welch statistics for speaker recognition
  publication-title: Odyssey, Vol. 2014
– start-page: 6844
  year: 2020
  end-page: 6848
  ident: b34
  article-title: Multi-task learning for speaker verification and voice trigger detection
  publication-title: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
– year: 2018
  ident: b45
  article-title: Interpreting and explaining deep neural networks for classification of audio signals
– start-page: 5115
  year: 2016
  end-page: 5119
  ident: b17
  article-title: End-to-end text-dependent speaker verification
  publication-title: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
– reference: Jung, M., Jung, Y., Goo, J., Kim, H., 2020. Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification Using CTC-Based Soft VAD and Global Query Attention. In: Proc. Interspeech 2020. pp. 931–935.
– year: 2022
  ident: b3
  article-title: Status of the nation’s highways, bridges and transit condition and performance report
– year: 2022
  ident: b1
  article-title: 2022 Bridge Report
– volume: 140
  start-page: 65
  year: 2021
  end-page: 99
  ident: b23
  article-title: Speaker recognition based on deep learning: An overview
  publication-title: Neural Netw.
– start-page: 130
  year: 2019
  end-page: 133
  ident: b31
  article-title: Speaker recognition using LPC, MFCC, ZCR features with ANN and SVM classifier for large input database
  publication-title: 2019 IEEE 4th International Conference on Computer and Communication Systems (ICCCS)
– volume: 90
  year: 2021
  ident: b16
  article-title: A review on speaker recognition: Technology and challenges
  publication-title: Comput. Electr. Eng.
– year: 2022
  ident: b44
  article-title: Speech dataset for drone assisted inspection
– volume: 28
  start-page: 1233
  year: 2020
  end-page: 1247
  ident: b36
  article-title: Improved external speaker-robust keyword spotting for hearing assistive devices
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
– start-page: 1147
  year: 2017
  end-page: 1149
  ident: b30
  article-title: Speaker recognition and verification using artificial neural network
  publication-title: 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET)
– start-page: 6137
  year: 2022
  end-page: 6141
  ident: b39
  article-title: Multi-task voice activated framework using self-supervised learning
  publication-title: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
– volume: 52
  start-page: 591
  year: 2022
  end-page: 601
  ident: b5
  article-title: A virtual-reality-based training and assessment system for bridge inspectors with an assistant drone
  publication-title: IEEE Trans. Hum.-Mach. Syst.
– start-page: 4052
  year: 2014
  end-page: 4056
  ident: b21
  article-title: Deep neural networks for small footprint text-dependent speaker verification
  publication-title: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
– volume: 9
  start-page: 1933
  year: 2015
  end-page: 1942
  ident: b8
  article-title: Possibilities, challenges and the state of the art of automatic speech recognition in air traffic control
  publication-title: Int. J. Comput. Inf. Eng.
– start-page: 1
  year: 2016
  end-page: 6
  ident: b32
  article-title: Speaker identification and clustering using convolutional neural networks
  publication-title: 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP)
– volume: 11
  start-page: 3603
  year: 2021
  ident: b33
  article-title: A deep neural network model for speaker identification
  publication-title: Appl. Sci.
– start-page: 6464
  year: 2020
  end-page: 6468
  ident: b26
  article-title: Multi-resolution multi-head attention in deep speaker embedding
  publication-title: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
– reference: Chung, J.S., Nagrani, A., Zisserman, A., 2018. VoxCeleb2: Deep Speaker Recognition. In: Proc. Interspeech 2018. pp. 1086–1090.
– start-page: 306
  year: 2000
  end-page: 309
  ident: b29
  article-title: Speaker recognition with recurrent neural networks.
  publication-title: Interspeech
– volume: 10
  start-page: 16
  year: 2010
  end-page: 24
  ident: b15
  article-title: A review on speech recognition technique
  publication-title: Int. J. Comput. Appl.
– volume: 26
  start-page: 1633
  year: 2018
  end-page: 1644
  ident: b24
  article-title: Text-independent speaker verification based on triplet convolutional neural network embeddings
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
– reference: Li, R., Jiang, J.-Y., Li, J.L., Hsieh, C.-C., Wang, W., 2020. Automatic speaker recognition with limited data. In: Proceedings of the 13th International Conference on Web Search and Data Mining. pp. 340–348.
– start-page: 1
  year: 2021
  end-page: 6
  ident: b14
  article-title: AST: Audio spectrogram transformer
  publication-title: 2021 INTERSPEACH
– volume: 1
  start-page: 203
  year: 2008
  end-page: 275
  ident: b6
  article-title: Human-robot interaction: a survey
  publication-title: Foundations and Trends® in Human–Computer Interaction
– start-page: 1
  year: 2016
  end-page: 6
  ident: b7
  article-title: Applying voice recognition technology for smart home networks
  publication-title: 2016 International Conference on Engineering & MIS (ICEMIS)
– start-page: 3423
  year: 2022
  end-page: 3427
  ident: b11
  article-title: End-to-end keyword spotting using neural architecture search and quantization
  publication-title: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
– start-page: 1
  year: 2020
  end-page: 8
  ident: b12
  article-title: A comparative study of time and frequency domain approaches to deep learning based speech enhancementsainath2015convolutional
  publication-title: 2020 International Joint Conference on Neural Networks (IJCNN)
– year: 2020
  ident: b2
  article-title: Highway Statistics 2020
– volume: 3
  start-page: 72
  year: 1995
  end-page: 83
  ident: b18
  article-title: Robust text-independent speaker identification using Gaussian mixture speaker models
  publication-title: IEEE Trans. Speech Audio Process.
– reference: He, K., Zhang, X., Ren, S., Sun, J., 2016. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 770–778.
– start-page: 5484
  year: 2018
  end-page: 5488
  ident: b10
  article-title: Deep residual learning for small-footprint keyword spotting
  publication-title: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
– start-page: 320
  year: 2021
  end-page: 327
  ident: b40
  article-title: Incremental learning for end-to-end automatic speech recognition
  publication-title: 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
– start-page: 37
  year: 2019
  end-page: 41
  ident: b13
  article-title: Spoken language recognition using CNN
  publication-title: 2019 International Conference on Information Technology (ICIT)
– volume: 31
  year: 2017
  ident: b22
  article-title: Feature extraction methods for speaker recognition: A review
  publication-title: Int. J. Pattern Recognit. Artif. Intell.
– start-page: 2237
  year: 2018
  end-page: 2241
  ident: b25
  article-title: Learning discriminative features for speaker identification and verification.
  publication-title: Interspeech
– start-page: 7559
  year: 2020
  end-page: 7563
  ident: b27
  article-title: Jhu-hltcoe system for the voxsrc speaker recognition challenge
  publication-title: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
– start-page: 6464
  year: 2020
  ident: 10.1016/j.engappai.2022.105597_b26
  article-title: Multi-resolution multi-head attention in deep speaker embedding
– start-page: 1
  year: 2020
  ident: 10.1016/j.engappai.2022.105597_b12
  article-title: A comparative study of time and frequency domain approaches to deep learning based speech enhancementsainath2015convolutional
– start-page: 2237
  year: 2018
  ident: 10.1016/j.engappai.2022.105597_b25
  article-title: Learning discriminative features for speaker identification and verification.
– start-page: 4087
  year: 2014
  ident: 10.1016/j.engappai.2022.105597_b9
  article-title: Small-footprint keyword spotting using deep neural networks
– volume: 90
  year: 2021
  ident: 10.1016/j.engappai.2022.105597_b16
  article-title: A review on speaker recognition: Technology and challenges
  publication-title: Comput. Electr. Eng.
– year: 2020
  ident: 10.1016/j.engappai.2022.105597_b2
– volume: 25
  start-page: 493
  issue: 3
  year: 2016
  ident: 10.1016/j.engappai.2022.105597_b37
  article-title: Collaborative joint training with multitask recurrent model for speech and speaker recognition
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASLP.2016.2639323
– volume: 31
  issue: 12
  year: 2017
  ident: 10.1016/j.engappai.2022.105597_b22
  article-title: Feature extraction methods for speaker recognition: A review
  publication-title: Int. J. Pattern Recognit. Artif. Intell.
  doi: 10.1142/S0218001417500410
– start-page: 306
  year: 2000
  ident: 10.1016/j.engappai.2022.105597_b29
  article-title: Speaker recognition with recurrent neural networks.
– ident: 10.1016/j.engappai.2022.105597_b43
  doi: 10.1109/CVPR.2016.90
– volume: 3
  start-page: 72
  issue: 1
  year: 1995
  ident: 10.1016/j.engappai.2022.105597_b18
  article-title: Robust text-independent speaker identification using Gaussian mixture speaker models
  publication-title: IEEE Trans. Speech Audio Process.
  doi: 10.1109/89.365379
– volume: 140
  start-page: 65
  year: 2021
  ident: 10.1016/j.engappai.2022.105597_b23
  article-title: Speaker recognition based on deep learning: An overview
  publication-title: Neural Netw.
  doi: 10.1016/j.neunet.2021.03.004
– volume: 26
  start-page: 1633
  issue: 9
  year: 2018
  ident: 10.1016/j.engappai.2022.105597_b24
  article-title: Text-independent speaker verification based on triplet convolutional neural network embeddings
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASLP.2018.2831456
– start-page: 1147
  year: 2017
  ident: 10.1016/j.engappai.2022.105597_b30
  article-title: Speaker recognition and verification using artificial neural network
– volume: 11
  start-page: 3603
  issue: 8
  year: 2021
  ident: 10.1016/j.engappai.2022.105597_b33
  article-title: A deep neural network model for speaker identification
  publication-title: Appl. Sci.
  doi: 10.3390/app11083603
– volume: 28
  start-page: 1233
  year: 2020
  ident: 10.1016/j.engappai.2022.105597_b36
  article-title: Improved external speaker-robust keyword spotting for hearing assistive devices
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASLP.2020.2984089
– start-page: 1
  year: 2021
  ident: 10.1016/j.engappai.2022.105597_b14
  article-title: AST: Audio spectrogram transformer
– start-page: 130
  year: 2019
  ident: 10.1016/j.engappai.2022.105597_b31
  article-title: Speaker recognition using LPC, MFCC, ZCR features with ANN and SVM classifier for large input database
– start-page: 37
  year: 2019
  ident: 10.1016/j.engappai.2022.105597_b13
  article-title: Spoken language recognition using CNN
– start-page: 3423
  year: 2022
  ident: 10.1016/j.engappai.2022.105597_b11
  article-title: End-to-end keyword spotting using neural architecture search and quantization
– year: 2022
  ident: 10.1016/j.engappai.2022.105597_b1
– start-page: 320
  year: 2021
  ident: 10.1016/j.engappai.2022.105597_b40
  article-title: Incremental learning for end-to-end automatic speech recognition
– year: 2018
  ident: 10.1016/j.engappai.2022.105597_b45
– year: 2021
  ident: 10.1016/j.engappai.2022.105597_b4
– start-page: 5115
  year: 2016
  ident: 10.1016/j.engappai.2022.105597_b17
  article-title: End-to-end text-dependent speaker verification
– start-page: 1
  year: 2016
  ident: 10.1016/j.engappai.2022.105597_b32
  article-title: Speaker identification and clustering using convolutional neural networks
– start-page: 6844
  year: 2020
  ident: 10.1016/j.engappai.2022.105597_b34
  article-title: Multi-task learning for speaker verification and voice trigger detection
– volume: 52
  start-page: 591
  issue: 4
  year: 2022
  ident: 10.1016/j.engappai.2022.105597_b5
  article-title: A virtual-reality-based training and assessment system for bridge inspectors with an assistant drone
  publication-title: IEEE Trans. Hum.-Mach. Syst.
  doi: 10.1109/THMS.2022.3155373
– volume: 9
  start-page: 1933
  issue: 8
  year: 2015
  ident: 10.1016/j.engappai.2022.105597_b8
  article-title: Possibilities, challenges and the state of the art of automatic speech recognition in air traffic control
  publication-title: Int. J. Comput. Inf. Eng.
– ident: 10.1016/j.engappai.2022.105597_b41
  doi: 10.1145/3336191.3371802
– year: 2022
  ident: 10.1016/j.engappai.2022.105597_b3
  article-title: Status of the nation’s highways, bridges and transit condition and performance report
– start-page: 4052
  year: 2014
  ident: 10.1016/j.engappai.2022.105597_b21
  article-title: Deep neural networks for small footprint text-dependent speaker verification
– start-page: 293
  year: 2014
  ident: 10.1016/j.engappai.2022.105597_b20
  article-title: Deep neural networks for extracting baum-welch statistics for speaker recognition
– year: 2022
  ident: 10.1016/j.engappai.2022.105597_b44
– volume: 1
  start-page: 203
  issue: 3
  year: 2008
  ident: 10.1016/j.engappai.2022.105597_b6
  article-title: Human-robot interaction: a survey
  publication-title: Foundations and Trends® in Human–Computer Interaction
  doi: 10.1561/1100000005
– volume: 19
  start-page: 788
  issue: 4
  year: 2010
  ident: 10.1016/j.engappai.2022.105597_b19
  article-title: Front-end factor analysis for speaker verification
  publication-title: IEEE Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASL.2010.2064307
– ident: 10.1016/j.engappai.2022.105597_b35
  doi: 10.21437/Interspeech.2019-1943
– start-page: 7559
  year: 2020
  ident: 10.1016/j.engappai.2022.105597_b27
  article-title: Jhu-hltcoe system for the voxsrc speaker recognition challenge
– ident: 10.1016/j.engappai.2022.105597_b28
  doi: 10.21437/Interspeech.2018-1929
– volume: 10
  start-page: 16
  issue: 3
  year: 2010
  ident: 10.1016/j.engappai.2022.105597_b15
  article-title: A review on speech recognition technique
  publication-title: Int. J. Comput. Appl.
– start-page: 6137
  year: 2022
  ident: 10.1016/j.engappai.2022.105597_b39
  article-title: Multi-task voice activated framework using self-supervised learning
– year: 2019
  ident: 10.1016/j.engappai.2022.105597_b42
– start-page: 1
  year: 2016
  ident: 10.1016/j.engappai.2022.105597_b7
  article-title: Applying voice recognition technology for smart home networks
– start-page: 5484
  year: 2018
  ident: 10.1016/j.engappai.2022.105597_b10
  article-title: Deep residual learning for small-footprint keyword spotting
– ident: 10.1016/j.engappai.2022.105597_b38
  doi: 10.21437/Interspeech.2020-1420
SSID ssj0003846
Score 2.3922398
Snippet Audio commands are a preferred communication medium to keep inspectors in the loop of civil infrastructure inspection performed by a semi-autonomous drone. To...
SourceID crossref
elsevier
SourceType Enrichment Source
Index Database
Publisher
StartPage 105597
SubjectTerms Human-in-the-loop
Human–robot interaction
Infrastructure inspection
Keyword classification
Speaker recognition
Title A multi-tasking model of speaker-keyword classification for keeping human in the loop of drone-assisted inspection
URI https://dx.doi.org/10.1016/j.engappai.2022.105597
Volume 117
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07a8MwEBYhXbr0XfoMGroqjmXLcsYQGtIGMrQNzWZkRSp5YBsnpVt_e-_8aFMoZCgejGUdNjqd7rvjHoTcCcFjT7sus2BdMB8gPwuDjmRWuq4N4JnbIsp3HAwn_uNUTBukX-fCYFhldfaXZ3pxWlcjTrWaTjafO88ADkDcQJjRMQKwGzPYfYn189ufP2EeXlgm68BkhrO3soQXbZO8qSxTc7ATOceWtwKLP_2loLaUzuCIHFRokfbKHzomDZOckMMKOdJKLtcwVDdnqMdOSd6jRbAg26g1usNp0fOGppauM6OWJmcgvh9gelKNABojhgomUUCxdGkM5lHRooMfnScUYCJdpWmG9LM8TQxDGnSVwtsyWzNNzshkcP_SH7KqwQLTnss3DLssgcEgAwsoRym_o92ZVFICLJCh5VwJLoWwHT9WXZB931rtaokV2kIbe3HHOyfNBD55QagSOvRdBWgH9L3UPNRGAZqadTVcgCIuiahXNdJV9XFsgrGK6jCzRVRzI0JuRCU3LonzTZeV9Td2UnRrpkW_dlIESmIH7dU_aK_JPraiL90zN6S5yd_NLQCWTdwqdmSL7PUeRsMx3kdPr6Mvxynr6w
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV1LT8MwDI4GO8CFN-JNDlxD17RpuuOEQBuPXdgkblWaJWgbaqtuiL-P3abTkJA4oJ6a1moVx_Znyw9CboTgaaB9n1nwLlgIkJ_FUUcyK33fRnDPbZXlO4z64_DxTby1yF1TC4NplU731zq90tZuxXO76RXTqfcK4ADEDYQZAyMAuzdIG7tTwWFv9wZP_eFKIQdxXa8D7zMkWCsUnt2a7F0VhZqCq8g5Tr0V2P_pNxu1Znce9siOA4y0V__TPmmZ7IDsOvBInWguYKmZz9CsHZKyR6t8QbZUC4yI02rsDc0tXRRGzU3JQIK_wPukGjE0Jg1VfKIAZOncGCylotUQPzrNKCBF-pHnBdJPyjwzDGkwWgpP64LNPDsi44f70V2fuRkLTAc-XzIctAQ-g4wsAB2lwo72J1JJCchAxpZzJbgUwnbCVHVB_ENrta8lNmmLbRqkneCYbGbwyRNCldBx6CsAPGDypeaxNgoA1aSr4QIgcUpEs6uJdg3IcQ7GR9Jkms2ShhsJciOpuXFKvBVdUbfg-JOi2zAt-XGYErATf9Ce_YP2mmz1Ry_PyfNg-HROtnEyfR2tuSCby_LTXAJ-WaZX7nx-A3h57Pk
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+multi-tasking+model+of+speaker-keyword+classification+for+keeping+human+in+the+loop+of+drone-assisted+inspection&rft.jtitle=Engineering+applications+of+artificial+intelligence&rft.au=Li%2C+Yu&rft.au=Parsan%2C+Anisha&rft.au=Wang%2C+Bill&rft.au=Dong%2C+Penghao&rft.date=2023-01-01&rft.issn=0952-1976&rft.volume=117&rft.spage=105597&rft_id=info:doi/10.1016%2Fj.engappai.2022.105597&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_engappai_2022_105597
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0952-1976&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0952-1976&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0952-1976&client=summon