Semi-Supervised Training of Transformer and Causal Dilated Convolution Network with Applications to Speech Topic Classification

Aiming at the audio event recognition problem of speech recognition, a decision fusion method based on the Transformer and Causal Dilated Convolutional Network (TCDCN) framework is proposed. This method can adjust the model sound events for a long time and capture the time correlation, and can effec...

Full description

Saved in:
Bibliographic Details
Published inApplied sciences Vol. 11; no. 12; p. 5712
Main Authors Zeng, Jinxiang, Zhang, Du, Li, Zhiyi, Li, Xiaolin
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 01.06.2021
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Aiming at the audio event recognition problem of speech recognition, a decision fusion method based on the Transformer and Causal Dilated Convolutional Network (TCDCN) framework is proposed. This method can adjust the model sound events for a long time and capture the time correlation, and can effectively deal with the sparsity of audio data. At the same time, our dataset comes from audio clips cropped by YouTube. In order to reliably and stably identify audio topics, we extract different features and different loss function calculation methods to find the best model solution. The experimental results from different test models show that the TCDCN model proposed in this paper achieves better recognition results than the classification using neural networks and other fusion methods.
AbstractList Aiming at the audio event recognition problem of speech recognition, a decision fusion method based on the Transformer and Causal Dilated Convolutional Network (TCDCN) framework is proposed. This method can adjust the model sound events for a long time and capture the time correlation, and can effectively deal with the sparsity of audio data. At the same time, our dataset comes from audio clips cropped by YouTube. In order to reliably and stably identify audio topics, we extract different features and different loss function calculation methods to find the best model solution. The experimental results from different test models show that the TCDCN model proposed in this paper achieves better recognition results than the classification using neural networks and other fusion methods.
Author Li, Xiaolin
Zhang, Du
Li, Zhiyi
Zeng, Jinxiang
Author_xml – sequence: 1
  givenname: Jinxiang
  orcidid: 0000-0002-5218-2542
  surname: Zeng
  fullname: Zeng, Jinxiang
– sequence: 2
  givenname: Du
  surname: Zhang
  fullname: Zhang, Du
– sequence: 3
  givenname: Zhiyi
  orcidid: 0000-0001-6407-2554
  surname: Li
  fullname: Li, Zhiyi
– sequence: 4
  givenname: Xiaolin
  orcidid: 0000-0001-7612-7612
  surname: Li
  fullname: Li, Xiaolin
BookMark eNpNUU1v1DAUjFArtZSe-gcscUQBfyT-OFaBQqUKDrs9W4793HrJ2sZOWnHir5NlK9R3eaM3o5knzdvmJKYITXNF8EfGFP5kciaE0F4Q-qY5p1jwlnVEnLzCZ81lrTu8jiJMEnze_NnAPrSbJUN5ChUc2hYTYogPKPkDjtWnsoeCTHRoMEs1E_ocJjOv0iHFpzQtc0gRfYf5OZWf6DnMj-g65ylYcyAqmhPaZAD7iLYpB4uGydQa_Av_rjn1Zqpw-bIvmvubL9vhW3v34-vtcH3XWqq6uSWeKamM4RKDEL2hVo4eCAFHKBAhueiYG4VTVMjReokJ76kVIxDwSjjGLprbo69LZqdzCXtTfutkgv53SOVBmzIHO4EGJzkngmOgpFMOpAdu8epHDxF8XL3eH71ySb8WqLPepaXE9X1N-65TvaBUrqoPR5UtqdYC_n8qwfpQmH5VGPsLLqyLbA
CitedBy_id crossref_primary_10_1109_ACCESS_2023_3318015
crossref_primary_10_3390_app12125984
Cites_doi 10.1007/978-0-387-74935-8
10.23919/FRUCT.2019.8711906
10.1016/j.csl.2015.03.006
10.1109/29.46546
10.1007/978-3-319-07569-3_8
10.1007/s00521-020-05569-0
10.1016/j.csl.2020.101157
10.1007/978-3-642-32692-9_67
10.21437/Interspeech.2015-350
10.1007/978-3-030-22948-1
10.1016/j.eswa.2014.07.035
10.1109/ICASSP40776.2020.9053896
10.1016/j.csl.2013.05.002
10.1007/3-540-33486-6
10.1109/TASLP.2020.3014737
10.1109/ICASSP.2018.8461392
10.1007/978-3-642-21512-4_23
10.1109/89.593305
10.1002/j.1538-7305.1983.tb03114.x
10.1016/j.specom.2003.08.002
ContentType Journal Article
Copyright 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
ABUWG
AFKRA
AZQEC
BENPR
CCPQU
DWQXO
PIMPY
PQEST
PQQKQ
PQUKI
PRINS
DOA
DOI 10.3390/app11125712
DatabaseName CrossRef
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
ProQuest Central Essentials
AUTh Library subscriptions: ProQuest Central
ProQuest One Community College
ProQuest Central Korea
ProQuest Publicly Available Content database
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
Publicly Available Content Database
ProQuest Central
ProQuest One Academic UKI Edition
ProQuest Central Essentials
ProQuest Central Korea
ProQuest One Academic Eastern Edition
ProQuest Central (Alumni Edition)
ProQuest One Community College
ProQuest One Academic
ProQuest Central China
DatabaseTitleList
Publicly Available Content Database
CrossRef
Database_xml – sequence: 1
  dbid: DOA
  name: Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: BENPR
  name: AUTh Library subscriptions: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Sciences (General)
EISSN 2076-3417
ExternalDocumentID oai_doaj_org_article_ed8661760e2149de8fe6c065223db76b
10_3390_app11125712
GroupedDBID .4S
2XV
5VS
7XC
8CJ
8FE
8FG
8FH
AADQD
AAFWJ
AAYXX
ABJCF
ADBBV
AFKRA
AFPKN
AFZYC
ALMA_UNASSIGNED_HOLDINGS
APEBS
ARAPS
ARCSS
ATCPS
BBNVY
BCNDV
BENPR
BHPHI
BKSAR
CCPQU
CITATION
CZ9
D1I
D1J
D1K
GROUPED_DOAJ
HCIFZ
IAO
ITC
K6-
K6V
K7-
KB.
KC.
KQ8
L6V
LK5
LK8
M0K
M7P
M7R
M7S
MODMG
M~E
N95
OK1
P62
PATMY
PCBAR
PDBOC
PIMPY
PROAC
PYCSY
RIG
TUS
ABUWG
AZQEC
DWQXO
PQEST
PQQKQ
PQUKI
PRINS
ID FETCH-LOGICAL-c294t-1f3989aa680e775a2c8bfe11ed12e1786743db7d9278bcf801652c7be1ef97d33
IEDL.DBID DOA
ISSN 2076-3417
IngestDate Tue Oct 22 15:12:08 EDT 2024
Thu Oct 10 17:22:15 EDT 2024
Wed Jul 17 12:58:01 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 12
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c294t-1f3989aa680e775a2c8bfe11ed12e1786743db7d9278bcf801652c7be1ef97d33
ORCID 0000-0001-7612-7612
0000-0001-6407-2554
0000-0002-5218-2542
OpenAccessLink https://doaj.org/article/ed8661760e2149de8fe6c065223db76b
PQID 2544957228
PQPubID 2032433
ParticipantIDs doaj_primary_oai_doaj_org_article_ed8661760e2149de8fe6c065223db76b
proquest_journals_2544957228
crossref_primary_10_3390_app11125712
PublicationCentury 2000
PublicationDate 2021-06-01
PublicationDateYYYYMMDD 2021-06-01
PublicationDate_xml – month: 06
  year: 2021
  text: 2021-06-01
  day: 01
PublicationDecade 2020
PublicationPlace Basel
PublicationPlace_xml – name: Basel
PublicationTitle Applied sciences
PublicationYear 2021
Publisher MDPI AG
Publisher_xml – name: MDPI AG
References Deng (ref_3) 1997; 5
ref_14
ref_13
ref_12
ref_23
ref_11
ref_22
ref_10
Bost (ref_19) 2015; 34
Lee (ref_2) 1989; 37
ref_20
Kong (ref_21) 2020; 28
Huang (ref_16) 2018; 52
Montero (ref_8) 2015; 42
Bellegarda (ref_7) 2004; 42
ref_18
Mohamed (ref_4) 2009; 4
Siu (ref_9) 2014; 28
ref_17
ref_15
Levinson (ref_1) 1983; 62
ref_5
ref_6
References_xml – volume: 52
  start-page: 351
  year: 2018
  ident: ref_16
  article-title: Automatic meeting summarization and topic detection system
  publication-title: Data Technol. Appl.
  contributor:
    fullname: Huang
– ident: ref_10
  doi: 10.1007/978-0-387-74935-8
– ident: ref_20
  doi: 10.23919/FRUCT.2019.8711906
– volume: 34
  start-page: 18
  year: 2015
  ident: ref_19
  article-title: Multiple topic identification in human/human conversations
  publication-title: Comput. Speech Lang.
  doi: 10.1016/j.csl.2015.03.006
  contributor:
    fullname: Bost
– volume: 37
  start-page: 1641
  year: 1989
  ident: ref_2
  article-title: Speaker-independent phone recognition using hidden Markov models
  publication-title: IEEE Trans. Acoust. Speech Signal Process.
  doi: 10.1109/29.46546
  contributor:
    fullname: Lee
– ident: ref_5
– ident: ref_17
  doi: 10.1007/978-3-319-07569-3_8
– ident: ref_13
  doi: 10.1007/s00521-020-05569-0
– volume: 4
  start-page: 1
  year: 2009
  ident: ref_4
  article-title: Deep Belief Networks for phone recognition
  publication-title: Scholarpedia
  contributor:
    fullname: Mohamed
– ident: ref_12
  doi: 10.1016/j.csl.2020.101157
– ident: ref_14
  doi: 10.1007/978-3-642-32692-9_67
– ident: ref_6
  doi: 10.21437/Interspeech.2015-350
– ident: ref_15
  doi: 10.1007/978-3-030-22948-1
– volume: 42
  start-page: 101
  year: 2015
  ident: ref_8
  article-title: Topic identification techniques applied to dynamic language model adaptation for automatic speech recognition
  publication-title: Expert Syst. Appl.
  doi: 10.1016/j.eswa.2014.07.035
  contributor:
    fullname: Montero
– ident: ref_23
  doi: 10.1109/ICASSP40776.2020.9053896
– volume: 28
  start-page: 210
  year: 2014
  ident: ref_9
  article-title: Unsupervised training of an HMM-Based self-organizing unit recognizer with applications to topic classification and keyword discovery
  publication-title: Comput. Speech Lang.
  doi: 10.1016/j.csl.2013.05.002
  contributor:
    fullname: Siu
– ident: ref_11
  doi: 10.1007/3-540-33486-6
– volume: 28
  start-page: 2450
  year: 2020
  ident: ref_21
  article-title: Sound Event Detection of Weakly Labelled Data with CNN-Transformer and Automatic Threshold Optimization
  publication-title: IEEE/ACM Trans. Audio Speech Lang. Process.
  doi: 10.1109/TASLP.2020.3014737
  contributor:
    fullname: Kong
– ident: ref_22
  doi: 10.1109/ICASSP.2018.8461392
– ident: ref_18
  doi: 10.1007/978-3-642-21512-4_23
– volume: 5
  start-page: 319
  year: 1997
  ident: ref_3
  article-title: Speaker-Independent phonetic classification using hidden Markovmodels with mixtures of trend functions
  publication-title: IEEE Trans. Speech Audio Process.
  doi: 10.1109/89.593305
  contributor:
    fullname: Deng
– volume: 62
  start-page: 1035
  year: 1983
  ident: ref_1
  article-title: An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition
  publication-title: Bell Syst. Tech. J.
  doi: 10.1002/j.1538-7305.1983.tb03114.x
  contributor:
    fullname: Levinson
– volume: 42
  start-page: 93
  year: 2004
  ident: ref_7
  article-title: Statistical language model adaptation: Review and perspectives
  publication-title: Speech Commun.
  doi: 10.1016/j.specom.2003.08.002
  contributor:
    fullname: Bellegarda
SSID ssj0000913810
Score 2.2458096
Snippet Aiming at the audio event recognition problem of speech recognition, a decision fusion method based on the Transformer and Causal Dilated Convolutional Network...
SourceID doaj
proquest
crossref
SourceType Open Website
Aggregation Database
StartPage 5712
SubjectTerms Accuracy
Acoustics
Audio data
automatic speech recognition
Classification
Convolution
Deep learning
Neural networks
Noise
semi-supervised learning
semi-supervised training
Speech
Speech recognition
topic classification
Transformer and Causal Dilated Convolution Network
Voice recognition
SummonAdditionalLinks – databaseName: ProQuest Technology Collection
  dbid: 8FG
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LT9tAEF5BeoFDVV4iLa32wAEOFtn1Y9eniqYNUSW4JEjcLO_uLESithsnXPnrnXE2EFSJo72WbHlmdr95fcPYqQSf2MzKyKSowUmpTWQSV0Y6ITSurAND_c7XN9n4Nvl9l96FgFsbyirXe2K3UbvaUoz8gqi08lRJqb83fyOaGkXZ1TBCY5t9EFIpKunSo6uXGAtxXmoxWLXlxejdU1YYjRvVVMg3B1HH1__fdtydMaNP7GMAh_xyJc09tgXVPtvdoAzcZ3vBGFt-Fhijzw_Y8wT-zKLJsiHDb8HxaRj8wGvPp2toCnNeVo4Py2WLL_k5e0SYiZd19RTUj9-sasI5BWf55UZqmy9qPmkA7AOf1s3M8m6WJlUZdeuH7Hb0azocR2GyQmRlniwi4eNc52WZ6QEolZbSauNBCHBCglCaOhOcUS6XShvrNfU8SasMCPC5cnF8xHpVXcEx4-jhlJlBEBnrNIltYjSK2isgrjnltO-z0_VvLpoVgUaBjgdJo9iQRp_9IBG8PEKs192Nen5fBCMqwGmEEyobgETHzoH2kFnEUAhx8Gsz02cnawEWwRTb4lVxPr-__IXtSCpY6UIsJ6y3mC_hKyKOhfnWqdU_QDXX_g
  priority: 102
  providerName: ProQuest
Title Semi-Supervised Training of Transformer and Causal Dilated Convolution Network with Applications to Speech Topic Classification
URI https://www.proquest.com/docview/2544957228
https://doaj.org/article/ed8661760e2149de8fe6c065223db76b
Volume 11
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT-MwEB4t5bIcVkt3V3SBygcOcIi2cR52joWlVEhUK1okblFsj0UlSKo-uPLXGTspyooDF45JLDnyzHi-sWe-ATjhaGOdah6ohDQ4LqQKVGyKQMYOjQttULl655tJOr6Lr--T-1arL5cTVtMD1wv3B40kFyLSAXIC8walxVST3yS3ZpRIld99B1krmPJ7cBY66qq6IC-iuN7dB5NZk4KG_D8X5Jn6323E3ruMvsO3BhayYf07-_AFyy7stcgCu7DfmOGKnTZc0Wc_4GWKT_Ngulk4k1-hYbOm5QOrLJttQSkuWVEadlFsVjTJ3_kjAUx6rMrnRvHYpM4GZ-5Ylg1bl9psXbHpAlE_sFm1mGvmu2i6_CL__SfcjS5nF-Og6akQaJ7F6yC0USazokjlAIVICq6lshiGaEKOoZCuJoHW1WRcSKWtdNVOXAuFIdpMmCj6BZ2yKvEAGMU2RaoIPkYyiSMdK0lCtgIdy5ww0vbgZLvM-aKmzsgp5HDSyFvS6MG5E8HbEMd37V-QFuSNFuQfaUEPjrYCzBsjXOWOfS1LBOfy92fMcQhfuUto8UcwR9BZLzd4TIhkrfqwI0dXfdg9v5z8u-17VXwFmOfivA
link.rule.ids 315,783,787,867,2109,12778,21401,27937,27938,33386,33757,43613,43818,74370,74637
linkProvider Directory of Open Access Journals
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9NAEB5BegAOiBYQgQJ76AEOFvH6sesTakurAG2EiCv1Znl3ZyES2CZOuPLXmXE2JQiJo72WbHlmdr95fQNwJNGnNrcyMhlpcFprE5nU1ZFOGY0r69Bwv_PlLJ9epR-us-sQcOtDWeV2Txw2atdajpG_YSqtIlNS6rfdj4inRnF2NYzQuA17TFWlR7B3cjb79PkmysKslzqebBrzEvLvOS9M5k2KGsu_jqKBsf-fDXk4Zc4fwP0AD8XxRp77cAubA7i3Qxp4APvBHHvxKnBGv34Iv-b4fRHN1x2bfo9OlGH0g2i9KLfgFJeibpw4rdc9veTd4hsBTbpsm59BAcVsUxUuODwrjneS22LVinmHaL-Ksu0WVgzTNLnOaFh_BFfnZ-XpNAqzFSIri3QVxT4pdFHXuZ6gUlktrTYe4xhdLDFWmnsTnFGukEob6zV3PUmrDMboC-WS5DGMmrbBJyDIx6lzQzAy0Vma2NRoErZXyGxzymk_hqPtb666DYVGRa4HS6PakcYYTlgEN48w7_Vwo11-qYIZVeg0AQqVT1CSa-dQe8wtoSgCOfS1uRnD4VaAVTDGvvqjOk__v_wS7kzLy4vq4v3s4zO4K7l8ZQi4HMJotVzjc8IfK_MiKNlvMszcTw
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LT9wwELZakKr2gAptxbaU-sABDhEbx4mdU8VrRR-sELtI3KLYHsNKbZJudrn2r3cm66WLkDgmjpQo8_A345lvGNsT4KXNrIhMihosS20iI10ZaUloXFkHhvqdL4bZ-bX8fpPehPqnNpRVLn1i56hdbSlHfkhUWnmqhNCHPpRFXJ4OvjZ_IpogRSetYZzGS7auZJZgILZ-fDa8vHrIuBADpo77iya9BGN9OiNGU0eljcWjbalj73_inLsdZ_CWbQSoyI8Wst1kL6DaYm9WCAS32GYwzZbvB_7og3fs7wh-T6LRvCE30ILj4zAGgteej5dAFaa8rBw_KectvuR08gtBJ17W1X1QRj5cVIhzStXyo5WDbj6r-agBsHd8XDcTy7vJmlRz1K2_Z9eDs_HJeRTmLERW5HIWxT7JdV6Wme6DUmkprDYe4hhcLCBWmvoUnFEuF0ob6zV1QAmrDMTgc-WS5ANbq-oKthnHeKfMDELKRKcysdJoFLxXQMxzymnfY3vL31w0CzqNAsMQkkaxIo0eOyYRPDxCHNjdjXp6WwSTKsBpBBcq64PAMM-B9pBZRFQIePBrM9NjO0sBFsEw2-K_Gn18fvkLe4X6Vfz8Nvzxib0WVMnS5V522NpsOofPCEVmZjfo2D9JkuCD
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Semi-Supervised+Training+of+Transformer+and+Causal+Dilated+Convolution+Network+with+Applications+to+Speech+Topic+Classification&rft.jtitle=Applied+sciences&rft.au=Jinxiang+Zeng&rft.au=Du+Zhang&rft.au=Zhiyi+Li&rft.au=Xiaolin+Li&rft.date=2021-06-01&rft.pub=MDPI+AG&rft.eissn=2076-3417&rft.volume=11&rft.issue=12&rft.spage=5712&rft_id=info:doi/10.3390%2Fapp11125712&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_ed8661760e2149de8fe6c065223db76b
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2076-3417&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2076-3417&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2076-3417&client=summon