Semi-Supervised Training of Transformer and Causal Dilated Convolution Network with Applications to Speech Topic Classification
Aiming at the audio event recognition problem of speech recognition, a decision fusion method based on the Transformer and Causal Dilated Convolutional Network (TCDCN) framework is proposed. This method can adjust the model sound events for a long time and capture the time correlation, and can effec...
Saved in:
Published in | Applied sciences Vol. 11; no. 12; p. 5712 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Basel
MDPI AG
01.06.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Aiming at the audio event recognition problem of speech recognition, a decision fusion method based on the Transformer and Causal Dilated Convolutional Network (TCDCN) framework is proposed. This method can adjust the model sound events for a long time and capture the time correlation, and can effectively deal with the sparsity of audio data. At the same time, our dataset comes from audio clips cropped by YouTube. In order to reliably and stably identify audio topics, we extract different features and different loss function calculation methods to find the best model solution. The experimental results from different test models show that the TCDCN model proposed in this paper achieves better recognition results than the classification using neural networks and other fusion methods. |
---|---|
AbstractList | Aiming at the audio event recognition problem of speech recognition, a decision fusion method based on the Transformer and Causal Dilated Convolutional Network (TCDCN) framework is proposed. This method can adjust the model sound events for a long time and capture the time correlation, and can effectively deal with the sparsity of audio data. At the same time, our dataset comes from audio clips cropped by YouTube. In order to reliably and stably identify audio topics, we extract different features and different loss function calculation methods to find the best model solution. The experimental results from different test models show that the TCDCN model proposed in this paper achieves better recognition results than the classification using neural networks and other fusion methods. |
Author | Li, Xiaolin Zhang, Du Li, Zhiyi Zeng, Jinxiang |
Author_xml | – sequence: 1 givenname: Jinxiang orcidid: 0000-0002-5218-2542 surname: Zeng fullname: Zeng, Jinxiang – sequence: 2 givenname: Du surname: Zhang fullname: Zhang, Du – sequence: 3 givenname: Zhiyi orcidid: 0000-0001-6407-2554 surname: Li fullname: Li, Zhiyi – sequence: 4 givenname: Xiaolin orcidid: 0000-0001-7612-7612 surname: Li fullname: Li, Xiaolin |
BookMark | eNpNUU1v1DAUjFArtZSe-gcscUQBfyT-OFaBQqUKDrs9W4793HrJ2sZOWnHir5NlK9R3eaM3o5knzdvmJKYITXNF8EfGFP5kciaE0F4Q-qY5p1jwlnVEnLzCZ81lrTu8jiJMEnze_NnAPrSbJUN5ChUc2hYTYogPKPkDjtWnsoeCTHRoMEs1E_ocJjOv0iHFpzQtc0gRfYf5OZWf6DnMj-g65ylYcyAqmhPaZAD7iLYpB4uGydQa_Av_rjn1Zqpw-bIvmvubL9vhW3v34-vtcH3XWqq6uSWeKamM4RKDEL2hVo4eCAFHKBAhueiYG4VTVMjReokJ76kVIxDwSjjGLprbo69LZqdzCXtTfutkgv53SOVBmzIHO4EGJzkngmOgpFMOpAdu8epHDxF8XL3eH71ySb8WqLPepaXE9X1N-65TvaBUrqoPR5UtqdYC_n8qwfpQmH5VGPsLLqyLbA |
CitedBy_id | crossref_primary_10_1109_ACCESS_2023_3318015 crossref_primary_10_3390_app12125984 |
Cites_doi | 10.1007/978-0-387-74935-8 10.23919/FRUCT.2019.8711906 10.1016/j.csl.2015.03.006 10.1109/29.46546 10.1007/978-3-319-07569-3_8 10.1007/s00521-020-05569-0 10.1016/j.csl.2020.101157 10.1007/978-3-642-32692-9_67 10.21437/Interspeech.2015-350 10.1007/978-3-030-22948-1 10.1016/j.eswa.2014.07.035 10.1109/ICASSP40776.2020.9053896 10.1016/j.csl.2013.05.002 10.1007/3-540-33486-6 10.1109/TASLP.2020.3014737 10.1109/ICASSP.2018.8461392 10.1007/978-3-642-21512-4_23 10.1109/89.593305 10.1002/j.1538-7305.1983.tb03114.x 10.1016/j.specom.2003.08.002 |
ContentType | Journal Article |
Copyright | 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | AAYXX CITATION ABUWG AFKRA AZQEC BENPR CCPQU DWQXO PIMPY PQEST PQQKQ PQUKI PRINS DOA |
DOI | 10.3390/app11125712 |
DatabaseName | CrossRef ProQuest Central (Alumni) ProQuest Central UK/Ireland ProQuest Central Essentials AUTh Library subscriptions: ProQuest Central ProQuest One Community College ProQuest Central Korea ProQuest Publicly Available Content database ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef Publicly Available Content Database ProQuest Central ProQuest One Academic UKI Edition ProQuest Central Essentials ProQuest Central Korea ProQuest One Academic Eastern Edition ProQuest Central (Alumni Edition) ProQuest One Community College ProQuest One Academic ProQuest Central China |
DatabaseTitleList | Publicly Available Content Database CrossRef |
Database_xml | – sequence: 1 dbid: DOA name: Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: BENPR name: AUTh Library subscriptions: ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering Sciences (General) |
EISSN | 2076-3417 |
ExternalDocumentID | oai_doaj_org_article_ed8661760e2149de8fe6c065223db76b 10_3390_app11125712 |
GroupedDBID | .4S 2XV 5VS 7XC 8CJ 8FE 8FG 8FH AADQD AAFWJ AAYXX ABJCF ADBBV AFKRA AFPKN AFZYC ALMA_UNASSIGNED_HOLDINGS APEBS ARAPS ARCSS ATCPS BBNVY BCNDV BENPR BHPHI BKSAR CCPQU CITATION CZ9 D1I D1J D1K GROUPED_DOAJ HCIFZ IAO ITC K6- K6V K7- KB. KC. KQ8 L6V LK5 LK8 M0K M7P M7R M7S MODMG M~E N95 OK1 P62 PATMY PCBAR PDBOC PIMPY PROAC PYCSY RIG TUS ABUWG AZQEC DWQXO PQEST PQQKQ PQUKI PRINS |
ID | FETCH-LOGICAL-c294t-1f3989aa680e775a2c8bfe11ed12e1786743db7d9278bcf801652c7be1ef97d33 |
IEDL.DBID | DOA |
ISSN | 2076-3417 |
IngestDate | Tue Oct 22 15:12:08 EDT 2024 Thu Oct 10 17:22:15 EDT 2024 Wed Jul 17 12:58:01 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 12 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c294t-1f3989aa680e775a2c8bfe11ed12e1786743db7d9278bcf801652c7be1ef97d33 |
ORCID | 0000-0001-7612-7612 0000-0001-6407-2554 0000-0002-5218-2542 |
OpenAccessLink | https://doaj.org/article/ed8661760e2149de8fe6c065223db76b |
PQID | 2544957228 |
PQPubID | 2032433 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_ed8661760e2149de8fe6c065223db76b proquest_journals_2544957228 crossref_primary_10_3390_app11125712 |
PublicationCentury | 2000 |
PublicationDate | 2021-06-01 |
PublicationDateYYYYMMDD | 2021-06-01 |
PublicationDate_xml | – month: 06 year: 2021 text: 2021-06-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | Basel |
PublicationPlace_xml | – name: Basel |
PublicationTitle | Applied sciences |
PublicationYear | 2021 |
Publisher | MDPI AG |
Publisher_xml | – name: MDPI AG |
References | Deng (ref_3) 1997; 5 ref_14 ref_13 ref_12 ref_23 ref_11 ref_22 ref_10 Bost (ref_19) 2015; 34 Lee (ref_2) 1989; 37 ref_20 Kong (ref_21) 2020; 28 Huang (ref_16) 2018; 52 Montero (ref_8) 2015; 42 Bellegarda (ref_7) 2004; 42 ref_18 Mohamed (ref_4) 2009; 4 Siu (ref_9) 2014; 28 ref_17 ref_15 Levinson (ref_1) 1983; 62 ref_5 ref_6 |
References_xml | – volume: 52 start-page: 351 year: 2018 ident: ref_16 article-title: Automatic meeting summarization and topic detection system publication-title: Data Technol. Appl. contributor: fullname: Huang – ident: ref_10 doi: 10.1007/978-0-387-74935-8 – ident: ref_20 doi: 10.23919/FRUCT.2019.8711906 – volume: 34 start-page: 18 year: 2015 ident: ref_19 article-title: Multiple topic identification in human/human conversations publication-title: Comput. Speech Lang. doi: 10.1016/j.csl.2015.03.006 contributor: fullname: Bost – volume: 37 start-page: 1641 year: 1989 ident: ref_2 article-title: Speaker-independent phone recognition using hidden Markov models publication-title: IEEE Trans. Acoust. Speech Signal Process. doi: 10.1109/29.46546 contributor: fullname: Lee – ident: ref_5 – ident: ref_17 doi: 10.1007/978-3-319-07569-3_8 – ident: ref_13 doi: 10.1007/s00521-020-05569-0 – volume: 4 start-page: 1 year: 2009 ident: ref_4 article-title: Deep Belief Networks for phone recognition publication-title: Scholarpedia contributor: fullname: Mohamed – ident: ref_12 doi: 10.1016/j.csl.2020.101157 – ident: ref_14 doi: 10.1007/978-3-642-32692-9_67 – ident: ref_6 doi: 10.21437/Interspeech.2015-350 – ident: ref_15 doi: 10.1007/978-3-030-22948-1 – volume: 42 start-page: 101 year: 2015 ident: ref_8 article-title: Topic identification techniques applied to dynamic language model adaptation for automatic speech recognition publication-title: Expert Syst. Appl. doi: 10.1016/j.eswa.2014.07.035 contributor: fullname: Montero – ident: ref_23 doi: 10.1109/ICASSP40776.2020.9053896 – volume: 28 start-page: 210 year: 2014 ident: ref_9 article-title: Unsupervised training of an HMM-Based self-organizing unit recognizer with applications to topic classification and keyword discovery publication-title: Comput. Speech Lang. doi: 10.1016/j.csl.2013.05.002 contributor: fullname: Siu – ident: ref_11 doi: 10.1007/3-540-33486-6 – volume: 28 start-page: 2450 year: 2020 ident: ref_21 article-title: Sound Event Detection of Weakly Labelled Data with CNN-Transformer and Automatic Threshold Optimization publication-title: IEEE/ACM Trans. Audio Speech Lang. Process. doi: 10.1109/TASLP.2020.3014737 contributor: fullname: Kong – ident: ref_22 doi: 10.1109/ICASSP.2018.8461392 – ident: ref_18 doi: 10.1007/978-3-642-21512-4_23 – volume: 5 start-page: 319 year: 1997 ident: ref_3 article-title: Speaker-Independent phonetic classification using hidden Markovmodels with mixtures of trend functions publication-title: IEEE Trans. Speech Audio Process. doi: 10.1109/89.593305 contributor: fullname: Deng – volume: 62 start-page: 1035 year: 1983 ident: ref_1 article-title: An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition publication-title: Bell Syst. Tech. J. doi: 10.1002/j.1538-7305.1983.tb03114.x contributor: fullname: Levinson – volume: 42 start-page: 93 year: 2004 ident: ref_7 article-title: Statistical language model adaptation: Review and perspectives publication-title: Speech Commun. doi: 10.1016/j.specom.2003.08.002 contributor: fullname: Bellegarda |
SSID | ssj0000913810 |
Score | 2.2458096 |
Snippet | Aiming at the audio event recognition problem of speech recognition, a decision fusion method based on the Transformer and Causal Dilated Convolutional Network... |
SourceID | doaj proquest crossref |
SourceType | Open Website Aggregation Database |
StartPage | 5712 |
SubjectTerms | Accuracy Acoustics Audio data automatic speech recognition Classification Convolution Deep learning Neural networks Noise semi-supervised learning semi-supervised training Speech Speech recognition topic classification Transformer and Causal Dilated Convolution Network Voice recognition |
SummonAdditionalLinks | – databaseName: ProQuest Technology Collection dbid: 8FG link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LT9tAEF5BeoFDVV4iLa32wAEOFtn1Y9eniqYNUSW4JEjcLO_uLESithsnXPnrnXE2EFSJo72WbHlmdr95fcPYqQSf2MzKyKSowUmpTWQSV0Y6ITSurAND_c7XN9n4Nvl9l96FgFsbyirXe2K3UbvaUoz8gqi08lRJqb83fyOaGkXZ1TBCY5t9EFIpKunSo6uXGAtxXmoxWLXlxejdU1YYjRvVVMg3B1HH1__fdtydMaNP7GMAh_xyJc09tgXVPtvdoAzcZ3vBGFt-Fhijzw_Y8wT-zKLJsiHDb8HxaRj8wGvPp2toCnNeVo4Py2WLL_k5e0SYiZd19RTUj9-sasI5BWf55UZqmy9qPmkA7AOf1s3M8m6WJlUZdeuH7Hb0azocR2GyQmRlniwi4eNc52WZ6QEolZbSauNBCHBCglCaOhOcUS6XShvrNfU8SasMCPC5cnF8xHpVXcEx4-jhlJlBEBnrNIltYjSK2isgrjnltO-z0_VvLpoVgUaBjgdJo9iQRp_9IBG8PEKs192Nen5fBCMqwGmEEyobgETHzoH2kFnEUAhx8Gsz02cnawEWwRTb4lVxPr-__IXtSCpY6UIsJ6y3mC_hKyKOhfnWqdU_QDXX_g priority: 102 providerName: ProQuest |
Title | Semi-Supervised Training of Transformer and Causal Dilated Convolution Network with Applications to Speech Topic Classification |
URI | https://www.proquest.com/docview/2544957228 https://doaj.org/article/ed8661760e2149de8fe6c065223db76b |
Volume | 11 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT-MwEB4t5bIcVkt3V3SBygcOcIi2cR52joWlVEhUK1okblFsj0UlSKo-uPLXGTspyooDF45JLDnyzHi-sWe-ATjhaGOdah6ohDQ4LqQKVGyKQMYOjQttULl655tJOr6Lr--T-1arL5cTVtMD1wv3B40kFyLSAXIC8walxVST3yS3ZpRIld99B1krmPJ7cBY66qq6IC-iuN7dB5NZk4KG_D8X5Jn6323E3ruMvsO3BhayYf07-_AFyy7stcgCu7DfmOGKnTZc0Wc_4GWKT_Ngulk4k1-hYbOm5QOrLJttQSkuWVEadlFsVjTJ3_kjAUx6rMrnRvHYpM4GZ-5Ylg1bl9psXbHpAlE_sFm1mGvmu2i6_CL__SfcjS5nF-Og6akQaJ7F6yC0USazokjlAIVICq6lshiGaEKOoZCuJoHW1WRcSKWtdNVOXAuFIdpMmCj6BZ2yKvEAGMU2RaoIPkYyiSMdK0lCtgIdy5ww0vbgZLvM-aKmzsgp5HDSyFvS6MG5E8HbEMd37V-QFuSNFuQfaUEPjrYCzBsjXOWOfS1LBOfy92fMcQhfuUto8UcwR9BZLzd4TIhkrfqwI0dXfdg9v5z8u-17VXwFmOfivA |
link.rule.ids | 315,783,787,867,2109,12778,21401,27937,27938,33386,33757,43613,43818,74370,74637 |
linkProvider | Directory of Open Access Journals |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1Lb9NAEB5BegAOiBYQgQJ76AEOFvH6sesTakurAG2EiCv1Znl3ZyES2CZOuPLXmXE2JQiJo72WbHlmdr95fQNwJNGnNrcyMhlpcFprE5nU1ZFOGY0r69Bwv_PlLJ9epR-us-sQcOtDWeV2Txw2atdajpG_YSqtIlNS6rfdj4inRnF2NYzQuA17TFWlR7B3cjb79PkmysKslzqebBrzEvLvOS9M5k2KGsu_jqKBsf-fDXk4Zc4fwP0AD8XxRp77cAubA7i3Qxp4APvBHHvxKnBGv34Iv-b4fRHN1x2bfo9OlGH0g2i9KLfgFJeibpw4rdc9veTd4hsBTbpsm59BAcVsUxUuODwrjneS22LVinmHaL-Ksu0WVgzTNLnOaFh_BFfnZ-XpNAqzFSIri3QVxT4pdFHXuZ6gUlktrTYe4xhdLDFWmnsTnFGukEob6zV3PUmrDMboC-WS5DGMmrbBJyDIx6lzQzAy0Vma2NRoErZXyGxzymk_hqPtb666DYVGRa4HS6PakcYYTlgEN48w7_Vwo11-qYIZVeg0AQqVT1CSa-dQe8wtoSgCOfS1uRnD4VaAVTDGvvqjOk__v_wS7kzLy4vq4v3s4zO4K7l8ZQi4HMJotVzjc8IfK_MiKNlvMszcTw |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LT9wwELZakKr2gAptxbaU-sABDhEbx4mdU8VrRR-sELtI3KLYHsNKbZJudrn2r3cm66WLkDgmjpQo8_A345lvGNsT4KXNrIhMihosS20iI10ZaUloXFkHhvqdL4bZ-bX8fpPehPqnNpRVLn1i56hdbSlHfkhUWnmqhNCHPpRFXJ4OvjZ_IpogRSetYZzGS7auZJZgILZ-fDa8vHrIuBADpo77iya9BGN9OiNGU0eljcWjbalj73_inLsdZ_CWbQSoyI8Wst1kL6DaYm9WCAS32GYwzZbvB_7og3fs7wh-T6LRvCE30ILj4zAGgteej5dAFaa8rBw_KectvuR08gtBJ17W1X1QRj5cVIhzStXyo5WDbj6r-agBsHd8XDcTy7vJmlRz1K2_Z9eDs_HJeRTmLERW5HIWxT7JdV6Wme6DUmkprDYe4hhcLCBWmvoUnFEuF0ob6zV1QAmrDMTgc-WS5ANbq-oKthnHeKfMDELKRKcysdJoFLxXQMxzymnfY3vL31w0CzqNAsMQkkaxIo0eOyYRPDxCHNjdjXp6WwSTKsBpBBcq64PAMM-B9pBZRFQIePBrM9NjO0sBFsEw2-K_Gn18fvkLe4X6Vfz8Nvzxib0WVMnS5V522NpsOofPCEVmZjfo2D9JkuCD |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Semi-Supervised+Training+of+Transformer+and+Causal+Dilated+Convolution+Network+with+Applications+to+Speech+Topic+Classification&rft.jtitle=Applied+sciences&rft.au=Jinxiang+Zeng&rft.au=Du+Zhang&rft.au=Zhiyi+Li&rft.au=Xiaolin+Li&rft.date=2021-06-01&rft.pub=MDPI+AG&rft.eissn=2076-3417&rft.volume=11&rft.issue=12&rft.spage=5712&rft_id=info:doi/10.3390%2Fapp11125712&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_ed8661760e2149de8fe6c065223db76b |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2076-3417&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2076-3417&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2076-3417&client=summon |