Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition
In this paper, we present an end-to-end speech recognition system for Japanese persons with articulation disorders resulting from athetoid cerebral palsy. Because their utterance is often unstable or unclear, speech recognition systems struggle to recognize their speech. Recent deep learning-based a...
Saved in:
Published in | IEEE access Vol. 7; pp. 164320 - 164326 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
01.01.2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
ISSN | 2169-3536 2169-3536 |
DOI | 10.1109/ACCESS.2019.2951856 |
Cover
Abstract | In this paper, we present an end-to-end speech recognition system for Japanese persons with articulation disorders resulting from athetoid cerebral palsy. Because their utterance is often unstable or unclear, speech recognition systems struggle to recognize their speech. Recent deep learning-based approaches have exhibited promising performance. However, these approaches require a large amount of training data, and it is difficult to collect sufficient data from such dysarthric people. This paper proposes a transfer learning method that transfers two types of knowledge corresponding to the different datasets: the language-dependent (phonetic and linguistic) characteristic of unimpaired speech and the language-independent characteristic of dysarthric speech. The former is obtained from Japanese non-dysarthric speech data, and the latter is obtained from non-Japanese dysarthric speech data. In the proposed method, we pre-train a model using Japanese non-dysarthric speech and non-Japanese dysarthric speech, and thereafter, we fine-tune the model using the target Japanese dysarthric speech. To handle the speech data of the two different languages in one model, we employ language-specific decoder modules. Experimental results indicate that our proposed approach can significantly improve speech recognition performance compared with other approaches that do not use additional speech data. |
---|---|
AbstractList | In this paper, we present an end-to-end speech recognition system for Japanese persons with articulation disorders resulting from athetoid cerebral palsy. Because their utterance is often unstable or unclear, speech recognition systems struggle to recognize their speech. Recent deep learning-based approaches have exhibited promising performance. However, these approaches require a large amount of training data, and it is difficult to collect sufficient data from such dysarthric people. This paper proposes a transfer learning method that transfers two types of knowledge corresponding to the different datasets: the language-dependent (phonetic and linguistic) characteristic of unimpaired speech and the language-independent characteristic of dysarthric speech. The former is obtained from Japanese non-dysarthric speech data, and the latter is obtained from non-Japanese dysarthric speech data. In the proposed method, we pre-train a model using Japanese non-dysarthric speech and non-Japanese dysarthric speech, and thereafter, we fine-tune the model using the target Japanese dysarthric speech. To handle the speech data of the two different languages in one model, we employ language-specific decoder modules. Experimental results indicate that our proposed approach can significantly improve speech recognition performance compared with other approaches that do not use additional speech data. |
Author | Takashima, Ryoichi Ariki, Yasuo Takashima, Yuki Takiguchi, Tetsuya |
Author_xml | – sequence: 1 givenname: Yuki orcidid: 0000-0001-8489-9487 surname: Takashima fullname: Takashima, Yuki email: takashima@kobe-u.ac.jp organization: Graduate School of System Informatics, Kobe University, Kobe, Japan – sequence: 2 givenname: Ryoichi orcidid: 0000-0002-9808-0250 surname: Takashima fullname: Takashima, Ryoichi organization: Graduate School of System Informatics, Kobe University, Kobe, Japan – sequence: 3 givenname: Tetsuya orcidid: 0000-0001-5005-7679 surname: Takiguchi fullname: Takiguchi, Tetsuya organization: Graduate School of System Informatics, Kobe University, Kobe, Japan – sequence: 4 givenname: Yasuo orcidid: 0000-0003-3473-2026 surname: Ariki fullname: Ariki, Yasuo organization: Graduate School of System Informatics, Kobe University, Kobe, Japan |
BookMark | eNp9UU1vEzEUXKEiUUp_QS-WOCesveuPPZa00IpIIFLE0Xrrfd44BDvYjqr8Bv50HbYFxIF38dNoZjxP87I68cFjVV3Qek5p3b25XCyuV6s5q2k3Zx2niotn1Smjops1vBEnf-0vqvOUNnUZVSAuT6ufH3y43-IwIrmL4JPFCL3bunwgbzHfI3qS10hWO0SzJleQgQRLPmFMwSfy1eUCHhLEvI4OjjT45vxIrpwtTugzWYIf9zBiIjbEP1zzZPkZTRi9yy74V9VzC9uE54_vWfXl3fXd4ma2_Pj-dnG5nJlyXJ71irVSSEDedNZ2HVLVcCupVBRpXTMQjLJGIbQth9pKhgOClFL0tRmQNc1ZdTv5DgE2ehfdd4gHHcDpX0CIoy4hndmi5lIJ2YOpbfm0AwU99j0HUMYqM3BRvF5PXrsYfuwxZb0J--hLfM1azgUVraCF1U0sE0NKEa02LsPx5hzBbTWt9bFKPVWpj1XqxyqLtvlH-5T4_6qLSeUQ8bdCqY6VUM0DtkmuXQ |
CODEN | IAECCG |
CitedBy_id | crossref_primary_10_1109_ACCESS_2020_3023783 crossref_primary_10_1109_ACCESS_2024_3374874 crossref_primary_10_1016_j_icte_2021_07_004 crossref_primary_10_1016_j_aeue_2021_153698 crossref_primary_10_32604_cmc_2023_040024 crossref_primary_10_3390_electronics12204278 crossref_primary_10_1016_j_knosys_2023_110851 crossref_primary_10_2147_PRBM_S460283 crossref_primary_10_1016_j_specom_2023_02_004 crossref_primary_10_1186_s13636_023_00318_2 crossref_primary_10_1109_ACCESS_2023_3234110 crossref_primary_10_32604_cmc_2023_037380 |
Cites_doi | 10.1109/EUSIPCO.2015.7362616 10.1109/ICASSP.2018.8461972 10.1109/ICASSP.2018.8462290 10.1109/ICASSP.2011.5947401 10.1109/TNSRE.2016.2638830 10.21437/Interspeech.2017-664 10.1109/ICASSP.2016.7472621 10.21437/Interspeech.2017-878 10.1109/TNSRE.2018.2802914 10.1109/MMSP.2010.5662075 10.1109/CVPR.2016.308 10.1007/s10579-011-9145-0 10.1109/ICASSP.2019.8683091 10.1109/ICASSP.2013.6639347 10.1109/TKDE.2009.191 10.1016/0167-6393(90)90011-W 10.1109/ICSDA.2011.6085978 10.1023/A:1007379606734 10.21437/Interspeech.2017-1318 10.1109/ICASSP.2019.8683803 10.21437/Interspeech.2008-583 10.21437/Interspeech.2018-1751 10.21437/Interspeech.2008-480 10.1109/ICSLP.1996.608020 10.1145/1143844.1143891 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019 |
DBID | 97E ESBDL RIA RIE AAYXX CITATION 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D DOA |
DOI | 10.1109/ACCESS.2019.2951856 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE Xplore Open Access Journals IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts METADEX Technology Research Database Materials Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef Materials Research Database Engineered Materials Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace METADEX Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Materials Research Database |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 2169-3536 |
EndPage | 164326 |
ExternalDocumentID | oai_doaj_org_article_57867bac0f8249a8abebb5aa8cf8cd56 10_1109_ACCESS_2019_2951856 8892556 |
Genre | orig-research |
GrantInformation_xml | – fundername: Japan Society for the Promotion of Science grantid: JP17J04380 funderid: 10.13039/501100001691 |
GroupedDBID | 0R~ 4.4 5VS 6IK 97E AAJGR ABAZT ABVLG ACGFS ADBBV AGSQL ALMA_UNASSIGNED_HOLDINGS BCNDV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD ESBDL GROUPED_DOAJ IPLJI JAVBF KQ8 M43 M~E O9- OCL OK1 RIA RIE RNS AAYXX CITATION RIG 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c518t-b824767ae539ff99e1835f71781e1002a621238ea445a0f72edea7776b0cde233 |
IEDL.DBID | DOA |
ISSN | 2169-3536 |
IngestDate | Wed Aug 27 01:24:46 EDT 2025 Mon Jun 30 03:53:27 EDT 2025 Tue Jul 01 01:21:50 EDT 2025 Thu Apr 24 22:53:56 EDT 2025 Wed Aug 27 02:44:45 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
License | https://creativecommons.org/licenses/by/4.0/legalcode |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c518t-b824767ae539ff99e1835f71781e1002a621238ea445a0f72edea7776b0cde233 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0003-3473-2026 0000-0001-5005-7679 0000-0002-9808-0250 0000-0001-8489-9487 |
OpenAccessLink | https://doaj.org/article/57867bac0f8249a8abebb5aa8cf8cd56 |
PQID | 2455616461 |
PQPubID | 4845423 |
PageCount | 7 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_57867bac0f8249a8abebb5aa8cf8cd56 ieee_primary_8892556 crossref_citationtrail_10_1109_ACCESS_2019_2951856 crossref_primary_10_1109_ACCESS_2019_2951856 proquest_journals_2455616461 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2019-01-01 |
PublicationDateYYYYMMDD | 2019-01-01 |
PublicationDate_xml | – month: 01 year: 2019 text: 2019-01-01 day: 01 |
PublicationDecade | 2010 |
PublicationPlace | Piscataway |
PublicationPlace_xml | – name: Piscataway |
PublicationTitle | IEEE access |
PublicationTitleAbbrev | Access |
PublicationYear | 2019 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 ref15 chorowski (ref27) 2015 ref30 ref11 ref32 ref10 ref17 vachhani (ref14) 2017 mohamed (ref3) 2009 aihara (ref19) 2017 ref18 goodfellow (ref20) 2014 ref24 ref23 ref26 ref25 rudzicz (ref2) 2010 ref22 wong (ref12) 2015 ref21 ref28 ref29 kingma (ref31) 2014 ref8 ref7 ref9 ref4 ref6 ref5 duffy (ref1) 2013 matsumasa (ref16) 2008 |
References_xml | – ident: ref18 doi: 10.1109/EUSIPCO.2015.7362616 – ident: ref25 doi: 10.1109/ICASSP.2018.8461972 – ident: ref13 doi: 10.1109/ICASSP.2018.8462290 – ident: ref4 doi: 10.1109/ICASSP.2011.5947401 – start-page: 577 year: 2015 ident: ref27 article-title: Attention-based models for speech recognition publication-title: Proc Neural Inf Process Syst – ident: ref23 doi: 10.1109/TNSRE.2016.2638830 – start-page: 3374 year: 2017 ident: ref19 article-title: Phoneme-discriminative features for dysarthric speech conversion publication-title: Proc INTERSPEECH doi: 10.21437/Interspeech.2017-664 – ident: ref28 doi: 10.1109/ICASSP.2016.7472621 – ident: ref22 doi: 10.21437/Interspeech.2017-878 – ident: ref24 doi: 10.1109/TNSRE.2018.2802914 – start-page: 70 year: 2010 ident: ref2 article-title: Learning mixed acoustic/articulatory models for disabled speech publication-title: Proc Neural Inf Process Syst – year: 2013 ident: ref1 publication-title: Motor Speech Disorders Substrates Differential Diagnosis and Management – year: 2014 ident: ref31 article-title: Adam: A method for stochastic optimization publication-title: arXiv 1412 6980 – ident: ref17 doi: 10.1109/MMSP.2010.5662075 – ident: ref32 doi: 10.1109/CVPR.2016.308 – start-page: 329 year: 2015 ident: ref12 article-title: Development of a Cantonese dysarthric speech corpus publication-title: Proc INTERSPEECH – ident: ref10 doi: 10.1007/s10579-011-9145-0 – ident: ref15 doi: 10.1109/ICASSP.2019.8683091 – ident: ref5 doi: 10.1109/ICASSP.2013.6639347 – ident: ref6 doi: 10.1109/TKDE.2009.191 – ident: ref30 doi: 10.1016/0167-6393(90)90011-W – ident: ref11 doi: 10.1109/ICSDA.2011.6085978 – ident: ref29 doi: 10.1023/A:1007379606734 – start-page: 1854 year: 2017 ident: ref14 article-title: Deep autoencoder based speech features for improved dysarthric speech recognition publication-title: Proc INTERSPEECH doi: 10.21437/Interspeech.2017-1318 – ident: ref7 doi: 10.1109/ICASSP.2019.8683803 – start-page: 2234 year: 2008 ident: ref16 article-title: Integration of metamodel and acoustic model for speech recognition publication-title: Proc INTERSPEECH doi: 10.21437/Interspeech.2008-583 – start-page: 2672 year: 2014 ident: ref20 article-title: Generative adversarial nets publication-title: Proc Neural Inf Process Syst – start-page: 39 year: 2009 ident: ref3 article-title: Deep belief networks for phone recognition publication-title: Proc NIPS Workshop Deep Learn Speech Recognit Related Appl – ident: ref21 doi: 10.21437/Interspeech.2018-1751 – ident: ref9 doi: 10.21437/Interspeech.2008-480 – ident: ref8 doi: 10.1109/ICSLP.1996.608020 – ident: ref26 doi: 10.1145/1143844.1143891 |
SSID | ssj0000816957 |
Score | 2.2865274 |
Snippet | In this paper, we present an end-to-end speech recognition system for Japanese persons with articulation disorders resulting from athetoid cerebral palsy.... |
SourceID | doaj proquest crossref ieee |
SourceType | Open Website Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 164320 |
SubjectTerms | Acoustics Assistive technology Data models Decoding deep learning dysarthria end-to-end model Hidden Markov models Knowledge management knowledge transfer Languages Machine learning multilingual Speech speech processing Speech recognition Voice recognition |
SummonAdditionalLinks | – databaseName: IEEE Xplore dbid: RIE link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwELaAU3voi1bdQisfeiRL4vh5hKUIQamqtqjcLD_GArXaRWz2AH-hf7q246SoVKi3yLItJzOZGY_H34fQ-4RSTg1lVdu2vKKhDZUCUVfKM8-9Idbm0_PTT_zojB6fs_M1tDPehQGAXHwG0_SYz_L9wq1SqmxXSpUQs9bRelSz_q7WmE9JBBKKiQIs1NRqd282i--QqrfUlMRAQiaS6jvOJ2P0F1KVe5Y4u5fDp-h0WFhfVfJjuurs1N3-hdn4vyt_hp6UOBPv9YrxHK3B_AV6fAd9cBP9OhnyaTh7rADXPWj3Dd7vq7dwjA7x1ysAd4EPTGfwIuDPOURf4u-XXWy8WUbVu4hKnLplYit8UChXOvyx5EKXOEbGf_q6YcovQ_HSYv4SnR1--DY7qgo3Q-Xil-wqKwkVXBhgrQpBKYimgYW4N5QNJFRXw5NPlGAoZaYOgoAHI4TgtnYeSNu-QhvzxRxeI2xJEK03TetivNB4owgRhjdBKEaUBz5BZBCadgW4PPFn_NR5A1Mr3UtaJ0nrIukJ2hkHXfW4HQ9330_aMHZNoNu5IUpRl39YR-PGhTWuDvHllZHGgrXMGOmCdD5NspkkP05ShD5B24Nu6WIglprQxEvKKW_e_HvUFnqUFthne7bRRne9grcx_unsu6z4vwHMRgPa priority: 102 providerName: IEEE |
Title | Knowledge Transferability Between the Speech Data of Persons With Dysarthria Speaking Different Languages for Dysarthric Speech Recognition |
URI | https://ieeexplore.ieee.org/document/8892556 https://www.proquest.com/docview/2455616461 https://doaj.org/article/57867bac0f8249a8abebb5aa8cf8cd56 |
Volume | 7 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LT9wwELYQp3JAlIdYCsiHHhtI7Ph1hKUItQUhHoKbNX4JpGoXsemB38Cfxnac7UpI5dJrNHEynrFnxpl8H0JfE0p5Cy2rKKW8agMNlfKirpRjjjsgxuSv5-cX_Oy2_XHP7heovlJPWA8P3E9cLNglFwZsHWSsFECC8cYwAGmDtI5lsO1a1QvFVN6DZcMVEwVmqKnV4dF4HDVKvVzqgMS0QibK6oVQlBH7C8XKu305B5vTNbRaskR81L_dZ7TkJ-toZQE7cAO9_hxOw3CON8E_95DbL_i4773CMbfD10_e2wd8Ah3gacCXOcGe4bvHLl58mUX9H6ILJrFMS4VPCmFKh3-Vk8wZjnntX1k7DHk1tB5NJ5vo9vT7zfisKswKlY2ad5WJcym4AM-oCkEpHxc2C7Gyk41PmKzAU0STHtqWQR0E8c6DEIKb2jpPKN1Cy5PpxG8jbEgQ1EFDbYz2jQNFiADeBKEYUc7zESLDJGtbYMcT-8VvncuPWuneMjpZRhfLjNC3-U1PPerGv8WPk_XmogkyO1-IjqSLI-mPHGmENpLt54NIqRI82wjtDr6gy_KeadImVlHe8mbnfzz6C_qU1OlPdnbRcvf8x-_FXKcz-9mt9_NviW_xf_zW |
linkProvider | Directory of Open Access Journals |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwELZKOUAPvArqQgEfODbbxPEjPrZbqoXuVgha0Zvlx1hFoN2qmz2Uv8CfxnacUAFC3CJrbNmZ8cx4PP4GoTcRpZxqyoq6rnlBfe0LCaIspGOOO02MSbfn81M-PafvL9jFBtob3sIAQEo-g3H8THf5bmnXMVS23zQyImbdQXeD3aese601RFRiCQnJRIYWqkq5fzCZhFXE_C05JsGVaGKZ6lvmJ6H057Iqf-jiZGCOH6J5P7Uur-TreN2asf3-G2rj_879EXqQPU180InGY7QBiydo6xb-4Db6cdJH1HCyWR6uO9juG3zY5W_h4B_iT1cA9hIf6VbjpccfkpO-wp-_tKHxZhWE7zKIcSRLpa3wUS660uJZjoaucPCNf9HafsiPffrScvEUnR-_PZtMi1ydobDhT7aFaQgVXGhgtfReSgjKgflwOmwqiLiumker2ICmlOnSCwIOtBCCm9I6IHX9DG0ulgvYQdgQL2qnq9oGj6FyWhIiNK-8kIxIB3yESM80ZTN0eayg8U2lI0wpVcdpFTmtMqdHaG_odNUhd_yb_DBKw0AaYbdTQ-CiyrtYBfXGhdG29GHxUjfagDFM68b6xro4yHbk_DBIZvoI7faypbKKWClCY2VSTnn1_O-9XqN707P5TM3enZ68QPfjZLvYzy7abK_X8DJ4Q615lTbBT4H4Byc |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Knowledge+Transferability+Between+the+Speech+Data+of+Persons+With+Dysarthria+Speaking+Different+Languages+for+Dysarthric+Speech+Recognition&rft.jtitle=IEEE+access&rft.au=Yuki+Takashima&rft.au=Ryoichi+Takashima&rft.au=Tetsuya+Takiguchi&rft.au=Yasuo+Ariki&rft.date=2019-01-01&rft.pub=IEEE&rft.eissn=2169-3536&rft.volume=7&rft.spage=164320&rft.epage=164326&rft_id=info:doi/10.1109%2FACCESS.2019.2951856&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_57867bac0f8249a8abebb5aa8cf8cd56 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon |