End-to-End Pathological Speech Detection Using Wavelet Scattering Network
In recent years, developing robust systems for automatic detection of pathological speech has attracted increasing interest among researchers and clinicians. This study proposes an end-to-end approach based on wavelet scattering network (WSN) for detection of pathological speech. In the proposed app...
Saved in:
Published in | IEEE signal processing letters Vol. 29; pp. 1863 - 1867 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | In recent years, developing robust systems for automatic detection of pathological speech has attracted increasing interest among researchers and clinicians. This study proposes an end-to-end approach based on wavelet scattering network (WSN) for detection of pathological speech. In the proposed approach, the WSN (which involves no learning) extracts suitable information from the input raw speech signal and this information is then passed through a multi-layer perceptron (MLP) in order to classify the speech signal as either healthy or pathological. The results show that the proposed approach outperformed a convolutional neural network (CNN) based end-to-end system in distinguishing pathological speech from healthy speech. Furthermore, the proposed system achieved comparable performance with a state-of-the-art traditional system based on hand-crafted features for uncompressed speech, but gave better performance than the traditional system for compressed speech of low bit rates. |
---|---|
AbstractList | In recent years, developing robust systems for automatic detection of pathological speech has attracted increasing interest among researchers and clinicians. This study proposes an end-to-end approach based on wavelet scattering network (WSN) for detection of pathological speech. In the proposed approach, the WSN (which involves no learning) extracts suitable information from the input raw speech signal and this information is then passed through a multi-layer perceptron (MLP) in order to classify the speech signal as either healthy or pathological. The results show that the proposed approach outperformed a convolutional neural network (CNN) based end-to-end system in distinguishing pathological speech from healthy speech. Furthermore, the proposed system achieved comparable performance with a state-of-the-art traditional system based on hand-crafted features for uncompressed speech, but gave better performance than the traditional system for compressed speech of low bit rates. |
Author | Reddy, Mittapalle Kiran Alku, Paavo Keerthana, Yagnavajjula Madhu |
Author_xml | – sequence: 1 givenname: Mittapalle Kiran orcidid: 0000-0002-7987-1735 surname: Reddy fullname: Reddy, Mittapalle Kiran email: kiran.r.mittapalle@aalto.fi organization: Department of Signal Processing and Acoustics, Aalto University, Aalto, Finland – sequence: 2 givenname: Yagnavajjula Madhu orcidid: 0000-0002-4244-0253 surname: Keerthana fullname: Keerthana, Yagnavajjula Madhu email: madhu.yagnavajjula@aalto.fi organization: Department of Signal Processing and Acoustics, Aalto University, Aalto, Finland – sequence: 3 givenname: Paavo orcidid: 0000-0002-8173-9418 surname: Alku fullname: Alku, Paavo email: paavo.alku@aalto.fi organization: Department of Signal Processing and Acoustics, Aalto University, Aalto, Finland |
BookMark | eNo9kEFLAzEQhYNUsK3eBS8LnrdOkt1NcpRatVC0UIvHkGZn2611U7Op4r83S4unNwzvvWG-Aek1rkFCrimMKAV1N1vMRwwYG3GqVFGoM9KneS5TxgvaizMISJUCeUEGbbsFAEll3ifTSVOmwaVRkrkJG7dz69qaXbLYI9pN8oABbahdkyzbulkn7-YbdxiShTUhoO9WLxh-nP-4JOeV2bV4ddIhWT5O3sbP6ez1aTq-n6WWcx5SIUS-qjjaLK84h0xy5NaqcsUZy0XJVJWVxgiQvFQst0J2_wiGEsVKZsj4kNwee_fefR2wDXrrDr6JJzUTEUURSyG64Oiy3rWtx0rvff1p_K-moDtgOgLTHTB9AhYjN8dIjYj_diULgEzxP9eqZ04 |
CODEN | ISPLEM |
CitedBy_id | crossref_primary_10_1371_journal_pone_0285506 crossref_primary_10_1109_ACCESS_2023_3337992 crossref_primary_10_1109_TMTT_2023_3313872 crossref_primary_10_1007_s40430_023_04426_0 crossref_primary_10_1177_09544089231216880 crossref_primary_10_1121_10_0026241 crossref_primary_10_1016_j_compbiomed_2024_108722 crossref_primary_10_1016_j_specom_2023_102989 |
Cites_doi | 10.1109/TSP.2014.2326991 10.1002/cpa.21413 10.1371/journal.pone.0177678 10.33588/rn.7011.2019414 10.21437/Interspeech.2018-1351 10.21437/Interspeech.2016-1062 10.1109/JBHI.2015.2467375 10.1109/TASLP.2021.3078364 10.1121/1.1945807 10.1109/ACCESS.2020.2984925 10.1109/TPAMI.2012.230 10.1109/TNSRE.2016.2533582 10.1121/1.420344 10.1109/TBME.2008.923769 10.1016/j.csl.2021.101205 10.1155/2020/3215681 10.21437/Interspeech.2019-2903 10.1109/TBME.2003.820386 10.1016/j.neucom.2015.02.085 10.21437/Interspeech.2010-739 10.1109/10.709563 10.1109/ICASSP.2019.8682391 10.1016/j.jvoice.2015.06.010 10.1109/ACCESS.2021.3117665 10.1016/j.bica.2015.10.004 10.1109/TAFFC.2015.2457417 10.21437/Interspeech.2020-2160 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
DBID | 97E ESBDL RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
DOI | 10.1109/LSP.2022.3199669 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE Open Access Journals IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library Online CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library Online url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 1558-2361 |
EndPage | 1867 |
ExternalDocumentID | 10_1109_LSP_2022_3199669 9860049 |
Genre | orig-research |
GrantInformation_xml | – fundername: Academy of Finland grantid: 330139 funderid: 10.13039/501100002341 – fundername: Aalto-Yliopisto; Aalto University funderid: 10.13039/501100002666 |
GroupedDBID | -~X .DC 0R~ 0ZS 29I 3EH 4.4 5GY 5VS 6IK 85S 97E AAJGR AASAJ AAYJJ ABFSI ABQJQ ACGFO ACGFS ACIWK AENEX AETIX AI. AIBXA AKJIK ALLEH ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD ESBDL F5P HZ~ H~9 ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RIG RNS TAE TN5 VH1 AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c333t-7775bf3ec45f330483e3cc9db32257d29f4daa7083d925c78319972e8e7b84e23 |
IEDL.DBID | RIE |
ISSN | 1070-9908 |
IngestDate | Thu Oct 10 17:01:05 EDT 2024 Fri Aug 23 02:31:17 EDT 2024 Mon Nov 04 12:06:30 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c333t-7775bf3ec45f330483e3cc9db32257d29f4daa7083d925c78319972e8e7b84e23 |
ORCID | 0000-0002-7987-1735 0000-0002-4244-0253 0000-0002-8173-9418 |
OpenAccessLink | https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/document/9860049 |
PQID | 2711063040 |
PQPubID | 75747 |
PageCount | 5 |
ParticipantIDs | proquest_journals_2711063040 crossref_primary_10_1109_LSP_2022_3199669 ieee_primary_9860049 |
PublicationCentury | 2000 |
PublicationDate | 20220000 2022-00-00 20220101 |
PublicationDateYYYYMMDD | 2022-01-01 |
PublicationDate_xml | – year: 2022 text: 20220000 |
PublicationDecade | 2020 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationTitle | IEEE signal processing letters |
PublicationTitleAbbrev | LSP |
PublicationYear | 2022 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 ref12 ref34 ref15 ref14 ref31 ref30 ref11 Nocedal (ref25) 2006 ref10 ref1 ref16 ref19 ref18 Fraile (ref17) 2011 (ref32) 1993 Blitzer (ref2) 1992 ref23 ref26 ref20 ref22 ref21 ref28 ref27 ref29 ref8 ref7 ref9 ref4 ref3 ref6 ref5 Ptzer (ref24) |
References_xml | – ident: ref8 doi: 10.1109/TSP.2014.2326991 – ident: ref9 doi: 10.1002/cpa.21413 – ident: ref30 doi: 10.1371/journal.pone.0177678 – ident: ref1 doi: 10.33588/rn.7011.2019414 – ident: ref6 doi: 10.21437/Interspeech.2018-1351 – ident: ref13 doi: 10.21437/Interspeech.2016-1062 – ident: ref15 doi: 10.1109/JBHI.2015.2467375 – ident: ref5 doi: 10.1109/TASLP.2021.3078364 – ident: ref23 doi: 10.1121/1.1945807 – ident: ref21 doi: 10.1109/ACCESS.2020.2984925 – ident: ref10 doi: 10.1109/TPAMI.2012.230 – ident: ref18 doi: 10.1109/TNSRE.2016.2533582 – ident: ref22 doi: 10.1121/1.420344 – ident: ref34 doi: 10.1109/TBME.2008.923769 – ident: ref3 doi: 10.1016/j.csl.2021.101205 – volume-title: Information Technology-coding of Moving Pictures and Associated Audio for Digital Storage Media up to 1.5 Mbit/s. Part 3: Audio year: 1993 ident: ref32 – ident: ref11 doi: 10.1155/2020/3215681 – ident: ref14 doi: 10.21437/Interspeech.2019-2903 – ident: ref19 doi: 10.1109/TBME.2003.820386 – ident: ref28 article-title: openSMILE 3.0.1 – ident: ref16 doi: 10.1016/j.neucom.2015.02.085 – year: 1992 ident: ref2 article-title: Neurologic Disorders of the Larynx contributor: fullname: Blitzer – ident: ref29 doi: 10.21437/Interspeech.2010-739 – ident: ref20 doi: 10.1109/10.709563 – ident: ref7 doi: 10.1109/ICASSP.2019.8682391 – start-page: 67 volume-title: Proc. Models Anal. Vocal Emissions Biomed. Appl. year: 2011 ident: ref17 article-title: Spectral analysis of pathological voices: Sustained vowels vs. running speech contributor: fullname: Fraile – ident: ref12 doi: 10.1016/j.jvoice.2015.06.010 – ident: ref4 doi: 10.1109/ACCESS.2021.3117665 – ident: ref31 doi: 10.1016/j.bica.2015.10.004 – ident: ref27 doi: 10.1109/TAFFC.2015.2457417 – ident: ref26 doi: 10.21437/Interspeech.2020-2160 – ident: ref24 article-title: Saarbrcken voice database, institute of phonetics, university of saarland contributor: fullname: Ptzer – volume-title: Numerical Optimization year: 2006 ident: ref25 contributor: fullname: Nocedal |
SSID | ssj0008185 |
Score | 2.4401517 |
Snippet | In recent years, developing robust systems for automatic detection of pathological speech has attracted increasing interest among researchers and clinicians.... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Publisher |
StartPage | 1863 |
SubjectTerms | Artificial neural networks CNN Convolutional neural networks Feature extraction MFCC MP3 compression Multilayer perceptrons Multilayers openSMILE features pathological speech Pathology Scattering Signal classification Speech compression Task analysis Wavelet scattering network Wavelet transforms Wireless sensor networks |
Title | End-to-End Pathological Speech Detection Using Wavelet Scattering Network |
URI | https://ieeexplore.ieee.org/document/9860049 https://www.proquest.com/docview/2711063040 |
Volume | 29 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDLbGTnDgjRgM1AMXJDJK0jbJEcGmgdiENCZ2q5rEFRJSN0F34deTpO3E68CpVdW6kZ3Ynx3bATjj3KpAJRNirRElEc8SIiMtSKaY0VGMIc1c7fBonAyn0f0snrXgYlULg4g--Qx77tbv5Zu5XrpQ2aUUiUO0a7DGpaxqtVZa1xmeKr8wJFbDimZLMpSXD5NH6whSav1Th-7lNxPkz1T5pYi9dRlswagZV5VU8tpblqqnP360bPzvwLdhs4aZwXU1L3aghcUubHxpPrgHd_3CkHJO7CV4zMqVFgwmC0T9Etxi6dO0isCnFQTPmTujogwm2rfkdI_GVQ75PkwH_aebIakPViCaMVZaRM1jlTO0wshdPEMwZFpLo9zq5obKPDJZxi06M5LGmgvHOE5RIFciQsoOoF3MCzyEQCkUnGs0muuIhblES89QvGK5COMw78B5w-t0UfXPSL3fEcrUyiV1cklruXRgz7Fu9V7NtQ50G-Gk9QJ7Tym3JBL7r_Do76-OYd3RrqIlXWiXb0s8sfihVKd-4nwCaobA7A |
link.rule.ids | 315,783,787,799,4031,27935,27936,27937,55086 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDLZ4HIADr4EYzx64IJGtJG2THBEPDdgmpG2CW9UkrpCQugm6C7-eJO0mXgdOrao0iezE_uzYDsAp51YEKpkQq40oiXiWEBlpQTLFjI5iDGnmcod7_aQziu6f4-cFOJ_nwiCiDz7Dlnv1Z_lmrKfOVdaWInGIdhGWLa4WSZWtNZe7TvVUEYYhsTJWzA4lQ9nuDh6tKUiptVAdvpfflJC_VeWXKPb65XYDerOZVWElr61pqVr640fRxv9OfRPWa6AZXFYrYwsWsNiGtS_lBxtwd1MYUo6JfQSPWTmXg8FggqhfgmssfaBWEfjAguApc7dUlMFA-6Kc7lO_iiLfgdHtzfCqQ-qrFYhmjJUWU_NY5QwtO3Ln0RAMmdbSKLe_uaEyj0yWcYvPjKSx5sIRjlMUyJWIkLJdWCrGBe5BoBQKzjUazXXEwlyi7c9QvGC5COMwb8LZjNbppKqgkXrLI5Sp5Uvq-JLWfGlCw5Fu3q6mWhMOZ8xJ6y32nlJuu0jsWOH-33-dwEpn2Oum3bv-wwGsunEq38khLJVvUzyyaKJUx34RfQKu1MQ3 |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=End-to-End+Pathological+Speech+Detection+Using+Wavelet+Scattering+Network&rft.jtitle=IEEE+signal+processing+letters&rft.au=Reddy%2C+Mittapalle+Kiran&rft.au=Keerthana%2C+Yagnavajjula+Madhu&rft.au=Alku%2C+Paavo&rft.date=2022&rft.issn=1070-9908&rft.eissn=1558-2361&rft.volume=29&rft.spage=1863&rft.epage=1867&rft_id=info:doi/10.1109%2FLSP.2022.3199669&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_LSP_2022_3199669 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1070-9908&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1070-9908&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1070-9908&client=summon |