End-to-End Pathological Speech Detection Using Wavelet Scattering Network

In recent years, developing robust systems for automatic detection of pathological speech has attracted increasing interest among researchers and clinicians. This study proposes an end-to-end approach based on wavelet scattering network (WSN) for detection of pathological speech. In the proposed app...

Full description

Saved in:
Bibliographic Details
Published inIEEE signal processing letters Vol. 29; pp. 1863 - 1867
Main Authors Reddy, Mittapalle Kiran, Keerthana, Yagnavajjula Madhu, Alku, Paavo
Format Journal Article
LanguageEnglish
Published New York IEEE 2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract In recent years, developing robust systems for automatic detection of pathological speech has attracted increasing interest among researchers and clinicians. This study proposes an end-to-end approach based on wavelet scattering network (WSN) for detection of pathological speech. In the proposed approach, the WSN (which involves no learning) extracts suitable information from the input raw speech signal and this information is then passed through a multi-layer perceptron (MLP) in order to classify the speech signal as either healthy or pathological. The results show that the proposed approach outperformed a convolutional neural network (CNN) based end-to-end system in distinguishing pathological speech from healthy speech. Furthermore, the proposed system achieved comparable performance with a state-of-the-art traditional system based on hand-crafted features for uncompressed speech, but gave better performance than the traditional system for compressed speech of low bit rates.
AbstractList In recent years, developing robust systems for automatic detection of pathological speech has attracted increasing interest among researchers and clinicians. This study proposes an end-to-end approach based on wavelet scattering network (WSN) for detection of pathological speech. In the proposed approach, the WSN (which involves no learning) extracts suitable information from the input raw speech signal and this information is then passed through a multi-layer perceptron (MLP) in order to classify the speech signal as either healthy or pathological. The results show that the proposed approach outperformed a convolutional neural network (CNN) based end-to-end system in distinguishing pathological speech from healthy speech. Furthermore, the proposed system achieved comparable performance with a state-of-the-art traditional system based on hand-crafted features for uncompressed speech, but gave better performance than the traditional system for compressed speech of low bit rates.
Author Reddy, Mittapalle Kiran
Alku, Paavo
Keerthana, Yagnavajjula Madhu
Author_xml – sequence: 1
  givenname: Mittapalle Kiran
  orcidid: 0000-0002-7987-1735
  surname: Reddy
  fullname: Reddy, Mittapalle Kiran
  email: kiran.r.mittapalle@aalto.fi
  organization: Department of Signal Processing and Acoustics, Aalto University, Aalto, Finland
– sequence: 2
  givenname: Yagnavajjula Madhu
  orcidid: 0000-0002-4244-0253
  surname: Keerthana
  fullname: Keerthana, Yagnavajjula Madhu
  email: madhu.yagnavajjula@aalto.fi
  organization: Department of Signal Processing and Acoustics, Aalto University, Aalto, Finland
– sequence: 3
  givenname: Paavo
  orcidid: 0000-0002-8173-9418
  surname: Alku
  fullname: Alku, Paavo
  email: paavo.alku@aalto.fi
  organization: Department of Signal Processing and Acoustics, Aalto University, Aalto, Finland
BookMark eNo9kEFLAzEQhYNUsK3eBS8LnrdOkt1NcpRatVC0UIvHkGZn2611U7Op4r83S4unNwzvvWG-Aek1rkFCrimMKAV1N1vMRwwYG3GqVFGoM9KneS5TxgvaizMISJUCeUEGbbsFAEll3ifTSVOmwaVRkrkJG7dz69qaXbLYI9pN8oABbahdkyzbulkn7-YbdxiShTUhoO9WLxh-nP-4JOeV2bV4ddIhWT5O3sbP6ez1aTq-n6WWcx5SIUS-qjjaLK84h0xy5NaqcsUZy0XJVJWVxgiQvFQst0J2_wiGEsVKZsj4kNwee_fefR2wDXrrDr6JJzUTEUURSyG64Oiy3rWtx0rvff1p_K-moDtgOgLTHTB9AhYjN8dIjYj_diULgEzxP9eqZ04
CODEN ISPLEM
CitedBy_id crossref_primary_10_1371_journal_pone_0285506
crossref_primary_10_1109_ACCESS_2023_3337992
crossref_primary_10_1109_TMTT_2023_3313872
crossref_primary_10_1007_s40430_023_04426_0
crossref_primary_10_1177_09544089231216880
crossref_primary_10_1121_10_0026241
crossref_primary_10_1016_j_compbiomed_2024_108722
crossref_primary_10_1016_j_specom_2023_102989
Cites_doi 10.1109/TSP.2014.2326991
10.1002/cpa.21413
10.1371/journal.pone.0177678
10.33588/rn.7011.2019414
10.21437/Interspeech.2018-1351
10.21437/Interspeech.2016-1062
10.1109/JBHI.2015.2467375
10.1109/TASLP.2021.3078364
10.1121/1.1945807
10.1109/ACCESS.2020.2984925
10.1109/TPAMI.2012.230
10.1109/TNSRE.2016.2533582
10.1121/1.420344
10.1109/TBME.2008.923769
10.1016/j.csl.2021.101205
10.1155/2020/3215681
10.21437/Interspeech.2019-2903
10.1109/TBME.2003.820386
10.1016/j.neucom.2015.02.085
10.21437/Interspeech.2010-739
10.1109/10.709563
10.1109/ICASSP.2019.8682391
10.1016/j.jvoice.2015.06.010
10.1109/ACCESS.2021.3117665
10.1016/j.bica.2015.10.004
10.1109/TAFFC.2015.2457417
10.21437/Interspeech.2020-2160
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
DBID 97E
ESBDL
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/LSP.2022.3199669
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE Open Access Journals
IEEE All-Society Periodicals Package (ASPP) 1998-Present
IEEE Electronic Library Online
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Technology Research Database
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library Online
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-2361
EndPage 1867
ExternalDocumentID 10_1109_LSP_2022_3199669
9860049
Genre orig-research
GrantInformation_xml – fundername: Academy of Finland
  grantid: 330139
  funderid: 10.13039/501100002341
– fundername: Aalto-Yliopisto; Aalto University
  funderid: 10.13039/501100002666
GroupedDBID -~X
.DC
0R~
0ZS
29I
3EH
4.4
5GY
5VS
6IK
85S
97E
AAJGR
AASAJ
AAYJJ
ABFSI
ABQJQ
ACGFO
ACGFS
ACIWK
AENEX
AETIX
AI.
AIBXA
AKJIK
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
ESBDL
F5P
HZ~
H~9
ICLAB
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIE
RIG
RNS
TAE
TN5
VH1
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c333t-7775bf3ec45f330483e3cc9db32257d29f4daa7083d925c78319972e8e7b84e23
IEDL.DBID RIE
ISSN 1070-9908
IngestDate Thu Oct 10 17:01:05 EDT 2024
Fri Aug 23 02:31:17 EDT 2024
Mon Nov 04 12:06:30 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c333t-7775bf3ec45f330483e3cc9db32257d29f4daa7083d925c78319972e8e7b84e23
ORCID 0000-0002-7987-1735
0000-0002-4244-0253
0000-0002-8173-9418
OpenAccessLink https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/document/9860049
PQID 2711063040
PQPubID 75747
PageCount 5
ParticipantIDs proquest_journals_2711063040
crossref_primary_10_1109_LSP_2022_3199669
ieee_primary_9860049
PublicationCentury 2000
PublicationDate 20220000
2022-00-00
20220101
PublicationDateYYYYMMDD 2022-01-01
PublicationDate_xml – year: 2022
  text: 20220000
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE signal processing letters
PublicationTitleAbbrev LSP
PublicationYear 2022
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref12
ref34
ref15
ref14
ref31
ref30
ref11
Nocedal (ref25) 2006
ref10
ref1
ref16
ref19
ref18
Fraile (ref17) 2011
(ref32) 1993
Blitzer (ref2) 1992
ref23
ref26
ref20
ref22
ref21
ref28
ref27
ref29
ref8
ref7
ref9
ref4
ref3
ref6
ref5
Ptzer (ref24)
References_xml – ident: ref8
  doi: 10.1109/TSP.2014.2326991
– ident: ref9
  doi: 10.1002/cpa.21413
– ident: ref30
  doi: 10.1371/journal.pone.0177678
– ident: ref1
  doi: 10.33588/rn.7011.2019414
– ident: ref6
  doi: 10.21437/Interspeech.2018-1351
– ident: ref13
  doi: 10.21437/Interspeech.2016-1062
– ident: ref15
  doi: 10.1109/JBHI.2015.2467375
– ident: ref5
  doi: 10.1109/TASLP.2021.3078364
– ident: ref23
  doi: 10.1121/1.1945807
– ident: ref21
  doi: 10.1109/ACCESS.2020.2984925
– ident: ref10
  doi: 10.1109/TPAMI.2012.230
– ident: ref18
  doi: 10.1109/TNSRE.2016.2533582
– ident: ref22
  doi: 10.1121/1.420344
– ident: ref34
  doi: 10.1109/TBME.2008.923769
– ident: ref3
  doi: 10.1016/j.csl.2021.101205
– volume-title: Information Technology-coding of Moving Pictures and Associated Audio for Digital Storage Media up to 1.5 Mbit/s. Part 3: Audio
  year: 1993
  ident: ref32
– ident: ref11
  doi: 10.1155/2020/3215681
– ident: ref14
  doi: 10.21437/Interspeech.2019-2903
– ident: ref19
  doi: 10.1109/TBME.2003.820386
– ident: ref28
  article-title: openSMILE 3.0.1
– ident: ref16
  doi: 10.1016/j.neucom.2015.02.085
– year: 1992
  ident: ref2
  article-title: Neurologic Disorders of the Larynx
  contributor:
    fullname: Blitzer
– ident: ref29
  doi: 10.21437/Interspeech.2010-739
– ident: ref20
  doi: 10.1109/10.709563
– ident: ref7
  doi: 10.1109/ICASSP.2019.8682391
– start-page: 67
  volume-title: Proc. Models Anal. Vocal Emissions Biomed. Appl.
  year: 2011
  ident: ref17
  article-title: Spectral analysis of pathological voices: Sustained vowels vs. running speech
  contributor:
    fullname: Fraile
– ident: ref12
  doi: 10.1016/j.jvoice.2015.06.010
– ident: ref4
  doi: 10.1109/ACCESS.2021.3117665
– ident: ref31
  doi: 10.1016/j.bica.2015.10.004
– ident: ref27
  doi: 10.1109/TAFFC.2015.2457417
– ident: ref26
  doi: 10.21437/Interspeech.2020-2160
– ident: ref24
  article-title: Saarbrcken voice database, institute of phonetics, university of saarland
  contributor:
    fullname: Ptzer
– volume-title: Numerical Optimization
  year: 2006
  ident: ref25
  contributor:
    fullname: Nocedal
SSID ssj0008185
Score 2.4401517
Snippet In recent years, developing robust systems for automatic detection of pathological speech has attracted increasing interest among researchers and clinicians....
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Publisher
StartPage 1863
SubjectTerms Artificial neural networks
CNN
Convolutional neural networks
Feature extraction
MFCC
MP3 compression
Multilayer perceptrons
Multilayers
openSMILE features
pathological speech
Pathology
Scattering
Signal classification
Speech compression
Task analysis
Wavelet scattering network
Wavelet transforms
Wireless sensor networks
Title End-to-End Pathological Speech Detection Using Wavelet Scattering Network
URI https://ieeexplore.ieee.org/document/9860049
https://www.proquest.com/docview/2711063040
Volume 29
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDLbGTnDgjRgM1AMXJDJK0jbJEcGmgdiENCZ2q5rEFRJSN0F34deTpO3E68CpVdW6kZ3Ynx3bATjj3KpAJRNirRElEc8SIiMtSKaY0VGMIc1c7fBonAyn0f0snrXgYlULg4g--Qx77tbv5Zu5XrpQ2aUUiUO0a7DGpaxqtVZa1xmeKr8wJFbDimZLMpSXD5NH6whSav1Th-7lNxPkz1T5pYi9dRlswagZV5VU8tpblqqnP360bPzvwLdhs4aZwXU1L3aghcUubHxpPrgHd_3CkHJO7CV4zMqVFgwmC0T9Etxi6dO0isCnFQTPmTujogwm2rfkdI_GVQ75PkwH_aebIakPViCaMVZaRM1jlTO0wshdPEMwZFpLo9zq5obKPDJZxi06M5LGmgvHOE5RIFciQsoOoF3MCzyEQCkUnGs0muuIhblES89QvGK5COMw78B5w-t0UfXPSL3fEcrUyiV1cklruXRgz7Fu9V7NtQ50G-Gk9QJ7Tym3JBL7r_Do76-OYd3RrqIlXWiXb0s8sfihVKd-4nwCaobA7A
link.rule.ids 315,783,787,799,4031,27935,27936,27937,55086
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDLZ4HIADr4EYzx64IJGtJG2THBEPDdgmpG2CW9UkrpCQugm6C7-eJO0mXgdOrao0iezE_uzYDsAp51YEKpkQq40oiXiWEBlpQTLFjI5iDGnmcod7_aQziu6f4-cFOJ_nwiCiDz7Dlnv1Z_lmrKfOVdaWInGIdhGWLa4WSZWtNZe7TvVUEYYhsTJWzA4lQ9nuDh6tKUiptVAdvpfflJC_VeWXKPb65XYDerOZVWElr61pqVr640fRxv9OfRPWa6AZXFYrYwsWsNiGtS_lBxtwd1MYUo6JfQSPWTmXg8FggqhfgmssfaBWEfjAguApc7dUlMFA-6Kc7lO_iiLfgdHtzfCqQ-qrFYhmjJUWU_NY5QwtO3Ln0RAMmdbSKLe_uaEyj0yWcYvPjKSx5sIRjlMUyJWIkLJdWCrGBe5BoBQKzjUazXXEwlyi7c9QvGC5COMwb8LZjNbppKqgkXrLI5Sp5Uvq-JLWfGlCw5Fu3q6mWhMOZ8xJ6y32nlJuu0jsWOH-33-dwEpn2Oum3bv-wwGsunEq38khLJVvUzyyaKJUx34RfQKu1MQ3
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=End-to-End+Pathological+Speech+Detection+Using+Wavelet+Scattering+Network&rft.jtitle=IEEE+signal+processing+letters&rft.au=Reddy%2C+Mittapalle+Kiran&rft.au=Keerthana%2C+Yagnavajjula+Madhu&rft.au=Alku%2C+Paavo&rft.date=2022&rft.issn=1070-9908&rft.eissn=1558-2361&rft.volume=29&rft.spage=1863&rft.epage=1867&rft_id=info:doi/10.1109%2FLSP.2022.3199669&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_LSP_2022_3199669
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1070-9908&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1070-9908&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1070-9908&client=summon