Intelligibility assessment of cleft lip and palate speech using Gaussian posteriograms based on joint spectro-temporal features

Intelligibility is considered as one of the primary measures for speech rehabilitation of individuals with a cleft lip and palate (CLP). Currently, speech processing and machine-learning-based objective methods are gaining more research interest as a way to quantify speech intelligibility. In this w...

Full description

Saved in:
Bibliographic Details
Published inThe Journal of the Acoustical Society of America Vol. 144; no. 4; p. 2413
Main Authors Kalita, Sishir, Mahadeva Prasanna, S R, Dandapat, S
Format Journal Article
LanguageEnglish
Published United States 01.10.2018
Online AccessGet more information

Cover

Loading…
More Information
Summary:Intelligibility is considered as one of the primary measures for speech rehabilitation of individuals with a cleft lip and palate (CLP). Currently, speech processing and machine-learning-based objective methods are gaining more research interest as a way to quantify speech intelligibility. In this work, joint spectro-temporal features computed from a time-frequency representation of speech are explored to derive speech representations based on Gaussian posteriograms. A comparative framework using dynamic time warping (DTW) is used to quantify the intelligibility of child CLP speech. The DTW distance is used to score sentence-level intelligibility and tested for correlation with perceptual intelligibility ratings obtained from expert speech-language pathologists. A baseline DTW system using the conventional Mel-frequency cepstral coefficients (MFCCs) is also developed to compare the performance of the proposed system. Spearman's rank correlation coefficient between the objective intelligibility scores and the perceptual intelligibility rating is studied. A Williams significance test is conducted to assess the statistical significance of the correlation difference between the methods. The results show that the system based on joint spectro-temporal features significantly outperforms the MFCC-based system.
ISSN:1520-8524
DOI:10.1121/1.5064463