0449 Comparing Deep Feature Representations to Improve Robustness to Subject Variation in Snore Detection

Introduction Snoring is an indicator of obstructive sleep apnea (OSA), which contributes to cardiovascular disease and mortality. To better study snoring, audio-based snore detection methods using different feature representations have been proposed. However, there is a gap in (1) baseline compariso...

Full description

Saved in:

Bibliographic Details
Published in	Sleep (New York, N.Y.) Vol. 42; no. Supplement_1; pp. A180 - A181
Main Authors	Chaturvedi, Akhil, Goh, Chun Fan, Raina, Rahul Shailesh, Shimada, Kenji
Format	Journal Article
Language	English
Published	Westchester Oxford University Press 13.04.2019
Subjects	Accuracy
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Introduction Snoring is an indicator of obstructive sleep apnea (OSA), which contributes to cardiovascular disease and mortality. To better study snoring, audio-based snore detection methods using different feature representations have been proposed. However, there is a gap in (1) baseline comparisons of different deep learning features, and (2) analysis of the robustness of snore detection in the presence of subject variation. Through an ablation study, we quantified the effect of features. As a measure of robustness to subject variation, we employed a leave-one-subject-out scheme. Methods We used 1D raw signals or 2D Mel-frequency-cepstrum-coefficients (MFCC) of the signals as inputs to fully connected, convolutional, long-short-term-memory (LSTM) cell-based recurrent, very deep networks (VGG) or combinations of them. The classifiers were support-vector-machines (SVM) or neural networks. The ablation study consists of seven modular combinations of the elements mentioned above. For training, we used 81,207 snore and non-snore 5s- segments from the snore channel of polysomnography (PSG) data obtained from 19 subjects. A leave-one-subject-out scheme, in which each subject is tested using the training data from other subjects, is used to simulate subject variation. We then measure the variation in performance (F1-score) over different subjects using the standard deviation (SD). Results Features learned from 2D convolutional, LSTM, and very deep network (VGG) significantly improve the classification accuracy and robustness of snore detection. Applying these findings, we developed a 2D convolutional LSTM network model that combines spectral and temporal features, resulting in the highest accuracy (mean F1-score = 0.8812) and the second-best robustness. Very deep convolutional networks (VGG-SVM) has the most robust performance (SD of F1-score = 0.0568). Conclusion We provide a baseline comparison to understand the effect of feature representation on snore classification. Besides accuracy, we introduce robustness as another performance metric. Methods with the best accuracy do not necessarily give the best robustness. Features extracted from 2D-convolutional and LSTM network results in the best accuracy, but those from very deep convolutional networks (VGG) have the best robustness. Support (If Any) Supported by Philips Respironics.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0161-8105 1550-9109
DOI:	10.1093/sleep/zsz067.448