0449 Comparing Deep Feature Representations to Improve Robustness to Subject Variation in Snore Detection
Introduction Snoring is an indicator of obstructive sleep apnea (OSA), which contributes to cardiovascular disease and mortality. To better study snoring, audio-based snore detection methods using different feature representations have been proposed. However, there is a gap in (1) baseline compariso...
Saved in:
Published in | Sleep (New York, N.Y.) Vol. 42; no. Supplement_1; pp. A180 - A181 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Westchester
Oxford University Press
13.04.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Introduction Snoring is an indicator of obstructive sleep apnea (OSA), which contributes to cardiovascular disease and mortality. To better study snoring, audio-based snore detection methods using different feature representations have been proposed. However, there is a gap in (1) baseline comparisons of different deep learning features, and (2) analysis of the robustness of snore detection in the presence of subject variation. Through an ablation study, we quantified the effect of features. As a measure of robustness to subject variation, we employed a leave-one-subject-out scheme. Methods We used 1D raw signals or 2D Mel-frequency-cepstrum-coefficients (MFCC) of the signals as inputs to fully connected, convolutional, long-short-term-memory (LSTM) cell-based recurrent, very deep networks (VGG) or combinations of them. The classifiers were support-vector-machines (SVM) or neural networks. The ablation study consists of seven modular combinations of the elements mentioned above. For training, we used 81,207 snore and non-snore 5s- segments from the snore channel of polysomnography (PSG) data obtained from 19 subjects. A leave-one-subject-out scheme, in which each subject is tested using the training data from other subjects, is used to simulate subject variation. We then measure the variation in performance (F1-score) over different subjects using the standard deviation (SD). Results Features learned from 2D convolutional, LSTM, and very deep network (VGG) significantly improve the classification accuracy and robustness of snore detection. Applying these findings, we developed a 2D convolutional LSTM network model that combines spectral and temporal features, resulting in the highest accuracy (mean F1-score = 0.8812) and the second-best robustness. Very deep convolutional networks (VGG-SVM) has the most robust performance (SD of F1-score = 0.0568). Conclusion We provide a baseline comparison to understand the effect of feature representation on snore classification. Besides accuracy, we introduce robustness as another performance metric. Methods with the best accuracy do not necessarily give the best robustness. Features extracted from 2D-convolutional and LSTM network results in the best accuracy, but those from very deep convolutional networks (VGG) have the best robustness. Support (If Any) Supported by Philips Respironics. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 0161-8105 1550-9109 |
DOI: | 10.1093/sleep/zsz067.448 |