Social Signal Detection by Probabilistic Sampling DNN Training

When our task is to detect social signals such as laughter and filler events in an audio recording, the most straightforward way is to apply a Hidden Markov Model-or a Hidden Markov Model/Deep Neural Network (HMM/DNN) hybrid, which is considered state-of-the-art nowadays. In this hybrid model, the D...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on affective computing Vol. 11; no. 1; pp. 164 - 177
Main Authors Gosztolya, Gabor, Grosz, Tamas, Toth, Laszlo
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.01.2020
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:When our task is to detect social signals such as laughter and filler events in an audio recording, the most straightforward way is to apply a Hidden Markov Model-or a Hidden Markov Model/Deep Neural Network (HMM/DNN) hybrid, which is considered state-of-the-art nowadays. In this hybrid model, the DNN component is trained on frame-level samples of the classes we are looking for. In such event detection tasks, however, the training labels are seriously imbalanced, as typically only a small fraction of the training data corresponds to these social signals, while the bulk of the utterances consists of speech segments or silence. A strong imbalance of the training classes is known to cause difficulties during DNN training. To alleviate these problems, here we apply the technique called probabilistic sampling, which seeks to balance the class distribution. Probabilistic sampling is a mathematically well-founded combination of upsampling and downsampling, which was found to outperform both of these simple resampling approaches. With this strategy, we managed to achieve a 7-8 percent relative error reduction both at the segment level and frame level, and we efficiently reduced the DNN training times as well.
ISSN:1949-3045
1949-3045
DOI:10.1109/TAFFC.2018.2871450