Reducing dimensionality of spectrograms using convolutional autoencoders

Under the “curse of dimensionality,” distance-based algorithms, such as k-means or Gaussian mixture model clustering, can lose meaning and interpretability in high-dimensional space. Acoustic data, specifically spectrograms, are subject to such limitations due to their high dimensionality: for examp...

Full description

Saved in:
Bibliographic Details
Published inThe Journal of the Acoustical Society of America Vol. 153; no. 3; p. A178
Main Authors Jenkins, William F., Gerstoft, Peter, Chien, Chih-Chieh, Ozanich, Emma
Format Journal Article
LanguageEnglish
Published 01.03.2023
Online AccessGet full text

Cover

Loading…
More Information
Summary:Under the “curse of dimensionality,” distance-based algorithms, such as k-means or Gaussian mixture model clustering, can lose meaning and interpretability in high-dimensional space. Acoustic data, specifically spectrograms, are subject to such limitations due to their high dimensionality: for example, a spectrogram with 100 time- and 100 frequency-bins contains 104 pixels, and its vectorized form constitutes a point in 104-dimensional space. In this talk, we look at four papers that used autoencoding convolutional neural networks to extract salient features of real data. The convolutional autoencoder consists of an encoder which compresses spectrograms into a low-dimensional latent feature space, and a decoder which seeks to reconstruct the original spectrogram from the latent feature space. The error between the original spectrogram and reconstruction is used to train the network. Once trained, the salient features of the data are embedded in the latent space and algorithms can be applied to the lower-dimensional latent space. We demonstrate how lower-dimensional representations result in interpretable clustering of complex physical data, which can contribute to reducing errors in classification and clustering tasks and enable exploratory analysis of large data sets.
ISSN:0001-4966
1520-8524
DOI:10.1121/10.0018582