Amplitude Spectrogram Prediction from Mel-Frequency Cepstrum Coefficients Using Deep Neural Networks

Timbre conversion of musical instrument sounds, utilizing deep neural networks (DNNs), has been extensively researched and continues to generate significant interest in the development of more advanced techniques. We propose a novel algorithm for timbre conversion that utilizes a variational autoenc...

Full description

Saved in:

Bibliographic Details
Published in	Journal of Signal Processing Vol. 27; no. 6; pp. 207 - 211
Main Authors	Kawaguchi, Shoya, Kitamura, Daichi
Format	Journal Article
Language	English Japanese
Published	Tokyo Research Institute of Signal Processing, Japan 01.11.2023 Japan Science and Technology Agency
Subjects	Algorithms Amplitudes Artificial neural networks deep learning mel-frequency cepstrum coefficient Musical instruments Spectrograms timbre conversion
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Timbre conversion of musical instrument sounds, utilizing deep neural networks (DNNs), has been extensively researched and continues to generate significant interest in the development of more advanced techniques. We propose a novel algorithm for timbre conversion that utilizes a variational autoencoder. However, this system must be capable of predicting the amplitude spectrogram from the melfrequency cepstrum coefficient (MFCC). This research aims to build a DNN-based decoder that utilizes the MFCC and time-frame-wise total amplitude as inputs to predict the amplitude spectrogram. Experiments conducted using a musical instrument sound dataset show that a decoder incorporating bidirectional long short-term memory yields accurate predictions of amplitude spectrograms.
ISSN:	1342-6230 1880-1013
DOI:	10.2299/jsp.27.207