An investigation of pre-upsampling generative modelling and Generative Adversarial Networks in audio super resolution
There have been several successful deep learning models that perform audio super-resolution. Many of these approaches involve using preprocessed feature extraction which requires a lot of domain-specific signal processing knowledge to implement. Convolutional Neural Networks (CNNs) improved upon thi...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
30.09.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | There have been several successful deep learning models that perform audio
super-resolution. Many of these approaches involve using preprocessed feature
extraction which requires a lot of domain-specific signal processing knowledge
to implement. Convolutional Neural Networks (CNNs) improved upon this framework
by automatically learning filters. An example of a convolutional approach is
AudioUNet, which takes inspiration from novel methods of upsampling images. Our
paper compares the pre-upsampling AudioUNet to a new generative model that
upsamples the signal before using deep learning to transform it into a more
believable signal. Based on the EDSR network for image super-resolution, the
newly proposed model outperforms UNet with a 20% increase in log spectral
distance and a mean opinion score of 4.06 compared to 3.82 for the two times
upsampling case. AudioEDSR also has 87% fewer parameters than AudioUNet. How
incorporating AudioUNet into a Wasserstein GAN (with gradient penalty)
(WGAN-GP) structure can affect training is also explored. Finally the effects
artifacting has on the current state of the art is analysed and solutions to
this problem are proposed. The methods used in this paper have broad
applications to telephony, audio recognition and audio generation tasks. |
---|---|
DOI: | 10.48550/arxiv.2109.14994 |