Generating High-quality Symbolic Music Using Fine-grained Discriminators

Existing symbolic music generation methods usually utilize discriminator to improve the quality of generated music via global perception of music. However, considering the complexity of information in music, such as rhythm and melody, a single discriminator cannot fully reflect the differences in th...

Full description

Saved in:

Bibliographic Details
Main Authors	Zhang, Zhedong, Li, Liang, Zhang, Jiehua, Hu, Zhenghui, Wang, Hongkui, Yan, Chenggang, Yang, Jian, Qi, Yuankai
Format	Journal Article
Language	English
Published	03.08.2024
Subjects	Computer Science - Artificial Intelligence Computer Science - Sound
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Existing symbolic music generation methods usually utilize discriminator to improve the quality of generated music via global perception of music. However, considering the complexity of information in music, such as rhythm and melody, a single discriminator cannot fully reflect the differences in these two primary dimensions of music. In this work, we propose to decouple the melody and rhythm from music, and design corresponding fine-grained discriminators to tackle the aforementioned issues. Specifically, equipped with a pitch augmentation strategy, the melody discriminator discerns the melody variations presented by the generated samples. By contrast, the rhythm discriminator, enhanced with bar-level relative positional encoding, focuses on the velocity of generated notes. Such a design allows the generator to be more explicitly aware of which aspects should be adjusted in the generated music, making it easier to mimic human-composed music. Experimental results on the POP909 benchmark demonstrate the favorable performance of the proposed method compared to several state-of-the-art methods in terms of both objective and subjective metrics.
DOI:	10.48550/arxiv.2408.01696