ViolinDiff: Enhancing Expressive Violin Synthesis with Pitch Bend Conditioning
Modeling the natural contour of fundamental frequency (F0) plays a critical role in music audio synthesis. However, transcribing and managing multiple F0 contours in polyphonic music is challenging, and explicit F0 contour modeling has not yet been explored for polyphonic instrumental synthesis. In...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
19.09.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Modeling the natural contour of fundamental frequency (F0) plays a critical
role in music audio synthesis. However, transcribing and managing multiple F0
contours in polyphonic music is challenging, and explicit F0 contour modeling
has not yet been explored for polyphonic instrumental synthesis. In this paper,
we present ViolinDiff, a two-stage diffusion-based synthesis framework. For a
given violin MIDI file, the first stage estimates the F0 contour as pitch bend
information, and the second stage generates mel spectrogram incorporating these
expressive details. The quantitative metrics and listening test results show
that the proposed model generates more realistic violin sounds than the model
without explicit pitch bend modeling. Audio samples are available online:
daewoung.github.io/ViolinDiff-Demo. |
---|---|
DOI: | 10.48550/arxiv.2409.12477 |