A deep-learning-based method for pitch shifting and tone complement

Most instruments have an effective range of no more than four octaves, and with the Musical Instrument Digital Interface (MIDI) standard supporting up to 128 different pitches, it is necessary to expand the range of an instrument's timbre artificially. Experiments demonstrate that the time doma...

Full description

Saved in:
Bibliographic Details
Published in2025 4th Asia Conference on Algorithms, Computing and Machine Learning (CACML) pp. 1 - 6
Main Authors Chen, Haiyong, Wang, Xiaoman, Zhang, Chaoyu, Wu, Heng, He, Chunhua, Deng, Songqing
Format Conference Proceeding
LanguageEnglish
Published IEEE 28.03.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Most instruments have an effective range of no more than four octaves, and with the Musical Instrument Digital Interface (MIDI) standard supporting up to 128 different pitches, it is necessary to expand the range of an instrument's timbre artificially. Experiments demonstrate that the time domain envelope and the frequency domain characteristics of the sample playback synthesized tones differ quite from the actual tones. Deep learning techniques have been widely used in timbre synthesis and music composition, but no one has applied them to pitch shifting and tone complement. This paper introduces the pitch deviation problem to CNN and RNN, and the time-domain similarity between synthetic pitch and actual pitch reaches 99.99%, more than 29% higher than the traditional method. It is found that high-frequency noise is present in the synthesized tones, so the synthesized audio is filtered using different orders of Butterworth low-pass filters (LPF). The results show that the first-order LPF gives the best results in both time and frequency domains, removing high-frequency noise. This method can be used in home-convenient medical devices that enable music therapy to save storage space for tones and enable tonal complementation.
DOI:10.1109/CACML64929.2025.11010935