Parody Detection Using Source-Target Attention with Teacher-Forced Lyrics
We propose an approach to detect parodies in singing voices, analyzing attention weights derived from an encoder-decoder-based automatic speech recognition (ASR) model. Here, parodies involve modifying and singing existing lyrics written for songs. Sharing such modified singing voices on the interne...
Saved in:
Published in | ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 1151 - 1155 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
14.04.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We propose an approach to detect parodies in singing voices, analyzing attention weights derived from an encoder-decoder-based automatic speech recognition (ASR) model. Here, parodies involve modifying and singing existing lyrics written for songs. Sharing such modified singing voices on the internet carries the potential risk of copyright infringement, posing the need of an automatic parody detection system. Given that songs typically comprise fixed lyrics, the pair of speech and its corresponding transcription can be used to analyze singing voices. In this work, we feed singing voices into an encoder-decoder-based ASR system and perform the decoding process using the corresponding lyrics in a teacher-forcing manner. Here, when the ASR model encounters a singing voice that includes a parody segment, there is a potential for the attention weights between the singing voice and the correct lyrics become collapsed. By identifying such misalignments in the attention weights, we attempt to detect parodies in singing voices. Experimental comparisons using real karaoke singing voice data demonstrate that the developed system achieves highly accurate parody detection performance by effectively identifying misalignments. |
---|---|
ISSN: | 2379-190X |
DOI: | 10.1109/ICASSP48485.2024.10446577 |