Parody Detection Using Source-Target Attention with Teacher-Forced Lyrics

We propose an approach to detect parodies in singing voices, analyzing attention weights derived from an encoder-decoder-based automatic speech recognition (ASR) model. Here, parodies involve modifying and singing existing lyrics written for songs. Sharing such modified singing voices on the interne...

Full description

Saved in:

Bibliographic Details
Published in	ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 1151 - 1155
Main Authors	Ariga, Tomoki, Higuchi, Yosuke, Hayasaka, Kazutoshi, Okamoto, Naoki, Ogawa, Tetsuji
Format	Conference Proceeding
Language	English
Published	IEEE 14.04.2024
Subjects	Acoustics Analytical models Copyright protection Decoding Feeds karaoke parody detection Signal processing singing voice Source-target attention Speech processing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We propose an approach to detect parodies in singing voices, analyzing attention weights derived from an encoder-decoder-based automatic speech recognition (ASR) model. Here, parodies involve modifying and singing existing lyrics written for songs. Sharing such modified singing voices on the internet carries the potential risk of copyright infringement, posing the need of an automatic parody detection system. Given that songs typically comprise fixed lyrics, the pair of speech and its corresponding transcription can be used to analyze singing voices. In this work, we feed singing voices into an encoder-decoder-based ASR system and perform the decoding process using the corresponding lyrics in a teacher-forcing manner. Here, when the ASR model encounters a singing voice that includes a parody segment, there is a potential for the attention weights between the singing voice and the correct lyrics become collapsed. By identifying such misalignments in the attention weights, we attempt to detect parodies in singing voices. Experimental comparisons using real karaoke singing voice data demonstrate that the developed system achieves highly accurate parody detection performance by effectively identifying misalignments.
ISSN:	2379-190X
DOI:	10.1109/ICASSP48485.2024.10446577