Active Speaker Detection Using Audio, Visual, and Depth Modalities: A Survey

The rapid progress of multimodal signal processing in recent years has cleared the way for novel applications in human-computer interaction, surveillance, and telecommunication. Active Speaker Detection (ASD) is a critical pre-processing step with numerous applications such as voice recognition, spe...

Full description

Saved in:

Bibliographic Details
Published in	IEEE access Vol. 12; pp. 96617 - 96634
Main Authors	Nur Aisyah Mohd Robi, Siti, Atiff Zakwan Mohd Ariffin, Muhammad, Mohd Izhar, Mohd Azri, Ahmad, Norulhusna, Mad Kaidi, Hazilah
Format	Journal Article
Language	English
Published	Piscataway IEEE 2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accuracy Active speaker detection Audio visual equipment audio-visual processing Audio-visual systems Cameras deep learning Detection algorithms Microphones multi-modalities Neural networks Noise reduction Reviews Speech recognition Surveys Visualization voice activity detection Voice recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The rapid progress of multimodal signal processing in recent years has cleared the way for novel applications in human-computer interaction, surveillance, and telecommunication. Active Speaker Detection (ASD) is a critical pre-processing step with numerous applications such as voice recognition, speaker diarization, and noise reduction. This paper comprehensively reviews ASD, including various ASD methods and datasets based on these three modalities - audio, visual and/or depth modalities. ASD methods are broadly categorised into two categories: single modality ASD and multi-modality ASD. This review looks at the most common ASD modalities, which include audio-based ASD (A-ASD), visual-based ASD (V-ASD), audio-visual ASD (AV-ASD), and audio-visual-depth ASD (AVD-ASD). Each strategy is well-detailed, including model-based and neural network-based approaches. Finally, the challenges and future research opportunities are highlighted in order to expand its broader use.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2024.3426670