Deep Audio-visual Learning: A Survey

Audio-visual learning, aimed at exploiting the relationship between audio and visual modalities, has drawn considerable attention since deep learning started to be used successfully. Researchers tend to leverage these two modalities to improve the performance of previously considered single-modality...

Full description

Saved in:

Bibliographic Details
Published in	International journal of automation and computing Vol. 18; no. 3; pp. 351 - 376
Main Authors	Zhu, Hao, Luo, Man-Di, Wang, Rui, Zheng, Ai-Hua, He, Ran
Format	Journal Article
Language	English
Published	Beijing Institute of Automation, Chinese Academy of Sciences 01.06.2021 Springer Nature B.V
Subjects	Artificial intelligence Automation CAE) and Design Cognitive tasks Computer Applications Computer-Aided Engineering (CAD Control Deep learning Engineering Localization Machine learning Mechatronics Review Robotics Semantics Signal processing Sound Speech Visual tasks Voice recognition audio-visual separation and localization generative models Deep audio-visual learning correspondence learning representation learning
Online Access	Get full text
ISSN	1476-8186 2153-182X 1751-8520 2153-1838
DOI	10.1007/s11633-021-1293-0

Cover

Loading…

More Information
Summary:	Audio-visual learning, aimed at exploiting the relationship between audio and visual modalities, has drawn considerable attention since deep learning started to be used successfully. Researchers tend to leverage these two modalities to improve the performance of previously considered single-modality tasks or address new challenging problems. In this paper, we provide a comprehensive survey of recent audio-visual learning development. We divide the current audio-visual learning tasks into four different subfields: audio-visual separation and localization, audio-visual correspondence learning, audio-visual generation, and audio-visual representation learning. State-of-the-art methods, as well as the remaining challenges of each subfield, are further discussed. Finally, we summarize the commonly used datasets and challenges.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1476-8186 2153-182X 1751-8520 2153-1838
DOI:	10.1007/s11633-021-1293-0