MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding

3D visual grounding involves matching natural language descriptions with their corresponding objects in 3D spaces. Existing methods often face challenges with accuracy in object recognition and struggle in interpreting complex linguistic queries, particularly with descriptions that involve multiple...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 14131 - 14140
Main Authors Chang, Chun-Peng, Wang, Shaoxiang, Pagani, Alain, Stricker, Didier
Format Conference Proceeding
LanguageEnglish
Published IEEE 16.06.2024
Subjects
Online AccessGet full text

Cover

Loading…