MiKASA: Multi-Key-Anchor & Scene-Aware Transformer for 3D Visual Grounding
3D visual grounding involves matching natural language descriptions with their corresponding objects in 3D spaces. Existing methods often face challenges with accuracy in object recognition and struggle in interpreting complex linguistic queries, particularly with descriptions that involve multiple...
Saved in:
Published in | 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) pp. 14131 - 14140 |
---|---|
Main Authors | , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
16.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Be the first to leave a comment!