POP-VQA - Privacy preserving, On-device, Personalized Visual Question Answering
The next generation of device smartness needs to go beyond being able to understand basic user commands. As our systems become more efficient, they need to be taught to understand user interactions and intents from all possible input modalities. This is where the recent advent of large scale multi-m...
Saved in:
Published in | 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) pp. 8455 - 8464 |
---|---|
Main Authors | , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
03.01.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The next generation of device smartness needs to go beyond being able to understand basic user commands. As our systems become more efficient, they need to be taught to understand user interactions and intents from all possible input modalities. This is where the recent advent of large scale multi-modal models can form the foundation for next-gen technologies. However, the true power of such interactive systems can only be realized with privacy conserving personalization. In this paper, we propose an on-device visual question answering system that generates personalized answers using on-device user knowledge graph. These systems have the potential to serve as a fundamental ground-work for the development of genuinely intelligent and tailored assistants, targeted specifically to the needs and preferences of each individual. We validate our model performance on both in-realm, public datasets and personal user data. Our results show consistent performance increase across both tasks, with an absolute improvement of ≈36% with KVQA data-set on 1-hop inferences and ≈6% improvement on user personal data. We also conduct and showcase user-study results to validate our hypothesis of the need and relevance of proposed system. |
---|---|
ISSN: | 2642-9381 |
DOI: | 10.1109/WACV57701.2024.00828 |