POP-VQA - Privacy preserving, On-device, Personalized Visual Question Answering

The next generation of device smartness needs to go beyond being able to understand basic user commands. As our systems become more efficient, they need to be taught to understand user interactions and intents from all possible input modalities. This is where the recent advent of large scale multi-m...

Full description

Saved in:

Bibliographic Details
Published in	2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) pp. 8455 - 8464
Main Authors	Sahu, Pragya Paramita, Raut, Abhishek, Samant, Jagdish Singh, Gorijala, Mahesh, Lakshminarayanan, Vignesh, Bhaskar, Pinaki
Format	Conference Proceeding
Language	English
Published	IEEE 03.01.2024
Subjects	Algorithms Applications Biological system modeling Computational modeling etc Generative models for image Optical character recognition Privacy Smartphones / end user devices System performance Training video Vision + language and/or other modalities Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The next generation of device smartness needs to go beyond being able to understand basic user commands. As our systems become more efficient, they need to be taught to understand user interactions and intents from all possible input modalities. This is where the recent advent of large scale multi-modal models can form the foundation for next-gen technologies. However, the true power of such interactive systems can only be realized with privacy conserving personalization. In this paper, we propose an on-device visual question answering system that generates personalized answers using on-device user knowledge graph. These systems have the potential to serve as a fundamental ground-work for the development of genuinely intelligent and tailored assistants, targeted specifically to the needs and preferences of each individual. We validate our model performance on both in-realm, public datasets and personal user data. Our results show consistent performance increase across both tasks, with an absolute improvement of ≈36% with KVQA data-set on 1-hop inferences and ≈6% improvement on user personal data. We also conduct and showcase user-study results to validate our hypothesis of the need and relevance of proposed system.
ISSN:	2642-9381
DOI:	10.1109/WACV57701.2024.00828