Predicting Goal-Directed Human Attention Using Inverse Reinforcement Learning

Human gaze behavior prediction is important for behavioral vision and for computer vision applications. Most models mainly focus on predicting free-viewing behavior using saliency maps, but do not generalize to goal-directed behavior, such as when a person searches for a visual target object. We pro...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) Vol. 2020; pp. 190 - 199
Main Authors	Yang, Zhibo, Huang, Lihan, Chen, Yupei, Wei, Zijun, Ahn, Seoyoung, Zelinsky, Gregory, Samaras, Dimitris, Hoai, Minh
Format	Conference Proceeding Journal Article
Language	English
Published	United States IEEE 01.06.2020
Subjects	Computational modeling Context modeling Learning (artificial intelligence) Predictive models Search problems Task analysis Visualization
Online Access	Get full text

Cover

Loading…

Abstract	Human gaze behavior prediction is important for behavioral vision and for computer vision applications. Most models mainly focus on predicting free-viewing behavior using saliency maps, but do not generalize to goal-directed behavior, such as when a person searches for a visual target object. We propose the first inverse reinforcement learning (IRL) model to learn the internal reward function and policy used by humans during visual search. We modeled the viewer's internal belief states as dynamic contextual belief maps of object locations. These maps were learned and then used to predict behavioral scanpaths for multiple target categories. To train and evaluate our IRL model we created COCO-Search18, which is now the largest dataset of high-quality search fixations in existence. COCO-Search18 has 10 participants searching for each of 18 target-object categories in 6202 images, making about 300,000 goal-directed fixations. When trained and evaluated on COCO-Search18, the IRL model outperformed baseline models in predicting search fixation scanpaths, both in terms of similarity to human search behavior and search efficiency. Finally, reward maps recovered by the IRL model reveal distinctive target-dependent patterns of object prioritization, which we interpret as a learned object context.
AbstractList	Human gaze behavior prediction is important for behavioral vision and for computer vision applications. Most models mainly focus on predicting free-viewing behavior using saliency maps, but do not generalize to goal-directed behavior, such as when a person searches for a visual target object. We propose the first inverse reinforcement learning (IRL) model to learn the internal reward function and policy used by humans during visual search. We modeled the viewer's internal belief states as dynamic contextual belief maps of object locations. These maps were learned and then used to predict behavioral scanpaths for multiple target categories. To train and evaluate our IRL model we created COCO-Search18, which is now the largest dataset of high-quality search fixations in existence. COCO-Search18 has 10 participants searching for each of 18 target-object categories in 6202 images, making about 300,000 goal-directed fixations. When trained and evaluated on COCO-Search18, the IRL model outperformed baseline models in predicting search fixation scanpaths, both in terms of similarity to human search behavior and search efficiency. Finally, reward maps recovered by the IRL model reveal distinctive target-dependent patterns of object prioritization, which we interpret as a learned object context.Human gaze behavior prediction is important for behavioral vision and for computer vision applications. Most models mainly focus on predicting free-viewing behavior using saliency maps, but do not generalize to goal-directed behavior, such as when a person searches for a visual target object. We propose the first inverse reinforcement learning (IRL) model to learn the internal reward function and policy used by humans during visual search. We modeled the viewer's internal belief states as dynamic contextual belief maps of object locations. These maps were learned and then used to predict behavioral scanpaths for multiple target categories. To train and evaluate our IRL model we created COCO-Search18, which is now the largest dataset of high-quality search fixations in existence. COCO-Search18 has 10 participants searching for each of 18 target-object categories in 6202 images, making about 300,000 goal-directed fixations. When trained and evaluated on COCO-Search18, the IRL model outperformed baseline models in predicting search fixation scanpaths, both in terms of similarity to human search behavior and search efficiency. Finally, reward maps recovered by the IRL model reveal distinctive target-dependent patterns of object prioritization, which we interpret as a learned object context. Human gaze behavior prediction is important for behavioral vision and for computer vision applications. Most models mainly focus on predicting free-viewing behavior using saliency maps, but do not generalize to goal-directed behavior, such as when a person searches for a visual target object. We propose the first inverse reinforcement learning (IRL) model to learn the internal reward function and policy used by humans during visual search. We modeled the viewer’s internal belief states as dynamic contextual belief maps of object locations. These maps were learned and then used to predict behavioral scanpaths for multiple target categories. To train and evaluate our IRL model we created COCO-Search18, which is now the largest dataset of high-quality search fixations in existence. COCO-Search18 has 10 participants searching for each of 18 target-object categories in 6202 images, making about 300,000 goal-directed fixations. When trained and evaluated on COCO-Search18, the IRL model outperformed baseline models in predicting search fixation scanpaths, both in terms of similarity to human search behavior and search efficiency. Finally, reward maps recovered by the IRL model reveal distinctive target-dependent patterns of object prioritization, which we interpret as a learned object context.
Author	Samaras, Dimitris Zelinsky, Gregory Wei, Zijun Ahn, Seoyoung Hoai, Minh Huang, Lihan Yang, Zhibo Chen, Yupei
AuthorAffiliation	1 Stony Brook University 2 Adobe Inc
AuthorAffiliation_xml	– name: 2 Adobe Inc – name: 1 Stony Brook University
Author_xml	– sequence: 1 givenname: Zhibo surname: Yang fullname: Yang, Zhibo organization: Stony Brook University – sequence: 2 givenname: Lihan surname: Huang fullname: Huang, Lihan organization: Stony Brook University – sequence: 3 givenname: Yupei surname: Chen fullname: Chen, Yupei organization: Stony Brook University – sequence: 4 givenname: Zijun surname: Wei fullname: Wei, Zijun organization: Adobe Inc – sequence: 5 givenname: Seoyoung surname: Ahn fullname: Ahn, Seoyoung organization: Stony Brook University – sequence: 6 givenname: Gregory surname: Zelinsky fullname: Zelinsky, Gregory organization: Stony Brook University – sequence: 7 givenname: Dimitris surname: Samaras fullname: Samaras, Dimitris organization: Stony Brook University – sequence: 8 givenname: Minh surname: Hoai fullname: Hoai, Minh organization: Stony Brook University
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/34163124$$D View this record in MEDLINE/PubMed
BookMark	eNpVUE1PAjEQrQYjiPwCjdmjl8W2W_pxMSGoSIKREPG66ZZZrNntYnch8d9bAxI8TGaS9-a9N3OBWq5ygNA1wX1CsLobvc_mjHKM-xRT3McYU3GCekpIImgowuXgFHUI5knMFVGto7mNenX9GVYSSghX8hy1E0Z4QijroJeZh6U1jXWraFzpIn6wHkwDy-h5U2oXDZsGXGMrFy3qX87EbcHXEM3BurzyBsoAR1PQ3gX4Ep3luqiht-9dtHh6fBs9x9PX8WQ0nMY2HNHEhIXcwKVWea4zY4AlNKMkMwPJcq5lQnORcSME1cs8U5kADZxrYrjhWKgs6aL7ne56k5WwNCGD10W69rbU_juttE3_I85-pKtqm0pKZKggcLsX8NXXBuomLW1toCi0g2pTp3TAmBQKMxWoN8deB5O_HwbC1Y5gAeAAKzIQnPPkB-AfhK8
CODEN	IEEPAD
ContentType	Conference Proceeding Journal Article
DBID	6IE 6IH CBEJK RIE RIO NPM 7X8 5PM
DOI	10.1109/CVPR42600.2020.00027
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present PubMed MEDLINE - Academic PubMed Central (Full Participant titles)
DatabaseTitle	PubMed MEDLINE - Academic
DatabaseTitleList	MEDLINE - Academic PubMed
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences Computer Science
EISBN	9781728171685 1728171687
EISSN	1063-6919
EndPage	199
ExternalDocumentID	PMC8218821 34163124 9157666
Genre	orig-research Journal Article
GrantInformation_xml	– fundername: NEI NIH HHS grantid: R01 EY030669
GroupedDBID	6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO 23M 29F 29O 6IK ABDPE ACGFS IPLJI M43 NPM RIG RNS 7X8 5PM
ID	FETCH-LOGICAL-i426t-14781e68a9ffabcce432b21bc584f6a832f7b6c772adfb9b7eae66a1c6c6079b3
IEDL.DBID	RIE
ISSN	1063-6919
IngestDate	Thu Aug 21 18:33:07 EDT 2025 Mon Jul 21 09:55:37 EDT 2025 Wed Feb 19 02:06:27 EST 2025 Wed Aug 27 02:30:35 EDT 2025
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i426t-14781e68a9ffabcce432b21bc584f6a832f7b6c772adfb9b7eae66a1c6c6079b3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
OpenAccessLink	https://www.ncbi.nlm.nih.gov/pmc/articles/8218821
PMID	34163124
PQID	2544879049
PQPubID	23479
PageCount	10
ParticipantIDs	proquest_miscellaneous_2544879049 ieee_primary_9157666 pubmedcentral_primary_oai_pubmedcentral_nih_gov_8218821 pubmed_primary_34163124
PublicationCentury	2000
PublicationDate	20200601
PublicationDateYYYYMMDD	2020-06-01
PublicationDate_xml	– month: 6 year: 2020 text: 20200601 day: 1
PublicationDecade	2020
PublicationPlace	United States
PublicationPlace_xml	– name: United States
PublicationTitle	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev	CVPR
PublicationTitleAlternate	Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit
PublicationYear	2020
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0003211698 ssj0023720
Score	2.5496342
Snippet	Human gaze behavior prediction is important for behavioral vision and for computer vision applications. Most models mainly focus on predicting free-viewing...
SourceID	pubmedcentral proquest pubmed ieee
SourceType	Open Access Repository Aggregation Database Index Database Publisher
StartPage	190
SubjectTerms	Computational modeling Context modeling Learning (artificial intelligence) Predictive models Search problems Task analysis Visualization
Title	Predicting Goal-Directed Human Attention Using Inverse Reinforcement Learning
URI	https://ieeexplore.ieee.org/document/9157666 https://www.ncbi.nlm.nih.gov/pubmed/34163124 https://www.proquest.com/docview/2544879049 https://pubmed.ncbi.nlm.nih.gov/PMC8218821
Volume	2020
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFH5sO3nyx6bOX0TwaGfTdUl7lKEOYTKGE28lSRMdSidbe_Gv9yXtqg4PHgqFppDm5TXfe_neF4ALaqgYGG48w9GbQoY-Jwah8XDtlqlvjOCBLU4eP7DRLLx_Hjw34LKuhdFaO_KZ7tlbt5efLlRhU2VXMUV0zFgTmhi4lbVadT6lj5EMi6OqOo768dXwaTJ1-usYBQaWwOUH6zNU_oKTm6zIH8vM7TaM1x0s2SVvvSKXPfW5od343y_Ygc53QR-Z1EvVLjR0tgfbFQIllX-v2jCeLO3GjaVCk7uFePfKPyK2ccl-cp3nJT2SOKoBsSody5UmU-0EWJXLNZJKs_WlA7Pbm8fhyKsOXPDmOFC5R23dqWaRiNFIUikd9gMZUKkQpRgm0PkNl0whIBepkbHkWmjGBFVMMZ_Hsr8PrWyR6UMggeIRhnZU-akKU3wmuRpwqlKtAqFjvwttOzrJR6mpkVQD04XztWESnOd280JkelGsEiulFvEYA5ouHJSGql_uW1SJQKUL_JcJ6wZWQ_v3k2z-6rS0I4Q4eB393Z1j2LJzp6SGnUArXxb6FEFILs_c7PsCcVLfaQ
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwEB2xHODEVqCsRuJISpwmdnJECChLq6oqiFtkOzYgUIra9MLXM3bSsIgDh0iR4kiOx5N5Hr95BjimhorIcOMZjt4UMvQ5EYXGw9gtM98YwQNbnNztsc59ePMYPc7BSV0Lo7V25DPdsrduLz8bqalNlZ0mFNExY_OwiHE_CspqrTqj0sa1DEviqj6O-snp-UN_4BTYcR0YWAqXH8xOUfkLUP7mRX4LNJcr0J11seSXvLamhWypj1_qjf_9hlVofJX0kX4drNZgTufrsFJhUFJ5-GQDuv2x3bqxZGhyNRJvXvlPxDYu3U_OiqIkSBJHNiBWp2M80WSgnQSrctlGUqm2PjXg_vJieN7xqiMXvBccqMKjtvJUs1gkaCaplA7bgQyoVIhTDBPo_oZLphCSi8zIRHItNGOCKqaYzxPZ3oSFfJTrbSCB4jEu7qjyMxVm-ExyFXGqMq0CoRO_CRt2dNL3UlUjrQamCUczw6Q40-32hcj1aDpJrZhazBNc0jRhqzRU_XLb4kqEKk3gP0xYN7Aq2j-f5C_PTk07RpCD187f3TmEpc6we5feXfdud2HZzqOSKLYHC8V4qvcRkhTywM3ETzwr4rM
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Predicting+Goal-Directed+Human+Attention+Using+Inverse+Reinforcement+Learning&rft.au=Yang%2C+Zhibo&rft.au=Huang%2C+Lihan&rft.au=Chen%2C+Yupei&rft.au=Wei%2C+Zijun&rft.date=2020-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=190&rft.epage=199&rft_id=info:doi/10.1109%2FCVPR42600.2020.00027&rft.externalDocID=9157666
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-6919&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-6919&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-6919&client=summon