Dynamic clustering based contextual combinatorial multi-armed bandit for online recommendation

Recommender systems still face a trade-off between exploring new items to maximize user satisfaction and exploiting those already interacted with to match user interests. This problem is widely recognized as the exploration/exploitation (EE) dilemma, and the multi-armed bandit (MAB) algorithm has pr...

Full description

Saved in:

Bibliographic Details
Published in	Knowledge-based systems Vol. 257; p. 109927
Main Authors	Yan, Cairong, Han, Haixia, Zhang, Yanting, Zhu, Dandan, Wan, Yongquan
Format	Journal Article
Language	English
Published	Elsevier B.V 05.12.2022
Subjects	Contextual multi-armed bandit Dynamic clustering Implicit feedback Online recommendation Online recommendation Dynamic clustering Implicit feedback Contextual multi-armed bandit
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recommender systems still face a trade-off between exploring new items to maximize user satisfaction and exploiting those already interacted with to match user interests. This problem is widely recognized as the exploration/exploitation (EE) dilemma, and the multi-armed bandit (MAB) algorithm has proven to be an effective solution. As the scale of users and items in real-world application scenarios increases, their purchase interactions become sparser. Then three issues need to be investigated when building MAB-based recommender systems. First, large-scale users and sparse interactions increase the difficulty of user preference mining. Second, traditional bandits model items as arms and cannot deal with ever-growing items effectively. Third, widely used Bernoulli-based reward mechanisms only feedback 0 or 1, ignoring rich implicit feedback such as behaviors like click and add-to-cart. To address these problems, we propose an algorithm named Dynamic Clustering based Contextual Combinatorial Multi-Armed Bandits (DC3MAB), which consists of three configurable key components. Specifically, a dynamic user clustering strategy enables different users in the same cluster to cooperate in estimating the expected rewards of arms. A dynamic item partitioning approach based on collaborative filtering significantly reduces the scale of arms and produces a recommendation list instead of one item to provide diversity. In addition, a multi-class reward mechanism based on fine-grained implicit feedback helps better capture user preferences. Extensive empirical experiments on three real-world datasets demonstrate the superiority of our proposed DC3MAB over state-of-the-art bandits (On average, +75.8% in F1 and +54.3% in cumulative reward). The source code is available at https://github.com/HaixHan/DC3MAB.
ISSN:	0950-7051 1872-7409
DOI:	10.1016/j.knosys.2022.109927