Gorthaur-EXP3: Bandit-based selection from a portfolio of recommendation algorithms balancing the accuracy-diversity dilemma

•Recommendation systems must be accurate but also diversify their recommendations.•No single bandit-based recommendation algorithm fits all datasets.•A portfolio approach dynamically selects bandit algorithms used for recommending.•The optimal recommendation algorithm is the one which maximises accu...

Full description

Saved in:

Bibliographic Details
Published in	Information sciences Vol. 546; pp. 378 - 396
Main Authors	Gutowski, Nicolas, Amghar, Tassadit, Camp, Olivier, Chhel, Fabien
Format	Journal Article
Language	English
Published	Elsevier Inc 06.02.2021 Elsevier
Subjects	Application of reinforcement learning Artificial Intelligence Computer Science Contextual Multi-Armed Bandit Multi-Armed Bandit Portfolio approach Recommendation systems 68T01 Contextual Multi-Armed Bandit 68T20 Application of reinforcement learning 68T05 Portfolio approach Recommendation systems 90C29 Multi-Armed Bandit Application of Reinforcement Learning Recommendation Systems Portfolio Approach 2020 MSC: 68T01
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•Recommendation systems must be accurate but also diversify their recommendations.•No single bandit-based recommendation algorithm fits all datasets.•A portfolio approach dynamically selects bandit algorithms used for recommending.•The optimal recommendation algorithm is the one which maximises accuracy and diversity.•Dynamic algorithm selection ensures robustness and efficiency in non stationary conditions. Nowadays, real-world pervasive computing applications increasingly face multi-objective problems. This is the case for recommendation systems where, from a user’s view point, recommended items must be both accurate and diverse. In recent years, model-based recommendation systems like those relying on Multi-Armed Bandit algorithms have been extensively studied. They are known to ensure theoretical guarantees of global accuracy. Nevertheless, despite these guarantees, the existing algorithms obtain different results depending on the application or on the dataset they operate on. Hence, when one needs to integrate such solutions, they should first be thoroughly evaluated to ensure the chosen method is efficient for the dynamic and potentially non-stationary nature of the target environments. However, human-based evaluations cost in time and money. Here, we propose a novel algorithm portfolio approach, Gorthaur-EXP3 aiming at automatically selecting the optimal algorithms which best maximise global accuracy and diversity of recommendations according to a predefined trade-off. Our method uses the EXP3 bandit algorithm which ensures a continuous exploration and a systematic exploitation of the best algorithm to apply in each situation it encounters. Gorthaur-EXP3 is an extension of the original Gorthaur method, which uses a roulette wheel selection, and obtains better results in most experimental cases.
ISSN:	0020-0255 1872-6291
DOI:	10.1016/j.ins.2020.08.106