Monte Carlo Rollout Policy for Recommendation Systems with Dynamic User Behavior

We model online recommendation systems using the hidden Markov multi-state restless multi-armed bandit problem. To solve this we present Monte Carlo rollout policy. We illustrate numerically that Monte Carlo rollout policy performs better than myopic policy for arbitrary transition dynamics with no...

Full description

Saved in:
Bibliographic Details
Published in2021 International Conference on COMmunication Systems & NETworkS (COMSNETS) pp. 86 - 89
Main Authors Meshram, Rahul, Kaza, Kesav
Format Conference Proceeding
LanguageEnglish
Published IEEE 05.01.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We model online recommendation systems using the hidden Markov multi-state restless multi-armed bandit problem. To solve this we present Monte Carlo rollout policy. We illustrate numerically that Monte Carlo rollout policy performs better than myopic policy for arbitrary transition dynamics with no specific structure. But, when some structure is imposed on the transition dynamics, myopic policy performs better than Monte Carlo rollout policy.
ISSN:2155-2509
DOI:10.1109/COMSNETS51098.2021.9352741