Greedy algorithms for stochastic monotone k-submodular maximization under full-bandit feedback

In this paper, we theoretically study the Combinatorial Multi-Armed Bandit problem with stochastic monotone k -submodular reward function under full-bandit feedback. In this setting, the decision-maker is allowed to select a super arm composed of multiple base arms in each round and then receives it...

Full description

Saved in:

Bibliographic Details
Published in	Journal of combinatorial optimization Vol. 49; no. 1
Main Authors	Sun, Xin, Guo, Tiande, Han, Congying, Zhang, Hongyang
Format	Journal Article
Language	English
Published	New York Springer US 01.01.2025 Springer Nature B.V
Subjects	Algorithms Budgets Combinatorial analysis Combinatorics Convex and Discrete Geometry Decision theory Feedback Greedy algorithms Mathematical Modeling and Industrial Mathematics Mathematics Mathematics and Statistics Multi-armed bandit problems Operations Research/Decision Theory Optimization Theory of Computation Upper bounds Stochastic reward submodular Budget constraints Full-bandit feedback Combinatorial multi-armed bandit
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, we theoretically study the Combinatorial Multi-Armed Bandit problem with stochastic monotone k -submodular reward function under full-bandit feedback. In this setting, the decision-maker is allowed to select a super arm composed of multiple base arms in each round and then receives its k -submodular reward. The k -submodularity enriches the application scenarios of the problem we consider in contexts characterized by diverse options. We present two simple greedy algorithms for two budget constraints (total size and individual size) and provide the theoretical analysis for upper bound of the regret value. For the total size budget, the proposed algorithm achieves a 1 2 -regret upper bound by O ~ T 2 3 ( k n ) 1 3 B where T is the time horizon, n is the number of base arms and B denotes the budget. For the individual size budget, the proposed algorithm achieves a 1 3 -regret with the same upper bound. Moreover, we conduct numerical experiments on these two algorithms to empirically demonstrate the effectiveness.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1382-6905 1573-2886
DOI:	10.1007/s10878-024-01240-9