Greedy algorithms for stochastic monotone k-submodular maximization under full-bandit feedback

In this paper, we theoretically study the Combinatorial Multi-Armed Bandit problem with stochastic monotone k -submodular reward function under full-bandit feedback. In this setting, the decision-maker is allowed to select a super arm composed of multiple base arms in each round and then receives it...

Full description

Saved in:
Bibliographic Details
Published inJournal of combinatorial optimization Vol. 49; no. 1
Main Authors Sun, Xin, Guo, Tiande, Han, Congying, Zhang, Hongyang
Format Journal Article
LanguageEnglish
Published New York Springer US 01.01.2025
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper, we theoretically study the Combinatorial Multi-Armed Bandit problem with stochastic monotone k -submodular reward function under full-bandit feedback. In this setting, the decision-maker is allowed to select a super arm composed of multiple base arms in each round and then receives its k -submodular reward. The k -submodularity enriches the application scenarios of the problem we consider in contexts characterized by diverse options. We present two simple greedy algorithms for two budget constraints (total size and individual size) and provide the theoretical analysis for upper bound of the regret value. For the total size budget, the proposed algorithm achieves a 1 2 -regret upper bound by O ~ T 2 3 ( k n ) 1 3 B where T is the time horizon, n is the number of base arms and B denotes the budget. For the individual size budget, the proposed algorithm achieves a 1 3 -regret with the same upper bound. Moreover, we conduct numerical experiments on these two algorithms to empirically demonstrate the effectiveness.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1382-6905
1573-2886
DOI:10.1007/s10878-024-01240-9