Greedy algorithms for stochastic monotone k-submodular maximization under full-bandit feedback
In this paper, we theoretically study the Combinatorial Multi-Armed Bandit problem with stochastic monotone k -submodular reward function under full-bandit feedback. In this setting, the decision-maker is allowed to select a super arm composed of multiple base arms in each round and then receives it...
Saved in:
Published in | Journal of combinatorial optimization Vol. 49; no. 1 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
New York
Springer US
01.01.2025
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In this paper, we theoretically study the Combinatorial Multi-Armed Bandit problem with stochastic monotone
k
-submodular reward function under full-bandit feedback. In this setting, the decision-maker is allowed to select a super arm composed of multiple base arms in each round and then receives its
k
-submodular reward. The
k
-submodularity enriches the application scenarios of the problem we consider in contexts characterized by diverse options. We present two simple greedy algorithms for two budget constraints (total size and individual size) and provide the theoretical analysis for upper bound of the regret value. For the total size budget, the proposed algorithm achieves a
1
2
-regret upper bound by
O
~
T
2
3
(
k
n
)
1
3
B
where
T
is the time horizon,
n
is the number of base arms and
B
denotes the budget. For the individual size budget, the proposed algorithm achieves a
1
3
-regret with the same upper bound. Moreover, we conduct numerical experiments on these two algorithms to empirically demonstrate the effectiveness. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1382-6905 1573-2886 |
DOI: | 10.1007/s10878-024-01240-9 |