Scalarized Lower Upper Confidence Bound Algorithm
Multi-objective evolutionary optimisation algorithms and stochastic multi-armed bandits techniques are combined in designing stochastic multi-objective multi-armed bandits (MOMAB) with an efficient exploration and exploitation trade-off. Lower upper confidence bound (LUCB) focuses on sampling the ar...
Saved in:
Published in | Learning and Intelligent Optimization Vol. 8994; pp. 229 - 235 |
---|---|
Main Author | |
Format | Book Chapter |
Language | English |
Published |
Switzerland
Springer International Publishing AG
2015
Springer International Publishing |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
ISBN | 9783319190839 3319190830 |
ISSN | 0302-9743 1611-3349 |
DOI | 10.1007/978-3-319-19084-6_21 |
Cover
Loading…
Summary: | Multi-objective evolutionary optimisation algorithms and stochastic multi-armed bandits techniques are combined in designing stochastic multi-objective multi-armed bandits (MOMAB) with an efficient exploration and exploitation trade-off. Lower upper confidence bound (LUCB) focuses on sampling the arms that are most probable to be misclassified (i.e., optimal or suboptimal arms) in order to identify the set of best arms aka the Pareto front. Our scalarized multi-objective LUCB (sMO-LUCB) is an adaptation of LUCB to reward vectors. Preliminary empirical results show good performance of the proposed algorithm on a bi-objective environment. |
---|---|
ISBN: | 9783319190839 3319190830 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-319-19084-6_21 |