Scalarized Lower Upper Confidence Bound Algorithm

Multi-objective evolutionary optimisation algorithms and stochastic multi-armed bandits techniques are combined in designing stochastic multi-objective multi-armed bandits (MOMAB) with an efficient exploration and exploitation trade-off. Lower upper confidence bound (LUCB) focuses on sampling the ar...

Full description

Saved in:

Bibliographic Details
Published in	Learning and Intelligent Optimization Vol. 8994; pp. 229 - 235
Main Author	Drugan, Mădălina M.
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2015 Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Algorithms & data structures Artificial intelligence Computer programming / software development Evolutionary Multi-objective Optimization Algorithms Multi-armed Bandit Pareto Front Reward Vector Suboptimal Arm
Online Access	Get full text
ISBN	9783319190839 3319190830
ISSN	0302-9743 1611-3349
DOI	10.1007/978-3-319-19084-6_21

Cover

Loading…

More Information
Summary:	Multi-objective evolutionary optimisation algorithms and stochastic multi-armed bandits techniques are combined in designing stochastic multi-objective multi-armed bandits (MOMAB) with an efficient exploration and exploitation trade-off. Lower upper confidence bound (LUCB) focuses on sampling the arms that are most probable to be misclassified (i.e., optimal or suboptimal arms) in order to identify the set of best arms aka the Pareto front. Our scalarized multi-objective LUCB (sMO-LUCB) is an adaptation of LUCB to reward vectors. Preliminary empirical results show good performance of the proposed algorithm on a bi-objective environment.
ISBN:	9783319190839 3319190830
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-319-19084-6_21