Scalarized Lower Upper Confidence Bound Algorithm

Multi-objective evolutionary optimisation algorithms and stochastic multi-armed bandits techniques are combined in designing stochastic multi-objective multi-armed bandits (MOMAB) with an efficient exploration and exploitation trade-off. Lower upper confidence bound (LUCB) focuses on sampling the ar...

Full description

Saved in:
Bibliographic Details
Published inLearning and Intelligent Optimization Vol. 8994; pp. 229 - 235
Main Author Drugan, Mădălina M.
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2015
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN9783319190839
3319190830
ISSN0302-9743
1611-3349
DOI10.1007/978-3-319-19084-6_21

Cover

Loading…
More Information
Summary:Multi-objective evolutionary optimisation algorithms and stochastic multi-armed bandits techniques are combined in designing stochastic multi-objective multi-armed bandits (MOMAB) with an efficient exploration and exploitation trade-off. Lower upper confidence bound (LUCB) focuses on sampling the arms that are most probable to be misclassified (i.e., optimal or suboptimal arms) in order to identify the set of best arms aka the Pareto front. Our scalarized multi-objective LUCB (sMO-LUCB) is an adaptation of LUCB to reward vectors. Preliminary empirical results show good performance of the proposed algorithm on a bi-objective environment.
ISBN:9783319190839
3319190830
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-319-19084-6_21