Brick Tic-Tac-Toe: Exploring the Generalizability of AlphaZero to Novel Test Environments
Traditional reinforcement learning (RL) environments typically are the same for both the training and testing phases. Hence, current RL methods are largely not generalizable to a test environment which is conceptually similar but different from what the method has been trained on, which we term the...
Saved in:
Main Authors | , |
---|---|
Format | Journal Article |
Language | English |
Published |
13.07.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Traditional reinforcement learning (RL) environments typically are the same
for both the training and testing phases. Hence, current RL methods are largely
not generalizable to a test environment which is conceptually similar but
different from what the method has been trained on, which we term the novel
test environment. As an effort to push RL research towards algorithms which can
generalize to novel test environments, we introduce the Brick Tic-Tac-Toe
(BTTT) test bed, where the brick position in the test environment is different
from that in the training environment. Using a round-robin tournament on the
BTTT environment, we show that traditional RL state-search approaches such as
Monte Carlo Tree Search (MCTS) and Minimax are more generalizable to novel test
environments than AlphaZero is. This is surprising because AlphaZero has been
shown to achieve superhuman performance in environments such as Go, Chess and
Shogi, which may lead one to think that it performs well in novel test
environments. Our results show that BTTT, though simple, is rich enough to
explore the generalizability of AlphaZero. We find that merely increasing MCTS
lookahead iterations was insufficient for AlphaZero to generalize to some novel
test environments. Rather, increasing the variety of training environments
helps to progressively improve generalizability across all possible starting
brick configurations. |
---|---|
DOI: | 10.48550/arxiv.2207.05991 |