Scalar reward is not enough: a response to Silver, Singh, Precup and Sutton (2021)

The recent paper “Reward is Enough” by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial, and provides a suitable basis for the creation of artificial general intelligence. We contest the underlying...

Full description

Saved in:

Bibliographic Details
Published in	Autonomous agents and multi-agent systems Vol. 36; no. 2
Main Authors	Vamplew, Peter, Smith, Benjamin J., Källström, Johan, Ramos, Gabriel, Rădulescu, Roxana, Roijers, Diederik M., Hayes, Conor F., Heintz, Fredrik, Mannion, Patrick, Libin, Pieter J. K., Dazeley, Richard, Foale, Cameron
Format	Journal Article
Language	English
Published	New York Springer US 01.10.2022 Springer Nature B.V
Subjects	Artificial general intelligence Artificial Intelligence Biological computing Computer Science Computer Systems Organization and Communication Networks Maximization Multi-objective decision making Multi-Objective Decision Making (MODeM) Multi-objective reinforcement learning Optimization Reinforcement learning Safe and Ethical AI Scalar rewards Software Engineering/Programming and Operating Systems User Interfaces and Human Computer Interaction Vector rewards Multi-objective reinforcement learning Artificial general intelligence Reinforcement learning Scalar rewards Safe and ethical AI Multi-objective decision making Vector rewards
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The recent paper “Reward is Enough” by Silver, Singh, Precup and Sutton posits that the concept of reward maximisation is sufficient to underpin all intelligence, both natural and artificial, and provides a suitable basis for the creation of artificial general intelligence. We contest the underlying assumption of Silver et al. that such reward can be scalar-valued. In this paper we explain why scalar rewards are insufficient to account for some aspects of both biological and computational intelligence, and argue in favour of explicitly multi-objective models of reward maximisation. Furthermore, we contend that even if scalar reward functions can trigger intelligent behaviour in specific cases, this type of reward is insufficient for the development of human-aligned artificial general intelligence due to unacceptable risks of unsafe or unethical behaviour.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1387-2532 1573-7454 1573-7454
DOI:	10.1007/s10458-022-09575-5