Preventing undesirable behavior of intelligent machines

Intelligent machines using machine learning algorithms are ubiquitous, ranging from simple data analysis and pattern recognition tools to complex systems that achieve superhuman performance on various tasks. Ensuring that they do not exhibit undesirable behavior-that they do not, for example, cause...

Full description

Saved in:

Bibliographic Details
Published in	Science (American Association for the Advancement of Science) Vol. 366; no. 6468; pp. 999 - 1004
Main Authors	Thomas, Philip S, Castro da Silva, Bruno, Barto, Andrew G, Giguere, Stephen, Brun, Yuriy, Brunskill, Emma
Format	Journal Article
Language	English
Published	United States The American Association for the Advancement of Science 22.11.2019
Subjects	Algorithms Artificial intelligence Complex systems Data analysis Diabetes mellitus Learning algorithms Machine learning Mathematics Pattern analysis Pattern recognition Quality of life Task complexity Viability
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Intelligent machines using machine learning algorithms are ubiquitous, ranging from simple data analysis and pattern recognition tools to complex systems that achieve superhuman performance on various tasks. Ensuring that they do not exhibit undesirable behavior-that they do not, for example, cause harm to humans-is therefore a pressing problem. We propose a general and flexible framework for designing machine learning algorithms. This framework simplifies the problem of specifying and regulating undesirable behavior. To show the viability of this framework, we used it to create machine learning algorithms that precluded the dangerous behavior caused by standard machine learning algorithms in our experiments. Our framework for designing machine learning algorithms simplifies the safe and responsible application of machine learning.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0036-8075 1095-9203
DOI:	10.1126/science.aag3311