Safety without alignment

Currently, the dominant paradigm in AI safety is alignment with human values. Here we describe progress on developing an alternative approach to safety, based on ethical rationalism (Gewirth:1978), and propose an inherently safe implementation path via hybrid theorem provers in a sandbox. As AGIs ev...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Kornai, András, Bukatin, Michael, Zombori, Zsolt
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 18.03.2023
Subjects	Alignment Ethics Safety Theorem proving
Online Access	Get full text

Cover

Loading…

Be the first to leave a comment!