Distribution Learning with Valid Outputs Beyond the Worst-Case

Generative models at times produce "invalid" outputs, such as images with generation artifacts and unnatural sounds. Validity-constrained distribution learning attempts to address this problem by requiring that the learned distribution have a provably small fraction of its mass in invalid...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Rittler, Nick, Chaudhuri, Kamalika
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 21.10.2024
Subjects	Algorithms Machine learning Polynomials Queries Validity
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Generative models at times produce "invalid" outputs, such as images with generation artifacts and unnatural sounds. Validity-constrained distribution learning attempts to address this problem by requiring that the learned distribution have a provably small fraction of its mass in invalid parts of space -- something which standard loss minimization does not always ensure. To this end, a learner in this model can guide the learning via "validity queries", which allow it to ascertain the validity of individual examples. Prior work on this problem takes a worst-case stance, showing that proper learning requires an exponential number of validity queries, and demonstrating an improper algorithm which -- while generating guarantees in a wide-range of settings -- makes an atypical polynomial number of validity queries. In this work, we take a first step towards characterizing regimes where guaranteeing validity is easier than in the worst-case. We show that when the data distribution lies in the model class and the log-loss is minimized, the number of samples required to ensure validity has a weak dependence on the validity requirement. Additionally, we show that when the validity region belongs to a VC-class, a limited number of validity queries are often sufficient.
ISSN:	2331-8422