Adaptive Sampling for Discovery

In this paper, we study a sequential decision-making problem, called Adaptive Sampling for Discovery (ASD). Starting with a large unlabeled dataset, algorithms for ASD adaptively label the points with the goal to maximize the sum of responses. This problem has wide applications to real-world discove...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Xu, Ziping, Shim, Eunjae, Tewari, Ambuj, Zimmerman, Paul
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 02.01.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper, we study a sequential decision-making problem, called Adaptive Sampling for Discovery (ASD). Starting with a large unlabeled dataset, algorithms for ASD adaptively label the points with the goal to maximize the sum of responses. This problem has wide applications to real-world discovery problems, for example drug discovery with the help of machine learning models. ASD algorithms face the well-known exploration-exploitation dilemma. The algorithm needs to choose points that yield information to improve model estimates but it also needs to exploit the model. We rigorously formulate the problem and propose a general information-directed sampling (IDS) algorithm. We provide theoretical guarantees for the performance of IDS in linear, graph and low-rank models. The benefits of IDS are shown in both simulation experiments and real-data experiments for discovering chemical reaction conditions.
ISSN:2331-8422