Generative modelling for supervised, unsupervised and private learning

In this thesis we develop several state-of-the-art generative modelling-based approaches for a variety of supervised, unsupervised and private learning problems. In the (almost) supervised domain, we tackle the problems of treatment effect estimation, imputation and feature selection. For treatment...

Full description

Saved in:
Bibliographic Details
Main Author Jordon, James Adam
Format Dissertation
LanguageEnglish
Published University of Oxford 2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this thesis we develop several state-of-the-art generative modelling-based approaches for a variety of supervised, unsupervised and private learning problems. In the (almost) supervised domain, we tackle the problems of treatment effect estimation, imputation and feature selection. For treatment effect estimation we begin by developing a GAN-based approach that generates the ``missing'' counterfactuals, which enables learning a fully supervised model. In SCIGAN, we then go on to adapt this method to the continuous-intervention setting, introducing novel generator and discriminator architectures to handle the continuous nature of the treatments. For imputation, we introduce GAIN, a GAN-based imputation approach that maximally leverages both the adversarial and generative nature of GANs to learn to impute missing data according to the ground truth distribution while simultaneously allowing for multiple imputation to improve reliability and robustness of the imputed data. Finally, for feature selection, we introduce KnockoffGAN, a GAN-based knockoff generation procedure which again leverages the adversarial nature of GANs to learn the appropriate ``knockoff'' distribution. In using a GAN, we are able to learn to generate knockoff non-parametrically and thus reliable performance on non-Gaussian data. In the domain of private learning we address the problem of private synthetic data generation. This challenge has three key components: (1) generating faithful synthetic data; (2) ensuring privacy; and (3) measuring the quality of the generated data. In PATEGAN, we tackle all 3 of these problems by proposing a GAN-based approach to generating differentially private synthetic data. In addition, we introduce a new synthetic data metric, the Synthetic Ranking Agreement, which measures how well machine learning research on the synthetic data represents the same research on the real data. In the final paper presented, we present a new differential privacy building block - a differentially private classification/prediction algorithm, that builds upon the subsample-and-aggregate paradigm. The new method affords better privacy and utility than subsample-and-aggregate and can be used in place of PATE in the PATEGAN framework.
Bibliography:0000000507380162
Engineering and Physical Sciences Research Council ; Office of Naval Research