PUMA: Performance Unchanged Model Augmentation for Training Data Removal
Preserving the performance of a trained model while removing unique characteristics of marked training data points is challenging. Recent research usually suggests retraining a model from scratch with remaining training data or refining the model by reverting the model optimization on the marked dat...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
01.03.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Preserving the performance of a trained model while removing unique
characteristics of marked training data points is challenging. Recent research
usually suggests retraining a model from scratch with remaining training data
or refining the model by reverting the model optimization on the marked data
points. Unfortunately, aside from their computational inefficiency, those
approaches inevitably hurt the resulting model's generalization ability since
they remove not only unique characteristics but also discard shared (and
possibly contributive) information. To address the performance degradation
problem, this paper presents a novel approach called Performance Unchanged
Model Augmentation~(PUMA). The proposed PUMA framework explicitly models the
influence of each training data point on the model's generalization ability
with respect to various performance criteria. It then complements the negative
impact of removing marked data by reweighting the remaining data optimally. To
demonstrate the effectiveness of the PUMA framework, we compared it with
multiple state-of-the-art data removal techniques in the experiments, where we
show the PUMA can effectively and efficiently remove the unique characteristics
of marked training data without retraining the model that can 1) fool a
membership attack, and 2) resist performance degradation. In addition, as PUMA
estimates the data importance during its operation, we show it could serve to
debug mislabelled data points more efficiently than existing approaches. |
---|---|
DOI: | 10.48550/arxiv.2203.00846 |