Query-efficient model extraction for text classification model in a hard label setting
Designing a query-efficient model extraction strategy to steal models from cloud-based platforms with black-box constraints remains a challenge, especially for language models. In a more realistic setting, a lack of information about the target model’s internal parameters, gradients, training data,...
Saved in:
Published in | Journal of King Saud University. Computer and information sciences Vol. 35; no. 4; pp. 10 - 20 |
---|---|
Main Authors | , , , , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.04.2023
Springer |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Designing a query-efficient model extraction strategy to steal models from cloud-based platforms with black-box constraints remains a challenge, especially for language models. In a more realistic setting, a lack of information about the target model’s internal parameters, gradients, training data, or even confidence scores prevents attackers from easily copying the target model. Selecting informative and useful examples to train a substitute model is critical to query-efficient model stealing. We propose a novel model extraction framework that fine-tunes a pretrained model based on bidirectional encoder representations from transformers (BERT) while improving query efficiency by utilizing an active learning selection strategy. The active learning strategy, incorporating semantic-based diversity sampling and class-balanced uncertainty sampling, builds an informative subset from the public unannotated dataset as the input for fine-tuning. We apply our method to extract deep classifiers with identical and mismatched architectures as the substitute model under tight and moderate query budgets. Furthermore, we evaluate the transferability of adversarial examples constructed with the help of the models extracted by our method. The results show that our method achieves higher accuracy with fewer queries than existing baselines and the resulting models exhibit a high transferability success rate of adversarial examples. |
---|---|
ISSN: | 1319-1578 2213-1248 |
DOI: | 10.1016/j.jksuci.2023.02.019 |