Small-Text: Active Learning for Text Classification in Python
We introduce small-text, an easy-to-use active learning library, which offers pool-based active learning for single- and multi-label text classification in Python. It features numerous pre-implemented state-of-the-art query strategies, including some that leverage the GPU. Standardized interfaces al...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
21.07.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We introduce small-text, an easy-to-use active learning library, which offers
pool-based active learning for single- and multi-label text classification in
Python. It features numerous pre-implemented state-of-the-art query strategies,
including some that leverage the GPU. Standardized interfaces allow the
combination of a variety of classifiers, query strategies, and stopping
criteria, facilitating a quick mix and match, and enabling a rapid and
convenient development of both active learning experiments and applications.
With the objective of making various classifiers and query strategies
accessible for active learning, small-text integrates several well-known
machine learning libraries, namely scikit-learn, PyTorch, and Hugging Face
transformers. The latter integrations are optionally installable extensions, so
GPUs can be used but are not required. Using this new library, we investigate
the performance of the recently published SetFit training paradigm, which we
compare to vanilla transformer fine-tuning, finding that it matches the latter
in classification accuracy while outperforming it in area under the curve. The
library is available under the MIT License at
https://github.com/webis-de/small-text, in version 1.3.0 at the time of
writing. |
---|---|
DOI: | 10.48550/arxiv.2107.10314 |