ConfigILM: A general purpose configurable library for combining image and language models for visual question answering

ConfigILM is an open-source Python library for rapid iterative development of image-language models for visual question answering in PyTorch. It provides a convenient implementation for seamlessly combining image and language models from two popular PyTorch libraries that are timm and huggingface. T...

Full description

Saved in:
Bibliographic Details
Published inSoftwareX Vol. 26; p. 101731
Main Authors Hackel, Leonard, Clasen, Kai Norman, Demir, Begüm
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.05.2024
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:ConfigILM is an open-source Python library for rapid iterative development of image-language models for visual question answering in PyTorch. It provides a convenient implementation for seamlessly combining image and language models from two popular PyTorch libraries that are timm and huggingface. These libraries allow a variety of configurations of models without additional implementation effort. The monolithic interface provided by ConfigILM simplifies the exchange of components of a considered model and offers possibilities for developing new image-language models based on recombining the selected encoders. Additionally, the library provides pre-built and throughput-optimized PyTorch dataloaders. We also provide a guideline document that contains installation instructions, tutorial examples, and a complete discussion of the monolithic interface to the library. ConfigILM is released under the MIT License, encouraging its use in academic and commercial environments. The source code and documentation of ConfigILM are available at https://github.com/lhackel-tub/ConfigILM.
ISSN:2352-7110
2352-7110
DOI:10.1016/j.softx.2024.101731