Towards Memory-Efficient Training for Extremely Large Output Spaces – Learning with 670k Labels on a Single Commodity GPU

In classification problems with large output spaces (up to millions of labels), the last layer can require an enormous amount of memory. Using sparse connectivity would drastically reduce the memory requirements, but as we show below, applied naïvely it can result in much diminished predictive perfo...

Full description

Saved in:

Bibliographic Details
Published in	Machine Learning and Knowledge Discovery in Databases: Research Track pp. 689 - 704
Main Authors	Schultheis, Erik, Babbar, Rohit
Format	Book Chapter
Language	English
Published	Cham Springer Nature Switzerland 2023
Series	Lecture Notes in Computer Science
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In classification problems with large output spaces (up to millions of labels), the last layer can require an enormous amount of memory. Using sparse connectivity would drastically reduce the memory requirements, but as we show below, applied naïvely it can result in much diminished predictive performance. Fortunately, we found that this can be mitigated by introducing an intermediate layer of intermediate size. We further demonstrate that one can constrain the connectivity of the sparse layer to be of constant fan-in, in the sense that each output neuron will have the exact same number of incoming connections, which allows for more efficient implementations, especially on GPU hardware. The CUDA implementation of our approach is provided at https://github.com/xmc-aalto/ecml23-sparse.
Bibliography:	The original version of this chapter was previously published without open access. A correction to this chapter is available at https://doi.org/10.1007/978-3-031-43418-1_42
ISBN:	9783031434174 303143417X
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-031-43418-1_41