TRAINING LARGE-SCALE VISION TRANSFORMER NEURAL NETWORKS WITH VARIABLE PATCH SIZES

This specification relates to training neural networks. This specification describes a system implemented as computer programs on one or more computers in one or more locations that trains a Vision Transformer neural network (ViT). A ViT is a neural network that processes an input that includes an i...

Full description

Saved in:

Bibliographic Details
Main Authors	BEYER, Lucas Klaus, ALABDULMOHSIN, Ibrahim, IZMAILOV, Pavel, KOLESNIKOV, Alexander, KORNBLITH, Simon, MINDERER, Matthias Johannes Lorenz, CARON, Mathilde, TSCHANNEN, Michael Tobias, ZHAI, Xiaohua, PAVETIC, Filip
Format	Patent
Language	English French German
Published	29.05.2024
Subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This specification relates to training neural networks. This specification describes a system implemented as computer programs on one or more computers in one or more locations that trains a Vision Transformer neural network (ViT). A ViT is a neural network that processes an input that includes an image, i.e., that processes the intensity values of the pixels of the image, to generate an output for the image, e.g., a classification or a regression output, and that includes one or more self-attention layers and one or more output layers.
Bibliography:	Application Number: EP20230211676