TRAINING LARGE-SCALE VISION TRANSFORMER NEURAL NETWORKS WITH VARIABLE PATCH SIZES

This specification relates to training neural networks. This specification describes a system implemented as computer programs on one or more computers in one or more locations that trains a Vision Transformer neural network (ViT). A ViT is a neural network that processes an input that includes an i...

Full description

Saved in:
Bibliographic Details
Main Authors BEYER, Lucas Klaus, ALABDULMOHSIN, Ibrahim, IZMAILOV, Pavel, KOLESNIKOV, Alexander, KORNBLITH, Simon, MINDERER, Matthias Johannes Lorenz, CARON, Mathilde, TSCHANNEN, Michael Tobias, ZHAI, Xiaohua, PAVETIC, Filip
Format Patent
LanguageEnglish
French
German
Published 29.05.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This specification relates to training neural networks. This specification describes a system implemented as computer programs on one or more computers in one or more locations that trains a Vision Transformer neural network (ViT). A ViT is a neural network that processes an input that includes an image, i.e., that processes the intensity values of the pixels of the image, to generate an output for the image, e.g., a classification or a regression output, and that includes one or more self-attention layers and one or more output layers.
Bibliography:Application Number: EP20230211676