Visual self-attention accelerator optimization method based on FPGA
The invention discloses a visual self-attention accelerator optimization method based on an FPGA, and the method comprises the following steps: carrying out the dynamic token pruning of a visual self-attention model through a dynamic token pruning scheme, removing redundant information, and reducing...
Saved in:
Main Authors | , , |
---|---|
Format | Patent |
Language | Chinese English |
Published |
27.02.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The invention discloses a visual self-attention accelerator optimization method based on an FPGA, and the method comprises the following steps: carrying out the dynamic token pruning of a visual self-attention model through a dynamic token pruning scheme, removing redundant information, and reducing the calculation amount of the visual self-attention model; through a design mode of a single visual self-attention calculation layer on an FPGA, a calculation process is segmented by using a matrix slicing mode, an optimal calculation resource allocation strategy is solved based on a genetic algorithm, and maximum load balancing is realized; the calculation amount is reduced, the model operation time is shortened, and the operation efficiency of the accelerator is improved.
本发明公开了一种基于FPGA的视觉自注意力加速器优化方法,包括以下步骤:通过动态令牌剪枝方案对视觉自注意力模型进行动态令牌剪枝,去除冗杂信息,减少视觉自注意力模型的计算量;通过在FPGA上的单个视觉自注意力计算层的设计方式、使用矩阵切块的方式对计算过程进行分割,基于遗传算法求解最优的计算资源分配策略,实现最大化负载均衡;本发明减少计算量,降低模型运行时间,提高加速器的运行效率。 |
---|---|
Bibliography: | Application Number: CN202311355863 |