Unfolding Once is Enough: A Deployment-Friendly Transformer Unit for Super-Resolution
Recent years have witnessed a few attempts of vision transformers for single image super-resolution (SISR). Since the high resolution of intermediate features in SISR models increases memory and computational requirements, efficient SISR transformers are more favored. Based on some popular transform...
Saved in:
Main Authors | , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
05.08.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Recent years have witnessed a few attempts of vision transformers for single
image super-resolution (SISR). Since the high resolution of intermediate
features in SISR models increases memory and computational requirements,
efficient SISR transformers are more favored. Based on some popular transformer
backbone, many methods have explored reasonable schemes to reduce the
computational complexity of the self-attention module while achieving
impressive performance. However, these methods only focus on the performance on
the training platform (e.g., Pytorch/Tensorflow) without further optimization
for the deployment platform (e.g., TensorRT). Therefore, they inevitably
contain some redundant operators, posing challenges for subsequent deployment
in real-world applications. In this paper, we propose a deployment-friendly
transformer unit, namely UFONE (i.e., UnFolding ONce is Enough), to alleviate
these problems. In each UFONE, we introduce an Inner-patch Transformer Layer
(ITL) to efficiently reconstruct the local structural information from patches
and a Spatial-Aware Layer (SAL) to exploit the long-range dependencies between
patches. Based on UFONE, we propose a Deployment-friendly Inner-patch
Transformer Network (DITN) for the SISR task, which can achieve favorable
performance with low latency and memory usage on both training and deployment
platforms. Furthermore, to further boost the deployment efficiency of the
proposed DITN on TensorRT, we also provide an efficient substitution for layer
normalization and propose a fusion optimization strategy for specific
operators. Extensive experiments show that our models can achieve competitive
results in terms of qualitative and quantitative performance with high
deployment efficiency. Code is available at
\url{https://github.com/yongliuy/DITN}. |
---|---|
DOI: | 10.48550/arxiv.2308.02794 |