HRVMamba: High-Resolution Visual State Space Model for Dense Prediction
Recently, State Space Models (SSMs) with efficient hardware-aware designs, i.e., Mamba, have demonstrated significant potential in computer vision tasks due to their linear computational complexity with respect to token length and their global receptive field. However, Mamba's performance on de...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
04.10.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Recently, State Space Models (SSMs) with efficient hardware-aware designs,
i.e., Mamba, have demonstrated significant potential in computer vision tasks
due to their linear computational complexity with respect to token length and
their global receptive field. However, Mamba's performance on dense prediction
tasks, including human pose estimation and semantic segmentation, has been
constrained by three key challenges: insufficient inductive bias, long-range
forgetting, and low-resolution output representation. To address these
challenges, we introduce the Dynamic Visual State Space (DVSS) block, which
utilizes multi-scale convolutional kernels to extract local features across
different scales and enhance inductive bias, and employs deformable convolution
to mitigate the long-range forgetting problem while enabling adaptive spatial
aggregation based on input and task-specific information. By leveraging the
multi-resolution parallel design proposed in HRNet, we introduce
High-Resolution Visual State Space Model (HRVMamba) based on the DVSS block,
which preserves high-resolution representations throughout the entire process
while promoting effective multi-scale feature learning. Extensive experiments
highlight HRVMamba's impressive performance on dense prediction tasks,
achieving competitive results against existing benchmark models without bells
and whistles. Code is available at https://github.com/zhanghao5201/HRVMamba. |
---|---|
DOI: | 10.48550/arxiv.2410.03174 |