ATDMNet: Multi-Head Agent Attention and Top-k Dynamic Mask for Camouflaged Object Detection

Camouflaged object detection (COD) encounters substantial difficulties owing to the visual resemblance between targets and their environments, together with discrepancies in multiscale representation of features. Current methodologies confront obstacles with feature distraction, modeling far-reachin...

Full description

Saved in:

Bibliographic Details
Published in	Sensors (Basel, Switzerland) Vol. 25; no. 10; p. 3001
Main Authors	Fu, Rui, Li, Yuehui, Chen, Chih-Cheng, Duan, Yile, Yao, Pengjian, Zhou, Kaixin
Format	Journal Article
Language	English
Published	Switzerland MDPI AG 09.05.2025 MDPI
Subjects	Accuracy Analysis camouflaged object detection Computational linguistics Computer vision digital images feature extraction Language processing Localization Natural language interfaces Neural networks self-attention network Semantics visualization China self-attention network camouflaged object detection visualization feature extraction digital images
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Camouflaged object detection (COD) encounters substantial difficulties owing to the visual resemblance between targets and their environments, together with discrepancies in multiscale representation of features. Current methodologies confront obstacles with feature distraction, modeling far-reaching dependencies, fusing multiple-scale details, and extracting boundary specifics. Consequently, we propose ATDMNet, an amalgamated architecture combining CNN and transformer within a numerous phases feature extraction framework. ATDMNet employs Res2Net as the foundational encoder and incorporates two essential components: multi-head agent attention (MHA) and top-k dynamic mask (TDM). MHA improves local feature sensitivity and long-range dependency modeling by incorporating agent nodes and positional biases, whereas TDM boosts attention with top-k operations and multiscale dynamic methods. The decoding phase utilizes bilinear upsampling and sophisticated semantic guidance to enhance low-level characteristics, hence ensuring precise segmentation. Enhanced performance is achieved by deep supervision and a hybrid loss function. Experiments applying COD datasets (NC4K, COD10K, CAMO) demonstrate that ATDMNet establishes a new benchmark in both precision and efficiency.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1424-8220 1424-8220
DOI:	10.3390/s25103001