CTNet: A Convolutional Transformer Network for Color Image Steganalysis

Compared with convolutional neural network (CNN), Transformer can obtain global receptive field features more effectively and has recently achieved great success in natural language processing and computer vision. Due to the particularity of steganography, however, almost all existing steganalytic n...

Full description

Saved in:
Bibliographic Details
Published inJournal of computer science and technology Vol. 40; no. 2; pp. 413 - 427
Main Authors Wei, Kang-Kang, Luo, Wei-Qi, Tan, Shun-Quan, Huang, Ji-Wu
Format Journal Article
LanguageEnglish
Published Singapore Springer Nature Singapore 01.03.2025
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Compared with convolutional neural network (CNN), Transformer can obtain global receptive field features more effectively and has recently achieved great success in natural language processing and computer vision. Due to the particularity of steganography, however, almost all existing steganalytic networks just employ CNN with local receptive fields to detect embedding artifacts. In this paper, we propose a novel convolutional Transformer network for color image steganalysis. Specifically, we firstly obtain various image residuals for each color channel of an input image in the pre-processing module. To capture more comprehensive steganalytic features, the truncated residuals after channel concatenation will pass through a feature extraction module composed of a CNN group and a Transformer group. The CNN group aims to extract local receptive fields features, while the Transformer group with multi-head self-attention as the key tries to extract global steganalytic features. Finally, we employ a global covariance pooling (GCP) and two fully-connected (FC) layers with dropout for classification. Extensive comparative experiments demonstrate that the proposed method can significantly improve the detection performances in color image steganalysis and achieve state-of-the-art results. Although the proposed method is originally designed for color images, it can also obtain competitive results for grayscale images compared with the current best detector. In addition, we provide numerous ablation studies to verify the rationality of the proposed network architecture.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1000-9000
1860-4749
DOI:10.1007/s11390-023-3006-3