CTNet: A Convolutional Transformer Network for Color Image Steganalysis

Compared with convolutional neural network (CNN), Transformer can obtain global receptive field features more effectively and has recently achieved great success in natural language processing and computer vision. Due to the particularity of steganography, however, almost all existing steganalytic n...

Full description

Saved in:

Bibliographic Details
Published in	Journal of computer science and technology Vol. 40; no. 2; pp. 413 - 427
Main Authors	Wei, Kang-Kang, Luo, Wei-Qi, Tan, Shun-Quan, Huang, Ji-Wu
Format	Journal Article
Language	English
Published	Singapore Springer Nature Singapore 01.03.2025 Springer Nature B.V
Subjects	Ablation Artificial Intelligence Artificial neural networks Classification Color imagery Computer Science Computer vision Computers Data Structures and Information Theory Feature extraction Information Systems Applications (incl.Internet) Methods Modules Natural language processing Neural networks Regular Paper Software Engineering Steganography Theory of Computation color image Transformer steganography convolutional neural network steganalysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Compared with convolutional neural network (CNN), Transformer can obtain global receptive field features more effectively and has recently achieved great success in natural language processing and computer vision. Due to the particularity of steganography, however, almost all existing steganalytic networks just employ CNN with local receptive fields to detect embedding artifacts. In this paper, we propose a novel convolutional Transformer network for color image steganalysis. Specifically, we firstly obtain various image residuals for each color channel of an input image in the pre-processing module. To capture more comprehensive steganalytic features, the truncated residuals after channel concatenation will pass through a feature extraction module composed of a CNN group and a Transformer group. The CNN group aims to extract local receptive fields features, while the Transformer group with multi-head self-attention as the key tries to extract global steganalytic features. Finally, we employ a global covariance pooling (GCP) and two fully-connected (FC) layers with dropout for classification. Extensive comparative experiments demonstrate that the proposed method can significantly improve the detection performances in color image steganalysis and achieve state-of-the-art results. Although the proposed method is originally designed for color images, it can also obtain competitive results for grayscale images compared with the current best detector. In addition, we provide numerous ablation studies to verify the rationality of the proposed network architecture.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1000-9000 1860-4749
DOI:	10.1007/s11390-023-3006-3