Application of Tswin-F network based on multi-scale feature fusion in tomato leaf lesion recognition

•A new recognition model based on transformer architecture is proposed. Self-supervised learning is introduced to strengthen the modeling ability of long-distance relationships in images.•Use bilateral attention mechanisms to further strengthen the link of continuous information on images.•An FFLCA...

Full description

Saved in:
Bibliographic Details
Published inPattern recognition Vol. 156; p. 110775
Main Authors Ye, Yuanbo, Zhou, Houkui, Yu, Huimin, Hu, Haoji, Zhang, Guangqun, Hu, Junguo, He, Tao
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.12.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•A new recognition model based on transformer architecture is proposed. Self-supervised learning is introduced to strengthen the modeling ability of long-distance relationships in images.•Use bilateral attention mechanisms to further strengthen the link of continuous information on images.•An FFLCA (Feature fusion local Attention) is proposed, which fuses the feature map output of the lower layer to the upper level, and fuses the feature information from the bottom to the upper level, which enhances the modeling ability of the entire network module and makes effective use of the global information.•A new parameter adjustment strategy is proposed, which combines flooding operation and dynamic attenuation of weights to improve the generalization ability of the model. Tomato leaf lesion identification can greatly help the detection and analysis of plant lesions. This study proposes Tswin-F network, a new network structure based on Transformer, to detect tomato leaf diseases. This Tswin-F network would obtain position information on images by implementing the bilateral local attention module and the self-supervised learning module. Specifically, the bilateral local attention mechanism focuses on the connection with certain continuous tokens, while the self-supervised learning module pays attention to the connection with random token positions. Then the information learned from the above two modules approaches will be combined to create the spatial connection between the final tokens. The combination of the above two modules can enhance the ability to communicate information between the windows of the input images and improve the accuracy of the models. In addition, a Feature Fuse Local Attention (FFLCA) structure is designed to solve the problem that attention distances would increase with the number of layers in the transformer network model. Furthermore, all the feature information is fused through the adaptive fusion strategy and is inputted into the classification network as the final global information of the model. Finally, an accuracy of 99.64% is obtained on 10 types of datasets, reaching the state-of-the-art level of CNN-based methods in terms of accuracy. The accuracy rate of identifying 13 types of tomato leaf lesions reaches 90.81% on average. Code is available at: https://github.com/fightpotato.
ISSN:0031-3203
DOI:10.1016/j.patcog.2024.110775