A comprehensive overview of core modules in visual SLAM framework

Visual Simultaneous Localization and Mapping (VSLAM) technology has become a key technology in autonomous driving and robot navigation. Relying on camera sensors, VSLAM can provide a richer and more precise perception means, and its advancement has accelerated in recent years. However, current revie...

Full description

Saved in:
Bibliographic Details
Published inNeurocomputing (Amsterdam) Vol. 590; p. 127760
Main Authors Cai, Dupeng, Li, Ruoqing, Hu, Zhuhua, Lu, Junlin, Li, Shijiang, Zhao, Yaochi
Format Journal Article
LanguageEnglish
Published Elsevier B.V 14.07.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Visual Simultaneous Localization and Mapping (VSLAM) technology has become a key technology in autonomous driving and robot navigation. Relying on camera sensors, VSLAM can provide a richer and more precise perception means, and its advancement has accelerated in recent years. However, current review studies are often limited to in-depth analysis of a specific module and lack a comprehensive review of the entire VSLAM framework. The VSLAM system consists of five core components: (1) The camera sensor module is responsible for capturing visual information about the surrounding environment. (2) The front-end module uses image data to roughly estimate the camera’s position and orientation. (3) The back-end module optimizes and processes the pose information estimated by the front-end. (4) The loop detection module is used to correct accumulated errors in the system. (5) The mapping module is responsible for generating environmental maps. This review provides a systematic and comprehensive analysis of the SLAM framework by taking the core components of VSLAM as the entry point. Deep learning brings new development opportunities for VSLAM, but it still needs to solve the problems of data dependence, cost and real-time in practical application. We deeply explore the challenges of combining VSLAM with deep learning and feasible solutions. This review provides a valuable reference for the development of VSLAM. This will help push VSLAM technology to become smarter and more efficient. Thus, it can better meet the needs of future intelligent autonomous systems in multiple fields. •We comprehensively cover the core components of VSLAM technology, including data acquisition, front-end, back-end, loop closure, and mapping modules, to provide a systematic perspective. In contrast to previous studies that focused primarily on individual modules, this comprehensive approach aims to assist readers in gaining a more holistic understanding of the overall framework of VSLAM technology and the interplay among its key components.•Deep learning significantly enhances feature extraction and matching accuracy in VSLAM, supporting localization and mapping tasks robustly. However, the practical application of deep learning in this context also encounters various challenges, such as the reliance on high-quality annotated data, the demand for significant computational resources, and the necessity for real-time processing. This analysis of the application and challenges of deep learning in practice provides readers with a clear understanding of its value and limitations within VSLAM technology.•We propose specific solutions to address the challenges in practical deep learning applications of VSLAM. For example, synthetic data can make up for the lack of actual annotated data. The lightweight model architecture reduces the computational expenses, while hardware structure optimization improves efficiency. Techniques such as model pruning and inference engine optimization speed up deep learning models during inference. These measures together effectively solve the challenge and provide a solid theoretical foundation and practical guidance for advancing VSLAM technology.•To be more specific, research on Visual Simultaneous Localization and Mapping (VSLAM) can be distinctly categorized into two major groups. On one hand, works like [1][2][3] have delved into specific modules of VSLAM or the underlying assumptions based on particular application scenarios, primarily focusing on a single perspective. On the other hand, studies such as [4][5] may have emphasized the importance of combining deep learning with VSLAM, especially in the context of traditional VSLAM and semantic VSLAM integrated with deep learning. Nevertheless, these investigations exhibit certain limitations. Firstly, they fail to provide a comprehensive and systematic exposition of the overall development of VSLAM, thus lacking a panoramic view. Secondly, existing literature has not conducted an in-depth analysis of the challenges faced by the integration of deep learning, and solutions are seldom provided.[1] Eyvazpour R, Shoaran M, Karimian G. Hardware implementation of SLAM algorithms: a survey on implementation approaches and platforms[J]. Artificial Intelligence Review, 2023, 56(7): 6187-6239.[2] Cheng J, Zhang L, Chen Q, et al. A review of visual SLAM methods for autonomous driving vehicles[J]. Engineering Applications of Artificial Intelligence, 2022, 114: 104992.[3] Kazerouni I A, Fitzgerald L, Dooly G, et al. A survey of state-of-the-art on visual SLAM[J]. Expert Systems with Applications, 2022, 205: 117734.[4] Chen W, Shang G, Ji A, et al. An overview on visual slam: From tradition to semantic[J]. Remote Sensing, 2022, 14(13): 3010.[5] Tang Y, Zhao C, Wang J, et al. Perception and navigation in autonomous systems in the era of learning: A survey[J]. IEEE Transactions on Neural Networks and Learning Systems, 2022.
ISSN:0925-2312
1872-8286
DOI:10.1016/j.neucom.2024.127760