Indoor Synthetic Data Generation: A Systematic Review

Deep learning-based object recognition, 6D pose estimation, and semantic scene understanding require a large amount of training data to achieve generalization. Time-consuming annotation processes, privacy, and security aspects lead to a scarcity of real-world datasets. To overcome this lack of data,...

Full description

Saved in:
Bibliographic Details
Published inComputer vision and image understanding Vol. 240; p. 103907
Main Authors Schieber, Hannah, Demir, Kubilay Can, Kleinbeck, Constantin, Yang, Seung Hee, Roth, Daniel
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.03.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Deep learning-based object recognition, 6D pose estimation, and semantic scene understanding require a large amount of training data to achieve generalization. Time-consuming annotation processes, privacy, and security aspects lead to a scarcity of real-world datasets. To overcome this lack of data, synthetic data generation has been proposed, including multiple facets in the area of domain randomization to extend the data distribution. The objective of this review is to identify methods applied for synthetic data generation aiming to improve 6D pose estimation, object recognition, and semantic scene understanding in indoor scenarios. We further review methods used to extend the data distribution and discuss best practices to bridge the gap between synthetic and real-world data. We adhered to the guidelines of the systematic PRISMA technique. Three databases, IEEE Xplore, Springer Link, and ACM, and an additional manual search were conducted. In total, we identified 241 studies and included 34 in our systematic review. In summary, synthetic data generation has been performed using crop-out methods, graphic APIs, 3D modeling or authoring tools, or game engine-based methods. To extend the data distribution, varying scene parameters, i.e., lighting conditions or textures and the use of distracting objects in the scene are promising. •A review of synthetic data for object recognition, 6D pose estimation, and semantics.•We investigate methods of domain randomization to bridge the sim-to-real gap.•We identified reusable approaches for synthetic dataset generation.•We screened synthetic datasets for indoor scenes.•We determined how the sim-to-real gap is addressed.
ISSN:1077-3142
1090-235X
DOI:10.1016/j.cviu.2023.103907