Discrete Latent Perspective Learning for Segmentation and Detection
In this paper, we address the challenge of Perspective-Invariant Learning in machine learning and computer vision, which involves enabling a network to understand images from varying perspectives to achieve consistent semantic interpretation. While standard approaches rely on the labor-intensive col...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
14.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In this paper, we address the challenge of Perspective-Invariant Learning in
machine learning and computer vision, which involves enabling a network to
understand images from varying perspectives to achieve consistent semantic
interpretation. While standard approaches rely on the labor-intensive
collection of multi-view images or limited data augmentation techniques, we
propose a novel framework, Discrete Latent Perspective Learning (DLPL), for
latent multi-perspective fusion learning using conventional single-view images.
DLPL comprises three main modules: Perspective Discrete Decomposition (PDD),
Perspective Homography Transformation (PHT), and Perspective Invariant
Attention (PIA), which work together to discretize visual features, transform
perspectives, and fuse multi-perspective semantic information, respectively.
DLPL is a universal perspective learning framework applicable to a variety of
scenarios and vision tasks. Extensive experiments demonstrate that DLPL
significantly enhances the network's capacity to depict images across diverse
scenarios (daily photos, UAV, auto-driving) and tasks (detection,
segmentation). |
---|---|
DOI: | 10.48550/arxiv.2406.10475 |