BKinD-3D: Self-Supervised 3D Keypoint Discovery from Multi-View Videos
Quantifying motion in 3D is important for studying the behavior of humans and other animals, but manual pose annotations are expensive and time-consuming to obtain. Self-supervised keypoint discovery is a promising strategy for estimating 3D poses without annotations. However, current keypoint disco...
Saved in:
Main Authors | , , , , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
14.12.2022
|
Subjects | |
Online Access | Get full text |
DOI | 10.48550/arxiv.2212.07401 |
Cover
Loading…
Summary: | Quantifying motion in 3D is important for studying the behavior of humans and
other animals, but manual pose annotations are expensive and time-consuming to
obtain. Self-supervised keypoint discovery is a promising strategy for
estimating 3D poses without annotations. However, current keypoint discovery
approaches commonly process single 2D views and do not operate in the 3D space.
We propose a new method to perform self-supervised keypoint discovery in 3D
from multi-view videos of behaving agents, without any keypoint or bounding box
supervision in 2D or 3D. Our method, BKinD-3D, uses an encoder-decoder
architecture with a 3D volumetric heatmap, trained to reconstruct
spatiotemporal differences across multiple views, in addition to joint length
constraints on a learned 3D skeleton of the subject. In this way, we discover
keypoints without requiring manual supervision in videos of humans and rats,
demonstrating the potential of 3D keypoint discovery for studying behavior. |
---|---|
DOI: | 10.48550/arxiv.2212.07401 |