Depth-based human action recognition using histogram of templates
In this paper, we propose an efficient, fast, and easy-to-implement method for recognizing human actions in depth image sequences. In this method, the human body silhouettes are initially extracted from the depth image sequences using the Gaussian mixture background subtraction model. After removing...
Saved in:
Published in | Multimedia tools and applications Vol. 83; no. 14; pp. 40415 - 40449 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
Springer US
01.04.2024
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In this paper, we propose an efficient, fast, and easy-to-implement method for recognizing human actions in depth image sequences. In this method, the human body silhouettes are initially extracted from the depth image sequences using the Gaussian mixture background subtraction model. After removing noise from the foreground image by performing a cascade of morphological operations and area filtering, the contour of the human silhouette is extracted by applying Moore’s neighbor contour tracing algorithm. From this contour, features describing the human posture are calculated using the Histogram of Templates (HoT) descriptor. These features are then used to train a Dendogram-based support vector machine for generating the frame-by-frame posture variation signal of the action sequence. The histogram of this signal is created, and finally introduced as an input vector into a Fuzzy
k
Nearest Neighbor (F
k
NN) classifier for recognizing human actions. The proposed method is evaluated on two publicly available datasets containing various daily actions (Bending, Sitting, Lying, etc.) performed by different human subjects. Extensive experiments are conducted using several values of the nearest neighbor (
k
) in the F
k
NN and different similarity measures, namely Euclidean distance, Bhattacharyya distance, Kullback–Leibler distance, and histogram intersection-based distance. The results show that the proposed method performs better or comparable to other state-of-the-art approaches. Moreover, this method can process 18 frames per second from the image sequence, which makes it well suited for applications needing real-time human action recognition. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1573-7721 1380-7501 1573-7721 |
DOI: | 10.1007/s11042-023-16989-0 |