Classifying Human Activities using CNN and ConvLSTM in Video Sequences

Video surveillance plays an important role to analyze any anomaly activity in the given premises. However, cameras can only capture the video information but cannot determine the type of activity on its own. Therefore, such systems require regular human intervention and monitoring. This requires a l...

Full description

Saved in:
Bibliographic Details
Published in2023 2nd International Conference on Paradigm Shifts in Communications Embedded Systems, Machine Learning and Signal Processing (PCEMS) pp. 1 - 6
Main Authors Gera, Reema, Ambati, Kalyan Ram, Chakole, Pallavi, Cheggoju, Naveen, Kamble, Vipin, Satpute, V. R.
Format Conference Proceeding
LanguageEnglish
Published IEEE 05.04.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Video surveillance plays an important role to analyze any anomaly activity in the given premises. However, cameras can only capture the video information but cannot determine the type of activity on its own. Therefore, such systems require regular human intervention and monitoring. This requires a lot of time and manual efforts. This calls for the need of automatic human activity recognition (HAR) system. This is possible using latest technologies like computer vision and deep learning based systems. Recognizing human activities in videos is a challenging task in computer vision. The main function of intelligent video systems is to automatically identify and tag the actions performed by people in video sequences accurately. The objective of this research is to develop a model that can accurately recognize and classify human activities from video footage. The information captured by the cameras i.e., videos can be used to determine the type of activity using deep learning based networks. Such a network should be capable of classifying the videos using the available spatial and temporal information. In this paper, a framework is proposed where the data is pre-processed initially to reject redundant information. This data is fed then into deep network to predict the event. In this paper, for HAR, two different network models are presented based on the size of the sequence of frames. One network takes in just the most significant frame and the other uses a longer sequence of frames for predicting the behavior as a time domain parameter.
DOI:10.1109/PCEMS58491.2023.10136043