Model-Guided Multi-Path Knowledge Aggregation for Aerial Saliency Prediction

As an emerging vision platform, a drone can look from many abnormal viewpoints which brings many new challenges into the classic vision task of video saliency prediction. To investigate these challenges, this paper proposes a large-scale video dataset for aerial saliency prediction, which consists o...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on image processing Vol. 29; pp. 7117 - 7127
Main Authors	Fu, Kui, Li, Jia, Zhang, Yu, Shen, Hongze, Tian, Yonghong
Format	Journal Article
Language	English
Published	New York IEEE 2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Adaptation models aerial video Algorithms Annotations Computational modeling Datasets Drone aircraft Drones Eye movements eye-tracking knowledge transfer Multi-path CNNs Optimization Prediction algorithms Predictions Predictive models Salience Solid modeling Vision visual saliency Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	As an emerging vision platform, a drone can look from many abnormal viewpoints which brings many new challenges into the classic vision task of video saliency prediction. To investigate these challenges, this paper proposes a large-scale video dataset for aerial saliency prediction, which consists of ground-truth salient object regions of 1,000 aerial videos, annotated by 24 subjects. To the best of our knowledge, it is the first large-scale video dataset that focuses on visual saliency prediction on drones. Based on this dataset, we propose a Model-guided Multi-path Network (MM-Net) that serves as a baseline model for aerial video saliency prediction. Inspired by the annotation process in eye-tracking experiments, MM-Net adopts multiple information paths, each of which is initialized under the guidance of a classic saliency model. After that, the visual saliency knowledge encoded in the most representative paths is selected and aggregated to improve the capability of MM-Net in predicting spatial saliency in aerial scenarios. Finally, these spatial predictions are adaptively combined with the temporal saliency predictions via a spatiotemporal optimization algorithm. Experimental results show that MM-Net outperforms ten state-of-the-art models in predicting aerial video saliency.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1057-7149 1941-0042
DOI:	10.1109/TIP.2020.2998977