Vehicle joint make and model recognition with multiscale attention windows

Vehicle Make and Model Recognition (VMMR) deals with the problem of classifying vehicles whose appearance may vary significantly when captured from different perspectives. A number of successful approaches to this problem rely on part-based models, requiring however labor-intensive parts annotations...

Full description

Saved in:

Bibliographic Details
Published in	Signal processing. Image communication Vol. 72; pp. 69 - 79
Main Authors	Ghassemi, Sina, Fiandrotti, Attilio, Caimotti, Emanuele, Francini, Gianluca, Magli, Enrico
Format	Journal Article
Language	English
Published	Amsterdam Elsevier B.V 01.03.2019 Elsevier BV
Subjects	Accounting Annotations Architecture Attention windows Automotive parts Classification Convolutional neural network Errors Mathematical models Recognition Residual network Training Vehicle classification Visual discrimination Windows (computer programs) Vehicle classification Attention windows Convolutional neural network Residual network
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Vehicle Make and Model Recognition (VMMR) deals with the problem of classifying vehicles whose appearance may vary significantly when captured from different perspectives. A number of successful approaches to this problem rely on part-based models, requiring however labor-intensive parts annotations. In this work, we address the VMMR problem proposing a deep convolutional architecture built upon multi-scale attention windows. The proposed architecture classifies a vehicle over attention windows which are predicted to minimize the classification error. Through these windows, the visual representations of the most discriminative part of the vehicle are aggregated over different scales which in fact provide more representative features for the classifier. In addition, we define a loss function accounting for the joint classification error across make and model. Besides, a training methodology is devised to stabilize the training process and to impose multi-scale constraints on predicted attention windows. The proposed architecture outperforms state-of-the-art schemes reducing the model classification error over the Stanford dataset by 1.7 % and improving the classification accuracy by 0.2 % and 0.3 % on model and make respectively over the CompCar dataset. •VMMR can be addressed by formulating a joint loss function.•Predicting attention window scale improves VMMR Performance.•Multi-scale visual representations can increase VMMR accuracy.•Multi-scale patch training enables generating multi-scale attention windows.
ISSN:	0923-5965 1879-2677
DOI:	10.1016/j.image.2018.12.009