Learning Everything about Anything: Webly-Supervised Visual Concept Learning

Recognition is graduating from labs to real-world applications. While it is encouraging to see its potential being tapped, it brings forth a fundamental challenge to the vision researcher: scalability. How can we learn a model for any concept that exhaustively covers all its appearance variations, w...

Full description

Saved in:

Bibliographic Details
Published in	2014 IEEE Conference on Computer Vision and Pattern Recognition pp. 3270 - 3277
Main Authors	Divvala, Santosh K., Farhadi, Ali, Guestrin, Carlos
Format	Conference Proceeding Journal Article
Language	English
Published	IEEE 01.06.2014
Subjects	Computer vision Data models Detectors Human Learning Medical services Noise measurement Pattern recognition Training Variance Vision Visual Visualization Vocabulary
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recognition is graduating from labs to real-world applications. While it is encouraging to see its potential being tapped, it brings forth a fundamental challenge to the vision researcher: scalability. How can we learn a model for any concept that exhaustively covers all its appearance variations, while requiring minimal or no human supervision for compiling the vocabulary of visual variance, gathering the training images and annotations, and learning the models? In this paper, we introduce a fully-automated approach for learning extensive models for a wide range of variations (e.g. actions, interactions, attributes and beyond) within any concept. Our approach leverages vast resources of online books to discover the vocabulary of variance, and intertwines the data collection and modeling steps to alleviate the need for explicit human supervision in training the models. Our approach organizes the visual knowledge about a concept in a convenient and useful way, enabling a variety of applications across vision and NLP. Our online system has been queried by users to learn models for several interesting concepts including breakfast, Gandhi, beautiful, etc. To date, our system has models available for over 50, 000 variations within 150 concepts, and has annotated more than 10 million images with bounding boxes.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Conference-1 ObjectType-Feature-3 content type line 23 SourceType-Conference Papers & Proceedings-2
ISSN:	1063-6919 1063-6919 2575-7075
DOI:	10.1109/CVPR.2014.412