Learning the Visual Interpretation of Sentences

Sentences that describe visual scenes contain a wide variety of information pertaining to the presence of objects, their attributes and their spatial relations. In this paper we learn the visual features that correspond to semantic phrases derived from sentences. Specifically, we extract predicate t...

Full description

Saved in:

Bibliographic Details
Published in	2013 IEEE International Conference on Computer Vision pp. 1681 - 1688
Main Authors	Zitnick, C. Lawrence, Parikh, Devi, Vanderwende, Lucy
Format	Conference Proceeding Journal Article
Language	English
Published	IEEE 01.12.2013
Subjects	Abstracts Computational modeling Computer vision Conferences Feature extraction Fields (mathematics) Queries Radio access networks Sampling Semantics Sentences Tasks Visual Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Sentences that describe visual scenes contain a wide variety of information pertaining to the presence of objects, their attributes and their spatial relations. In this paper we learn the visual features that correspond to semantic phrases derived from sentences. Specifically, we extract predicate tuples that contain two nouns and a relation. The relation may take several forms, such as a verb, preposition, adjective or their combination. We model a scene using a Conditional Random Field (CRF) formulation where each node corresponds to an object, and the edges to their relations. We determine the potentials of the CRF using the tuples extracted from the sentences. We generate novel scenes depicting the sentences' visual meaning by sampling from the CRF. The CRF is also used to score a set of scenes for a text-based image retrieval task. Our results show we can generate (retrieve) scenes that convey the desired semantic meaning, even when scenes (queries) are described by multiple sentences. Significant improvement is found over several baseline approaches.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Conference-1 ObjectType-Feature-3 content type line 23 SourceType-Conference Papers & Proceedings-2
ISSN:	1550-5499
DOI:	10.1109/ICCV.2013.211