Leveraging Features from Background and Salient Regions for Automatic Image Annotation

In this era of information explosion, automating the annotation process of digital images is a crucial step towards efficient and effective management of this increasingly high volume of content. However, this still is a highly challenging task for the research community. One of the main bottlenecks...

Full description

Saved in:

Bibliographic Details
Published in	Journal of Information Processing Vol. 20; no. 1; pp. 250 - 266
Main Authors	Fahrmair, Michael, Sarin, Supheakmungkol, Wagner, Matthias, Kameyama, Wataru
Format	Journal Article
Language	English
Published	Information Processing Society of Japan 2012
Online Access	Get full text
ISSN	1882-6652 1882-6652
DOI	10.2197/ipsjjip.20.250

Cover

Loading…

Abstract	In this era of information explosion, automating the annotation process of digital images is a crucial step towards efficient and effective management of this increasingly high volume of content. However, this still is a highly challenging task for the research community. One of the main bottlenecks is the lack of integrity and diversity of features. We propose to solve this problem by utilizing 43 image features that cover the holistic content of the image from global to subject, background and scene. In our approach, salient regions and the background are separated without prior knowledge. Each of them together with the whole image are treated independently for feature extraction. Extensive experiments were designed to show the efficiency and the effectiveness of our approach. We chose two publicly available datasets manually annotated with diverse nature of images for our experiments, namely, the Corel5K and ESP Game datasets. We confirm the superior performance of our approach over the use of a single whole image using sign test with p-value < 0.05. Furthermore, our combined feature set gives satisfactory performance compared to recently proposed approaches especially in terms of generalization even with just a simple combination. We also obtain a better performance with the same feature set versus the grid-based approach. More importantly, when using our features with the state-of-the-art technique, our results show higher performance in a variety of standard metrics.
AbstractList	In this era of information explosion, automating the annotation process of digital images is a crucial step towards efficient and effective management of this increasingly high volume of content. However, this still is a highly challenging task for the research community. One of the main bottlenecks is the lack of integrity and diversity of features. We propose to solve this problem by utilizing 43 image features that cover the holistic content of the image from global to subject, background and scene. In our approach, salient regions and the background are separated without prior knowledge. Each of them together with the whole image are treated independently for feature extraction. Extensive experiments were designed to show the efficiency and the effectiveness of our approach. We chose two publicly available datasets manually annotated with diverse nature of images for our experiments, namely, the Corel5K and ESP Game datasets. We confirm the superior performance of our approach over the use of a single whole image using sign test with p-value < 0.05. Furthermore, our combined feature set gives satisfactory performance compared to recently proposed approaches especially in terms of generalization even with just a simple combination. We also obtain a better performance with the same feature set versus the grid-based approach. More importantly, when using our features with the state-of-the-art technique, our results show higher performance in a variety of standard metrics.
Author	Fahrmair, Michael Sarin, Supheakmungkol Kameyama, Wataru Wagner, Matthias
Author_xml	– sequence: 1 fullname: Fahrmair, Michael organization: DOCOMO Communications Laboratories Europe GmbH – sequence: 1 fullname: Sarin, Supheakmungkol organization: GITS, Waseda University – sequence: 1 fullname: Wagner, Matthias organization: DOCOMO Communications Laboratories Europe GmbH – sequence: 1 fullname: Kameyama, Wataru organization: GITS, Waseda University
BookMark	eNpNUNtKAzEQDVLBWn31OT-wNck2e3msxWqhIHh7XSbpZM3aTUqyFfx7Iy2lD3M5nHOGmbkmI-cdEnLH2VTwury3u9h1djcVCUt2Qca8qkRWFFKMzvorch1jx1hRM8nG5HONPxigta6lS4RhHzBSE3xPH0B_t8Hv3YZCijfYWnQDfcXWepc0PtD5fvA9DFbTVQ8t0rlzfkjYuxtyaWAb8fZYJ-Rj-fi-eM7WL0-rxXyd6bwsWFah4dIoBrysmFRcFrU2Sqqac5RGlzOuEq3yGQjFVa5meiOVAJObDZRYyHxCpoe5OvgYA5pmF2wP4bfhrPn_SnP8SiMSliwZFgdDF4e08kkOIZ2xxXM5PybJTqz-gtCgy_8AGzZzeg
Cites_doi	10.1037/0096-3445.108.3.316 10.1109/CVPRW.2009.5206596 10.1109/TPAMI.2008.128 10.1109/34.868688 10.1109/TPAMI.2007.61 10.1109/CVPR.2004.1315215 10.1109/MUE.2011.22 10.1109/TPAMI.2006.233 10.1109/34.730558 10.1023/A:1011139631724 10.1109/ICCV.2003.1238351 10.1109/TPAMI.2007.70791 10.1109/ICCV.2005.239 10.1109/TPAMI.2007.70716 10.1109/TPAMI.2006.57 10.1109/TPAMI.2009.154 10.1109/TPAMI.2006.3 10.1109/ICCV.2009.5459266 10.1162/153244303322533214
ContentType	Journal Article
Copyright	2012 by the Information Processing Society of Japan
Copyright_xml	– notice: 2012 by the Information Processing Society of Japan
DBID	AAYXX CITATION
DOI	10.2197/ipsjjip.20.250
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1882-6652
EndPage	266
ExternalDocumentID	10_2197_ipsjjip_20_250 article_ipsjjip_20_1_20_1_250_article_char_en
GroupedDBID	2WC ALMA_UNASSIGNED_HOLDINGS CS3 JSF JSH KQ8 RJT RZJ TKC AAYXX CITATION
ID	FETCH-LOGICAL-c3760-8ef15fb0a17805b1569cfb5b911e5fc741b5fbb34a2b1b3b4cd5b2af3fda7e653
ISSN	1882-6652
IngestDate	Tue Jul 01 05:17:05 EDT 2025 Wed Sep 03 06:25:13 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Issue	1
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-c3760-8ef15fb0a17805b1569cfb5b911e5fc741b5fbb34a2b1b3b4cd5b2af3fda7e653
OpenAccessLink	https://www.jstage.jst.go.jp/article/ipsjjip/20/1/20_1_250/_article/-char/en
PageCount	17
ParticipantIDs	crossref_primary_10_2197_ipsjjip_20_250 jstage_primary_article_ipsjjip_20_1_20_1_250_article_char_en
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2012 2012-00-00
PublicationDateYYYYMMDD	2012-01-01
PublicationDate_xml	– year: 2012 text: 2012
PublicationDecade	2010
PublicationTitle	Journal of Information Processing
PublicationTitleAlternate	Journal of Information Processing
PublicationYear	2012
Publisher	Information Processing Society of Japan
Publisher_xml	– name: Information Processing Society of Japan
References	[18] Shyu, M.-L., Chen, S.-C., Chen, M., Zhang, C. and Sarinnapakorn, K.: Image database retrieval utilizing affinity relationships, MMDB '03: Proc. 1st ACM International Workshop on Multimedia Databases, pp.78-85, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/951676.951691 (2003). [37] Hou, X. and Zhang, L.: Saliency Detection: A Spectral Residual Approach, Proc. IEEE Conference on Computer Vision and Pattern Recognition CVPR '07, pp.1-8 (online), DOI:10.1109/CVPR. 2007.383267 (2007). [48] Sarin, S. and Kameyama, W.: Holistic Image Features Extraction for Better Image Annotation, IEICE General Conference, Sendai City, Miyagi, Japan (2010). [13] Grady, L. and Schwartz, E.: Isoperimetric graph partitioning for image segmentation, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.28, No.3, pp.469-475 (2006). [19] Zhao, R. and Grosky, W.: From Features to Semantics: Some Preliminary Results, p.TAS3 (2000). [51] Guillaumin, M.: Exploiting Multimodal Data for Image Understanding, PhD Thesis, Université de Grenoble (2010). [35] Judd, T., Ehinger, K., Durand, F. and Torralba, A.: Learning to predict where humans look, 2009 IEEE 12th International Conference on Computer Vision, pp.2106-2113, IEEE (2010). [49] Ong, K.-M., Sarin, S. and Kameyama, W.: Affective and Holistic Approach at TRECVID 2010 Task-Semantic Indexing (SIN), Working Notes of TRECVID (2010). [15] Meghini, C., Sebastiani, F. and Straccia, U.: A model of multimedia information retrieval, J. ACM, Vol.48, pp.909-970 (online), DOI:http://doi.acm.org/10.1145/502102.502103 (2001). [46] Abdel-Hakim, A. and Farag, A.: CSIFT: A SIFT descriptor with color invariant characteristics, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol.2, pp.1978-1983, IEEE (2006). [26] Grangier, D. and Bengio, S.: A Discriminative Kernel-Based Approach to Rank Images from Text Queries, IEEE Trans. Pattern Anal. Mach. Intell., Vol.30, pp.1371-1384 (online), DOI:10.1109/TPAMI.2007.70791 (2008). [8] Makadia, A., Pavlovic, V. and Kumar, S.: Baselines for Image Annotation, International Journal of Computer Vision, pp.1-18 (2010). [52] Shechtman, E. and Irani, M.: Matching local self-similarities across images and videos, 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, IEEE (2007). [6] von Ahn, L. and Dabbish, L.: Labeling images with a computer game, CHI '04: Proc. SIGCHI Conference on Human Factors in Computing Systems, pp.319-326, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/985692.985733 (2004). [9] Deng, Y., Manjunath, B. and Shin, H.: Color image segmentation, CVPR '99, p.2446, IEEE Computer Society (1999). [28] Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D.M. and Jordan, M.I.: Matching words and pictures, J. Mach. Learn. Res., Vol.3, pp.1107-1135 (2003). [39] Van De Sande, K., Gevers, T. and Snoek, C.: Evaluating color descriptors for object and scene recognition, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.32, No.9, pp.1582-1596 (2010). [11] Grady, L.: Random walks for image segmentation, IEEE Trans. on Pattern Analysis and Machine Intelligence, pp.1768-1783 (2006). [29] Monay, F. and Gatica-Perez, D.: PLSA-based image auto-annotation: constraining the latent space, MULTIMEDIA '04: Proc. 12th Annual ACM International Conference on Multimedia, pp.348-351, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/1027527.1027608 (2004). [22] Grauman, K. and Darrell, T.: The pyramid match kernel: Discriminative classification with sets of image features (2005). [1] Gantz, J.F., Reinsel, D., Chute, C., Schlichting, W., Mcarthur, J., Minton, S., Xheneti, I., Toncheva, A. and Manfrediz, A.: The Expanding Digital Universe: A Forecast of Worldwide Information Growth Through 2010, IDC White Paper (online) (2007), available from <http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf>. [21] Monay, F. and Gatica-Perez, D.: On image auto-annotation with latent space models, MULTIMEDIA '03: Proc. 11th ACM International Conference on Multimedia, pp.275-278, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/957013.957070 (2003). [33] Torralba, A., Fergus, R. and Freeman, W.T.: 80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition, IEEE Trans. Pattern Anal. Mach. Intell., Vol.30, pp.1958-1970 (online), DOI:10.1109/TPAMI.2008.128 (2008). [36] Achanta, R., Hemami, S., Estrada, F. and Ssstrunk, S.: Frequency-tuned Salient Region Detection, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), (online) (2009), available from <http://www.cvpr2009.org/>. [27] Hertz, T., Bar-Hillel, A. and Weinshall, D.: Learning distance functions for image retrieval, CVPR'04: Proc. 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.570-577, IEEE Computer Society, Washington, DC, USA (online) (2004), available from <http://portal.acm.org/citation.cfm?id=1896300.1896383>. [34] Itti, L., Koch, C. and Niebur, E.: A Model of Saliency-Based Visual Attention for Rapid Scene Analysis, IEEE Trans. Pattern Anal. Mach. Intell., Vol.20, No.11, pp.1254-1259 (online), DOI:http://dx.doi.org/10.1109/34.730558 (1998). [12] Zahn, C.: Graph-theoretical methods for detecting and describing gestalt clusters, IEEE Trans. Computers, Vol.100, No.1, pp.68-86 (2006). [40] Oliva, A. and Torralba, A.: Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope, International Journal of Computer Vision, Vol.42, No.3, pp.145-175 (online), DOI:http://dx.doi.org/10.1023/A:1011139631724 (2001). [44] Van de Weijer, J., Gevers, T. and Bagdanov, A.: Boosting color saliency in image feature detection, IEEE Trans. on Pattern Analysis and Machine Intelligence, pp.150-156 (2006). [23] Wallraven, C., Caputo, B. and Graf, A.: Recognition with local features: The kernel recipe (2003). [42] Potter, M.C.: Short-term conceptual memory for pictures, Journal of Experimental Psychology: Human Learning and Memory, Vol.2, No.5, pp.509-522 (1976). [3] Facebook Photo Statistics (2010), available from <http://blog.facebook.com/blog.php?post=206178097130>. [50] Sarin, S., Fahrmair, M., Wagner, M. and Kameyama, W.: Holistic Feature Extraction for Automatic Image Annotation, Proc. 5th FTRA Int Multimedia and Ubiquitous Engineering (MUE) Conf, pp.59-66 (online), DOI:10.1109/MUE.2011.22 (2011). [7] Guillaumin, M., Mensink, T., Verbeek, J. and Schmid, C.: TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation, International Conference on Computer Vision (online) (2009), available from <http://lear.inrialpes.fr/pubs/2009/GMVS09>. [43] Lowe, D.: Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, Vol.60, No.2, pp.91-110 (2004). [25] Lazebnik, S., Schmid, C. and Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol.2, pp.2169-2178, IEEE (2006). [14] Laaksonen, J., Koskela, M. and Oja, E.: Content-Based Image Retrieval Using Self-Organizing Maps, VISUAL, pp.541-548 (1999). [32] Feng, S., Manmatha, R. and Lavrenko, V.: Multiple Bernoulli Relevance Models for Image and Video Annotation, CVPR, Vol.2, pp.1002-1009 (2004). [4] Datta, R., Joshi, D., Li, J. and Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age, ACM Comput. Surv., Vol.40, No.2, pp.1-60 (online), DOI:http://doi.acm.org/10.1145/1348246.1348248 (2008). [38] Makadia, A., Pavlovic, V. and Kumar, S.: A New Baseline for Image Annotation, ECCV, Vol.3, pp.316-329 (2008). [10] Shi, J. and Malik, J.: Normalized cuts and image segmentation, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.22, No.8, pp.888-905 (2002). [16] Schettini, R., CIOCCA, G., Zuffi, S., Tecnologie, I. and Multimediali, I.: A Survey of Methods for Colour Image Indexing and Retrieval in Image Databases, Color Imaging Science: Exploiting Digital, Media, John Wiley, pp.1-9 (2001). [30] Carneiro, G., Chan, A.B., Moreno, P.J. and Vasconcelos, N.: Supervised Learning of Semantic Classes for Image Annotation and Retrieval, Vol.29, No.3, pp.394-410 (online), DOI:10.1109/TPAMI.2007.61 (2007). [31] Jeon, L., Lavrenko, V., Manmatha, R. and Jeon, J.: A model for learning the semantics of pictures, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS), MIT Press (2003). [2] Flickr Photo Statistics (2010), available from <http://blog.flickr.net/en/2010/09/19/5000000000/>. [41] Friedman, A.: Framing pictures: The role of knowledge in automatized encoding and memory for gist, Journal of Experimental Psychology: General, Vol.108, pp.316-355 (1979). [24] Willamowski, J., Arregui, D., Csurka, G., Dance, C. and Fan, L.: Categorizing nine visual classes using local appearance descriptors, Illumination, Vol.17, p.21 (2004). [20] Tsai, C.-F., McGarry, K. and Tait, J.: Image classification using hybrid neural networks, SIGIR '03: Proc. 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp.431-432, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/860435.860536 (2003). [5] Duygulu, P., Barnard, K., Freitas, de Freitas, J.F.G. and Forsyth, D.A.: Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary, ECCV '02: Proc. 7th European Conference on Computer Vision-Part IV, pp.97-112, Springer-Verlag, London, UK (2002). [17] Ko, B., Lee, H.-S. and Byun, H.: Image retrieval using flexible image subblocks, SAC '00: Proc. 2000 ACM Symposium on Applied Computing-Volume 2, pp.574-578, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/338407.338502 (2000). [45] Bosch, A., Zisserman, A. and Muoz, X.: Scene classification using a hybrid generative/discriminative approach, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.30, No.4, pp.712-727 (20 44 45 46 47 48 49 50 51 52 10 11 12 13 14 15 16 17 18 19 1 2 3 4 5 6 7 8 9 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
References_xml	– reference: [13] Grady, L. and Schwartz, E.: Isoperimetric graph partitioning for image segmentation, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.28, No.3, pp.469-475 (2006). – reference: [11] Grady, L.: Random walks for image segmentation, IEEE Trans. on Pattern Analysis and Machine Intelligence, pp.1768-1783 (2006). – reference: [39] Van De Sande, K., Gevers, T. and Snoek, C.: Evaluating color descriptors for object and scene recognition, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.32, No.9, pp.1582-1596 (2010). – reference: [38] Makadia, A., Pavlovic, V. and Kumar, S.: A New Baseline for Image Annotation, ECCV, Vol.3, pp.316-329 (2008). – reference: [10] Shi, J. and Malik, J.: Normalized cuts and image segmentation, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.22, No.8, pp.888-905 (2002). – reference: [40] Oliva, A. and Torralba, A.: Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope, International Journal of Computer Vision, Vol.42, No.3, pp.145-175 (online), DOI:http://dx.doi.org/10.1023/A:1011139631724 (2001). – reference: [12] Zahn, C.: Graph-theoretical methods for detecting and describing gestalt clusters, IEEE Trans. Computers, Vol.100, No.1, pp.68-86 (2006). – reference: [30] Carneiro, G., Chan, A.B., Moreno, P.J. and Vasconcelos, N.: Supervised Learning of Semantic Classes for Image Annotation and Retrieval, Vol.29, No.3, pp.394-410 (online), DOI:10.1109/TPAMI.2007.61 (2007). – reference: [52] Shechtman, E. and Irani, M.: Matching local self-similarities across images and videos, 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, IEEE (2007). – reference: [42] Potter, M.C.: Short-term conceptual memory for pictures, Journal of Experimental Psychology: Human Learning and Memory, Vol.2, No.5, pp.509-522 (1976). – reference: [28] Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D.M. and Jordan, M.I.: Matching words and pictures, J. Mach. Learn. Res., Vol.3, pp.1107-1135 (2003). – reference: [41] Friedman, A.: Framing pictures: The role of knowledge in automatized encoding and memory for gist, Journal of Experimental Psychology: General, Vol.108, pp.316-355 (1979). – reference: [8] Makadia, A., Pavlovic, V. and Kumar, S.: Baselines for Image Annotation, International Journal of Computer Vision, pp.1-18 (2010). – reference: [14] Laaksonen, J., Koskela, M. and Oja, E.: Content-Based Image Retrieval Using Self-Organizing Maps, VISUAL, pp.541-548 (1999). – reference: [16] Schettini, R., CIOCCA, G., Zuffi, S., Tecnologie, I. and Multimediali, I.: A Survey of Methods for Colour Image Indexing and Retrieval in Image Databases, Color Imaging Science: Exploiting Digital, Media, John Wiley, pp.1-9 (2001). – reference: [4] Datta, R., Joshi, D., Li, J. and Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age, ACM Comput. Surv., Vol.40, No.2, pp.1-60 (online), DOI:http://doi.acm.org/10.1145/1348246.1348248 (2008). – reference: [25] Lazebnik, S., Schmid, C. and Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol.2, pp.2169-2178, IEEE (2006). – reference: [43] Lowe, D.: Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, Vol.60, No.2, pp.91-110 (2004). – reference: [35] Judd, T., Ehinger, K., Durand, F. and Torralba, A.: Learning to predict where humans look, 2009 IEEE 12th International Conference on Computer Vision, pp.2106-2113, IEEE (2010). – reference: [1] Gantz, J.F., Reinsel, D., Chute, C., Schlichting, W., Mcarthur, J., Minton, S., Xheneti, I., Toncheva, A. and Manfrediz, A.: The Expanding Digital Universe: A Forecast of Worldwide Information Growth Through 2010, IDC White Paper (online) (2007), available from <http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf>. – reference: [29] Monay, F. and Gatica-Perez, D.: PLSA-based image auto-annotation: constraining the latent space, MULTIMEDIA '04: Proc. 12th Annual ACM International Conference on Multimedia, pp.348-351, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/1027527.1027608 (2004). – reference: [31] Jeon, L., Lavrenko, V., Manmatha, R. and Jeon, J.: A model for learning the semantics of pictures, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS), MIT Press (2003). – reference: [19] Zhao, R. and Grosky, W.: From Features to Semantics: Some Preliminary Results, p.TAS3 (2000). – reference: [37] Hou, X. and Zhang, L.: Saliency Detection: A Spectral Residual Approach, Proc. IEEE Conference on Computer Vision and Pattern Recognition CVPR '07, pp.1-8 (online), DOI:10.1109/CVPR. 2007.383267 (2007). – reference: [48] Sarin, S. and Kameyama, W.: Holistic Image Features Extraction for Better Image Annotation, IEICE General Conference, Sendai City, Miyagi, Japan (2010). – reference: [49] Ong, K.-M., Sarin, S. and Kameyama, W.: Affective and Holistic Approach at TRECVID 2010 Task-Semantic Indexing (SIN), Working Notes of TRECVID (2010). – reference: [50] Sarin, S., Fahrmair, M., Wagner, M. and Kameyama, W.: Holistic Feature Extraction for Automatic Image Annotation, Proc. 5th FTRA Int Multimedia and Ubiquitous Engineering (MUE) Conf, pp.59-66 (online), DOI:10.1109/MUE.2011.22 (2011). – reference: [15] Meghini, C., Sebastiani, F. and Straccia, U.: A model of multimedia information retrieval, J. ACM, Vol.48, pp.909-970 (online), DOI:http://doi.acm.org/10.1145/502102.502103 (2001). – reference: [5] Duygulu, P., Barnard, K., Freitas, de Freitas, J.F.G. and Forsyth, D.A.: Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary, ECCV '02: Proc. 7th European Conference on Computer Vision-Part IV, pp.97-112, Springer-Verlag, London, UK (2002). – reference: [27] Hertz, T., Bar-Hillel, A. and Weinshall, D.: Learning distance functions for image retrieval, CVPR'04: Proc. 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.570-577, IEEE Computer Society, Washington, DC, USA (online) (2004), available from <http://portal.acm.org/citation.cfm?id=1896300.1896383>. – reference: [20] Tsai, C.-F., McGarry, K. and Tait, J.: Image classification using hybrid neural networks, SIGIR '03: Proc. 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp.431-432, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/860435.860536 (2003). – reference: [3] Facebook Photo Statistics (2010), available from <http://blog.facebook.com/blog.php?post=206178097130>. – reference: [17] Ko, B., Lee, H.-S. and Byun, H.: Image retrieval using flexible image subblocks, SAC '00: Proc. 2000 ACM Symposium on Applied Computing-Volume 2, pp.574-578, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/338407.338502 (2000). – reference: [23] Wallraven, C., Caputo, B. and Graf, A.: Recognition with local features: The kernel recipe (2003). – reference: [44] Van de Weijer, J., Gevers, T. and Bagdanov, A.: Boosting color saliency in image feature detection, IEEE Trans. on Pattern Analysis and Machine Intelligence, pp.150-156 (2006). – reference: [6] von Ahn, L. and Dabbish, L.: Labeling images with a computer game, CHI '04: Proc. SIGCHI Conference on Human Factors in Computing Systems, pp.319-326, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/985692.985733 (2004). – reference: [47] Sarin, S. and Kameyama, W.: Joint Equal Contribution of Global and Local Features for Image Annotation, CLEF Workshop 2009 (2009). – reference: [9] Deng, Y., Manjunath, B. and Shin, H.: Color image segmentation, CVPR '99, p.2446, IEEE Computer Society (1999). – reference: [22] Grauman, K. and Darrell, T.: The pyramid match kernel: Discriminative classification with sets of image features (2005). – reference: [26] Grangier, D. and Bengio, S.: A Discriminative Kernel-Based Approach to Rank Images from Text Queries, IEEE Trans. Pattern Anal. Mach. Intell., Vol.30, pp.1371-1384 (online), DOI:10.1109/TPAMI.2007.70791 (2008). – reference: [32] Feng, S., Manmatha, R. and Lavrenko, V.: Multiple Bernoulli Relevance Models for Image and Video Annotation, CVPR, Vol.2, pp.1002-1009 (2004). – reference: [36] Achanta, R., Hemami, S., Estrada, F. and Ssstrunk, S.: Frequency-tuned Salient Region Detection, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), (online) (2009), available from <http://www.cvpr2009.org/>. – reference: [18] Shyu, M.-L., Chen, S.-C., Chen, M., Zhang, C. and Sarinnapakorn, K.: Image database retrieval utilizing affinity relationships, MMDB '03: Proc. 1st ACM International Workshop on Multimedia Databases, pp.78-85, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/951676.951691 (2003). – reference: [51] Guillaumin, M.: Exploiting Multimodal Data for Image Understanding, PhD Thesis, Université de Grenoble (2010). – reference: [2] Flickr Photo Statistics (2010), available from <http://blog.flickr.net/en/2010/09/19/5000000000/>. – reference: [33] Torralba, A., Fergus, R. and Freeman, W.T.: 80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition, IEEE Trans. Pattern Anal. Mach. Intell., Vol.30, pp.1958-1970 (online), DOI:10.1109/TPAMI.2008.128 (2008). – reference: [46] Abdel-Hakim, A. and Farag, A.: CSIFT: A SIFT descriptor with color invariant characteristics, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol.2, pp.1978-1983, IEEE (2006). – reference: [24] Willamowski, J., Arregui, D., Csurka, G., Dance, C. and Fan, L.: Categorizing nine visual classes using local appearance descriptors, Illumination, Vol.17, p.21 (2004). – reference: [34] Itti, L., Koch, C. and Niebur, E.: A Model of Saliency-Based Visual Attention for Rapid Scene Analysis, IEEE Trans. Pattern Anal. Mach. Intell., Vol.20, No.11, pp.1254-1259 (online), DOI:http://dx.doi.org/10.1109/34.730558 (1998). – reference: [7] Guillaumin, M., Mensink, T., Verbeek, J. and Schmid, C.: TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation, International Conference on Computer Vision (online) (2009), available from <http://lear.inrialpes.fr/pubs/2009/GMVS09>. – reference: [45] Bosch, A., Zisserman, A. and Muoz, X.: Scene classification using a hybrid generative/discriminative approach, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.30, No.4, pp.712-727 (2008). – reference: [21] Monay, F. and Gatica-Perez, D.: On image auto-annotation with latent space models, MULTIMEDIA '03: Proc. 11th ACM International Conference on Multimedia, pp.275-278, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/957013.957070 (2003). – ident: 2 – ident: 41 doi: 10.1037/0096-3445.108.3.316 – ident: 36 doi: 10.1109/CVPRW.2009.5206596 – ident: 12 – ident: 33 doi: 10.1109/TPAMI.2008.128 – ident: 35 – ident: 10 doi: 10.1109/34.868688 – ident: 51 – ident: 16 – ident: 31 – ident: 30 doi: 10.1109/TPAMI.2007.61 – ident: 9 – ident: 27 doi: 10.1109/CVPR.2004.1315215 – ident: 49 – ident: 17 – ident: 5 – ident: 1 – ident: 38 – ident: 50 doi: 10.1109/MUE.2011.22 – ident: 48 – ident: 8 – ident: 11 doi: 10.1109/TPAMI.2006.233 – ident: 34 doi: 10.1109/34.730558 – ident: 40 doi: 10.1023/A:1011139631724 – ident: 18 – ident: 43 – ident: 23 doi: 10.1109/ICCV.2003.1238351 – ident: 4 – ident: 26 doi: 10.1109/TPAMI.2007.70791 – ident: 37 – ident: 22 doi: 10.1109/ICCV.2005.239 – ident: 14 – ident: 45 doi: 10.1109/TPAMI.2007.70716 – ident: 13 doi: 10.1109/TPAMI.2006.57 – ident: 24 – ident: 47 – ident: 20 – ident: 42 – ident: 3 – ident: 39 doi: 10.1109/TPAMI.2009.154 – ident: 44 doi: 10.1109/TPAMI.2006.3 – ident: 7 doi: 10.1109/ICCV.2009.5459266 – ident: 19 – ident: 52 – ident: 15 – ident: 32 – ident: 29 – ident: 28 doi: 10.1162/153244303322533214 – ident: 6 – ident: 46 – ident: 21 – ident: 25
SSID	ssj0069050
Score	1.490037
Snippet	In this era of information explosion, automating the annotation process of digital images is a crucial step towards efficient and effective management of this...
SourceID	crossref jstage
SourceType	Index Database Publisher
StartPage	250
Title	Leveraging Features from Background and Salient Regions for Automatic Image Annotation
URI	https://www.jstage.jst.go.jp/article/ipsjjip/20/1/20_1_250/_article/-char/en
Volume	20
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
ispartofPNX	Journal of Information Processing, 2012, Vol.20(1), pp.250-266
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3Nb9MwFLfK4MCFb8T4kg9IHFCG48RJKnEpCDQGQgI2tltkO_Zoq6ZVSQ7wX_Ef8p7tJO22A4OLVcVJbfn99D78vgh5BkLGVoUB8FpbRWnFTFQoxSPDxkpqludMumiLT9n-UXpwIk5Go98bUUtto_b0rwvzSv6FqvAM6IpZspegbP-n8AB-A31hBArD-Fc0_mhgy77NEKpyLZjOPl_ktdRzzNfwhVhffAVlG33-X8ypi3vD0MJJ2yx9udb3C4zbmdT1csMtf15fDYlLDi8hvaATe66Y43eYnK7PhuLj5Q2Y4_6WtV0B658vgL_Ml_30sTwNSTeu9_hUDu4luTA_5cKpt8eyket2844iHmzZi7fWR6TC5g9AJwglxj0HRpU_y8QWi-bsHBQDv_VVa4Po5r6By1mpAEwZ_dLT1Y_ZbIo1Svf6z7YqbQc6luHFkoN9FAbBym4Ws-EAfFfIVZ7nMTLSD597x1U2ZoL52qC46svtNbd0n2szUP-70EGnzRzeIjcCWenEr3abjEx9h9zsWnzQwPHvkm8DxGgHMYoQowPEKECMBojRADEKBKE9xKiDGB0gdo8cvXt7-GY_Cq04Io1RU1FhbCysYjLGHhgKjP6xtkooEJVGWA1qqYJplaSSq1glKtWVUFzaxFYyN5lI7pOdelmbB4SmQmpZ5NbkSZFqYWSaVnqsOEiHhMsi3iXPu0MqV77iSgmWKh7nJmXgOHfJK3-G_XuXouDD__v8EbmOUPfXcY_JTrNuzRNQUBv11EHiDzq4pKo
linkProvider	Colorado Alliance of Research Libraries
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Leveraging+Features+from+Background+and+Salient+Regions+for+Automatic+Image+Annotation&rft.jtitle=Journal+of+Information+Processing&rft.au=Fahrmair%2C+Michael&rft.au=Sarin%2C+Supheakmungkol&rft.au=Wagner%2C+Matthias&rft.au=Kameyama%2C+Wataru&rft.date=2012&rft.pub=Information+Processing+Society+of+Japan&rft.eissn=1882-6652&rft.volume=20&rft.issue=1&rft.spage=250&rft.epage=266&rft_id=info:doi/10.2197%2Fipsjjip.20.250&rft.externalDocID=article_ipsjjip_20_1_20_1_250_article_char_en
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1882-6652&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1882-6652&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1882-6652&client=summon