Leveraging Features from Background and Salient Regions for Automatic Image Annotation

In this era of information explosion, automating the annotation process of digital images is a crucial step towards efficient and effective management of this increasingly high volume of content. However, this still is a highly challenging task for the research community. One of the main bottlenecks...

Full description

Saved in:
Bibliographic Details
Published inJournal of Information Processing Vol. 20; no. 1; pp. 250 - 266
Main Authors Fahrmair, Michael, Sarin, Supheakmungkol, Wagner, Matthias, Kameyama, Wataru
Format Journal Article
LanguageEnglish
Published Information Processing Society of Japan 2012
Online AccessGet full text
ISSN1882-6652
1882-6652
DOI10.2197/ipsjjip.20.250

Cover

Loading…
Abstract In this era of information explosion, automating the annotation process of digital images is a crucial step towards efficient and effective management of this increasingly high volume of content. However, this still is a highly challenging task for the research community. One of the main bottlenecks is the lack of integrity and diversity of features. We propose to solve this problem by utilizing 43 image features that cover the holistic content of the image from global to subject, background and scene. In our approach, salient regions and the background are separated without prior knowledge. Each of them together with the whole image are treated independently for feature extraction. Extensive experiments were designed to show the efficiency and the effectiveness of our approach. We chose two publicly available datasets manually annotated with diverse nature of images for our experiments, namely, the Corel5K and ESP Game datasets. We confirm the superior performance of our approach over the use of a single whole image using sign test with p-value < 0.05. Furthermore, our combined feature set gives satisfactory performance compared to recently proposed approaches especially in terms of generalization even with just a simple combination. We also obtain a better performance with the same feature set versus the grid-based approach. More importantly, when using our features with the state-of-the-art technique, our results show higher performance in a variety of standard metrics.
AbstractList In this era of information explosion, automating the annotation process of digital images is a crucial step towards efficient and effective management of this increasingly high volume of content. However, this still is a highly challenging task for the research community. One of the main bottlenecks is the lack of integrity and diversity of features. We propose to solve this problem by utilizing 43 image features that cover the holistic content of the image from global to subject, background and scene. In our approach, salient regions and the background are separated without prior knowledge. Each of them together with the whole image are treated independently for feature extraction. Extensive experiments were designed to show the efficiency and the effectiveness of our approach. We chose two publicly available datasets manually annotated with diverse nature of images for our experiments, namely, the Corel5K and ESP Game datasets. We confirm the superior performance of our approach over the use of a single whole image using sign test with p-value < 0.05. Furthermore, our combined feature set gives satisfactory performance compared to recently proposed approaches especially in terms of generalization even with just a simple combination. We also obtain a better performance with the same feature set versus the grid-based approach. More importantly, when using our features with the state-of-the-art technique, our results show higher performance in a variety of standard metrics.
Author Fahrmair, Michael
Sarin, Supheakmungkol
Kameyama, Wataru
Wagner, Matthias
Author_xml – sequence: 1
  fullname: Fahrmair, Michael
  organization: DOCOMO Communications Laboratories Europe GmbH
– sequence: 1
  fullname: Sarin, Supheakmungkol
  organization: GITS, Waseda University
– sequence: 1
  fullname: Wagner, Matthias
  organization: DOCOMO Communications Laboratories Europe GmbH
– sequence: 1
  fullname: Kameyama, Wataru
  organization: GITS, Waseda University
BookMark eNpNUNtKAzEQDVLBWn31OT-wNck2e3msxWqhIHh7XSbpZM3aTUqyFfx7Iy2lD3M5nHOGmbkmI-cdEnLH2VTwury3u9h1djcVCUt2Qca8qkRWFFKMzvorch1jx1hRM8nG5HONPxigta6lS4RhHzBSE3xPH0B_t8Hv3YZCijfYWnQDfcXWepc0PtD5fvA9DFbTVQ8t0rlzfkjYuxtyaWAb8fZYJ-Rj-fi-eM7WL0-rxXyd6bwsWFah4dIoBrysmFRcFrU2Sqqac5RGlzOuEq3yGQjFVa5meiOVAJObDZRYyHxCpoe5OvgYA5pmF2wP4bfhrPn_SnP8SiMSliwZFgdDF4e08kkOIZ2xxXM5PybJTqz-gtCgy_8AGzZzeg
Cites_doi 10.1037/0096-3445.108.3.316
10.1109/CVPRW.2009.5206596
10.1109/TPAMI.2008.128
10.1109/34.868688
10.1109/TPAMI.2007.61
10.1109/CVPR.2004.1315215
10.1109/MUE.2011.22
10.1109/TPAMI.2006.233
10.1109/34.730558
10.1023/A:1011139631724
10.1109/ICCV.2003.1238351
10.1109/TPAMI.2007.70791
10.1109/ICCV.2005.239
10.1109/TPAMI.2007.70716
10.1109/TPAMI.2006.57
10.1109/TPAMI.2009.154
10.1109/TPAMI.2006.3
10.1109/ICCV.2009.5459266
10.1162/153244303322533214
ContentType Journal Article
Copyright 2012 by the Information Processing Society of Japan
Copyright_xml – notice: 2012 by the Information Processing Society of Japan
DBID AAYXX
CITATION
DOI 10.2197/ipsjjip.20.250
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1882-6652
EndPage 266
ExternalDocumentID 10_2197_ipsjjip_20_250
article_ipsjjip_20_1_20_1_250_article_char_en
GroupedDBID 2WC
ALMA_UNASSIGNED_HOLDINGS
CS3
JSF
JSH
KQ8
RJT
RZJ
TKC
AAYXX
CITATION
ID FETCH-LOGICAL-c3760-8ef15fb0a17805b1569cfb5b911e5fc741b5fbb34a2b1b3b4cd5b2af3fda7e653
ISSN 1882-6652
IngestDate Tue Jul 01 05:17:05 EDT 2025
Wed Sep 03 06:25:13 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Issue 1
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c3760-8ef15fb0a17805b1569cfb5b911e5fc741b5fbb34a2b1b3b4cd5b2af3fda7e653
OpenAccessLink https://www.jstage.jst.go.jp/article/ipsjjip/20/1/20_1_250/_article/-char/en
PageCount 17
ParticipantIDs crossref_primary_10_2197_ipsjjip_20_250
jstage_primary_article_ipsjjip_20_1_20_1_250_article_char_en
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2012
2012-00-00
PublicationDateYYYYMMDD 2012-01-01
PublicationDate_xml – year: 2012
  text: 2012
PublicationDecade 2010
PublicationTitle Journal of Information Processing
PublicationTitleAlternate Journal of Information Processing
PublicationYear 2012
Publisher Information Processing Society of Japan
Publisher_xml – name: Information Processing Society of Japan
References [18] Shyu, M.-L., Chen, S.-C., Chen, M., Zhang, C. and Sarinnapakorn, K.: Image database retrieval utilizing affinity relationships, MMDB '03: Proc. 1st ACM International Workshop on Multimedia Databases, pp.78-85, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/951676.951691 (2003).
[37] Hou, X. and Zhang, L.: Saliency Detection: A Spectral Residual Approach, Proc. IEEE Conference on Computer Vision and Pattern Recognition CVPR '07, pp.1-8 (online), DOI:10.1109/CVPR. 2007.383267 (2007).
[48] Sarin, S. and Kameyama, W.: Holistic Image Features Extraction for Better Image Annotation, IEICE General Conference, Sendai City, Miyagi, Japan (2010).
[13] Grady, L. and Schwartz, E.: Isoperimetric graph partitioning for image segmentation, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.28, No.3, pp.469-475 (2006).
[19] Zhao, R. and Grosky, W.: From Features to Semantics: Some Preliminary Results, p.TAS3 (2000).
[51] Guillaumin, M.: Exploiting Multimodal Data for Image Understanding, PhD Thesis, Université de Grenoble (2010).
[35] Judd, T., Ehinger, K., Durand, F. and Torralba, A.: Learning to predict where humans look, 2009 IEEE 12th International Conference on Computer Vision, pp.2106-2113, IEEE (2010).
[49] Ong, K.-M., Sarin, S. and Kameyama, W.: Affective and Holistic Approach at TRECVID 2010 Task-Semantic Indexing (SIN), Working Notes of TRECVID (2010).
[15] Meghini, C., Sebastiani, F. and Straccia, U.: A model of multimedia information retrieval, J. ACM, Vol.48, pp.909-970 (online), DOI:http://doi.acm.org/10.1145/502102.502103 (2001).
[46] Abdel-Hakim, A. and Farag, A.: CSIFT: A SIFT descriptor with color invariant characteristics, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol.2, pp.1978-1983, IEEE (2006).
[26] Grangier, D. and Bengio, S.: A Discriminative Kernel-Based Approach to Rank Images from Text Queries, IEEE Trans. Pattern Anal. Mach. Intell., Vol.30, pp.1371-1384 (online), DOI:10.1109/TPAMI.2007.70791 (2008).
[8] Makadia, A., Pavlovic, V. and Kumar, S.: Baselines for Image Annotation, International Journal of Computer Vision, pp.1-18 (2010).
[52] Shechtman, E. and Irani, M.: Matching local self-similarities across images and videos, 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, IEEE (2007).
[6] von Ahn, L. and Dabbish, L.: Labeling images with a computer game, CHI '04: Proc. SIGCHI Conference on Human Factors in Computing Systems, pp.319-326, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/985692.985733 (2004).
[9] Deng, Y., Manjunath, B. and Shin, H.: Color image segmentation, CVPR '99, p.2446, IEEE Computer Society (1999).
[28] Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D.M. and Jordan, M.I.: Matching words and pictures, J. Mach. Learn. Res., Vol.3, pp.1107-1135 (2003).
[39] Van De Sande, K., Gevers, T. and Snoek, C.: Evaluating color descriptors for object and scene recognition, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.32, No.9, pp.1582-1596 (2010).
[11] Grady, L.: Random walks for image segmentation, IEEE Trans. on Pattern Analysis and Machine Intelligence, pp.1768-1783 (2006).
[29] Monay, F. and Gatica-Perez, D.: PLSA-based image auto-annotation: constraining the latent space, MULTIMEDIA '04: Proc. 12th Annual ACM International Conference on Multimedia, pp.348-351, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/1027527.1027608 (2004).
[22] Grauman, K. and Darrell, T.: The pyramid match kernel: Discriminative classification with sets of image features (2005).
[1] Gantz, J.F., Reinsel, D., Chute, C., Schlichting, W., Mcarthur, J., Minton, S., Xheneti, I., Toncheva, A. and Manfrediz, A.: The Expanding Digital Universe: A Forecast of Worldwide Information Growth Through 2010, IDC White Paper (online) (2007), available from <http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf>.
[21] Monay, F. and Gatica-Perez, D.: On image auto-annotation with latent space models, MULTIMEDIA '03: Proc. 11th ACM International Conference on Multimedia, pp.275-278, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/957013.957070 (2003).
[33] Torralba, A., Fergus, R. and Freeman, W.T.: 80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition, IEEE Trans. Pattern Anal. Mach. Intell., Vol.30, pp.1958-1970 (online), DOI:10.1109/TPAMI.2008.128 (2008).
[36] Achanta, R., Hemami, S., Estrada, F. and Ssstrunk, S.: Frequency-tuned Salient Region Detection, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), (online) (2009), available from <http://www.cvpr2009.org/>.
[27] Hertz, T., Bar-Hillel, A. and Weinshall, D.: Learning distance functions for image retrieval, CVPR'04: Proc. 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.570-577, IEEE Computer Society, Washington, DC, USA (online) (2004), available from <http://portal.acm.org/citation.cfm?id=1896300.1896383>.
[34] Itti, L., Koch, C. and Niebur, E.: A Model of Saliency-Based Visual Attention for Rapid Scene Analysis, IEEE Trans. Pattern Anal. Mach. Intell., Vol.20, No.11, pp.1254-1259 (online), DOI:http://dx.doi.org/10.1109/34.730558 (1998).
[12] Zahn, C.: Graph-theoretical methods for detecting and describing gestalt clusters, IEEE Trans. Computers, Vol.100, No.1, pp.68-86 (2006).
[40] Oliva, A. and Torralba, A.: Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope, International Journal of Computer Vision, Vol.42, No.3, pp.145-175 (online), DOI:http://dx.doi.org/10.1023/A:1011139631724 (2001).
[44] Van de Weijer, J., Gevers, T. and Bagdanov, A.: Boosting color saliency in image feature detection, IEEE Trans. on Pattern Analysis and Machine Intelligence, pp.150-156 (2006).
[23] Wallraven, C., Caputo, B. and Graf, A.: Recognition with local features: The kernel recipe (2003).
[42] Potter, M.C.: Short-term conceptual memory for pictures, Journal of Experimental Psychology: Human Learning and Memory, Vol.2, No.5, pp.509-522 (1976).
[3] Facebook Photo Statistics (2010), available from <http://blog.facebook.com/blog.php?post=206178097130>.
[50] Sarin, S., Fahrmair, M., Wagner, M. and Kameyama, W.: Holistic Feature Extraction for Automatic Image Annotation, Proc. 5th FTRA Int Multimedia and Ubiquitous Engineering (MUE) Conf, pp.59-66 (online), DOI:10.1109/MUE.2011.22 (2011).
[7] Guillaumin, M., Mensink, T., Verbeek, J. and Schmid, C.: TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation, International Conference on Computer Vision (online) (2009), available from <http://lear.inrialpes.fr/pubs/2009/GMVS09>.
[43] Lowe, D.: Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, Vol.60, No.2, pp.91-110 (2004).
[25] Lazebnik, S., Schmid, C. and Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol.2, pp.2169-2178, IEEE (2006).
[14] Laaksonen, J., Koskela, M. and Oja, E.: Content-Based Image Retrieval Using Self-Organizing Maps, VISUAL, pp.541-548 (1999).
[32] Feng, S., Manmatha, R. and Lavrenko, V.: Multiple Bernoulli Relevance Models for Image and Video Annotation, CVPR, Vol.2, pp.1002-1009 (2004).
[4] Datta, R., Joshi, D., Li, J. and Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age, ACM Comput. Surv., Vol.40, No.2, pp.1-60 (online), DOI:http://doi.acm.org/10.1145/1348246.1348248 (2008).
[38] Makadia, A., Pavlovic, V. and Kumar, S.: A New Baseline for Image Annotation, ECCV, Vol.3, pp.316-329 (2008).
[10] Shi, J. and Malik, J.: Normalized cuts and image segmentation, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.22, No.8, pp.888-905 (2002).
[16] Schettini, R., CIOCCA, G., Zuffi, S., Tecnologie, I. and Multimediali, I.: A Survey of Methods for Colour Image Indexing and Retrieval in Image Databases, Color Imaging Science: Exploiting Digital, Media, John Wiley, pp.1-9 (2001).
[30] Carneiro, G., Chan, A.B., Moreno, P.J. and Vasconcelos, N.: Supervised Learning of Semantic Classes for Image Annotation and Retrieval, Vol.29, No.3, pp.394-410 (online), DOI:10.1109/TPAMI.2007.61 (2007).
[31] Jeon, L., Lavrenko, V., Manmatha, R. and Jeon, J.: A model for learning the semantics of pictures, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS), MIT Press (2003).
[2] Flickr Photo Statistics (2010), available from <http://blog.flickr.net/en/2010/09/19/5000000000/>.
[41] Friedman, A.: Framing pictures: The role of knowledge in automatized encoding and memory for gist, Journal of Experimental Psychology: General, Vol.108, pp.316-355 (1979).
[24] Willamowski, J., Arregui, D., Csurka, G., Dance, C. and Fan, L.: Categorizing nine visual classes using local appearance descriptors, Illumination, Vol.17, p.21 (2004).
[20] Tsai, C.-F., McGarry, K. and Tait, J.: Image classification using hybrid neural networks, SIGIR '03: Proc. 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp.431-432, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/860435.860536 (2003).
[5] Duygulu, P., Barnard, K., Freitas, de Freitas, J.F.G. and Forsyth, D.A.: Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary, ECCV '02: Proc. 7th European Conference on Computer Vision-Part IV, pp.97-112, Springer-Verlag, London, UK (2002).
[17] Ko, B., Lee, H.-S. and Byun, H.: Image retrieval using flexible image subblocks, SAC '00: Proc. 2000 ACM Symposium on Applied Computing-Volume 2, pp.574-578, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/338407.338502 (2000).
[45] Bosch, A., Zisserman, A. and Muoz, X.: Scene classification using a hybrid generative/discriminative approach, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.30, No.4, pp.712-727 (20
44
45
46
47
48
49
50
51
52
10
11
12
13
14
15
16
17
18
19
1
2
3
4
5
6
7
8
9
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
References_xml – reference: [13] Grady, L. and Schwartz, E.: Isoperimetric graph partitioning for image segmentation, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.28, No.3, pp.469-475 (2006).
– reference: [11] Grady, L.: Random walks for image segmentation, IEEE Trans. on Pattern Analysis and Machine Intelligence, pp.1768-1783 (2006).
– reference: [39] Van De Sande, K., Gevers, T. and Snoek, C.: Evaluating color descriptors for object and scene recognition, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.32, No.9, pp.1582-1596 (2010).
– reference: [38] Makadia, A., Pavlovic, V. and Kumar, S.: A New Baseline for Image Annotation, ECCV, Vol.3, pp.316-329 (2008).
– reference: [10] Shi, J. and Malik, J.: Normalized cuts and image segmentation, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.22, No.8, pp.888-905 (2002).
– reference: [40] Oliva, A. and Torralba, A.: Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope, International Journal of Computer Vision, Vol.42, No.3, pp.145-175 (online), DOI:http://dx.doi.org/10.1023/A:1011139631724 (2001).
– reference: [12] Zahn, C.: Graph-theoretical methods for detecting and describing gestalt clusters, IEEE Trans. Computers, Vol.100, No.1, pp.68-86 (2006).
– reference: [30] Carneiro, G., Chan, A.B., Moreno, P.J. and Vasconcelos, N.: Supervised Learning of Semantic Classes for Image Annotation and Retrieval, Vol.29, No.3, pp.394-410 (online), DOI:10.1109/TPAMI.2007.61 (2007).
– reference: [52] Shechtman, E. and Irani, M.: Matching local self-similarities across images and videos, 2007 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, IEEE (2007).
– reference: [42] Potter, M.C.: Short-term conceptual memory for pictures, Journal of Experimental Psychology: Human Learning and Memory, Vol.2, No.5, pp.509-522 (1976).
– reference: [28] Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D.M. and Jordan, M.I.: Matching words and pictures, J. Mach. Learn. Res., Vol.3, pp.1107-1135 (2003).
– reference: [41] Friedman, A.: Framing pictures: The role of knowledge in automatized encoding and memory for gist, Journal of Experimental Psychology: General, Vol.108, pp.316-355 (1979).
– reference: [8] Makadia, A., Pavlovic, V. and Kumar, S.: Baselines for Image Annotation, International Journal of Computer Vision, pp.1-18 (2010).
– reference: [14] Laaksonen, J., Koskela, M. and Oja, E.: Content-Based Image Retrieval Using Self-Organizing Maps, VISUAL, pp.541-548 (1999).
– reference: [16] Schettini, R., CIOCCA, G., Zuffi, S., Tecnologie, I. and Multimediali, I.: A Survey of Methods for Colour Image Indexing and Retrieval in Image Databases, Color Imaging Science: Exploiting Digital, Media, John Wiley, pp.1-9 (2001).
– reference: [4] Datta, R., Joshi, D., Li, J. and Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age, ACM Comput. Surv., Vol.40, No.2, pp.1-60 (online), DOI:http://doi.acm.org/10.1145/1348246.1348248 (2008).
– reference: [25] Lazebnik, S., Schmid, C. and Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol.2, pp.2169-2178, IEEE (2006).
– reference: [43] Lowe, D.: Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, Vol.60, No.2, pp.91-110 (2004).
– reference: [35] Judd, T., Ehinger, K., Durand, F. and Torralba, A.: Learning to predict where humans look, 2009 IEEE 12th International Conference on Computer Vision, pp.2106-2113, IEEE (2010).
– reference: [1] Gantz, J.F., Reinsel, D., Chute, C., Schlichting, W., Mcarthur, J., Minton, S., Xheneti, I., Toncheva, A. and Manfrediz, A.: The Expanding Digital Universe: A Forecast of Worldwide Information Growth Through 2010, IDC White Paper (online) (2007), available from <http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf>.
– reference: [29] Monay, F. and Gatica-Perez, D.: PLSA-based image auto-annotation: constraining the latent space, MULTIMEDIA '04: Proc. 12th Annual ACM International Conference on Multimedia, pp.348-351, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/1027527.1027608 (2004).
– reference: [31] Jeon, L., Lavrenko, V., Manmatha, R. and Jeon, J.: A model for learning the semantics of pictures, Seventeenth Annual Conference on Neural Information Processing Systems (NIPS), MIT Press (2003).
– reference: [19] Zhao, R. and Grosky, W.: From Features to Semantics: Some Preliminary Results, p.TAS3 (2000).
– reference: [37] Hou, X. and Zhang, L.: Saliency Detection: A Spectral Residual Approach, Proc. IEEE Conference on Computer Vision and Pattern Recognition CVPR '07, pp.1-8 (online), DOI:10.1109/CVPR. 2007.383267 (2007).
– reference: [48] Sarin, S. and Kameyama, W.: Holistic Image Features Extraction for Better Image Annotation, IEICE General Conference, Sendai City, Miyagi, Japan (2010).
– reference: [49] Ong, K.-M., Sarin, S. and Kameyama, W.: Affective and Holistic Approach at TRECVID 2010 Task-Semantic Indexing (SIN), Working Notes of TRECVID (2010).
– reference: [50] Sarin, S., Fahrmair, M., Wagner, M. and Kameyama, W.: Holistic Feature Extraction for Automatic Image Annotation, Proc. 5th FTRA Int Multimedia and Ubiquitous Engineering (MUE) Conf, pp.59-66 (online), DOI:10.1109/MUE.2011.22 (2011).
– reference: [15] Meghini, C., Sebastiani, F. and Straccia, U.: A model of multimedia information retrieval, J. ACM, Vol.48, pp.909-970 (online), DOI:http://doi.acm.org/10.1145/502102.502103 (2001).
– reference: [5] Duygulu, P., Barnard, K., Freitas, de Freitas, J.F.G. and Forsyth, D.A.: Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary, ECCV '02: Proc. 7th European Conference on Computer Vision-Part IV, pp.97-112, Springer-Verlag, London, UK (2002).
– reference: [27] Hertz, T., Bar-Hillel, A. and Weinshall, D.: Learning distance functions for image retrieval, CVPR'04: Proc. 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp.570-577, IEEE Computer Society, Washington, DC, USA (online) (2004), available from <http://portal.acm.org/citation.cfm?id=1896300.1896383>.
– reference: [20] Tsai, C.-F., McGarry, K. and Tait, J.: Image classification using hybrid neural networks, SIGIR '03: Proc. 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp.431-432, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/860435.860536 (2003).
– reference: [3] Facebook Photo Statistics (2010), available from <http://blog.facebook.com/blog.php?post=206178097130>.
– reference: [17] Ko, B., Lee, H.-S. and Byun, H.: Image retrieval using flexible image subblocks, SAC '00: Proc. 2000 ACM Symposium on Applied Computing-Volume 2, pp.574-578, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/338407.338502 (2000).
– reference: [23] Wallraven, C., Caputo, B. and Graf, A.: Recognition with local features: The kernel recipe (2003).
– reference: [44] Van de Weijer, J., Gevers, T. and Bagdanov, A.: Boosting color saliency in image feature detection, IEEE Trans. on Pattern Analysis and Machine Intelligence, pp.150-156 (2006).
– reference: [6] von Ahn, L. and Dabbish, L.: Labeling images with a computer game, CHI '04: Proc. SIGCHI Conference on Human Factors in Computing Systems, pp.319-326, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/985692.985733 (2004).
– reference: [47] Sarin, S. and Kameyama, W.: Joint Equal Contribution of Global and Local Features for Image Annotation, CLEF Workshop 2009 (2009).
– reference: [9] Deng, Y., Manjunath, B. and Shin, H.: Color image segmentation, CVPR '99, p.2446, IEEE Computer Society (1999).
– reference: [22] Grauman, K. and Darrell, T.: The pyramid match kernel: Discriminative classification with sets of image features (2005).
– reference: [26] Grangier, D. and Bengio, S.: A Discriminative Kernel-Based Approach to Rank Images from Text Queries, IEEE Trans. Pattern Anal. Mach. Intell., Vol.30, pp.1371-1384 (online), DOI:10.1109/TPAMI.2007.70791 (2008).
– reference: [32] Feng, S., Manmatha, R. and Lavrenko, V.: Multiple Bernoulli Relevance Models for Image and Video Annotation, CVPR, Vol.2, pp.1002-1009 (2004).
– reference: [36] Achanta, R., Hemami, S., Estrada, F. and Ssstrunk, S.: Frequency-tuned Salient Region Detection, IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), (online) (2009), available from <http://www.cvpr2009.org/>.
– reference: [18] Shyu, M.-L., Chen, S.-C., Chen, M., Zhang, C. and Sarinnapakorn, K.: Image database retrieval utilizing affinity relationships, MMDB '03: Proc. 1st ACM International Workshop on Multimedia Databases, pp.78-85, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/951676.951691 (2003).
– reference: [51] Guillaumin, M.: Exploiting Multimodal Data for Image Understanding, PhD Thesis, Université de Grenoble (2010).
– reference: [2] Flickr Photo Statistics (2010), available from <http://blog.flickr.net/en/2010/09/19/5000000000/>.
– reference: [33] Torralba, A., Fergus, R. and Freeman, W.T.: 80 Million Tiny Images: A Large Data Set for Nonparametric Object and Scene Recognition, IEEE Trans. Pattern Anal. Mach. Intell., Vol.30, pp.1958-1970 (online), DOI:10.1109/TPAMI.2008.128 (2008).
– reference: [46] Abdel-Hakim, A. and Farag, A.: CSIFT: A SIFT descriptor with color invariant characteristics, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol.2, pp.1978-1983, IEEE (2006).
– reference: [24] Willamowski, J., Arregui, D., Csurka, G., Dance, C. and Fan, L.: Categorizing nine visual classes using local appearance descriptors, Illumination, Vol.17, p.21 (2004).
– reference: [34] Itti, L., Koch, C. and Niebur, E.: A Model of Saliency-Based Visual Attention for Rapid Scene Analysis, IEEE Trans. Pattern Anal. Mach. Intell., Vol.20, No.11, pp.1254-1259 (online), DOI:http://dx.doi.org/10.1109/34.730558 (1998).
– reference: [7] Guillaumin, M., Mensink, T., Verbeek, J. and Schmid, C.: TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation, International Conference on Computer Vision (online) (2009), available from <http://lear.inrialpes.fr/pubs/2009/GMVS09>.
– reference: [45] Bosch, A., Zisserman, A. and Muoz, X.: Scene classification using a hybrid generative/discriminative approach, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol.30, No.4, pp.712-727 (2008).
– reference: [21] Monay, F. and Gatica-Perez, D.: On image auto-annotation with latent space models, MULTIMEDIA '03: Proc. 11th ACM International Conference on Multimedia, pp.275-278, ACM, New York, NY, USA (online), DOI:http://doi.acm.org/10.1145/957013.957070 (2003).
– ident: 2
– ident: 41
  doi: 10.1037/0096-3445.108.3.316
– ident: 36
  doi: 10.1109/CVPRW.2009.5206596
– ident: 12
– ident: 33
  doi: 10.1109/TPAMI.2008.128
– ident: 35
– ident: 10
  doi: 10.1109/34.868688
– ident: 51
– ident: 16
– ident: 31
– ident: 30
  doi: 10.1109/TPAMI.2007.61
– ident: 9
– ident: 27
  doi: 10.1109/CVPR.2004.1315215
– ident: 49
– ident: 17
– ident: 5
– ident: 1
– ident: 38
– ident: 50
  doi: 10.1109/MUE.2011.22
– ident: 48
– ident: 8
– ident: 11
  doi: 10.1109/TPAMI.2006.233
– ident: 34
  doi: 10.1109/34.730558
– ident: 40
  doi: 10.1023/A:1011139631724
– ident: 18
– ident: 43
– ident: 23
  doi: 10.1109/ICCV.2003.1238351
– ident: 4
– ident: 26
  doi: 10.1109/TPAMI.2007.70791
– ident: 37
– ident: 22
  doi: 10.1109/ICCV.2005.239
– ident: 14
– ident: 45
  doi: 10.1109/TPAMI.2007.70716
– ident: 13
  doi: 10.1109/TPAMI.2006.57
– ident: 24
– ident: 47
– ident: 20
– ident: 42
– ident: 3
– ident: 39
  doi: 10.1109/TPAMI.2009.154
– ident: 44
  doi: 10.1109/TPAMI.2006.3
– ident: 7
  doi: 10.1109/ICCV.2009.5459266
– ident: 19
– ident: 52
– ident: 15
– ident: 32
– ident: 29
– ident: 28
  doi: 10.1162/153244303322533214
– ident: 6
– ident: 46
– ident: 21
– ident: 25
SSID ssj0069050
Score 1.490037
Snippet In this era of information explosion, automating the annotation process of digital images is a crucial step towards efficient and effective management of this...
SourceID crossref
jstage
SourceType Index Database
Publisher
StartPage 250
Title Leveraging Features from Background and Salient Regions for Automatic Image Annotation
URI https://www.jstage.jst.go.jp/article/ipsjjip/20/1/20_1_250/_article/-char/en
Volume 20
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
ispartofPNX Journal of Information Processing, 2012, Vol.20(1), pp.250-266
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV3Nb9MwFLfK4MCFb8T4kg9IHFCG48RJKnEpCDQGQgI2tltkO_Zoq6ZVSQ7wX_Ef8p7tJO22A4OLVcVJbfn99D78vgh5BkLGVoUB8FpbRWnFTFQoxSPDxkpqludMumiLT9n-UXpwIk5Go98bUUtto_b0rwvzSv6FqvAM6IpZspegbP-n8AB-A31hBArD-Fc0_mhgy77NEKpyLZjOPl_ktdRzzNfwhVhffAVlG33-X8ypi3vD0MJJ2yx9udb3C4zbmdT1csMtf15fDYlLDi8hvaATe66Y43eYnK7PhuLj5Q2Y4_6WtV0B658vgL_Ml_30sTwNSTeu9_hUDu4luTA_5cKpt8eyket2844iHmzZi7fWR6TC5g9AJwglxj0HRpU_y8QWi-bsHBQDv_VVa4Po5r6By1mpAEwZ_dLT1Y_ZbIo1Svf6z7YqbQc6luHFkoN9FAbBym4Ws-EAfFfIVZ7nMTLSD597x1U2ZoL52qC46svtNbd0n2szUP-70EGnzRzeIjcCWenEr3abjEx9h9zsWnzQwPHvkm8DxGgHMYoQowPEKECMBojRADEKBKE9xKiDGB0gdo8cvXt7-GY_Cq04Io1RU1FhbCysYjLGHhgKjP6xtkooEJVGWA1qqYJplaSSq1glKtWVUFzaxFYyN5lI7pOdelmbB4SmQmpZ5NbkSZFqYWSaVnqsOEiHhMsi3iXPu0MqV77iSgmWKh7nJmXgOHfJK3-G_XuXouDD__v8EbmOUPfXcY_JTrNuzRNQUBv11EHiDzq4pKo
linkProvider Colorado Alliance of Research Libraries
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Leveraging+Features+from+Background+and+Salient+Regions+for+Automatic+Image+Annotation&rft.jtitle=Journal+of+Information+Processing&rft.au=Fahrmair%2C+Michael&rft.au=Sarin%2C+Supheakmungkol&rft.au=Wagner%2C+Matthias&rft.au=Kameyama%2C+Wataru&rft.date=2012&rft.pub=Information+Processing+Society+of+Japan&rft.eissn=1882-6652&rft.volume=20&rft.issue=1&rft.spage=250&rft.epage=266&rft_id=info:doi/10.2197%2Fipsjjip.20.250&rft.externalDocID=article_ipsjjip_20_1_20_1_250_article_char_en
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1882-6652&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1882-6652&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1882-6652&client=summon