User-centric multimodal feature extraction for personalized retrieval of tumblr posts

Tumblr is one of the most popular micro-blogging services worldwide on which users can share posts consisting of texts and images. This paper proposes a user-centric method of multimodal feature extraction for the personalized retrieval of Tumblr posts. To implement personalized retrieval, we formul...

Full description

Saved in:

Bibliographic Details
Published in	Multimedia tools and applications Vol. 81; no. 2; pp. 2979 - 3003
Main Authors	Ohtomo, Kazuma, Harakawa, Ryosuke, Ogawa, Takahiro, Haseyama, Miki, Iwahashi, Masahiro
Format	Journal Article
Language	English
Published	New York Springer US 2022 Springer Nature B.V
Subjects	Algorithms Computer Communication Networks Computer Science Correlation analysis Customization Data Structures and Information Theory Experiments Feature extraction Image retrieval Information retrieval Metadata Methods Multimedia Multimedia Information Systems Retrieval performance measures Semantics Social networks Social research Special Purpose and Application-Based Systems Social networking services Multimodal analysis Deep metric learning Multimedia information retrieval
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Tumblr is one of the most popular micro-blogging services worldwide on which users can share posts consisting of texts and images. This paper proposes a user-centric method of multimodal feature extraction for the personalized retrieval of Tumblr posts. To implement personalized retrieval, we formulate each user’s preferences as a triplet loss by using Likes as metadata as well as the text- and image-related features of posts. Furthermore, we develop a personalized multivariational autoencoder (PMVAE) by introducing a triplet loss into multivariational autoencoder (MVAE), which is among the most effective methods of multimodal feature extraction. Previously proposed variants of MVAE can project multiple kinds of features into the single latent features. However, because the latent features do not reflect each user’s preferences, retrieval performance when using the previous methods is limited. On the contrary, our PMVAE can extract relationships between text- and image-related features of posts by considering class-related information that represents whether a user prefers a given post. As a result, user-centric multimodal features, which separate a post that a user prefer and a post that a user does not prefer in the latent feature space, can be obtained. Because user-centric multimodal features have high discriminating power, the personalized retrieval of posts desired by each user becomes feasible by using them in such retrieval algorithms as the k -nearest neighbors and Annoy, which is a technique for approximate nearest neighbor search. We conduct experiments using 10 users and 150,947 contents, to verify the performance of k-NN and Annoy. The results show that our PMVAE increased normalized discounted cumulative gain (nDCG) compared with existing methods. The nDCG becomes 0.253 when using term frequency-inverse document frequency based text features and our end-to-end image features.
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-021-11634-0