User-centric multimodal feature extraction for personalized retrieval of tumblr posts

Tumblr is one of the most popular micro-blogging services worldwide on which users can share posts consisting of texts and images. This paper proposes a user-centric method of multimodal feature extraction for the personalized retrieval of Tumblr posts. To implement personalized retrieval, we formul...

Full description

Saved in:
Bibliographic Details
Published inMultimedia tools and applications Vol. 81; no. 2; pp. 2979 - 3003
Main Authors Ohtomo, Kazuma, Harakawa, Ryosuke, Ogawa, Takahiro, Haseyama, Miki, Iwahashi, Masahiro
Format Journal Article
LanguageEnglish
Published New York Springer US 2022
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Tumblr is one of the most popular micro-blogging services worldwide on which users can share posts consisting of texts and images. This paper proposes a user-centric method of multimodal feature extraction for the personalized retrieval of Tumblr posts. To implement personalized retrieval, we formulate each user’s preferences as a triplet loss by using Likes as metadata as well as the text- and image-related features of posts. Furthermore, we develop a personalized multivariational autoencoder (PMVAE) by introducing a triplet loss into multivariational autoencoder (MVAE), which is among the most effective methods of multimodal feature extraction. Previously proposed variants of MVAE can project multiple kinds of features into the single latent features. However, because the latent features do not reflect each user’s preferences, retrieval performance when using the previous methods is limited. On the contrary, our PMVAE can extract relationships between text- and image-related features of posts by considering class-related information that represents whether a user prefers a given post. As a result, user-centric multimodal features, which separate a post that a user prefer and a post that a user does not prefer in the latent feature space, can be obtained. Because user-centric multimodal features have high discriminating power, the personalized retrieval of posts desired by each user becomes feasible by using them in such retrieval algorithms as the k -nearest neighbors and Annoy, which is a technique for approximate nearest neighbor search. We conduct experiments using 10 users and 150,947 contents, to verify the performance of k-NN and Annoy. The results show that our PMVAE increased normalized discounted cumulative gain (nDCG) compared with existing methods. The nDCG becomes 0.253 when using term frequency-inverse document frequency based text features and our end-to-end image features.
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-021-11634-0