Tags and titles of videos you watched tell your gender

In online video systems, viewer demographic information (gender, age, etc.) is of huge commercial value for delivering targeted advertising and video recommendations, but generally not available directly. This paper targets inferring viewers' gender based on implicit watching history in the lar...

Full description

Saved in:
Bibliographic Details
Published in2014 IEEE International Conference on Communications (ICC) pp. 1837 - 1842
Main Authors Tingting Feng, Yuchun Guo, Yishuai Chen, Xiaoying Tan, Ting Xu, Baijun Shen, Wei Zhu
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2014
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In online video systems, viewer demographic information (gender, age, etc.) is of huge commercial value for delivering targeted advertising and video recommendations, but generally not available directly. This paper targets inferring viewers' gender based on implicit watching history in the large-scale online video systems. To tackle the sparsity problem without filtering out any cold users or videos, we not only introduce video tags as features, but also use an efficient Chinese word segmentation method to extract hot key-words from video titles as features. Moreover, users' viewing behavior distribute lognormally, hence we apply a logarithmic transformation on the inference matrixes and further find key features via principal components analysis (PCA). We then solve the gender inference as a classification problem and define some modified evaluation metrics adapt to the imbalance classification problem. We compare a set of classifiers including Class prior, EM, SVM, Logistic regression, Partially supervised soft-label and belief-based mixture and find that Logistic regression is the best. The inference results show that our algorithms can obtain high F̃ 1 values for all classes. The highest value of PPTV dataset can reach nearly 0.75. And inference based on key-words results in a 14.63% increase of F̃ 1 contrast to the ratings of MovieLens.
ISSN:1550-3607
1938-1883
DOI:10.1109/ICC.2014.6883590