Tags and titles of videos you watched tell your gender
In online video systems, viewer demographic information (gender, age, etc.) is of huge commercial value for delivering targeted advertising and video recommendations, but generally not available directly. This paper targets inferring viewers' gender based on implicit watching history in the lar...
Saved in:
Published in | 2014 IEEE International Conference on Communications (ICC) pp. 1837 - 1842 |
---|---|
Main Authors | , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.06.2014
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | In online video systems, viewer demographic information (gender, age, etc.) is of huge commercial value for delivering targeted advertising and video recommendations, but generally not available directly. This paper targets inferring viewers' gender based on implicit watching history in the large-scale online video systems. To tackle the sparsity problem without filtering out any cold users or videos, we not only introduce video tags as features, but also use an efficient Chinese word segmentation method to extract hot key-words from video titles as features. Moreover, users' viewing behavior distribute lognormally, hence we apply a logarithmic transformation on the inference matrixes and further find key features via principal components analysis (PCA). We then solve the gender inference as a classification problem and define some modified evaluation metrics adapt to the imbalance classification problem. We compare a set of classifiers including Class prior, EM, SVM, Logistic regression, Partially supervised soft-label and belief-based mixture and find that Logistic regression is the best. The inference results show that our algorithms can obtain high F̃ 1 values for all classes. The highest value of PPTV dataset can reach nearly 0.75. And inference based on key-words results in a 14.63% increase of F̃ 1 contrast to the ratings of MovieLens. |
---|---|
ISSN: | 1550-3607 1938-1883 |
DOI: | 10.1109/ICC.2014.6883590 |