Content-Based Methods for Predicting Web-Site Demographic Attributes
Demographic information plays an important role in gaining valuable insights about a web-site's user-base and is used extensively to target online advertisements and promotions. This paper investigates machine-learning approaches for predicting the demographic attributes of web-sites using info...
Saved in:
Published in | 2010 IEEE International Conference on Data Mining pp. 863 - 868 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.12.2010
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Demographic information plays an important role in gaining valuable insights about a web-site's user-base and is used extensively to target online advertisements and promotions. This paper investigates machine-learning approaches for predicting the demographic attributes of web-sites using information derived from their content and their hyper linked structure and not relying on any information directly or indirectly obtained from the web-site's users. Such methods are important because users are becoming increasingly more concerned about sharing their personal and behavioral information on the Internet. Regression-based approaches are developed and studied for predicting demographic attributes that utilize different content-derived features, different ways of building the prediction models, and different ways of aggregating web-page level predictions that take into account the web's hyper linked structure. In addition, a matrix-approximation based approach is developed for coupling the predictions of individual regression models into a model designed to predict the probability mass function of the attribute. Extensive experiments show that these methods are able to achieve an RMSE of 8-10% and provide insights on how to best train and apply such models. |
---|---|
ISBN: | 9781424491315 1424491312 |
ISSN: | 1550-4786 2374-8486 |
DOI: | 10.1109/ICDM.2010.97 |