A review of features for the discrimination of twitter users: application to the prediction of offline influence

Many works related to Twitter aim at characterizing its users in some way: role on the service (spammers, bots, organizations, etc.), nature of the user (socio-professional category, age, etc.), topics of interest, and others. However, for a given user classification problem, it is very difficult to...

Full description

Saved in:

Bibliographic Details
Published in	Social network analysis and mining Vol. 6; no. 1; p. 25
Main Authors	Cossu, Jean-Valère, Labatut, Vincent, Dugué, Nicolas
Format	Journal Article
Language	English
Published	Vienna Springer Vienna 01.12.2016 Springer Nature B.V Springer
Subjects	Applications of Graph Theory and Complex Networks Automation Classification Computation and Language Computer Science Data Mining and Knowledge Discovery Diffusion of Information and Influence in Social Networks Discrimination Economics Game Theory Humanities Law Marketing Methodology of the Social Sciences Original Article Quality of service Social and Behav. Sciences Social networks Statistics for Social Sciences Variants Twitter Influence Natural language processing Social network analysis Social media Complex networks
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Many works related to Twitter aim at characterizing its users in some way: role on the service (spammers, bots, organizations, etc.), nature of the user (socio-professional category, age, etc.), topics of interest, and others. However, for a given user classification problem, it is very difficult to select a set of appropriate features, because the many features described in the literature are very heterogeneous, with name overlaps and collisions, and numerous very close variants. In this article, we review a wide range of such features. In order to present a clear state-of-the-art description, we unify their names, definitions and relationships, and we propose a new, neutral, typology. We then illustrate the interest of our review by applying a selection of these features to the offline influence detection problem. This task consists in identifying users who are influential in real life, based on their Twitter account and related data. We show that most features deemed efficient to predict online influence, such as the numbers of retweets and followers, are not relevant to this problem. However, we propose several content-based approaches to label Twitter users as influencers or not. We also rank them according to a predicted influence level. Our proposals are evaluated over the CLEF RepLab 2014 dataset, and outmatch state-of-the-art methods.
ISSN:	1869-5450 1869-5469
DOI:	10.1007/s13278-016-0329-x