Comparison and benchmark of name-to-gender inference services

The increased interest in analyzing and explaining gender inequalities in tech, media, and academia highlights the need for accurate inference methods to predict a person’s gender from their name. Several such services exist that provide access to large databases of names, often enriched with inform...

Full description

Saved in:

Bibliographic Details
Published in	PeerJ. Computer science Vol. 4; p. e156
Main Authors	Santamaría, Lucía, Mihaljević, Helena
Format	Journal Article
Language	English
Published	United States PeerJ, Inc 16.07.2018 PeerJ Inc
Subjects	Accuracy Algorithms Authorship Benchmarks Bibliometrics Classification Classification algorithms Computer supported cooperative work Data Mining and Machine Learning Data Science Datasets Digital Libraries Digital media Gender Gender analysis Inference Inventors Name-based gender inference Names Parameters Performance evaluation Performance measurement Productivity Scholarly publishing Science Scientometrics Social networks World Wide Web New York Germany Performance evaluation Scientometrics Gender analysis Classification algorithms Name-based gender inference Bibliometrics
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The increased interest in analyzing and explaining gender inequalities in tech, media, and academia highlights the need for accurate inference methods to predict a person’s gender from their name. Several such services exist that provide access to large databases of names, often enriched with information from social media profiles, culture-specific rules, and insights from sociolinguistics. We compare and benchmark five name-to-gender inference services by applying them to the classification of a test data set consisting of 7,076 manually labeled names. The compiled names are analyzed and characterized according to their geographical and cultural origin. We define a series of performance metrics to quantify various types of classification errors, and define a parameter tuning procedure to search for optimal values of the services’ free parameters. Finally, we perform benchmarks of all services under study regarding several scenarios where a particular metric is to be optimized.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2376-5992 2376-5992
DOI:	10.7717/peerj-cs.156