A Pitfall of Learning from User-generated Data: In-Depth Analysis of Subjective Class Problem

Research in the supervised learning algorithms field implicitly assumes that training data is labeled by domain experts or at least semi-professional labelers accessible through crowdsourcing services like Amazon Mechanical Turk. With the advent of the Internet, data has become abundant and a large...

Full description

Saved in:
Bibliographic Details
Published inProcedia computer science Vol. 185; pp. 160 - 169
Main Authors Nemoto, Kei, Jain, Shweta
Format Journal Article
LanguageEnglish
Published Elsevier B.V 2021
Subjects
Online AccessGet full text
ISSN1877-0509
1877-0509
DOI10.1016/j.procs.2021.05.017

Cover

More Information
Summary:Research in the supervised learning algorithms field implicitly assumes that training data is labeled by domain experts or at least semi-professional labelers accessible through crowdsourcing services like Amazon Mechanical Turk. With the advent of the Internet, data has become abundant and a large number of machine learning based systems are being trained with user-generated data, where categorical data is used as labels. However, little work has been done in the area of supervised learning with user-defined labels where users are not necessarily experts and might be unable to provide correct labels to some data or the labels might contain significant human bias. In this article, we propose two types of classes in user-defined labels: subjective class and objective class - showing that the objective classes are as reliable as if they were provided by domain experts, whereas the subjective classes are subject to error and bias. We name this a subjective class problem and propose a Normalized Feature Indicative Score that can be effective in detecting subjective classes in a dataset without querying oracle. This score provides early detection of subjective classes in the data, saving time for data mining practitioners working with data that might contain errors and biases.
ISSN:1877-0509
1877-0509
DOI:10.1016/j.procs.2021.05.017