Detecting Low-Quality Workers in QoE Crowdtesting: A Worker Behavior-Based Approach

QoE crowdtesting is increasingly popular among researchers to conduct subjective assessments of network services. Experimenters can easily access a huge pool of human subjects through crowdsourcing platforms. Without any supervision, low-quality workers, however, can threaten the reliability of the...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on multimedia Vol. 19; no. 3; pp. 530 - 543
Main Authors Mok, Ricky K. P., Chang, Rocky K. C., Weichao Li
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.03.2017
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1520-9210
1941-0077
DOI10.1109/TMM.2016.2619901

Cover

Loading…
More Information
Summary:QoE crowdtesting is increasingly popular among researchers to conduct subjective assessments of network services. Experimenters can easily access a huge pool of human subjects through crowdsourcing platforms. Without any supervision, low-quality workers, however, can threaten the reliability of the assessments. One of the approaches in classifying the quality of workers is to analyze their behavior during the experiments, such as mouse cursor trajectory. However, existing works analyze the trajectory coarsely, which cannot fully extract the imbedded information. In this paper, we propose a novel method to detect low-quality workers in QoE crowdtesting by analyzing the worker behavior. Our approach is to construct a predictive model by using supervised learning algorithms. A quality score is computed by applying existing anti-cheating techniques and human inspections to label the workers. We define a set of ten worker behavior metrics, which quantifies different types of worker behavior, including finer-grained cursor trajectory analysis. A multiclass Naïve Bayes classifier is applied to train a model to predict the quality of workers from the metrics. We have conducted video QoE assessments on Amazon Mechanical Turk and CrowdFlower to collect the worker behavior. Our results show that the error rates of the model trained from four metrics are equal or less than 30%. We further find that combining the predictions from the four different 5-point Likert scale rating methods can improve the success rate in detecting low-quality workers to around 80%. Finally, our method is 16.5% and 42.9% better in precision and recall than CrowdMOS.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1520-9210
1941-0077
DOI:10.1109/TMM.2016.2619901