Evaluating LLM-Generated Topics from Survey Responses: Identifying Challenges in Recruiting Participants through Crowdsourcing

The evolution of generative artificial intelligence (AI) technologies, particularly large language models (LLMs), has lead to consequences for the field of Human-Computer Interaction (HCI) in areas such as personalization, predictive analytics, automation, and data analysis. This research aims to ev...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings (IEEE Symposium on Visual Languages and Human-Centric Computing) pp. 412 - 416
Main Authors	Tamime, Reham Al, Salminen, Joni, Jung, Soon-Gyo, Jansen, Bernard
Format	Conference Proceeding
Language	English
Published	IEEE 02.09.2024
Subjects	Accuracy Automation Challenges in Recruitment Crowdsourcing Crowdsourcing for Human-centric Computing Data analysis Feasibility Study Generative AI Human computer interaction Large language models LLM Evaluation Predictive analytics Surveys Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The evolution of generative artificial intelligence (AI) technologies, particularly large language models (LLMs), has lead to consequences for the field of Human-Computer Interaction (HCI) in areas such as personalization, predictive analytics, automation, and data analysis. This research aims to evaluate LLM-generated topics derived from survey responses in comparison with topics suggested by humans, particularly participants recruited through a crowdsourcing experiment. We present an evaluation results to compare LLM-generated topics with human-generated topics in terms of Quality, Usefulness, Accuracy, Interestingness, and Completeness. This involves three stages: (1) Design and Generate Topics with an LLM (OpenAI's GPT-4); (2) Crowdsourcing Human-Generated Topics; and (3) Evaluation of Human-Generated Topics and LLM-Generated Topics. However, a feasibility study with 33 crowdworkers indicated challenges in using participants for LLM evaluation, particularly in inviting humans participants to suggest topics based on open-ended survey answers. We highlight several challenges in recruiting crowdsourcing participants for generating topics from survey responses. We recommend using well-trained human experts rather than crowdsourcing to generate human baselines for LLM evaluation.
ISSN:	1943-6106
DOI:	10.1109/VL/HCC60511.2024.00064