Evaluating LLM-Generated Topics from Survey Responses: Identifying Challenges in Recruiting Participants through Crowdsourcing

The evolution of generative artificial intelligence (AI) technologies, particularly large language models (LLMs), has lead to consequences for the field of Human-Computer Interaction (HCI) in areas such as personalization, predictive analytics, automation, and data analysis. This research aims to ev...

Full description

Saved in:
Bibliographic Details
Published inProceedings (IEEE Symposium on Visual Languages and Human-Centric Computing) pp. 412 - 416
Main Authors Tamime, Reham Al, Salminen, Joni, Jung, Soon-Gyo, Jansen, Bernard
Format Conference Proceeding
LanguageEnglish
Published IEEE 02.09.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The evolution of generative artificial intelligence (AI) technologies, particularly large language models (LLMs), has lead to consequences for the field of Human-Computer Interaction (HCI) in areas such as personalization, predictive analytics, automation, and data analysis. This research aims to evaluate LLM-generated topics derived from survey responses in comparison with topics suggested by humans, particularly participants recruited through a crowdsourcing experiment. We present an evaluation results to compare LLM-generated topics with human-generated topics in terms of Quality, Usefulness, Accuracy, Interestingness, and Completeness. This involves three stages: (1) Design and Generate Topics with an LLM (OpenAI's GPT-4); (2) Crowdsourcing Human-Generated Topics; and (3) Evaluation of Human-Generated Topics and LLM-Generated Topics. However, a feasibility study with 33 crowdworkers indicated challenges in using participants for LLM evaluation, particularly in inviting humans participants to suggest topics based on open-ended survey answers. We highlight several challenges in recruiting crowdsourcing participants for generating topics from survey responses. We recommend using well-trained human experts rather than crowdsourcing to generate human baselines for LLM evaluation.
ISSN:1943-6106
DOI:10.1109/VL/HCC60511.2024.00064