The Problems of LLM-generated Data in Social Science Research

Beyond being used as fast and cheap annotators for otherwise complex classification tasks, LLMs have seen a growing adoption for generating synthetic data for social science and design research. Researchers have used LLM-generated data for data augmentation and prototyping, as well as for direct ana...

Full description

Saved in:

Bibliographic Details
Published in	Sociologica (Bologna) Vol. 18; no. 2; pp. 145 - 168
Main Authors	Rossi, Luca, Harrison, Katherine, Shklovski, Irina
Format	Journal Article
Language	English
Published	Bologna Societa Editrice il Mulino 01.01.2024 University of Bologna
Subjects	Data Data collection Epistemology Humanities llm Methodological problems Research design research methods Research subjects Social research social science Social sciences synthetic data research methods LLM social science synthetic data
Online Access	Get full text
ISSN	1971-8853 1971-8853
DOI	10.6092/issn.1971-8853/19576

Cover

More Information
Summary:	Beyond being used as fast and cheap annotators for otherwise complex classification tasks, LLMs have seen a growing adoption for generating synthetic data for social science and design research. Researchers have used LLM-generated data for data augmentation and prototyping, as well as for direct analysis where LLMs acted as proxies for real human subjects. LLM-based synthetic data build on fundamentally different epistemological assumptions than previous synthetically generated data and are justified by a different set of considerations. In this essay, we explore the various ways in which LLMs have been used to generate research data and consider the underlying epistemological (and accompanying methodological) assumptions. We challenge some of the assumptions made about LLM-generated data, and we highlight the main challenges that social sciences and humanities need to address if they want to adopt LLMs as synthetic data generators.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1971-8853 1971-8853
DOI:	10.6092/issn.1971-8853/19576