Machine driven classification of open-ended responses (MDCOR): An analytic framework and no-code, free software application to classify longitudinal and cross-sectional text responses in survey and social media research

[Display omitted] •Open-ended responses allow gaining unrestricted insights about processes and reasons.•But these knowledge gains come at two costs: closing these responses and increases in the chances of non-response.•We offer a framework to address and assess these costs based on NLP/machine lear...

Full description

Saved in:

Bibliographic Details
Published in	Expert systems with applications Vol. 215; p. 119265
Main Author	González Canché, Manuel S.
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.04.2023
Subjects	Cost-free Software Application Data Analysis and Integration Data Science Democratization Machine Learning Text Mining and Classification Mixed Equal-status Design Mixed Methods Research Open-Ended Survey Questions Machine Learning Text Mining and Classification Cost-free Software Application Mixed Equal-status Design Data Science Democratization Open-Ended Survey Questions Data Analysis and Integration Mixed Methods Research
Online Access	Get full text

Cover

Loading…

More Information
Summary:	[Display omitted] •Open-ended responses allow gaining unrestricted insights about processes and reasons.•But these knowledge gains come at two costs: closing these responses and increases in the chances of non-response.•We offer a framework to address and assess these costs based on NLP/machine learning.•We provide access to the no-code software and back-end code to apply this framework.•MDCOR expands access to data science tools by removing costs and programming hurdles. Open-ended questions in survey research allow participants to respond freely using their own words. Because such questions offer the possibility of learning how or why respondents may have achieved a goal or behaved in certain ways, these responses can address some of the inherent limitations of quantitative research, which typically does not allow researchers to understand processes or reasons. But such knowledge-based benefits come at the cost of having to label text data into categories or codes to ease their comparison and reach meaningful understandings. Manually classifying open-ended responses is not only time consuming—potentially taking weeks or even months, depending on sample size—but also risks introducing human errors or inconsistencies that can reduce the contribution of these responses in strengthening our understandings. In this study, we discuss the unresolved issue of how to close open-ended responses as rigorously and efficiently as possible relying on machine learning and text classification techniques, without losing context nor the original voices of our research participants, and while leveraging the nuances that human reasoning brings to the qualitative and mixed methods analytic tables. To this end, we offer a rigorous, user-friendly, no-code, and cost-free software application that implements our mixed equal-status design analytic framework: machine driven classification of open-ended responses (MDCOR). To test the performance of MDCOR, we analyzed tens of thousands of open-ended responses from two different surveys—one publicly available and one federally-protected. In all instances, MDCOR consistently offered time-efficient and reliable results and even tested whether non-response was associated with respondents’ attributes. Among its multiple outputs, MDCOR allows researchers to access the fully classified responses that can then be used in traditional quantitative modeling. Since MDCOR runs locally, its versatility to handling cross-sectional and longitudinal responses, enables the analysis of a variety of data, from federally protected/restricted sources to the classification of social media posts. By removing manual classification burdens and computer programming expertise, MDCOR opens the possibility of efficiently and rigorously reaping the knowledge-based benefits of open-ended responses in survey research without losing or altering our participants’ voices. We offer access to the public data analyzed (https://cutt.ly/YNmBOAL or González Canché, 2022c) and the software (Mac version here https://cutt.ly/xv6nnuN, Windows version https://cutt.ly/Lv6nTLG, Code Ocean version at González Canché (2022f)) so that researchers can interact first-hand with MDCOR and start using this tool and analytic framework in their own studies (see also González Canché, 2022a, 2022b, 2022d, 2022e, for related no-code data science applications to analyze qualitative data dynamically).
ISSN:	0957-4174 1873-6793
DOI:	10.1016/j.eswa.2022.119265