Using natural language processing and machine learning to replace human content coders
Content analysis is a common and flexible technique to quantify and make sense of qualitative data in psychological research. However, the practical implementation of content analysis is extremely labor-intensive and subject to human coder errors. Applying natural language processing (NLP) technique...
Saved in:
Published in | Psychological methods Vol. 29; no. 6; p. 1148 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
United States
01.12.2024
|
Subjects | |
Online Access | Get more information |
Cover
Loading…
Abstract | Content analysis is a common and flexible technique to quantify and make sense of qualitative data in psychological research. However, the practical implementation of content analysis is extremely labor-intensive and subject to human coder errors. Applying natural language processing (NLP) techniques can help address these limitations. We explain and illustrate these techniques to psychological researchers. For this purpose, we first present a study exploring the creation of psychometrically meaningful predictions of human content codes. Using an existing database of human content codes, we build an NLP algorithm to validly predict those codes, at generally acceptable standards. We then conduct a Monte-Carlo simulation to model how four dataset characteristics (i.e., sample size, unlabeled proportion of cases, classification base rate, and human coder reliability) influence content classification performance. The simulation indicated that the influence of sample size and unlabeled proportion on model classification performance tended to be curvilinear. In addition, base rate and human coder reliability had a strong effect on classification performance. Finally, using these results, we offer practical recommendations to psychologists on the necessary dataset characteristics to achieve valid prediction of content codes to guide researchers on the use of NLP models to replace human coders in content analysis research. (PsycInfo Database Record (c) 2024 APA, all rights reserved). |
---|---|
AbstractList | Content analysis is a common and flexible technique to quantify and make sense of qualitative data in psychological research. However, the practical implementation of content analysis is extremely labor-intensive and subject to human coder errors. Applying natural language processing (NLP) techniques can help address these limitations. We explain and illustrate these techniques to psychological researchers. For this purpose, we first present a study exploring the creation of psychometrically meaningful predictions of human content codes. Using an existing database of human content codes, we build an NLP algorithm to validly predict those codes, at generally acceptable standards. We then conduct a Monte-Carlo simulation to model how four dataset characteristics (i.e., sample size, unlabeled proportion of cases, classification base rate, and human coder reliability) influence content classification performance. The simulation indicated that the influence of sample size and unlabeled proportion on model classification performance tended to be curvilinear. In addition, base rate and human coder reliability had a strong effect on classification performance. Finally, using these results, we offer practical recommendations to psychologists on the necessary dataset characteristics to achieve valid prediction of content codes to guide researchers on the use of NLP models to replace human coders in content analysis research. (PsycInfo Database Record (c) 2024 APA, all rights reserved). |
Author | Landers, Richard N Tian, Jingyuan Yazar, Yagizhan Wang, Yilei Ones, Deniz S |
Author_xml | – sequence: 1 givenname: Yilei orcidid: 0000-0002-3082-3038 surname: Wang fullname: Wang, Yilei organization: Department of Psychology, University of Minnesota at Twin Cities – sequence: 2 givenname: Jingyuan orcidid: 0000-0001-5012-0797 surname: Tian fullname: Tian, Jingyuan organization: Department of Psychology, University of Minnesota at Twin Cities – sequence: 3 givenname: Yagizhan orcidid: 0000-0001-6040-3969 surname: Yazar fullname: Yazar, Yagizhan organization: Department of Psychology, University of Minnesota at Twin Cities – sequence: 4 givenname: Deniz S orcidid: 0000-0003-1739-8951 surname: Ones fullname: Ones, Deniz S organization: Department of Psychology, University of Minnesota at Twin Cities – sequence: 5 givenname: Richard N orcidid: 0000-0001-5611-2923 surname: Landers fullname: Landers, Richard N organization: Department of Psychology, University of Minnesota at Twin Cities |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/36006759$$D View this record in MEDLINE/PubMed |
BookMark | eNo1j81KxDAUhYMozo9ufADJC1Rzk7RJlzLoKAy4cdwON-lNp9KmJW0Xvr3j39l8cD44cFbsPPaRGLsBcQdCmfuOJnFKDvaMLaFUZQa6UAu2GscPIUArqy_ZQhVCFCYvl-x9Pzax5hGnOWHLW4z1jDXxIfWexh-HseId-mMTibeEKX6XU88TDS164se5w8h9HyeK04kVpfGKXQRsR7r-45rtnx7fNs_Z7nX7snnYZahyOWUuoCbhrbKlLIIzzpAmBQG8taHQPoBRIPIKdChzBUbkTguwpnKyDFKhXLPb391hdh1VhyE1HabPw_9B-QWBgFJd |
CitedBy_id | crossref_primary_10_1177_1932202X231211633 crossref_primary_10_1007_s10648_024_09862_5 crossref_primary_10_1016_j_cresp_2023_100164 crossref_primary_10_1177_10944281241264027 crossref_primary_10_1108_BPMJ_11_2023_0876 crossref_primary_10_1177_25152459241296401 crossref_primary_10_1177_20413866241245314 crossref_primary_10_3758_s13428_024_02381_9 |
ContentType | Journal Article |
DBID | CGR CUY CVF ECM EIF NPM |
DOI | 10.1037/met0000518 |
DatabaseName | Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed |
DatabaseTitle | MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) |
DatabaseTitleList | MEDLINE |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
DeliveryMethod | no_fulltext_linktorsrc |
Discipline | Psychology |
EISSN | 1939-1463 |
ExternalDocumentID | 36006759 |
Genre | Journal Article |
GroupedDBID | --- --Z -~X .-4 07C 0R~ 123 29P 354 53G 5VS 7RZ ABIVO ABNCP ACHQT ACPQG AEHFB ALMA_UNASSIGNED_HOLDINGS AWKKM AZXWR CGNQK CGR CS3 CUY CVF ECM EIF EPA F5P FTD HVGLF HZ~ ISO LW5 NPM O9- OHT OPA OVD P2P ROL SES SPA TEORI TN5 UHS XJT YNT ZPI |
ID | FETCH-LOGICAL-a352t-bfa4e0c838926fb7b7e4e31f1c88f64cf173105d14f9531705b40187db29f23a2 |
IngestDate | Thu Jan 02 22:23:06 EST 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 6 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-a352t-bfa4e0c838926fb7b7e4e31f1c88f64cf173105d14f9531705b40187db29f23a2 |
ORCID | 0000-0003-1739-8951 0000-0001-5611-2923 0000-0001-5012-0797 0000-0001-6040-3969 0000-0002-3082-3038 |
PMID | 36006759 |
ParticipantIDs | pubmed_primary_36006759 |
PublicationCentury | 2000 |
PublicationDate | 2024-Dec |
PublicationDateYYYYMMDD | 2024-12-01 |
PublicationDate_xml | – month: 12 year: 2024 text: 2024-Dec |
PublicationDecade | 2020 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States |
PublicationTitle | Psychological methods |
PublicationTitleAlternate | Psychol Methods |
PublicationYear | 2024 |
SSID | ssj0014384 |
Score | 2.526594 |
Snippet | Content analysis is a common and flexible technique to quantify and make sense of qualitative data in psychological research. However, the practical... |
SourceID | pubmed |
SourceType | Index Database |
StartPage | 1148 |
SubjectTerms | Humans Machine Learning Natural Language Processing Psychology - methods Qualitative Research Reproducibility of Results |
Title | Using natural language processing and machine learning to replace human content coders |
URI | https://www.ncbi.nlm.nih.gov/pubmed/36006759 |
Volume | 29 |
hasFullText | |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3JTsMwELUKSKgXxL4jH7ihQBI7iXNELEJIwKUsPSE7tiEHSoXKgX49461p2QRcospO08bvZfI8Hs8gtKvLVGaM0ChWUkc0Z0kkChZHQsqKMklyZVMKXVzmZ9f0_C67a7X6Y1FLrwOxXw2_3FfyH1ShDXA1u2T_gOzootAAnwFfOALCcPwVxm693-bmhJEOrse9vgv-D9sPn2y8pAoFIh6M3HxRNhjLl-gz8eomJsDsb_erO16vTtpHV256pMJvva-5C5alHrkAaudSPYdfenttuNflQxfL3eUP9fCx6bjyxQKOVa8eek-sd0OkdCykQznTWZIyArtLxm2r92bUnwylmYZ9acFdDgC4n9jKy4mTYPT7TxZLkpvXrMsl_nPvh2zaoWsKTcG8whRKNd4dv-pECaMhhS0pDpo_0Uaz4Ysfph9WhnTm0ZyfP-BDR4YF1FK9RdQewfS2hG4sK7BnBQ6swA0rMLACe1bgwAo8eMaeFdiyAntWYMeKZXR9etI5Oot89YyIg6geREJzquKKgSJNcy0KUSiqSKKTijGd00onBUj7TCZUl2CIizgT1FRolCItdUp4uoKme889tYYw17EUXJUSxDLNy0xUCZc8AbEK-pCydB2tujG577sUKfdhtDa-7dlE7YZGW2hGwzOptkHgDcSOReUdWkRT2w |
linkProvider | National Library of Medicine |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Using+natural+language+processing+and+machine+learning+to+replace+human+content+coders&rft.jtitle=Psychological+methods&rft.au=Wang%2C+Yilei&rft.au=Tian%2C+Jingyuan&rft.au=Yazar%2C+Yagizhan&rft.au=Ones%2C+Deniz+S&rft.date=2024-12-01&rft.eissn=1939-1463&rft.volume=29&rft.issue=6&rft.spage=1148&rft_id=info:doi/10.1037%2Fmet0000518&rft_id=info%3Apmid%2F36006759&rft_id=info%3Apmid%2F36006759&rft.externalDocID=36006759 |