How to prepare data for the automatic classification of politically related beliefs expressed on Twitter? The consequences of researchers’ decisions on the number of coders, the algorithm learning procedure, and the pre-processing steps on the performance of supervised models
Due to the recent advances in natural language processing, social scientists use automatic text classification methods more and more frequently. The article raises the question about how researchers’ subjective decisions affect the performance of supervised deep learning models. The aim is to delive...
Saved in:
Published in | Quality & quantity Vol. 57; no. 1; pp. 301 - 321 |
---|---|
Main Author | |
Format | Journal Article |
Language | English |
Published |
Dordrecht
Springer Netherlands
01.02.2023
Springer Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Due to the recent advances in natural language processing, social scientists use automatic text classification methods more and more frequently. The article raises the question about how researchers’ subjective decisions affect the performance of supervised deep learning models. The aim is to deliver practical advice for researchers concerning: (1) whether it is more efficient to monitor coders’ work to ensure a high quality training dataset or have every document coded once and obtain a larger dataset instead; (2) whether lemmatisation improves model performance; (3) if it is better to apply passive learning or active learning approaches; and (4) if the answers are dependent on the models’ classification tasks. The models were trained to detect if a tweet is about current affairs or political issues, the tweet’s subject matter and the tweet author’s stance on this. The study uses a sample of 200,000 manually coded tweets published by Polish political opinion leaders in 2019. The consequences of decisions under different conditions were checked by simulating 52,800 results using the fastText algorithm (DV: F1-score). Linear regression analysis suggests that the researchers’ choices not only strongly affect model performance but may also lead, in the worst-case scenario, to a waste of funds. |
---|---|
AbstractList | Due to the recent advances in natural language processing, social scientists use automatic text classification methods more and more frequently. The article raises the question about how researchers' subjective decisions affect the performance of supervised deep learning models. The aim is to deliver practical advice for researchers concerning: (1) whether it is more efficient to monitor coders' work to ensure a high quality training dataset or have every document coded once and obtain a larger dataset instead; (2) whether lemmatisation improves model performance; (3) if it is better to apply passive learning or active learning approaches; and (4) if the answers are dependent on the models' classification tasks. The models were trained to detect if a tweet is about current affairs or political issues, the tweet's subject matter and the tweet author's stance on this. The study uses a sample of 200,000 manually coded tweets published by Polish political opinion leaders in 2019. The consequences of decisions under different conditions were checked by simulating 52,800 results using the fastText algorithm (DV: F1-score). Linear regression analysis suggests that the researchers' choices not only strongly affect model performance but may also lead, in the worst-case scenario, to a waste of funds. |
Audience | Academic |
Author | Matuszewski, Paweł |
Author_xml | – sequence: 1 givenname: Paweł orcidid: 0000-0003-0069-157X surname: Matuszewski fullname: Matuszewski, Paweł email: pawel.matuszewski@civitas.edu.pl organization: Collegium Civitas |
BookMark | eNp9ks9u1DAQxgMqEtvClQMnS1yb4j9JnD2hqgJaqRKX5Rw5znjXlWMH26H0xmvwejwJk00FEocqhyjj7_fN58mcFic-eCiKt4xeMErl-8QYE3VJOS8pE5KX_HmxYbUUpWyr-qTYUCpEWTMpXxanKd1RilglN8_eXId7kgOZIkwqAhlUVsSESPIBiJpzGFW2mminUrLGavwKngRDpuAsnijnHkgEpzIMpAdnwSQCP9AuJaygdndvc4b4gezQUQef4NsMXkNaXFAGKuoDxPT75y8ygLYJG6QFXBL4eewhLkodBhSdr7ncPkSbDyNxSHvr95g_aBjmCOdE-eGowgzlsYzJUZEyTH99J4h4yVFhjsU8zVj4bpfEI_Zx6VXxwiiX4PXj-6z4-unj7uq6vP3y-ebq8rbUXApe9sMAAgznQDUOt5Jb1TAjt0JRyY1qB8E03_a1ZFQzKRrV9Jy1pmJC960wWpwV71ZfDIpjSbm7C3P02LLjUjLWtk3DUXWxqvbKQWe9CTkqjc8Ao8WRgrFYv5Sirup2W20RaFdAx5BSBNNpm4-_DkHrOka7ZW26dW06XJvuuDbd0ov_h07Rjio-PA2JFUoo9nuI_67xBPUHMZ7etQ |
CitedBy_id | crossref_primary_10_1080_1828051X_2024_2333813 crossref_primary_10_1016_j_atech_2025_100827 |
Cites_doi | 10.1177/0081175019863783 10.1093/pan/mps028 10.1016/j.aci.2018.08.003 10.1126/science.aaa8415 10.1080/13645579.2019.1576317 10.1177/2053951714559105 10.18148/srm/2020b.v14i3.7639 10.1017/pan.2020.8 10.1017/pan.2017.44 10.1177/0038038520918562 10.1111/jtsb.12086 10.1177/0038038511422553 10.1146/annurev-soc-081715-074206 10.1177/0894439319846622 10.1177/0081175019865231 10.1017/pan.2020.4 10.1177/2053951715602908 10.1371/journal.pone.0155036 10.1177/0081175019852762 10.1111/j.1540-5907.2009.00428.x 10.1177/0038038517708140 10.1177/0081175019867855 10.1080/19312458.2021.2015574 10.1007/s11186-014-9216-5 10.1007/s11135-020-01037-y 10.1177/0038038513511561 10.18653/v1/E17-2068 10.2139/ssrn.1926431 10.1145/1645953.1646003 |
ContentType | Journal Article |
Copyright | The Author(s), under exclusive licence to Springer Nature B.V. 2022 COPYRIGHT 2023 Springer The Author(s), under exclusive licence to Springer Nature B.V. 2022. |
Copyright_xml | – notice: The Author(s), under exclusive licence to Springer Nature B.V. 2022 – notice: COPYRIGHT 2023 Springer – notice: The Author(s), under exclusive licence to Springer Nature B.V. 2022. |
DBID | AAYXX CITATION 0-V 3V. 7U4 7UB 7WY 7WZ 7XB 87Z 88G 88J 8BJ 8FI 8FJ 8FK 8FL 8G5 ABUWG AFKRA ALSLI AZQEC BENPR BEZIV BHHNA CCPQU DWI DWQXO FQK FRNLG FYUFA F~G GHDGH GNUQQ GUQSH HEHIP JBE K60 K6~ L.- M0C M2M M2O M2R M2S MBDVC PHGZM PHGZT PKEHL POGQB PQBIZ PQBZA PQEST PQQKQ PQUKI PRINS PRQQA PSYQQ Q9U WZK |
DOI | 10.1007/s11135-022-01372-2 |
DatabaseName | CrossRef ProQuest Social Sciences Premium Collection ProQuest Central (Corporate) Sociological Abstracts (pre-2017) Worldwide Political Science Abstracts ABI/INFORM Collection ABI/INFORM Global (PDF only) ProQuest Central (purchase pre-March 2016) ABI/INFORM Global (Alumni Edition) Psychology Database (Alumni) Social Science Database (Alumni Edition) International Bibliography of the Social Sciences (IBSS) Hospital Premium Collection Hospital Premium Collection (Alumni Edition) ProQuest Central (Alumni) (purchase pre-March 2016) ABI/INFORM Collection (Alumni Edition) Research Library (Alumni Edition) ProQuest Central (Alumni Edition) ProQuest Central UK/Ireland Social Science Premium Collection ProQuest Central Essentials ProQuest Central Business Premium Collection Sociological Abstracts ProQuest One Community College Sociological Abstracts ProQuest Central Korea International Bibliography of the Social Sciences Business Premium Collection (Alumni) Health Research Premium Collection ABI/INFORM Global (Corporate) Health Research Premium Collection (Alumni) ProQuest Central Student Research Library Prep Sociology Collection International Bibliography of the Social Sciences ProQuest Business Collection (Alumni Edition) ProQuest Business Collection ABI/INFORM Professional Advanced ABI/INFORM Global Psychology Database Research Library Social Science Database Sociology Database Research Library (Corporate) ProQuest Central Premium ProQuest One Academic (New) ProQuest One Academic Middle East (New) ProQuest Sociology & Social Sciences Collection ProQuest One Business ProQuest One Business (Alumni) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China ProQuest One Social Sciences ProQuest One Psychology ProQuest Central Basic Sociological Abstracts (Ovid) |
DatabaseTitle | CrossRef ProQuest Business Collection (Alumni Edition) ProQuest One Psychology Research Library Prep ProQuest Central Student ProQuest Central Essentials Sociology & Social Sciences Collection ProQuest Central China ABI/INFORM Complete Health Research Premium Collection ProQuest Central (New) ProQuest Sociology Business Premium Collection Social Science Premium Collection ABI/INFORM Global ProQuest One Academic Eastern Edition ProQuest Hospital Collection Sociology Collection Health Research Premium Collection (Alumni) ProQuest Business Collection ProQuest Hospital Collection (Alumni) ProQuest Social Science Journals ProQuest Social Sciences Premium Collection ProQuest One Academic UKI Edition ProQuest One Academic ProQuest One Academic (New) ABI/INFORM Global (Corporate) ProQuest One Business ProQuest Sociology & Social Sciences Collection ProQuest One Academic Middle East (New) ProQuest Social Science Journals (Alumni Edition) ProQuest Central (Alumni Edition) ProQuest One Community College Research Library (Alumni Edition) ProQuest Central ABI/INFORM Professional Advanced International Bibliography of the Social Sciences (IBSS) ProQuest Central Korea ProQuest Research Library ProQuest Sociology Collection Worldwide Political Science Abstracts ABI/INFORM Complete (Alumni Edition) ProQuest One Social Sciences ABI/INFORM Global (Alumni Edition) ProQuest Central Basic ProQuest Psychology Journals (Alumni) Sociological Abstracts (pre-2017) ProQuest Psychology Journals Sociological Abstracts ProQuest One Business (Alumni) ProQuest Central (Alumni) Business Premium Collection (Alumni) |
DatabaseTitleList | ProQuest Business Collection (Alumni Edition) |
Database_xml | – sequence: 1 dbid: BENPR name: ProQuest Central url: https://www.proquest.com/central sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Statistics Philosophy Social Sciences (General) |
EISSN | 1573-7845 |
EndPage | 321 |
ExternalDocumentID | A735458949 10_1007_s11135_022_01372_2 |
GrantInformation_xml | – fundername: Narodowe Centrum Nauki grantid: 2019/03/X/HS6/00882 funderid: http://dx.doi.org/10.13039/501100004281 |
GroupedDBID | --Z -51 -5C -5G -BR -EM -Y2 -~C .86 .VR 0-V 06D 0R~ 0VY 123 199 1N0 1SB 2.D 203 28- 29P 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 30V 3V. 4.4 406 408 409 40D 40E 5QI 5VS 67Z 6NX 78A 7WY 8FI 8FJ 8FL 8G5 8TC 8UJ 95- 95. 95~ 96X AAAVM AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANTL AANZL AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDZT ABECU ABFTD ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABUWG ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHQT ACHSB ACHXU ACKNC ACMDZ ACMLO ACNCT ACOKC ACOMO ACPIV ACYUM ACZOJ ADBBV ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADRFC ADTPH ADURQ ADYFF ADZJE ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFFNX AFGCZ AFKRA AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHKAY AHSBF AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALIPV ALMA_UNASSIGNED_HOLDINGS ALSLI ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARALO ARMRJ ASOEW ASPBG AVWKF AXYYD AYQZM AZFZN AZQEC B-. BA0 BBWZM BDATZ BENPR BEZIV BGNMA BPHCQ BSONS BVXVI CAG CCPQU COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 DWQXO EBLON EBS EIOEI EJD ESBYG F5P FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRNLG FRRFC FSGXE FWDCC FYUFA GGCAI GGRSB GJIRD GNUQQ GNWQR GQ6 GQ7 GQ8 GROUPED_ABI_INFORM_COMPLETE GUQSH GXS H13 HEHIP HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ I09 IAO IHE IJ- IKXTQ INS IPY ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ K60 K6~ KDC KOV KOW LAK LLZTM M0C M2M M2O M2R M2S M4Y MA- N2Q N9A NB0 NDZJH NPVJJ NQJWS NU0 O-J O9- O93 O9G O9I O9J OAM OVD P19 P9Q PF0 PQBIZ PQBZA PQQKQ PROAC PSYQQ PT4 PT5 Q2X QOK QOS R-Y R4E R89 R9I RHV RIG RNI ROL RPX RSV RZC RZD RZK S16 S1Z S26 S27 S28 S3B SAP SCLPG SDA SDH SDM SHS SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TEORI TN5 TSG TSK TSV TUC U2A UG4 UKHRP UOJIU UTJUX UZXMN VC2 VFIZW W23 W48 WH7 WK6 WK8 YLTOR Z45 Z5O Z7R Z81 Z83 Z86 Z8M Z8U Z8W Z92 ZMTXR ZWUKE ZXP ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ACSTC ADHKG AEZWR AFDZB AFHIU AFOHR AGQPQ AHPBZ AHWEU AIXLP ATHPR AYFIA CITATION PHGZM PHGZT AEIIB PMFND 7U4 7UB 7XB 8BJ 8FK ABRTQ BHHNA DWI FQK JBE L.- MBDVC PKEHL POGQB PQEST PQUKI PRINS PRQQA Q9U WZK |
ID | FETCH-LOGICAL-c2732-bdde3ef22e0c517479a61f793a072fa8d31c29b5710c1736a6b218f413cb83fc3 |
IEDL.DBID | BENPR |
ISSN | 0033-5177 |
IngestDate | Fri Jul 25 23:05:10 EDT 2025 Tue Jun 10 21:22:43 EDT 2025 Thu Apr 24 22:57:23 EDT 2025 Tue Jul 01 02:17:03 EDT 2025 Fri Feb 21 02:44:19 EST 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Keywords | Deep learning Natural language processing Big data Text classification Content analysis |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c2732-bdde3ef22e0c517479a61f793a072fa8d31c29b5710c1736a6b218f413cb83fc3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0003-0069-157X |
PQID | 2771188662 |
PQPubID | 54128 |
PageCount | 21 |
ParticipantIDs | proquest_journals_2771188662 gale_infotracacademiconefile_A735458949 crossref_citationtrail_10_1007_s11135_022_01372_2 crossref_primary_10_1007_s11135_022_01372_2 springer_journals_10_1007_s11135_022_01372_2 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 20230200 |
PublicationDateYYYYMMDD | 2023-02-01 |
PublicationDate_xml | – month: 2 year: 2023 text: 20230200 |
PublicationDecade | 2020 |
PublicationPlace | Dordrecht |
PublicationPlace_xml | – name: Dordrecht |
PublicationSubtitle | International Journal of Methodology |
PublicationTitle | Quality & quantity |
PublicationTitleAbbrev | Qual Quant |
PublicationYear | 2023 |
Publisher | Springer Netherlands Springer Springer Nature B.V |
Publisher_xml | – name: Springer Netherlands – name: Springer – name: Springer Nature B.V |
References | Jordan, Mitchell (CR21) 2015; 349 Mozetič, Grčar, Smailović (CR28) 2016; 11 Ignatow (CR18) 2016; 46 Fussey, Roth (CR11) 2020 Tharwat (CR33) 2020; 17 Denny, Spirling (CR6) 2018; 26 Goldenstein, Poschmann (CR12) 2019; 49 CR10 Murthy, Bowman (CR30) 2014 Jemielniak (CR20) 2018; 2 Di Franco, Santurro (CR7) 2020; 55 Barberá, Boydstun, Linn, McMahon, Nagler (CR4) 2020; 29 Neuendorf (CR32) 2016 Weller, Bruns, Burgess, Mahrt (CR35) 2013 Bail (CR3) 2014; 43 DiMaggio (CR8) 2015; 2 Grimmer, Stewart (CR14) 2013; 21 Evans, Aceves (CR9) 2016; 42 Murthy (CR29) 2012 Tinati, Halford, Carr, Pope (CR34) 2014; 48 CR5 Krippendorff (CR23) 2003 He, Schonlau (CR15) 2020; 38 CR26 CR24 CR22 He, Schonlau (CR16) 2020; 14 Nelson (CR31) 2019; 49 Baden, Pipal, Schoonvelde, van der Velden (CR2) 2021 Goldenstein, Poschmann (CR13) 2019; 49 Jacobs, Tschötschel (CR19) 2019; 22 Williams, Burnap, Sloan (CR36) 2017; 51 Monroe (CR27) 2019; 49 Hopkins, King (CR17) 2010; 54 Miller, Linder, Mebane (CR25) 2020; 28 Z He (1372_CR15) 2020; 38 D Murthy (1372_CR29) 2012 CA Bail (1372_CR3) 2014; 43 G Ignatow (1372_CR18) 2016; 46 1372_CR10 (1372_CR35) 2013 J Goldenstein (1372_CR13) 2019; 49 P Barberá (1372_CR4) 2020; 29 LK Nelson (1372_CR31) 2019; 49 J Goldenstein (1372_CR12) 2019; 49 B Miller (1372_CR25) 2020; 28 1372_CR5 BL Monroe (1372_CR27) 2019; 49 (1372_CR32) 2016 J Grimmer (1372_CR14) 2013; 21 DJ Hopkins (1372_CR17) 2010; 54 ML Williams (1372_CR36) 2017; 51 R Tinati (1372_CR34) 2014; 48 C Baden (1372_CR2) 2021 1372_CR22 I Mozetič (1372_CR28) 2016; 11 MJ Denny (1372_CR6) 2018; 26 1372_CR24 Z He (1372_CR16) 2020; 14 KH Krippendorff (1372_CR23) 2003 1372_CR26 A Tharwat (1372_CR33) 2020; 17 D Murthy (1372_CR30) 2014 D Jemielniak (1372_CR20) 2018; 2 P Fussey (1372_CR11) 2020 P DiMaggio (1372_CR8) 2015; 2 M Jordan (1372_CR21) 2015; 349 G Di Franco (1372_CR7) 2020; 55 T Jacobs (1372_CR19) 2019; 22 JA Evans (1372_CR9) 2016; 42 |
References_xml | – volume: 49 start-page: 139 year: 2019 end-page: 143 ident: CR31 article-title: To measure meaning in big data, don’t give me a map, give me transparency and reproducibility publication-title: Sociol. Methodol. doi: 10.1177/0081175019863783 – ident: CR22 – volume: 21 start-page: 267 year: 2013 end-page: 297 ident: CR14 article-title: Text as data: the promise and pitfalls of automatic content analysis methods for political texts publication-title: Polit. Anal. doi: 10.1093/pan/mps028 – volume: 17 start-page: 168 year: 2020 end-page: 192 ident: CR33 article-title: Classification assessment methods. Appl publication-title: Comput. Inform. doi: 10.1016/j.aci.2018.08.003 – volume: 349 start-page: 255 year: 2015 end-page: 260 ident: CR21 article-title: Machine learning: trends, perspectives, and prospects publication-title: Science doi: 10.1126/science.aaa8415 – volume: 22 start-page: 469 year: 2019 end-page: 485 ident: CR19 article-title: Topic models meet discourse analysis: a quantitative tool for a qualitative approach publication-title: Int. J. Soc. Res. Methodol. doi: 10.1080/13645579.2019.1576317 – year: 2014 ident: CR30 article-title: Big data solutions on a small scale: evaluating accessible high-performance computing for social research publication-title: Big Data Soc. doi: 10.1177/2053951714559105 – volume: 14 start-page: 267 year: 2020 end-page: 287 ident: CR16 article-title: Automatic coding of open-ended questions into multiple classes: whether and how to use double coded data publication-title: Surv. Res. Methods doi: 10.18148/srm/2020b.v14i3.7639 – ident: CR10 – volume: 29 start-page: 1 year: 2020 end-page: 24 ident: CR4 article-title: Automated text classification of news articles: a practical guide publication-title: Polit. Anal. doi: 10.1017/pan.2020.8 – year: 2013 ident: CR35 publication-title: Twitter and Society – year: 2016 ident: CR32 publication-title: The Content Analysis Guidebook – volume: 26 start-page: 168 year: 2018 end-page: 189 ident: CR6 article-title: Text preprocessing for unsupervised learning: why it matters, when it misleads, and what to do about It publication-title: Polit. Anal. doi: 10.1017/pan.2017.44 – year: 2020 ident: CR11 article-title: Digitizing sociology: continuity and change in the internet era publication-title: Sociology doi: 10.1177/0038038520918562 – volume: 46 start-page: 104 year: 2016 end-page: 120 ident: CR18 article-title: Theoretical foundations for digital text analysis publication-title: J. Theory Soc. Behav. doi: 10.1111/jtsb.12086 – year: 2012 ident: CR29 article-title: Towards a sociological understanding of social media: theorizing twitter publication-title: Sociology doi: 10.1177/0038038511422553 – volume: 42 start-page: 21 year: 2016 end-page: 50 ident: CR9 article-title: Machine translation: mining text for social theory publication-title: Annu. Rev. Sociol. doi: 10.1146/annurev-soc-081715-074206 – volume: 38 start-page: 754 year: 2020 end-page: 765 ident: CR15 article-title: Automatic coding of text answers to open-ended questions: should you double code the training data? publication-title: Soc. Sci. Comput. Rev. doi: 10.1177/0894439319846622 – volume: 49 start-page: 132 year: 2019 end-page: 139 ident: CR27 article-title: The meanings of “meaning” in social scientific text analysis publication-title: Sociol. Methodol. doi: 10.1177/0081175019865231 – volume: 28 start-page: 532 year: 2020 end-page: 551 ident: CR25 article-title: Active learning approaches for labeling text: review and assessment of the performance of active learning approaches publication-title: Polit. Anal. doi: 10.1017/pan.2020.4 – volume: 2 start-page: 2053951715602908 year: 2015 ident: CR8 article-title: Adapting computational text analysis to social science (and vice versa) publication-title: Big Data Soc. doi: 10.1177/2053951715602908 – volume: 11 start-page: e0155036 year: 2016 ident: CR28 article-title: Multilingual twitter sentiment classification: the role of human annotators publication-title: PLoS ONE doi: 10.1371/journal.pone.0155036 – volume: 2 start-page: 7 year: 2018 end-page: 29 ident: CR20 article-title: Socjologia 2.0: o potrzebie łączenia Big Data z etnografią cyfrową, wyzwaniach jakościowej socjologii cyfrowej i systematyzacji pojęć publication-title: Stud. Socjol. – volume: 49 start-page: 83 year: 2019 end-page: 131 ident: CR13 article-title: Analyzing meaning in big data: performing a map analysis using grammatical parsing and topic modeling publication-title: Sociol. Methodol. doi: 10.1177/0081175019852762 – volume: 54 start-page: 229 year: 2010 end-page: 247 ident: CR17 article-title: A method of automated nonparametric content analysis for social science publication-title: Am. J. Polit. Sci. doi: 10.1111/j.1540-5907.2009.00428.x – year: 2003 ident: CR23 publication-title: Content Analysis: An Introduction to Its Methodology, 2nd – volume: 51 start-page: 1149 year: 2017 end-page: 1168 ident: CR36 article-title: Towards an ethical framework for publishing twitter data in social research: taking into account users’ views, online context and algorithmic estimation publication-title: Sociology doi: 10.1177/0038038517708140 – ident: CR5 – volume: 49 start-page: 144 year: 2019 end-page: 151 ident: CR12 article-title: A quest for transparent and reproducible text-mining methodologies in computational social science publication-title: Sociol. Methodol. doi: 10.1177/0081175019867855 – year: 2021 ident: CR2 article-title: Three gaps in computational text analysis methods for social sciences: a research agenda publication-title: Commun. Methods Meas. doi: 10.1080/19312458.2021.2015574 – ident: CR26 – ident: CR24 – volume: 43 start-page: 465 year: 2014 end-page: 482 ident: CR3 article-title: The cultural environment: measuring culture with big data publication-title: Theory Soc. doi: 10.1007/s11186-014-9216-5 – volume: 55 start-page: 1007 year: 2020 end-page: 1025 ident: CR7 article-title: Machine learning, artificial neural networks and social research publication-title: Qual. Quant. doi: 10.1007/s11135-020-01037-y – volume: 48 start-page: 663 year: 2014 end-page: 681 ident: CR34 article-title: Big data: methodological challenges and approaches for sociological analysis publication-title: Sociology doi: 10.1177/0038038513511561 – ident: 1372_CR22 doi: 10.18653/v1/E17-2068 – volume: 49 start-page: 132 year: 2019 ident: 1372_CR27 publication-title: Sociol. Methodol. doi: 10.1177/0081175019865231 – volume-title: Content Analysis: An Introduction to Its Methodology, 2nd year: 2003 ident: 1372_CR23 – year: 2021 ident: 1372_CR2 publication-title: Commun. Methods Meas. doi: 10.1080/19312458.2021.2015574 – volume: 21 start-page: 267 year: 2013 ident: 1372_CR14 publication-title: Polit. Anal. doi: 10.1093/pan/mps028 – volume: 2 start-page: 205395171560290 year: 2015 ident: 1372_CR8 publication-title: Big Data Soc. doi: 10.1177/2053951715602908 – volume: 17 start-page: 168 year: 2020 ident: 1372_CR33 publication-title: Comput. Inform. doi: 10.1016/j.aci.2018.08.003 – ident: 1372_CR5 doi: 10.2139/ssrn.1926431 – volume: 55 start-page: 1007 year: 2020 ident: 1372_CR7 publication-title: Qual. Quant. doi: 10.1007/s11135-020-01037-y – volume: 349 start-page: 255 year: 2015 ident: 1372_CR21 publication-title: Science doi: 10.1126/science.aaa8415 – volume: 54 start-page: 229 year: 2010 ident: 1372_CR17 publication-title: Am. J. Polit. Sci. doi: 10.1111/j.1540-5907.2009.00428.x – volume: 26 start-page: 168 year: 2018 ident: 1372_CR6 publication-title: Polit. Anal. doi: 10.1017/pan.2017.44 – volume: 43 start-page: 465 year: 2014 ident: 1372_CR3 publication-title: Theory Soc. doi: 10.1007/s11186-014-9216-5 – volume: 29 start-page: 1 year: 2020 ident: 1372_CR4 publication-title: Polit. Anal. doi: 10.1017/pan.2020.8 – volume: 48 start-page: 663 year: 2014 ident: 1372_CR34 publication-title: Sociology doi: 10.1177/0038038513511561 – volume-title: Twitter and Society year: 2013 ident: 1372_CR35 – volume: 49 start-page: 144 year: 2019 ident: 1372_CR12 publication-title: Sociol. Methodol. doi: 10.1177/0081175019867855 – volume: 38 start-page: 754 year: 2020 ident: 1372_CR15 publication-title: Soc. Sci. Comput. Rev. doi: 10.1177/0894439319846622 – volume: 49 start-page: 83 year: 2019 ident: 1372_CR13 publication-title: Sociol. Methodol. doi: 10.1177/0081175019852762 – volume: 46 start-page: 104 year: 2016 ident: 1372_CR18 publication-title: J. Theory Soc. Behav. doi: 10.1111/jtsb.12086 – volume: 42 start-page: 21 year: 2016 ident: 1372_CR9 publication-title: Annu. Rev. Sociol. doi: 10.1146/annurev-soc-081715-074206 – ident: 1372_CR10 – volume: 28 start-page: 532 year: 2020 ident: 1372_CR25 publication-title: Polit. Anal. doi: 10.1017/pan.2020.4 – volume: 51 start-page: 1149 year: 2017 ident: 1372_CR36 publication-title: Sociology doi: 10.1177/0038038517708140 – year: 2020 ident: 1372_CR11 publication-title: Sociology doi: 10.1177/0038038520918562 – volume: 14 start-page: 267 year: 2020 ident: 1372_CR16 publication-title: Surv. Res. Methods doi: 10.18148/srm/2020b.v14i3.7639 – volume: 22 start-page: 469 year: 2019 ident: 1372_CR19 publication-title: Int. J. Soc. Res. Methodol. doi: 10.1080/13645579.2019.1576317 – ident: 1372_CR26 – ident: 1372_CR24 doi: 10.1145/1645953.1646003 – year: 2014 ident: 1372_CR30 publication-title: Big Data Soc. doi: 10.1177/2053951714559105 – volume: 11 start-page: e0155036 year: 2016 ident: 1372_CR28 publication-title: PLoS ONE doi: 10.1371/journal.pone.0155036 – volume: 2 start-page: 7 year: 2018 ident: 1372_CR20 publication-title: Stud. Socjol. – year: 2012 ident: 1372_CR29 publication-title: Sociology doi: 10.1177/0038038511422553 – volume-title: The Content Analysis Guidebook year: 2016 ident: 1372_CR32 – volume: 49 start-page: 139 year: 2019 ident: 1372_CR31 publication-title: Sociol. Methodol. doi: 10.1177/0081175019863783 |
SSID | ssj0010047 |
Score | 2.2965307 |
Snippet | Due to the recent advances in natural language processing, social scientists use automatic text classification methods more and more frequently. The article... |
SourceID | proquest gale crossref springer |
SourceType | Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 301 |
SubjectTerms | Accuracy Algorithms Automatic classification Big Data Classification Cognitive style Computational linguistics Current events Datasets Decisions Deep learning Digital archives Language Language processing Machine learning Mathematical functions Methodology of the Social Sciences Methods Natural language interfaces Natural language processing Opinion leaders Political activity Political aspects Political attitudes Political factors Regression analysis Research methodology Researcher subject relations Researchers Science Scientists Social networks Social research Social science research Social Sciences Social scientists Text analysis Text categorization |
SummonAdditionalLinks | – databaseName: SpringerLink Journals (ICM) dbid: U2A link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Nb9QwEDWoXHpBUKhYKGgOSIDYSLWd2MkJrRDVCgnEoSv1FjmOUyotm2iTVemtf6N_j1_CjO3s8i1x3UxmR8p4PJbfe8PYc0PszNy6RAubJ6nhdVIUFa0rpWVl6owbIgp_-Kjmi_T9WXYWSWH9iHYfryR9pd6R3TiXxCYmKIHUIsHCeyfDszsBuRZitr07IAHEIMYok4xrHakyf_bx03b0a1H-7XbUbzon99jd2C3CLHze--y2Wx2w_U_j-IGrAzYJ_FqIa7SHl1FI-hUaUicZhJgf3Dqct5cwtNCtHYHOgaChgB0rYAcIZjO0XrsVLHXTBB_yXwzaBrqAkDPL5RV45ouroXLYujY9uK8eR4u_oO3p5QVRg94Aph7YH1Da5CWKClG3-e36Buo42qenFymCMJmELIlkv-6nIa7lebu-GD5_gTjd4hz8jltv1m4KZlV7K0KydIHwQBaYt93Wb7cjRpDzftNRcaSI_Qyg_iFbnLw7fTtP4lCIxGKnJZIK67F0jRDu2JLKti6M4g1WGXOsRWPyWnIriirDzslyLZVRFXYxDe7VtsplY-Uh21u1K_eIgVRe_U5J1fDUcszKrG6kkwUeGl1q0wnjY26UNiqm0-COZbnTeqZ8KjGfSp9PpZiw19t3uqAX8k_rF5RyJRUT9GxN5ERgfCTLVc60pItNDGjCjsasLGOV6UuhNZ4Pc6XQ0XTM1N3jv__v4_8zf8L2cZ3JAFY_YnvDeuOeYi82VM_80vsO0k0vkw priority: 102 providerName: Springer Nature |
Title | How to prepare data for the automatic classification of politically related beliefs expressed on Twitter? The consequences of researchers’ decisions on the number of coders, the algorithm learning procedure, and the pre-processing steps on the performance of supervised models |
URI | https://link.springer.com/article/10.1007/s11135-022-01372-2 https://www.proquest.com/docview/2771188662 |
Volume | 57 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3fb9MwEDasfdkLgsFEYVT3gASIRsx2G6dPU0EdFYhpQqs0niLHcTak0oQm1dgb_wb_Hn8Jd7bT8kPstblcrPh8_hrf9x1jTzWxMxNjIyVMEg01z6PxOKN1FSuZ6XzENRGFP5zEs_nw3fnoPHxwq0NZZZsTXaLOS0PfyF8JpRALJ3EsjqqvEXWNotPV0EJjh3UxBSdJh3VfT09OP27OEUgM0QszymjElQq0GU-e41wSO5lKE6QSkfhja_o7Qf9zUuo2oOO77E5AjjDxU32P3bbLPbZ72rYiuN5jPc-1hbBea3geRKVfoCGhSi_KfP_W_qy8gqaEamWpAB2oTBQQvQKiQdDrpnQ6rmAIWVMpkZs9KAuofLWcXiyuwbFgbA6ZRRhb1GC_uZpa_AVtz64-E03oCDAMwfxWsU1egsAQIc-f339AHtr81HQjjcB3KSFLItyv6oEf1-IC56O5_AKh08UFuN03X6_sAPQyd1ZU1VJ58gNZYAxXG7_VliRBzut1RYmSRuz6AdUP2Px4evZmFoUGEZFB1CWiDHOztIUQ9tCQ4rYa65gXmHH0oRKFTnLJjRhnI0RRhisZ6zhDRFPgvm2yRBZG7rPOslzahwxk7JTwYhkXfGg4RugoL6SVY_wDaYdm2GO8jY3UBPV0auKxSLe6zxRPKcZT6uIpFT32cnNP5bVDbrR-RiGXUmJBz0YHfgSOjyS60omSdMiJA-qxgzYq05Bx6nS7Pnps0Ebq9vL_n_voZm-P2a7AN-ML1Q9Yp1mt7RPEYU3WZ93J20_vp_2w6PpsZy4mvwDlNTa1 |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3NbtNAEF5KeqAXBIWKQIE5gAARi3jXWScHVBVoldI2qlAq9eau1-uCFGITJwq58Rq8BA_FkzCzu074Eb31mqwnq_ibH3vn-4axJ4rYmV1tgpjrbhCpMAt6vZT8SsYiVVknVEQUPh7I_mn0_qxztsZ-1FwYaqusY6IN1Fmh6R35Kx7HWAt3peQ75ZeApkbR6Wo9QsPB4tAs5vjIVr0-eIf39ynn-3vDt_3ATxUINKZqHqTo0MLknJu2JpnmuKdkmCNMVTvmuepmItS8l3Yw9eowFlLJFNNgjsFep12Ra4F2r7P1SMg2b7D1N3uDkw_LcwsSX3RCkCJA47Gn6TiyXhgKYkNTK4SIecD_SIV_J4R_TmZtwtu_xW76ShV2HbRuszUz3mQbJ_Xog8UmazpuL_j4UMFzL2L9AhdSFetEoO9c2-oXc5gWUE4MNbwDtaUCVsuA1Seo2bSwurGgqZKn1iWLFihyKF13nhqNFmBZNyaD1GDZnFdgvtoeXvwE1w7nn4iWtAMIe9C_dYiTFS9oRJXuz2_fIfNjhSq6kHbgpqLQSiL4T6qW29foAu__9ONn8JM1LsBm-2w2MS1Q48yuoi6a0pEtaAX6TLm0W65IGWS8mpUUmGnHdv5QdZedXgl0tlhjXIzNPQZCWuU9KWQeRjpEj-hkuTCihw-sJtJRk4U1NhLt1dppaMgoWelME54SxFNi8ZTwJnu5vKZ0WiWXrn5GkEsokKFlrTwfA_dHkmDJbizoUBU31GTbNSoTH-GqZOWPTdaqkbr6-v-_e_9ya4_Zjf7w-Cg5OhgcPmAbHP8l1yS_zRrTycw8xBpwmj7yjgfs_Kp9_RefmHA9 |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3NbtNAEF5KKqFeEBQqDAXmAAJErNa7iTc5oKrQRimFKEKt1Ju7Xq8LUohNnCjkxmvwKjwOT8LM7jrhR_TWa7KerOJvfuyd7xvGnihiZ3a0CSXXnbCloizsdlPyq1iKVGXtSBFR-P0g7p-23p61z9bYj5oLQ22VdUy0gTorNL0j3-FSYi3ciWO-k_u2iOFBb6_8EtIEKTpprcdpOIgcm8UcH9-qV0cHeK-fct47PHnTD_2EgVBj2uZhis4tTM652dUk2Sy7Ko5yhKzalTxXnUxEmnfTNqZhHUkRqzjFlJhj4NdpR-RaoN3rbF3SU1GDrb8-HAw_LM8wSIjRiUKKEI1LT9lxxL0oEsSMprYIIXnI_0iLfyeHf05pbfLr3WI3fdUK-w5mt9maGW-yjWE9BmGxyQLH8wUfKyp47gWtX-BCqmidIPSda1v9Yg7TAsqJoeZ3oBZVwMoZsBIFNZsWVkMWNFX11MZkkQNFDqXr1FOj0QIsA8dkkBosofMKzFfbz4uf4NqT-SeiKO0BugDo37rFyYoXN6Kq9-e375D5EUMVXUg7cBNSaCWR_SdV0-1rdIH3f_rxM_gpGxdgM382m5gmqHFmV1FHTemIF7QC_adc2i1XBA0yXs1KCtK0YzuLqLrLTq8EOlusMS7G5h4DEVsVvljEedTSEXpHO8uFEV18eDUt3QpYVGMj0V65nQaIjJKV5jThKUE8JRZPCQ_Yy-U1pdMtuXT1M4JcQkENLWvluRm4P5IHS_aloANW3FDAtmtUJj7aVcnKNwPWrJG6-vr_v3v_cmuP2Q308eTd0eD4Advg-Ce5fvlt1phOZuYhloPT9JH3O2DnV-3qvwDHmnRy |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=How+to+prepare+data+for+the+automatic+classification+of+politically+related+beliefs+expressed+on+Twitter%3F+The+consequences+of+researchers%E2%80%99+decisions+on+the+number+of+coders%2C+the+algorithm+learning+procedure%2C+and+the+pre-processing+steps+on+the+performance+of+supervised+models&rft.jtitle=Quality+%26+quantity&rft.au=Matuszewski%2C+Pawe%C5%82&rft.date=2023-02-01&rft.pub=Springer+Nature+B.V&rft.issn=0033-5177&rft.eissn=1573-7845&rft.volume=57&rft.issue=1&rft.spage=301&rft.epage=321&rft_id=info:doi/10.1007%2Fs11135-022-01372-2&rft.externalDBID=HAS_PDF_LINK |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0033-5177&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0033-5177&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0033-5177&client=summon |