How to prepare data for the automatic classification of politically related beliefs expressed on Twitter? The consequences of researchers’ decisions on the number of coders, the algorithm learning procedure, and the pre-processing steps on the performance of supervised models

Due to the recent advances in natural language processing, social scientists use automatic text classification methods more and more frequently. The article raises the question about how researchers’ subjective decisions affect the performance of supervised deep learning models. The aim is to delive...

Full description

Saved in:
Bibliographic Details
Published inQuality & quantity Vol. 57; no. 1; pp. 301 - 321
Main Author Matuszewski, Paweł
Format Journal Article
LanguageEnglish
Published Dordrecht Springer Netherlands 01.02.2023
Springer
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Due to the recent advances in natural language processing, social scientists use automatic text classification methods more and more frequently. The article raises the question about how researchers’ subjective decisions affect the performance of supervised deep learning models. The aim is to deliver practical advice for researchers concerning: (1) whether it is more efficient to monitor coders’ work to ensure a high quality training dataset or have every document coded once and obtain a larger dataset instead; (2) whether lemmatisation improves model performance; (3) if it is better to apply passive learning or active learning approaches; and (4) if the answers are dependent on the models’ classification tasks. The models were trained to detect if a tweet is about current affairs or political issues, the tweet’s subject matter and the tweet author’s stance on this. The study uses a sample of 200,000 manually coded tweets published by Polish political opinion leaders in 2019. The consequences of decisions under different conditions were checked by simulating 52,800 results using the fastText algorithm (DV: F1-score). Linear regression analysis suggests that the researchers’ choices not only strongly affect model performance but may also lead, in the worst-case scenario, to a waste of funds.
AbstractList Due to the recent advances in natural language processing, social scientists use automatic text classification methods more and more frequently. The article raises the question about how researchers' subjective decisions affect the performance of supervised deep learning models. The aim is to deliver practical advice for researchers concerning: (1) whether it is more efficient to monitor coders' work to ensure a high quality training dataset or have every document coded once and obtain a larger dataset instead; (2) whether lemmatisation improves model performance; (3) if it is better to apply passive learning or active learning approaches; and (4) if the answers are dependent on the models' classification tasks. The models were trained to detect if a tweet is about current affairs or political issues, the tweet's subject matter and the tweet author's stance on this. The study uses a sample of 200,000 manually coded tweets published by Polish political opinion leaders in 2019. The consequences of decisions under different conditions were checked by simulating 52,800 results using the fastText algorithm (DV: F1-score). Linear regression analysis suggests that the researchers' choices not only strongly affect model performance but may also lead, in the worst-case scenario, to a waste of funds.
Audience Academic
Author Matuszewski, Paweł
Author_xml – sequence: 1
  givenname: Paweł
  orcidid: 0000-0003-0069-157X
  surname: Matuszewski
  fullname: Matuszewski, Paweł
  email: pawel.matuszewski@civitas.edu.pl
  organization: Collegium Civitas
BookMark eNp9ks9u1DAQxgMqEtvClQMnS1yb4j9JnD2hqgJaqRKX5Rw5znjXlWMH26H0xmvwejwJk00FEocqhyjj7_fN58mcFic-eCiKt4xeMErl-8QYE3VJOS8pE5KX_HmxYbUUpWyr-qTYUCpEWTMpXxanKd1RilglN8_eXId7kgOZIkwqAhlUVsSESPIBiJpzGFW2mminUrLGavwKngRDpuAsnijnHkgEpzIMpAdnwSQCP9AuJaygdndvc4b4gezQUQef4NsMXkNaXFAGKuoDxPT75y8ygLYJG6QFXBL4eewhLkodBhSdr7ncPkSbDyNxSHvr95g_aBjmCOdE-eGowgzlsYzJUZEyTH99J4h4yVFhjsU8zVj4bpfEI_Zx6VXxwiiX4PXj-6z4-unj7uq6vP3y-ebq8rbUXApe9sMAAgznQDUOt5Jb1TAjt0JRyY1qB8E03_a1ZFQzKRrV9Jy1pmJC960wWpwV71ZfDIpjSbm7C3P02LLjUjLWtk3DUXWxqvbKQWe9CTkqjc8Ao8WRgrFYv5Sirup2W20RaFdAx5BSBNNpm4-_DkHrOka7ZW26dW06XJvuuDbd0ov_h07Rjio-PA2JFUoo9nuI_67xBPUHMZ7etQ
CitedBy_id crossref_primary_10_1080_1828051X_2024_2333813
crossref_primary_10_1016_j_atech_2025_100827
Cites_doi 10.1177/0081175019863783
10.1093/pan/mps028
10.1016/j.aci.2018.08.003
10.1126/science.aaa8415
10.1080/13645579.2019.1576317
10.1177/2053951714559105
10.18148/srm/2020b.v14i3.7639
10.1017/pan.2020.8
10.1017/pan.2017.44
10.1177/0038038520918562
10.1111/jtsb.12086
10.1177/0038038511422553
10.1146/annurev-soc-081715-074206
10.1177/0894439319846622
10.1177/0081175019865231
10.1017/pan.2020.4
10.1177/2053951715602908
10.1371/journal.pone.0155036
10.1177/0081175019852762
10.1111/j.1540-5907.2009.00428.x
10.1177/0038038517708140
10.1177/0081175019867855
10.1080/19312458.2021.2015574
10.1007/s11186-014-9216-5
10.1007/s11135-020-01037-y
10.1177/0038038513511561
10.18653/v1/E17-2068
10.2139/ssrn.1926431
10.1145/1645953.1646003
ContentType Journal Article
Copyright The Author(s), under exclusive licence to Springer Nature B.V. 2022
COPYRIGHT 2023 Springer
The Author(s), under exclusive licence to Springer Nature B.V. 2022.
Copyright_xml – notice: The Author(s), under exclusive licence to Springer Nature B.V. 2022
– notice: COPYRIGHT 2023 Springer
– notice: The Author(s), under exclusive licence to Springer Nature B.V. 2022.
DBID AAYXX
CITATION
0-V
3V.
7U4
7UB
7WY
7WZ
7XB
87Z
88G
88J
8BJ
8FI
8FJ
8FK
8FL
8G5
ABUWG
AFKRA
ALSLI
AZQEC
BENPR
BEZIV
BHHNA
CCPQU
DWI
DWQXO
FQK
FRNLG
FYUFA
F~G
GHDGH
GNUQQ
GUQSH
HEHIP
JBE
K60
K6~
L.-
M0C
M2M
M2O
M2R
M2S
MBDVC
PHGZM
PHGZT
PKEHL
POGQB
PQBIZ
PQBZA
PQEST
PQQKQ
PQUKI
PRINS
PRQQA
PSYQQ
Q9U
WZK
DOI 10.1007/s11135-022-01372-2
DatabaseName CrossRef
ProQuest Social Sciences Premium Collection
ProQuest Central (Corporate)
Sociological Abstracts (pre-2017)
Worldwide Political Science Abstracts
ABI/INFORM Collection
ABI/INFORM Global (PDF only)
ProQuest Central (purchase pre-March 2016)
ABI/INFORM Global (Alumni Edition)
Psychology Database (Alumni)
Social Science Database (Alumni Edition)
International Bibliography of the Social Sciences (IBSS)
Hospital Premium Collection
Hospital Premium Collection (Alumni Edition)
ProQuest Central (Alumni) (purchase pre-March 2016)
ABI/INFORM Collection (Alumni Edition)
Research Library (Alumni Edition)
ProQuest Central (Alumni Edition)
ProQuest Central UK/Ireland
Social Science Premium Collection
ProQuest Central Essentials
ProQuest Central
Business Premium Collection
Sociological Abstracts
ProQuest One Community College
Sociological Abstracts
ProQuest Central Korea
International Bibliography of the Social Sciences
Business Premium Collection (Alumni)
Health Research Premium Collection
ABI/INFORM Global (Corporate)
Health Research Premium Collection (Alumni)
ProQuest Central Student
Research Library Prep
Sociology Collection
International Bibliography of the Social Sciences
ProQuest Business Collection (Alumni Edition)
ProQuest Business Collection
ABI/INFORM Professional Advanced
ABI/INFORM Global
Psychology Database
Research Library
Social Science Database
Sociology Database
Research Library (Corporate)
ProQuest Central Premium
ProQuest One Academic (New)
ProQuest One Academic Middle East (New)
ProQuest Sociology & Social Sciences Collection
ProQuest One Business
ProQuest One Business (Alumni)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
ProQuest One Social Sciences
ProQuest One Psychology
ProQuest Central Basic
Sociological Abstracts (Ovid)
DatabaseTitle CrossRef
ProQuest Business Collection (Alumni Edition)
ProQuest One Psychology
Research Library Prep
ProQuest Central Student
ProQuest Central Essentials
Sociology & Social Sciences Collection
ProQuest Central China
ABI/INFORM Complete
Health Research Premium Collection
ProQuest Central (New)
ProQuest Sociology
Business Premium Collection
Social Science Premium Collection
ABI/INFORM Global
ProQuest One Academic Eastern Edition
ProQuest Hospital Collection
Sociology Collection
Health Research Premium Collection (Alumni)
ProQuest Business Collection
ProQuest Hospital Collection (Alumni)
ProQuest Social Science Journals
ProQuest Social Sciences Premium Collection
ProQuest One Academic UKI Edition
ProQuest One Academic
ProQuest One Academic (New)
ABI/INFORM Global (Corporate)
ProQuest One Business
ProQuest Sociology & Social Sciences Collection
ProQuest One Academic Middle East (New)
ProQuest Social Science Journals (Alumni Edition)
ProQuest Central (Alumni Edition)
ProQuest One Community College
Research Library (Alumni Edition)
ProQuest Central
ABI/INFORM Professional Advanced
International Bibliography of the Social Sciences (IBSS)
ProQuest Central Korea
ProQuest Research Library
ProQuest Sociology Collection
Worldwide Political Science Abstracts
ABI/INFORM Complete (Alumni Edition)
ProQuest One Social Sciences
ABI/INFORM Global (Alumni Edition)
ProQuest Central Basic
ProQuest Psychology Journals (Alumni)
Sociological Abstracts (pre-2017)
ProQuest Psychology Journals
Sociological Abstracts
ProQuest One Business (Alumni)
ProQuest Central (Alumni)
Business Premium Collection (Alumni)
DatabaseTitleList

ProQuest Business Collection (Alumni Edition)
Database_xml – sequence: 1
  dbid: BENPR
  name: ProQuest Central
  url: https://www.proquest.com/central
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Statistics
Philosophy
Social Sciences (General)
EISSN 1573-7845
EndPage 321
ExternalDocumentID A735458949
10_1007_s11135_022_01372_2
GrantInformation_xml – fundername: Narodowe Centrum Nauki
  grantid: 2019/03/X/HS6/00882
  funderid: http://dx.doi.org/10.13039/501100004281
GroupedDBID --Z
-51
-5C
-5G
-BR
-EM
-Y2
-~C
.86
.VR
0-V
06D
0R~
0VY
123
199
1N0
1SB
2.D
203
28-
29P
2J2
2JN
2JY
2KG
2KM
2LR
2P1
2VQ
2~H
30V
3V.
4.4
406
408
409
40D
40E
5QI
5VS
67Z
6NX
78A
7WY
8FI
8FJ
8FL
8G5
8TC
8UJ
95-
95.
95~
96X
AAAVM
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANTL
AANZL
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDZT
ABECU
ABFTD
ABFTV
ABHLI
ABHQN
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABUWG
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACHQT
ACHSB
ACHXU
ACKNC
ACMDZ
ACMLO
ACNCT
ACOKC
ACOMO
ACPIV
ACYUM
ACZOJ
ADBBV
ADHHG
ADHIR
ADIMF
ADINQ
ADKNI
ADKPE
ADRFC
ADTPH
ADURQ
ADYFF
ADZJE
ADZKW
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFFNX
AFGCZ
AFKRA
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALIPV
ALMA_UNASSIGNED_HOLDINGS
ALSLI
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARALO
ARMRJ
ASOEW
ASPBG
AVWKF
AXYYD
AYQZM
AZFZN
AZQEC
B-.
BA0
BBWZM
BDATZ
BENPR
BEZIV
BGNMA
BPHCQ
BSONS
BVXVI
CAG
CCPQU
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
DWQXO
EBLON
EBS
EIOEI
EJD
ESBYG
F5P
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRNLG
FRRFC
FSGXE
FWDCC
FYUFA
GGCAI
GGRSB
GJIRD
GNUQQ
GNWQR
GQ6
GQ7
GQ8
GROUPED_ABI_INFORM_COMPLETE
GUQSH
GXS
H13
HEHIP
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
I09
IAO
IHE
IJ-
IKXTQ
INS
IPY
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
K60
K6~
KDC
KOV
KOW
LAK
LLZTM
M0C
M2M
M2O
M2R
M2S
M4Y
MA-
N2Q
N9A
NB0
NDZJH
NPVJJ
NQJWS
NU0
O-J
O9-
O93
O9G
O9I
O9J
OAM
OVD
P19
P9Q
PF0
PQBIZ
PQBZA
PQQKQ
PROAC
PSYQQ
PT4
PT5
Q2X
QOK
QOS
R-Y
R4E
R89
R9I
RHV
RIG
RNI
ROL
RPX
RSV
RZC
RZD
RZK
S16
S1Z
S26
S27
S28
S3B
SAP
SCLPG
SDA
SDH
SDM
SHS
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TEORI
TN5
TSG
TSK
TSV
TUC
U2A
UG4
UKHRP
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W23
W48
WH7
WK6
WK8
YLTOR
Z45
Z5O
Z7R
Z81
Z83
Z86
Z8M
Z8U
Z8W
Z92
ZMTXR
ZWUKE
ZXP
~EX
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ACSTC
ADHKG
AEZWR
AFDZB
AFHIU
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
ATHPR
AYFIA
CITATION
PHGZM
PHGZT
AEIIB
PMFND
7U4
7UB
7XB
8BJ
8FK
ABRTQ
BHHNA
DWI
FQK
JBE
L.-
MBDVC
PKEHL
POGQB
PQEST
PQUKI
PRINS
PRQQA
Q9U
WZK
ID FETCH-LOGICAL-c2732-bdde3ef22e0c517479a61f793a072fa8d31c29b5710c1736a6b218f413cb83fc3
IEDL.DBID BENPR
ISSN 0033-5177
IngestDate Fri Jul 25 23:05:10 EDT 2025
Tue Jun 10 21:22:43 EDT 2025
Thu Apr 24 22:57:23 EDT 2025
Tue Jul 01 02:17:03 EDT 2025
Fri Feb 21 02:44:19 EST 2025
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords Deep learning
Natural language processing
Big data
Text classification
Content analysis
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c2732-bdde3ef22e0c517479a61f793a072fa8d31c29b5710c1736a6b218f413cb83fc3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0003-0069-157X
PQID 2771188662
PQPubID 54128
PageCount 21
ParticipantIDs proquest_journals_2771188662
gale_infotracacademiconefile_A735458949
crossref_citationtrail_10_1007_s11135_022_01372_2
crossref_primary_10_1007_s11135_022_01372_2
springer_journals_10_1007_s11135_022_01372_2
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20230200
PublicationDateYYYYMMDD 2023-02-01
PublicationDate_xml – month: 2
  year: 2023
  text: 20230200
PublicationDecade 2020
PublicationPlace Dordrecht
PublicationPlace_xml – name: Dordrecht
PublicationSubtitle International Journal of Methodology
PublicationTitle Quality & quantity
PublicationTitleAbbrev Qual Quant
PublicationYear 2023
Publisher Springer Netherlands
Springer
Springer Nature B.V
Publisher_xml – name: Springer Netherlands
– name: Springer
– name: Springer Nature B.V
References Jordan, Mitchell (CR21) 2015; 349
Mozetič, Grčar, Smailović (CR28) 2016; 11
Ignatow (CR18) 2016; 46
Fussey, Roth (CR11) 2020
Tharwat (CR33) 2020; 17
Denny, Spirling (CR6) 2018; 26
Goldenstein, Poschmann (CR12) 2019; 49
CR10
Murthy, Bowman (CR30) 2014
Jemielniak (CR20) 2018; 2
Di Franco, Santurro (CR7) 2020; 55
Barberá, Boydstun, Linn, McMahon, Nagler (CR4) 2020; 29
Neuendorf (CR32) 2016
Weller, Bruns, Burgess, Mahrt (CR35) 2013
Bail (CR3) 2014; 43
DiMaggio (CR8) 2015; 2
Grimmer, Stewart (CR14) 2013; 21
Evans, Aceves (CR9) 2016; 42
Murthy (CR29) 2012
Tinati, Halford, Carr, Pope (CR34) 2014; 48
CR5
Krippendorff (CR23) 2003
He, Schonlau (CR15) 2020; 38
CR26
CR24
CR22
He, Schonlau (CR16) 2020; 14
Nelson (CR31) 2019; 49
Baden, Pipal, Schoonvelde, van der Velden (CR2) 2021
Goldenstein, Poschmann (CR13) 2019; 49
Jacobs, Tschötschel (CR19) 2019; 22
Williams, Burnap, Sloan (CR36) 2017; 51
Monroe (CR27) 2019; 49
Hopkins, King (CR17) 2010; 54
Miller, Linder, Mebane (CR25) 2020; 28
Z He (1372_CR15) 2020; 38
D Murthy (1372_CR29) 2012
CA Bail (1372_CR3) 2014; 43
G Ignatow (1372_CR18) 2016; 46
1372_CR10
(1372_CR35) 2013
J Goldenstein (1372_CR13) 2019; 49
P Barberá (1372_CR4) 2020; 29
LK Nelson (1372_CR31) 2019; 49
J Goldenstein (1372_CR12) 2019; 49
B Miller (1372_CR25) 2020; 28
1372_CR5
BL Monroe (1372_CR27) 2019; 49
(1372_CR32) 2016
J Grimmer (1372_CR14) 2013; 21
DJ Hopkins (1372_CR17) 2010; 54
ML Williams (1372_CR36) 2017; 51
R Tinati (1372_CR34) 2014; 48
C Baden (1372_CR2) 2021
1372_CR22
I Mozetič (1372_CR28) 2016; 11
MJ Denny (1372_CR6) 2018; 26
1372_CR24
Z He (1372_CR16) 2020; 14
KH Krippendorff (1372_CR23) 2003
1372_CR26
A Tharwat (1372_CR33) 2020; 17
D Murthy (1372_CR30) 2014
D Jemielniak (1372_CR20) 2018; 2
P Fussey (1372_CR11) 2020
P DiMaggio (1372_CR8) 2015; 2
M Jordan (1372_CR21) 2015; 349
G Di Franco (1372_CR7) 2020; 55
T Jacobs (1372_CR19) 2019; 22
JA Evans (1372_CR9) 2016; 42
References_xml – volume: 49
  start-page: 139
  year: 2019
  end-page: 143
  ident: CR31
  article-title: To measure meaning in big data, don’t give me a map, give me transparency and reproducibility
  publication-title: Sociol. Methodol.
  doi: 10.1177/0081175019863783
– ident: CR22
– volume: 21
  start-page: 267
  year: 2013
  end-page: 297
  ident: CR14
  article-title: Text as data: the promise and pitfalls of automatic content analysis methods for political texts
  publication-title: Polit. Anal.
  doi: 10.1093/pan/mps028
– volume: 17
  start-page: 168
  year: 2020
  end-page: 192
  ident: CR33
  article-title: Classification assessment methods. Appl
  publication-title: Comput. Inform.
  doi: 10.1016/j.aci.2018.08.003
– volume: 349
  start-page: 255
  year: 2015
  end-page: 260
  ident: CR21
  article-title: Machine learning: trends, perspectives, and prospects
  publication-title: Science
  doi: 10.1126/science.aaa8415
– volume: 22
  start-page: 469
  year: 2019
  end-page: 485
  ident: CR19
  article-title: Topic models meet discourse analysis: a quantitative tool for a qualitative approach
  publication-title: Int. J. Soc. Res. Methodol.
  doi: 10.1080/13645579.2019.1576317
– year: 2014
  ident: CR30
  article-title: Big data solutions on a small scale: evaluating accessible high-performance computing for social research
  publication-title: Big Data Soc.
  doi: 10.1177/2053951714559105
– volume: 14
  start-page: 267
  year: 2020
  end-page: 287
  ident: CR16
  article-title: Automatic coding of open-ended questions into multiple classes: whether and how to use double coded data
  publication-title: Surv. Res. Methods
  doi: 10.18148/srm/2020b.v14i3.7639
– ident: CR10
– volume: 29
  start-page: 1
  year: 2020
  end-page: 24
  ident: CR4
  article-title: Automated text classification of news articles: a practical guide
  publication-title: Polit. Anal.
  doi: 10.1017/pan.2020.8
– year: 2013
  ident: CR35
  publication-title: Twitter and Society
– year: 2016
  ident: CR32
  publication-title: The Content Analysis Guidebook
– volume: 26
  start-page: 168
  year: 2018
  end-page: 189
  ident: CR6
  article-title: Text preprocessing for unsupervised learning: why it matters, when it misleads, and what to do about It
  publication-title: Polit. Anal.
  doi: 10.1017/pan.2017.44
– year: 2020
  ident: CR11
  article-title: Digitizing sociology: continuity and change in the internet era
  publication-title: Sociology
  doi: 10.1177/0038038520918562
– volume: 46
  start-page: 104
  year: 2016
  end-page: 120
  ident: CR18
  article-title: Theoretical foundations for digital text analysis
  publication-title: J. Theory Soc. Behav.
  doi: 10.1111/jtsb.12086
– year: 2012
  ident: CR29
  article-title: Towards a sociological understanding of social media: theorizing twitter
  publication-title: Sociology
  doi: 10.1177/0038038511422553
– volume: 42
  start-page: 21
  year: 2016
  end-page: 50
  ident: CR9
  article-title: Machine translation: mining text for social theory
  publication-title: Annu. Rev. Sociol.
  doi: 10.1146/annurev-soc-081715-074206
– volume: 38
  start-page: 754
  year: 2020
  end-page: 765
  ident: CR15
  article-title: Automatic coding of text answers to open-ended questions: should you double code the training data?
  publication-title: Soc. Sci. Comput. Rev.
  doi: 10.1177/0894439319846622
– volume: 49
  start-page: 132
  year: 2019
  end-page: 139
  ident: CR27
  article-title: The meanings of “meaning” in social scientific text analysis
  publication-title: Sociol. Methodol.
  doi: 10.1177/0081175019865231
– volume: 28
  start-page: 532
  year: 2020
  end-page: 551
  ident: CR25
  article-title: Active learning approaches for labeling text: review and assessment of the performance of active learning approaches
  publication-title: Polit. Anal.
  doi: 10.1017/pan.2020.4
– volume: 2
  start-page: 2053951715602908
  year: 2015
  ident: CR8
  article-title: Adapting computational text analysis to social science (and vice versa)
  publication-title: Big Data Soc.
  doi: 10.1177/2053951715602908
– volume: 11
  start-page: e0155036
  year: 2016
  ident: CR28
  article-title: Multilingual twitter sentiment classification: the role of human annotators
  publication-title: PLoS ONE
  doi: 10.1371/journal.pone.0155036
– volume: 2
  start-page: 7
  year: 2018
  end-page: 29
  ident: CR20
  article-title: Socjologia 2.0: o potrzebie łączenia Big Data z etnografią cyfrową, wyzwaniach jakościowej socjologii cyfrowej i systematyzacji pojęć
  publication-title: Stud. Socjol.
– volume: 49
  start-page: 83
  year: 2019
  end-page: 131
  ident: CR13
  article-title: Analyzing meaning in big data: performing a map analysis using grammatical parsing and topic modeling
  publication-title: Sociol. Methodol.
  doi: 10.1177/0081175019852762
– volume: 54
  start-page: 229
  year: 2010
  end-page: 247
  ident: CR17
  article-title: A method of automated nonparametric content analysis for social science
  publication-title: Am. J. Polit. Sci.
  doi: 10.1111/j.1540-5907.2009.00428.x
– year: 2003
  ident: CR23
  publication-title: Content Analysis: An Introduction to Its Methodology, 2nd
– volume: 51
  start-page: 1149
  year: 2017
  end-page: 1168
  ident: CR36
  article-title: Towards an ethical framework for publishing twitter data in social research: taking into account users’ views, online context and algorithmic estimation
  publication-title: Sociology
  doi: 10.1177/0038038517708140
– ident: CR5
– volume: 49
  start-page: 144
  year: 2019
  end-page: 151
  ident: CR12
  article-title: A quest for transparent and reproducible text-mining methodologies in computational social science
  publication-title: Sociol. Methodol.
  doi: 10.1177/0081175019867855
– year: 2021
  ident: CR2
  article-title: Three gaps in computational text analysis methods for social sciences: a research agenda
  publication-title: Commun. Methods Meas.
  doi: 10.1080/19312458.2021.2015574
– ident: CR26
– ident: CR24
– volume: 43
  start-page: 465
  year: 2014
  end-page: 482
  ident: CR3
  article-title: The cultural environment: measuring culture with big data
  publication-title: Theory Soc.
  doi: 10.1007/s11186-014-9216-5
– volume: 55
  start-page: 1007
  year: 2020
  end-page: 1025
  ident: CR7
  article-title: Machine learning, artificial neural networks and social research
  publication-title: Qual. Quant.
  doi: 10.1007/s11135-020-01037-y
– volume: 48
  start-page: 663
  year: 2014
  end-page: 681
  ident: CR34
  article-title: Big data: methodological challenges and approaches for sociological analysis
  publication-title: Sociology
  doi: 10.1177/0038038513511561
– ident: 1372_CR22
  doi: 10.18653/v1/E17-2068
– volume: 49
  start-page: 132
  year: 2019
  ident: 1372_CR27
  publication-title: Sociol. Methodol.
  doi: 10.1177/0081175019865231
– volume-title: Content Analysis: An Introduction to Its Methodology, 2nd
  year: 2003
  ident: 1372_CR23
– year: 2021
  ident: 1372_CR2
  publication-title: Commun. Methods Meas.
  doi: 10.1080/19312458.2021.2015574
– volume: 21
  start-page: 267
  year: 2013
  ident: 1372_CR14
  publication-title: Polit. Anal.
  doi: 10.1093/pan/mps028
– volume: 2
  start-page: 205395171560290
  year: 2015
  ident: 1372_CR8
  publication-title: Big Data Soc.
  doi: 10.1177/2053951715602908
– volume: 17
  start-page: 168
  year: 2020
  ident: 1372_CR33
  publication-title: Comput. Inform.
  doi: 10.1016/j.aci.2018.08.003
– ident: 1372_CR5
  doi: 10.2139/ssrn.1926431
– volume: 55
  start-page: 1007
  year: 2020
  ident: 1372_CR7
  publication-title: Qual. Quant.
  doi: 10.1007/s11135-020-01037-y
– volume: 349
  start-page: 255
  year: 2015
  ident: 1372_CR21
  publication-title: Science
  doi: 10.1126/science.aaa8415
– volume: 54
  start-page: 229
  year: 2010
  ident: 1372_CR17
  publication-title: Am. J. Polit. Sci.
  doi: 10.1111/j.1540-5907.2009.00428.x
– volume: 26
  start-page: 168
  year: 2018
  ident: 1372_CR6
  publication-title: Polit. Anal.
  doi: 10.1017/pan.2017.44
– volume: 43
  start-page: 465
  year: 2014
  ident: 1372_CR3
  publication-title: Theory Soc.
  doi: 10.1007/s11186-014-9216-5
– volume: 29
  start-page: 1
  year: 2020
  ident: 1372_CR4
  publication-title: Polit. Anal.
  doi: 10.1017/pan.2020.8
– volume: 48
  start-page: 663
  year: 2014
  ident: 1372_CR34
  publication-title: Sociology
  doi: 10.1177/0038038513511561
– volume-title: Twitter and Society
  year: 2013
  ident: 1372_CR35
– volume: 49
  start-page: 144
  year: 2019
  ident: 1372_CR12
  publication-title: Sociol. Methodol.
  doi: 10.1177/0081175019867855
– volume: 38
  start-page: 754
  year: 2020
  ident: 1372_CR15
  publication-title: Soc. Sci. Comput. Rev.
  doi: 10.1177/0894439319846622
– volume: 49
  start-page: 83
  year: 2019
  ident: 1372_CR13
  publication-title: Sociol. Methodol.
  doi: 10.1177/0081175019852762
– volume: 46
  start-page: 104
  year: 2016
  ident: 1372_CR18
  publication-title: J. Theory Soc. Behav.
  doi: 10.1111/jtsb.12086
– volume: 42
  start-page: 21
  year: 2016
  ident: 1372_CR9
  publication-title: Annu. Rev. Sociol.
  doi: 10.1146/annurev-soc-081715-074206
– ident: 1372_CR10
– volume: 28
  start-page: 532
  year: 2020
  ident: 1372_CR25
  publication-title: Polit. Anal.
  doi: 10.1017/pan.2020.4
– volume: 51
  start-page: 1149
  year: 2017
  ident: 1372_CR36
  publication-title: Sociology
  doi: 10.1177/0038038517708140
– year: 2020
  ident: 1372_CR11
  publication-title: Sociology
  doi: 10.1177/0038038520918562
– volume: 14
  start-page: 267
  year: 2020
  ident: 1372_CR16
  publication-title: Surv. Res. Methods
  doi: 10.18148/srm/2020b.v14i3.7639
– volume: 22
  start-page: 469
  year: 2019
  ident: 1372_CR19
  publication-title: Int. J. Soc. Res. Methodol.
  doi: 10.1080/13645579.2019.1576317
– ident: 1372_CR26
– ident: 1372_CR24
  doi: 10.1145/1645953.1646003
– year: 2014
  ident: 1372_CR30
  publication-title: Big Data Soc.
  doi: 10.1177/2053951714559105
– volume: 11
  start-page: e0155036
  year: 2016
  ident: 1372_CR28
  publication-title: PLoS ONE
  doi: 10.1371/journal.pone.0155036
– volume: 2
  start-page: 7
  year: 2018
  ident: 1372_CR20
  publication-title: Stud. Socjol.
– year: 2012
  ident: 1372_CR29
  publication-title: Sociology
  doi: 10.1177/0038038511422553
– volume-title: The Content Analysis Guidebook
  year: 2016
  ident: 1372_CR32
– volume: 49
  start-page: 139
  year: 2019
  ident: 1372_CR31
  publication-title: Sociol. Methodol.
  doi: 10.1177/0081175019863783
SSID ssj0010047
Score 2.2965307
Snippet Due to the recent advances in natural language processing, social scientists use automatic text classification methods more and more frequently. The article...
SourceID proquest
gale
crossref
springer
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 301
SubjectTerms Accuracy
Algorithms
Automatic classification
Big Data
Classification
Cognitive style
Computational linguistics
Current events
Datasets
Decisions
Deep learning
Digital archives
Language
Language processing
Machine learning
Mathematical functions
Methodology of the Social Sciences
Methods
Natural language interfaces
Natural language processing
Opinion leaders
Political activity
Political aspects
Political attitudes
Political factors
Regression analysis
Research methodology
Researcher subject relations
Researchers
Science
Scientists
Social networks
Social research
Social science research
Social Sciences
Social scientists
Text analysis
Text categorization
SummonAdditionalLinks – databaseName: SpringerLink Journals (ICM)
  dbid: U2A
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Nb9QwEDWoXHpBUKhYKGgOSIDYSLWd2MkJrRDVCgnEoSv1FjmOUyotm2iTVemtf6N_j1_CjO3s8i1x3UxmR8p4PJbfe8PYc0PszNy6RAubJ6nhdVIUFa0rpWVl6owbIgp_-Kjmi_T9WXYWSWH9iHYfryR9pd6R3TiXxCYmKIHUIsHCeyfDszsBuRZitr07IAHEIMYok4xrHakyf_bx03b0a1H-7XbUbzon99jd2C3CLHze--y2Wx2w_U_j-IGrAzYJ_FqIa7SHl1FI-hUaUicZhJgf3Dqct5cwtNCtHYHOgaChgB0rYAcIZjO0XrsVLHXTBB_yXwzaBrqAkDPL5RV45ouroXLYujY9uK8eR4u_oO3p5QVRg94Aph7YH1Da5CWKClG3-e36Buo42qenFymCMJmELIlkv-6nIa7lebu-GD5_gTjd4hz8jltv1m4KZlV7K0KydIHwQBaYt93Wb7cjRpDzftNRcaSI_Qyg_iFbnLw7fTtP4lCIxGKnJZIK67F0jRDu2JLKti6M4g1WGXOsRWPyWnIriirDzslyLZVRFXYxDe7VtsplY-Uh21u1K_eIgVRe_U5J1fDUcszKrG6kkwUeGl1q0wnjY26UNiqm0-COZbnTeqZ8KjGfSp9PpZiw19t3uqAX8k_rF5RyJRUT9GxN5ERgfCTLVc60pItNDGjCjsasLGOV6UuhNZ4Pc6XQ0XTM1N3jv__v4_8zf8L2cZ3JAFY_YnvDeuOeYi82VM_80vsO0k0vkw
  priority: 102
  providerName: Springer Nature
Title How to prepare data for the automatic classification of politically related beliefs expressed on Twitter? The consequences of researchers’ decisions on the number of coders, the algorithm learning procedure, and the pre-processing steps on the performance of supervised models
URI https://link.springer.com/article/10.1007/s11135-022-01372-2
https://www.proquest.com/docview/2771188662
Volume 57
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3fb9MwEDasfdkLgsFEYVT3gASIRsx2G6dPU0EdFYhpQqs0niLHcTak0oQm1dgb_wb_Hn8Jd7bT8kPstblcrPh8_hrf9x1jTzWxMxNjIyVMEg01z6PxOKN1FSuZ6XzENRGFP5zEs_nw3fnoPHxwq0NZZZsTXaLOS0PfyF8JpRALJ3EsjqqvEXWNotPV0EJjh3UxBSdJh3VfT09OP27OEUgM0QszymjElQq0GU-e41wSO5lKE6QSkfhja_o7Qf9zUuo2oOO77E5AjjDxU32P3bbLPbZ72rYiuN5jPc-1hbBea3geRKVfoCGhSi_KfP_W_qy8gqaEamWpAB2oTBQQvQKiQdDrpnQ6rmAIWVMpkZs9KAuofLWcXiyuwbFgbA6ZRRhb1GC_uZpa_AVtz64-E03oCDAMwfxWsU1egsAQIc-f339AHtr81HQjjcB3KSFLItyv6oEf1-IC56O5_AKh08UFuN03X6_sAPQyd1ZU1VJ58gNZYAxXG7_VliRBzut1RYmSRuz6AdUP2Px4evZmFoUGEZFB1CWiDHOztIUQ9tCQ4rYa65gXmHH0oRKFTnLJjRhnI0RRhisZ6zhDRFPgvm2yRBZG7rPOslzahwxk7JTwYhkXfGg4RugoL6SVY_wDaYdm2GO8jY3UBPV0auKxSLe6zxRPKcZT6uIpFT32cnNP5bVDbrR-RiGXUmJBz0YHfgSOjyS60omSdMiJA-qxgzYq05Bx6nS7Pnps0Ebq9vL_n_voZm-P2a7AN-ML1Q9Yp1mt7RPEYU3WZ93J20_vp_2w6PpsZy4mvwDlNTa1
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3NbtNAEF5KeqAXBIWKQIE5gAARi3jXWScHVBVoldI2qlAq9eau1-uCFGITJwq58Rq8BA_FkzCzu074Eb31mqwnq_ibH3vn-4axJ4rYmV1tgpjrbhCpMAt6vZT8SsYiVVknVEQUPh7I_mn0_qxztsZ-1FwYaqusY6IN1Fmh6R35Kx7HWAt3peQ75ZeApkbR6Wo9QsPB4tAs5vjIVr0-eIf39ynn-3vDt_3ATxUINKZqHqTo0MLknJu2JpnmuKdkmCNMVTvmuepmItS8l3Yw9eowFlLJFNNgjsFep12Ra4F2r7P1SMg2b7D1N3uDkw_LcwsSX3RCkCJA47Gn6TiyXhgKYkNTK4SIecD_SIV_J4R_TmZtwtu_xW76ShV2HbRuszUz3mQbJ_Xog8UmazpuL_j4UMFzL2L9AhdSFetEoO9c2-oXc5gWUE4MNbwDtaUCVsuA1Seo2bSwurGgqZKn1iWLFihyKF13nhqNFmBZNyaD1GDZnFdgvtoeXvwE1w7nn4iWtAMIe9C_dYiTFS9oRJXuz2_fIfNjhSq6kHbgpqLQSiL4T6qW29foAu__9ONn8JM1LsBm-2w2MS1Q48yuoi6a0pEtaAX6TLm0W65IGWS8mpUUmGnHdv5QdZedXgl0tlhjXIzNPQZCWuU9KWQeRjpEj-hkuTCihw-sJtJRk4U1NhLt1dppaMgoWelME54SxFNi8ZTwJnu5vKZ0WiWXrn5GkEsokKFlrTwfA_dHkmDJbizoUBU31GTbNSoTH-GqZOWPTdaqkbr6-v-_e_9ya4_Zjf7w-Cg5OhgcPmAbHP8l1yS_zRrTycw8xBpwmj7yjgfs_Kp9_RefmHA9
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3NbtNAEF5KKqFeEBQqDAXmAAJErNa7iTc5oKrQRimFKEKt1Ju7Xq8LUohNnCjkxmvwKjwOT8LM7jrhR_TWa7KerOJvfuyd7xvGnihiZ3a0CSXXnbCloizsdlPyq1iKVGXtSBFR-P0g7p-23p61z9bYj5oLQ22VdUy0gTorNL0j3-FSYi3ciWO-k_u2iOFBb6_8EtIEKTpprcdpOIgcm8UcH9-qV0cHeK-fct47PHnTD_2EgVBj2uZhis4tTM652dUk2Sy7Ko5yhKzalTxXnUxEmnfTNqZhHUkRqzjFlJhj4NdpR-RaoN3rbF3SU1GDrb8-HAw_LM8wSIjRiUKKEI1LT9lxxL0oEsSMprYIIXnI_0iLfyeHf05pbfLr3WI3fdUK-w5mt9maGW-yjWE9BmGxyQLH8wUfKyp47gWtX-BCqmidIPSda1v9Yg7TAsqJoeZ3oBZVwMoZsBIFNZsWVkMWNFX11MZkkQNFDqXr1FOj0QIsA8dkkBosofMKzFfbz4uf4NqT-SeiKO0BugDo37rFyYoXN6Kq9-e375D5EUMVXUg7cBNSaCWR_SdV0-1rdIH3f_rxM_gpGxdgM382m5gmqHFmV1FHTemIF7QC_adc2i1XBA0yXs1KCtK0YzuLqLrLTq8EOlusMS7G5h4DEVsVvljEedTSEXpHO8uFEV18eDUt3QpYVGMj0V65nQaIjJKV5jThKUE8JRZPCQ_Yy-U1pdMtuXT1M4JcQkENLWvluRm4P5IHS_aloANW3FDAtmtUJj7aVcnKNwPWrJG6-vr_v3v_cmuP2Q308eTd0eD4Advg-Ce5fvlt1phOZuYhloPT9JH3O2DnV-3qvwDHmnRy
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=How+to+prepare+data+for+the+automatic+classification+of+politically+related+beliefs+expressed+on+Twitter%3F+The+consequences+of+researchers%E2%80%99+decisions+on+the+number+of+coders%2C+the+algorithm+learning+procedure%2C+and+the+pre-processing+steps+on+the+performance+of+supervised+models&rft.jtitle=Quality+%26+quantity&rft.au=Matuszewski%2C+Pawe%C5%82&rft.date=2023-02-01&rft.pub=Springer+Nature+B.V&rft.issn=0033-5177&rft.eissn=1573-7845&rft.volume=57&rft.issue=1&rft.spage=301&rft.epage=321&rft_id=info:doi/10.1007%2Fs11135-022-01372-2&rft.externalDBID=HAS_PDF_LINK
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0033-5177&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0033-5177&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0033-5177&client=summon