Categorising the world into local climate zones: towards quantifying labelling uncertainty for machine learning models

Abstract Image classification is often prone to labelling uncertainty. To generate suitable training data, images are labelled according to evaluations of human experts. This can result in ambiguities, which will affect subsequent models. In this work, we aim to model the labelling uncertainty in th...

Full description

Saved in:
Bibliographic Details
Published inJournal of the Royal Statistical Society Series C: Applied Statistics Vol. 73; no. 1; pp. 143 - 161
Main Authors Hechinger, Katharina, Zhu, Xiao Xiang, Kauermann, Göran
Format Journal Article
LanguageEnglish
Published US Oxford University Press 11.01.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Abstract Image classification is often prone to labelling uncertainty. To generate suitable training data, images are labelled according to evaluations of human experts. This can result in ambiguities, which will affect subsequent models. In this work, we aim to model the labelling uncertainty in the context of remote sensing and the classification of satellite images. We construct a multinomial mixture model given the evaluations of multiple experts. This is based on the assumption that there is no ambiguity of the image class, but apparently in the experts’ opinion about it. The model parameters can be estimated by a stochastic expectation maximisation algorithm. Analysing the estimates gives insights into sources of label uncertainty. Here, we focus on the general class ambiguity, the heterogeneity of experts, and the origin city of the images. The results are relevant for all machine learning applications where image classification is pursued and labelling is subject to humans.
AbstractList Image classification is often prone to labelling uncertainty. To generate suitable training data, images are labelled according to evaluations of human experts. This can result in ambiguities, which will affect subsequent models. In this work, we aim to model the labelling uncertainty in the context of remote sensing and the classification of satellite images. We construct a multinomial mixture model given the evaluations of multiple experts. This is based on the assumption that there is no ambiguity of the image class, but apparently in the experts’ opinion about it. The model parameters can be estimated by a stochastic expectation maximisation algorithm. Analysing the estimates gives insights into sources of label uncertainty. Here, we focus on the general class ambiguity, the heterogeneity of experts, and the origin city of the images. The results are relevant for all machine learning applications where image classification is pursued and labelling is subject to humans.
Abstract Image classification is often prone to labelling uncertainty. To generate suitable training data, images are labelled according to evaluations of human experts. This can result in ambiguities, which will affect subsequent models. In this work, we aim to model the labelling uncertainty in the context of remote sensing and the classification of satellite images. We construct a multinomial mixture model given the evaluations of multiple experts. This is based on the assumption that there is no ambiguity of the image class, but apparently in the experts’ opinion about it. The model parameters can be estimated by a stochastic expectation maximisation algorithm. Analysing the estimates gives insights into sources of label uncertainty. Here, we focus on the general class ambiguity, the heterogeneity of experts, and the origin city of the images. The results are relevant for all machine learning applications where image classification is pursued and labelling is subject to humans.
Author Hechinger, Katharina
Kauermann, Göran
Zhu, Xiao Xiang
Author_xml – sequence: 1
  givenname: Katharina
  surname: Hechinger
  fullname: Hechinger, Katharina
  email: Katharina.Hechinger@stat.uni-muenchen.de
– sequence: 2
  givenname: Xiao Xiang
  surname: Zhu
  fullname: Zhu, Xiao Xiang
– sequence: 3
  givenname: Göran
  surname: Kauermann
  fullname: Kauermann, Göran
BookMark eNqFkE1PAjEQhhuDiYBePffqYaH73XozRNGExIueN7PtFJaUFtoiwV_vbuDuXGYO7zOTeSZkZJ1FQh5TNkuZyOdbH0KQ84MBxbi4IeO0qOpE8LoakTFjeZmIrCzuyCSELesrZcWY_Cwg4tr5LnR2TeMG6cl5o2hno6PGSTBUmm7Xh-hvfy480-hO4FWghyPY2OnzwBlo0ZhhOlqJPkKPn6l2nu5AbjqL1CB4OwR2TqEJ9-RWgwn4cO1T8v32-rV4T1afy4_FyyqRWcZjUrccERgqVvGsBcRCK60klFpxEBKxbFudtynXmdRVzmVeFyKTSpZKCax0PiWzy17pXQgedbP3_Tf-3KSsGaw1F2vN1VoPPF0Ad9z_l_0Dz2t5NQ
Cites_doi 10.1198/016214502760047131
10.3390/rs10101572
10.1073/pnas.1721355115
10.1007/s10994-021-05946-3
10.3390/rs13040755
10.1109/TNNLS.2013.2292894
10.1016/j.media.2021.102062
10.1109/TKDE.2016.2545658
10.1093/biomet/63.3.581
10.1002/0471721182
10.1016/j.rse.2021.112794
10.1146/annurev-statistics-031017-100325
10.1613/jair.1.12125
10.1080/00949659608811772
10.1177/0165551512437638
10.1016/j.isprsjprs.2019.05.004
10.1093/biomet/61.2.215
10.1111/j.2517-6161.1977.tb01600.x
10.1109/MGRS.2017.2762307
10.1002/9781119013563
ContentType Journal Article
Copyright The Royal Statistical Society 2023. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2023
Copyright_xml – notice: The Royal Statistical Society 2023. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2023
DBID AAYXX
CITATION
DOI 10.1093/jrsssc/qlad089
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList CrossRef

DeliveryMethod fulltext_linktorsrc
Discipline Statistics
Computer Science
EISSN 1467-9876
EndPage 161
ExternalDocumentID 10_1093_jrsssc_qlad089
10.1093/jrsssc/qlad089
GroupedDBID -~X
.3N
.4S
.DC
.GA
.Y3
05W
10A
1OC
1OL
29L
2AX
3-9
31~
33P
3R3
3SF
4.4
50Y
50Z
51W
51X
52M
52N
52O
52P
52S
52T
52U
52W
52X
5HH
5LA
66C
7PT
8-0
8-1
8-3
8UM
8VB
930
A03
AAESR
AAEVG
AAHHS
AANHP
AAONW
AARHZ
AASGY
AAUAY
AAWIL
AAXRX
AAZKR
ABAWQ
ABBHK
ABCQN
ABCSF
ABCUV
ABDFA
ABEML
ABFAN
ABIVO
ABLJU
ABPFR
ABPQH
ABPTD
ABWST
ABXSQ
ABYWD
ACAHQ
ACBWZ
ACCFJ
ACCZN
ACFRR
ACGFS
ACHJO
ACIWK
ACMTB
ACNCT
ACPOU
ACRPL
ACSCC
ACTMH
ACUBG
ACXBN
ACXQS
ACYXJ
ADBBV
ADEOM
ADIZJ
ADKYN
ADMGS
ADNMO
ADODI
ADOZA
ADQBN
ADRDM
ADULT
ADVEK
ADZMN
AEEZP
AEGXH
AEIMD
AEMOZ
AEQDE
AEUPB
AFBPY
AFEBI
AFGKR
AFVYC
AFXHP
AFZJQ
AGLNM
AGQPQ
AHQJS
AIHAF
AIURR
AIWBW
AJAOE
AJBDE
AJNCP
AJXKR
AKVCP
ALAGY
ALMA_UNASSIGNED_HOLDINGS
ALRMG
ALUQN
AMBMR
AMVHM
AMYDB
ANFBD
ARCSS
ASPBG
AS~
ATGXG
ATUGU
AUFTA
AVWKF
AZBYB
AZFZN
AZVAB
BAFTC
BCRHZ
BDRZF
BHBCM
BMNLL
BMXJE
BNHUX
BROTX
BRXPI
BY8
CAG
CO8
COF
D-E
DCZOG
DPXWK
DQDLB
DR2
DRFUL
DRSTM
DSRWC
EBA
EBO
EBR
EBS
EBU
ECEWR
EDO
EJD
EMK
F00
F5P
FEDTE
FVMVE
G-S
G.N
GODZA
H.T
H.X
HF~
HQ6
HVGLF
HZI
HZ~
H~9
IHE
IPSME
IX1
J0M
JAAYA
JAS
JBMMH
JBZCM
JENOY
JHFFW
JKQEH
JLEZI
JLXEF
JMS
JPL
JST
K1G
K48
LATKE
LC2
LC3
LEEKS
LH4
LITHE
LOXES
LP6
LP7
LUTES
LW6
LYRES
MK4
MRFUL
MRSTM
MSFUL
MSSTM
MXFUL
MXSTM
N04
N05
NF~
NU-
O66
O9-
OIG
P2W
P2X
P4D
PQQKQ
PZZ
Q.N
Q11
QB0
QWB
R.K
RJQFR
RNS
ROL
ROX
RX1
SA0
SUPJJ
TH9
TUS
U5U
UAP
UB1
W8V
W99
WBKPD
WH7
WIH
WIK
WOHZO
WQJ
WYISQ
XBAML
XG1
YF5
ZGI
ZL0
ZZTAW
~IA
~WT
AAYXX
CITATION
ID FETCH-LOGICAL-c228t-7b8eea0ed0682baee4fdfdca5fd8a9cee5bbf3b18f2cf638c37492cdc5dd9e6f3
ISSN 0035-9254
IngestDate Tue Jul 01 01:20:41 EDT 2025
Mon Jun 30 08:34:52 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 1
Keywords stochastic expectation maximisation
mixture models
multiple labellers
expert evaluations
labelling uncertainty
Language English
License This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/pages/standard-publication-reuse-rights)
https://academic.oup.com/pages/standard-publication-reuse-rights
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-c228t-7b8eea0ed0682baee4fdfdca5fd8a9cee5bbf3b18f2cf638c37492cdc5dd9e6f3
PageCount 19
ParticipantIDs crossref_primary_10_1093_jrsssc_qlad089
oup_primary_10_1093_jrsssc_qlad089
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024-01-11
PublicationDateYYYYMMDD 2024-01-11
PublicationDate_xml – month: 01
  year: 2024
  text: 2024-01-11
  day: 11
PublicationDecade 2020
PublicationPlace US
PublicationPlace_xml – name: US
PublicationTitle Journal of the Royal Statistical Society Series C: Applied Statistics
PublicationYear 2024
Publisher Oxford University Press
Publisher_xml – name: Oxford University Press
References Hüllermeier (2024011112402991000_qlad089-B15) 2021; 110
Ju (2024011112402991000_qlad089-B16) 2021
Little (2024011112402991000_qlad089-B20) 2002
Celeux (2024011112402991000_qlad089-B3) 1996; 55
Estellés-Arolas (2024011112402991000_qlad089-B8) 2012; 38
Chang (2024011112402991000_qlad089-B4) 2017
Zhu (2024011112402991000_qlad089-B37) 2021
Luo (2024011112402991000_qlad089-B21) 2021; 13
Qiu (2024011112402991000_qlad089-B29) 2018; 10
Lazarsfeld (2024011112402991000_qlad089-B19) 1950
Zhang (2024011112402991000_qlad089-B36) 2020
Zhu (2024011112402991000_qlad089-B39) 2022; 269
Dawid (2024011112402991000_qlad089-B5) 1979; 28
Raykar (2024011112402991000_qlad089-B30) 2011; 24
Friedman (2024011112402991000_qlad089-B11) 2001
Rubin (2024011112402991000_qlad089-B32) 1976; 63
Goodman (2024011112402991000_qlad089-B14) 1974; 61
McLachlan (2024011112402991000_qlad089-B24) 2019; 6
Zhu (2024011112402991000_qlad089-B40) 2017; 5
Gawlikowski (2024011112402991000_qlad089-B12) 2023
Dgani (2024011112402991000_qlad089-B7) 2018
Kamar (2024011112402991000_qlad089-B17) 2012
Settles (2024011112402991000_qlad089-B34) 2009
Cadez (2024011112402991000_qlad089-B2) 2001
Robbins (2024011112402991000_qlad089-B31) 1992
Peterson (2024011112402991000_qlad089-B26) 2019
Russwurm (2024011112402991000_qlad089-B33) 2020
Budd (2024011112402991000_qlad089-B1) 2021; 71
McLachlan (2024011112402991000_qlad089-B23) 2000
Qiu (2024011112402991000_qlad089-B28) 2019; 154
Frenay (2024011112402991000_qlad089-B10) 2014; 25
Zhu (2024011112402991000_qlad089-B38) 2020
Fraley (2024011112402991000_qlad089-B9) 2002; 97
Phillips (2024011112402991000_qlad089-B27) 2018; 115
Stewart (2024011112402991000_qlad089-B35) 2012
Magidson (2024011112402991000_qlad089-B22) 2020
Northcutt (2024011112402991000_qlad089-B25) 2021; 70
Dempster (2024011112402991000_qlad089-B6) 1977; 39
Geng (2024011112402991000_qlad089-B13) 2016; 28
Karger (2024011112402991000_qlad089-B18) 2013
References_xml – volume-title: The elements of statistical learning
  year: 2001
  ident: 2024011112402991000_qlad089-B11
– year: 2009
  ident: 2024011112402991000_qlad089-B34
– volume: 97
  start-page: 611
  issue: 458
  year: 2002
  ident: 2024011112402991000_qlad089-B9
  article-title: Model-based clustering, discriminant analysis, and density estimation
  publication-title: Journal of the American Statistical Association
  doi: 10.1198/016214502760047131
– volume: 10
  start-page: 1572
  issue: 10
  year: 2018
  ident: 2024011112402991000_qlad089-B29
  article-title: Feature importance analysis for local climate zone classification using a residual convolutional neural network with multi-source datasets
  publication-title: Remote Sensing
  doi: 10.3390/rs10101572
– year: 2020
  ident: 2024011112402991000_qlad089-B38
– year: 2020
  ident: 2024011112402991000_qlad089-B36
– volume: 115
  start-page: 6171
  issue: 24
  year: 2018
  ident: 2024011112402991000_qlad089-B27
  article-title: Face recognition accuracy of forensic examiners, superrecognizers, and face recognition algorithms
  publication-title: Proceedings of the National Academy of Sciences
  doi: 10.1073/pnas.1721355115
– volume: 110
  start-page: 457
  issue: 3
  year: 2021
  ident: 2024011112402991000_qlad089-B15
  article-title: Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods
  publication-title: Machine Learning
  doi: 10.1007/s10994-021-05946-3
– volume-title: Latent class analysis
  year: 2020
  ident: 2024011112402991000_qlad089-B22
– volume: 13
  start-page: 755
  issue: 4
  year: 2021
  ident: 2024011112402991000_qlad089-B21
  article-title: Neighbor-based label distribution learning to model label ambiguity for aerial scene classification
  publication-title: Remote Sensing
  doi: 10.3390/rs13040755
– year: 2023
  ident: 2024011112402991000_qlad089-B12
– volume: 24
  year: 2011
  ident: 2024011112402991000_qlad089-B30
  article-title: Ranking annotators for crowdsourced labeling tasks
  publication-title: Advances in Neural Information Processing Systems
– year: 1992
  ident: 2024011112402991000_qlad089-B31
– volume: 25
  start-page: 845
  issue: 5
  year: 2014
  ident: 2024011112402991000_qlad089-B10
  article-title: Classification in the presence of label noise: A survey
  publication-title: IEEE Transactions on Neural Networks and Learning Systems
  doi: 10.1109/TNNLS.2013.2292894
– volume: 71
  start-page: 102062
  year: 2021
  ident: 2024011112402991000_qlad089-B1
  article-title: A survey on active learning and human-in-the-loop deep learning for medical image analysis
  publication-title: Medical Image Analysis
  doi: 10.1016/j.media.2021.102062
– year: 2017
  ident: 2024011112402991000_qlad089-B4
– year: 2013
  ident: 2024011112402991000_qlad089-B18
– volume: 28
  start-page: 1734
  issue: 7
  year: 2016
  ident: 2024011112402991000_qlad089-B13
  article-title: Label distribution learning
  publication-title: IEEE Transactions on Knowledge and Data Engineering
  doi: 10.1109/TKDE.2016.2545658
– year: 2001
  ident: 2024011112402991000_qlad089-B2
– year: 1950
  ident: 2024011112402991000_qlad089-B19
– year: 2020
  ident: 2024011112402991000_qlad089-B33
– volume: 63
  start-page: 581
  issue: 3
  year: 1976
  ident: 2024011112402991000_qlad089-B32
  article-title: Inference and missing data
  publication-title: Biometrika
  doi: 10.1093/biomet/63.3.581
– year: 2018
  ident: 2024011112402991000_qlad089-B7
– volume-title: Finite mixture models
  year: 2000
  ident: 2024011112402991000_qlad089-B23
  doi: 10.1002/0471721182
– volume: 269
  start-page: 112794
  year: 2022
  ident: 2024011112402991000_qlad089-B39
  article-title: The urban morphology on our planet–global perspectives from space
  publication-title: Remote Sensing of Environment
  doi: 10.1016/j.rse.2021.112794
– year: 2012
  ident: 2024011112402991000_qlad089-B17
– volume: 6
  start-page: 355
  issue: 1
  year: 2019
  ident: 2024011112402991000_qlad089-B24
  article-title: Finite mixture models
  publication-title: Annual Review of Statistics and its Application
  doi: 10.1146/annurev-statistics-031017-100325
– volume: 70
  start-page: 1373
  year: 2021
  ident: 2024011112402991000_qlad089-B25
  article-title: Confident learning: Estimating uncertainty in dataset labels
  publication-title: Journal of Artificial Intelligence Research
  doi: 10.1613/jair.1.12125
– volume: 55
  start-page: 287
  issue: 4
  year: 1996
  ident: 2024011112402991000_qlad089-B3
  article-title: Stochastic versions of the EM algorithm: An experimental study in the mixture case
  publication-title: Journal of Statistical Computation and Simulation
  doi: 10.1080/00949659608811772
– volume: 38
  start-page: 189
  issue: 2
  year: 2012
  ident: 2024011112402991000_qlad089-B8
  article-title: Towards an integrated crowdsourcing definition
  publication-title: Journal of Information Science
  doi: 10.1177/0165551512437638
– volume: 154
  start-page: 151
  year: 2019
  ident: 2024011112402991000_qlad089-B28
  article-title: Local climate zone-based urban land cover classification from multi-seasonal sentinel-2 images with a recurrent residual network
  publication-title: ISPRS Journal of Photogrammetry and Remote Sensing
  doi: 10.1016/j.isprsjprs.2019.05.004
– volume: 61
  start-page: 215
  issue: 2
  year: 1974
  ident: 2024011112402991000_qlad089-B14
  article-title: Exploratory latent structure analysis using both identifiable and unidentifiable models
  publication-title: Biometrika
  doi: 10.1093/biomet/61.2.215
– year: 2021
  ident: 2024011112402991000_qlad089-B16
– year: 2012
  ident: 2024011112402991000_qlad089-B35
– year: 2019
  ident: 2024011112402991000_qlad089-B26
– volume: 39
  start-page: 1
  issue: 1
  year: 1977
  ident: 2024011112402991000_qlad089-B6
  article-title: Maximum likelihood from incomplete data via the EM algorithm
  publication-title: Journal of the Royal Statistical Society: Series B (Methodological)
  doi: 10.1111/j.2517-6161.1977.tb01600.x
– volume: 5
  start-page: 8
  issue: 4
  year: 2017
  ident: 2024011112402991000_qlad089-B40
  article-title: Deep learning in remote sensing: A comprehensive review and list of resources
  publication-title: IEEE Geoscience and Remote Sensing Magazine
  doi: 10.1109/MGRS.2017.2762307
– volume: 28
  start-page: 20
  issue: 1
  year: 1979
  ident: 2024011112402991000_qlad089-B5
  article-title: Maximum likelihood estimation of observer error-rates using the EM algorithm
  publication-title: Journal of the Royal Statistical Society: Series C (Applied Statistics)
– year: 2021
  ident: 2024011112402991000_qlad089-B37
– volume-title: Statistical analysis with missing data
  year: 2002
  ident: 2024011112402991000_qlad089-B20
  doi: 10.1002/9781119013563
SSID ssj0000104
ssj0018311
Score 2.387215
Snippet Abstract Image classification is often prone to labelling uncertainty. To generate suitable training data, images are labelled according to evaluations of...
Image classification is often prone to labelling uncertainty. To generate suitable training data, images are labelled according to evaluations of human...
SourceID crossref
oup
SourceType Index Database
Publisher
StartPage 143
Title Categorising the world into local climate zones: towards quantifying labelling uncertainty for machine learning models
Volume 73
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lj9MwELbKctkLjwXELg9ZCIlDld0mzsPhBuVRIcEB7UoVlyqOxwuIbaFpOew_5d8w43HSLFCxcLGsyLVSz5eZsT3zjRCPcxtbhbKNCluaKAX65oytI5UUUKrcuAroRvftu3xykr6ZZtPB4Ecvamm9Mof1-R_zSv5HqvgM5UpZsv8g2W5SfIB9lC-2KGFsLyXjMdE8LPAzbXOePP8pUUAsht5IUeIjuqQwPPeU_BTD4cNkG0qmpDAhn-SEQACm5kYjxyECIY7zzIdaQltb4pQL5zRbPFp6Az6OIBfWM0BTP8SF0jkcNMOxP4oMvm83rvPsJ-CDOzehHriXn3em48PHNT2ffqoW1ASrS_aiWpOJYWf8Nd3-P8-XAfjhTCOhOJgo3pxpbMmV7OtxlUVlwvTTh8Cqm1R-qbmaTKvbuUzKBQyzoo6ZHCrY_JgJ4X8zJ0y19XnZNE2NnW9fKjviike_kHRvH3xFXE1wB0PFNV687zOb0TY43HZpFcctfSj9q45cVB3xdEdhsgvOEyVk9nyh4xviWhC5fMaIvCkGMN8T19sCITLYiz2xu5HvLfG9j1aJWJEerZLQKj1aZUCr9Gh9KgNWZQ-rssOq7GFVoiBlwKpssSoZq7fFyauXx-NJFKp-RHWS6FVUGA1QjcCOcp2YCiB11tm6ypzVVYk-XWaMUybWLqkdWo9aFWmZ1LbOrC0hd-qO2Jnja94VEjQqm1gXzhmT4k6kSrXLLf5aFQq0TffFk3Y5Z1-Z3GXGQRlqxgs_Cwu_Lx7hav9l0MFlBt0TuxvE3xc7q-UaHqBfuzIPPUJ-AkRZsds
linkProvider Wiley-Blackwell
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Categorising+the+world+into+local+climate+zones%3A+towards+quantifying+labelling+uncertainty+for+machine+learning+models&rft.jtitle=Journal+of+the+Royal+Statistical+Society+Series+C%3A+Applied+Statistics&rft.au=Hechinger%2C+Katharina&rft.au=Zhu%2C+Xiao+Xiang&rft.au=Kauermann%2C+G%C3%B6ran&rft.date=2024-01-11&rft.pub=Oxford+University+Press&rft.issn=0035-9254&rft.eissn=1467-9876&rft.volume=73&rft.issue=1&rft.spage=143&rft.epage=161&rft_id=info:doi/10.1093%2Fjrsssc%2Fqlad089&rft.externalDocID=10.1093%2Fjrsssc%2Fqlad089
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0035-9254&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0035-9254&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0035-9254&client=summon