Categorising the world into local climate zones: towards quantifying labelling uncertainty for machine learning models
Abstract Image classification is often prone to labelling uncertainty. To generate suitable training data, images are labelled according to evaluations of human experts. This can result in ambiguities, which will affect subsequent models. In this work, we aim to model the labelling uncertainty in th...
Saved in:
Published in | Journal of the Royal Statistical Society Series C: Applied Statistics Vol. 73; no. 1; pp. 143 - 161 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
US
Oxford University Press
11.01.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Abstract
Image classification is often prone to labelling uncertainty. To generate suitable training data, images are labelled according to evaluations of human experts. This can result in ambiguities, which will affect subsequent models. In this work, we aim to model the labelling uncertainty in the context of remote sensing and the classification of satellite images. We construct a multinomial mixture model given the evaluations of multiple experts. This is based on the assumption that there is no ambiguity of the image class, but apparently in the experts’ opinion about it. The model parameters can be estimated by a stochastic expectation maximisation algorithm. Analysing the estimates gives insights into sources of label uncertainty. Here, we focus on the general class ambiguity, the heterogeneity of experts, and the origin city of the images. The results are relevant for all machine learning applications where image classification is pursued and labelling is subject to humans. |
---|---|
AbstractList | Image classification is often prone to labelling uncertainty. To generate suitable training data, images are labelled according to evaluations of human experts. This can result in ambiguities, which will affect subsequent models. In this work, we aim to model the labelling uncertainty in the context of remote sensing and the classification of satellite images. We construct a multinomial mixture model given the evaluations of multiple experts. This is based on the assumption that there is no ambiguity of the image class, but apparently in the experts’ opinion about it. The model parameters can be estimated by a stochastic expectation maximisation algorithm. Analysing the estimates gives insights into sources of label uncertainty. Here, we focus on the general class ambiguity, the heterogeneity of experts, and the origin city of the images. The results are relevant for all machine learning applications where image classification is pursued and labelling is subject to humans. Abstract Image classification is often prone to labelling uncertainty. To generate suitable training data, images are labelled according to evaluations of human experts. This can result in ambiguities, which will affect subsequent models. In this work, we aim to model the labelling uncertainty in the context of remote sensing and the classification of satellite images. We construct a multinomial mixture model given the evaluations of multiple experts. This is based on the assumption that there is no ambiguity of the image class, but apparently in the experts’ opinion about it. The model parameters can be estimated by a stochastic expectation maximisation algorithm. Analysing the estimates gives insights into sources of label uncertainty. Here, we focus on the general class ambiguity, the heterogeneity of experts, and the origin city of the images. The results are relevant for all machine learning applications where image classification is pursued and labelling is subject to humans. |
Author | Hechinger, Katharina Kauermann, Göran Zhu, Xiao Xiang |
Author_xml | – sequence: 1 givenname: Katharina surname: Hechinger fullname: Hechinger, Katharina email: Katharina.Hechinger@stat.uni-muenchen.de – sequence: 2 givenname: Xiao Xiang surname: Zhu fullname: Zhu, Xiao Xiang – sequence: 3 givenname: Göran surname: Kauermann fullname: Kauermann, Göran |
BookMark | eNqFkE1PAjEQhhuDiYBePffqYaH73XozRNGExIueN7PtFJaUFtoiwV_vbuDuXGYO7zOTeSZkZJ1FQh5TNkuZyOdbH0KQ84MBxbi4IeO0qOpE8LoakTFjeZmIrCzuyCSELesrZcWY_Cwg4tr5LnR2TeMG6cl5o2hno6PGSTBUmm7Xh-hvfy480-hO4FWghyPY2OnzwBlo0ZhhOlqJPkKPn6l2nu5AbjqL1CB4OwR2TqEJ9-RWgwn4cO1T8v32-rV4T1afy4_FyyqRWcZjUrccERgqVvGsBcRCK60klFpxEBKxbFudtynXmdRVzmVeFyKTSpZKCax0PiWzy17pXQgedbP3_Tf-3KSsGaw1F2vN1VoPPF0Ad9z_l_0Dz2t5NQ |
Cites_doi | 10.1198/016214502760047131 10.3390/rs10101572 10.1073/pnas.1721355115 10.1007/s10994-021-05946-3 10.3390/rs13040755 10.1109/TNNLS.2013.2292894 10.1016/j.media.2021.102062 10.1109/TKDE.2016.2545658 10.1093/biomet/63.3.581 10.1002/0471721182 10.1016/j.rse.2021.112794 10.1146/annurev-statistics-031017-100325 10.1613/jair.1.12125 10.1080/00949659608811772 10.1177/0165551512437638 10.1016/j.isprsjprs.2019.05.004 10.1093/biomet/61.2.215 10.1111/j.2517-6161.1977.tb01600.x 10.1109/MGRS.2017.2762307 10.1002/9781119013563 |
ContentType | Journal Article |
Copyright | The Royal Statistical Society 2023. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2023 |
Copyright_xml | – notice: The Royal Statistical Society 2023. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2023 |
DBID | AAYXX CITATION |
DOI | 10.1093/jrsssc/qlad089 |
DatabaseName | CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | CrossRef |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Statistics Computer Science |
EISSN | 1467-9876 |
EndPage | 161 |
ExternalDocumentID | 10_1093_jrsssc_qlad089 10.1093/jrsssc/qlad089 |
GroupedDBID | -~X .3N .4S .DC .GA .Y3 05W 10A 1OC 1OL 29L 2AX 3-9 31~ 33P 3R3 3SF 4.4 50Y 50Z 51W 51X 52M 52N 52O 52P 52S 52T 52U 52W 52X 5HH 5LA 66C 7PT 8-0 8-1 8-3 8UM 8VB 930 A03 AAESR AAEVG AAHHS AANHP AAONW AARHZ AASGY AAUAY AAWIL AAXRX AAZKR ABAWQ ABBHK ABCQN ABCSF ABCUV ABDFA ABEML ABFAN ABIVO ABLJU ABPFR ABPQH ABPTD ABWST ABXSQ ABYWD ACAHQ ACBWZ ACCFJ ACCZN ACFRR ACGFS ACHJO ACIWK ACMTB ACNCT ACPOU ACRPL ACSCC ACTMH ACUBG ACXBN ACXQS ACYXJ ADBBV ADEOM ADIZJ ADKYN ADMGS ADNMO ADODI ADOZA ADQBN ADRDM ADULT ADVEK ADZMN AEEZP AEGXH AEIMD AEMOZ AEQDE AEUPB AFBPY AFEBI AFGKR AFVYC AFXHP AFZJQ AGLNM AGQPQ AHQJS AIHAF AIURR AIWBW AJAOE AJBDE AJNCP AJXKR AKVCP ALAGY ALMA_UNASSIGNED_HOLDINGS ALRMG ALUQN AMBMR AMVHM AMYDB ANFBD ARCSS ASPBG AS~ ATGXG ATUGU AUFTA AVWKF AZBYB AZFZN AZVAB BAFTC BCRHZ BDRZF BHBCM BMNLL BMXJE BNHUX BROTX BRXPI BY8 CAG CO8 COF D-E DCZOG DPXWK DQDLB DR2 DRFUL DRSTM DSRWC EBA EBO EBR EBS EBU ECEWR EDO EJD EMK F00 F5P FEDTE FVMVE G-S G.N GODZA H.T H.X HF~ HQ6 HVGLF HZI HZ~ H~9 IHE IPSME IX1 J0M JAAYA JAS JBMMH JBZCM JENOY JHFFW JKQEH JLEZI JLXEF JMS JPL JST K1G K48 LATKE LC2 LC3 LEEKS LH4 LITHE LOXES LP6 LP7 LUTES LW6 LYRES MK4 MRFUL MRSTM MSFUL MSSTM MXFUL MXSTM N04 N05 NF~ NU- O66 O9- OIG P2W P2X P4D PQQKQ PZZ Q.N Q11 QB0 QWB R.K RJQFR RNS ROL ROX RX1 SA0 SUPJJ TH9 TUS U5U UAP UB1 W8V W99 WBKPD WH7 WIH WIK WOHZO WQJ WYISQ XBAML XG1 YF5 ZGI ZL0 ZZTAW ~IA ~WT AAYXX CITATION |
ID | FETCH-LOGICAL-c228t-7b8eea0ed0682baee4fdfdca5fd8a9cee5bbf3b18f2cf638c37492cdc5dd9e6f3 |
ISSN | 0035-9254 |
IngestDate | Tue Jul 01 01:20:41 EDT 2025 Mon Jun 30 08:34:52 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Keywords | stochastic expectation maximisation mixture models multiple labellers expert evaluations labelling uncertainty |
Language | English |
License | This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic.oup.com/pages/standard-publication-reuse-rights) https://academic.oup.com/pages/standard-publication-reuse-rights |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c228t-7b8eea0ed0682baee4fdfdca5fd8a9cee5bbf3b18f2cf638c37492cdc5dd9e6f3 |
PageCount | 19 |
ParticipantIDs | crossref_primary_10_1093_jrsssc_qlad089 oup_primary_10_1093_jrsssc_qlad089 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2024-01-11 |
PublicationDateYYYYMMDD | 2024-01-11 |
PublicationDate_xml | – month: 01 year: 2024 text: 2024-01-11 day: 11 |
PublicationDecade | 2020 |
PublicationPlace | US |
PublicationPlace_xml | – name: US |
PublicationTitle | Journal of the Royal Statistical Society Series C: Applied Statistics |
PublicationYear | 2024 |
Publisher | Oxford University Press |
Publisher_xml | – name: Oxford University Press |
References | Hüllermeier (2024011112402991000_qlad089-B15) 2021; 110 Ju (2024011112402991000_qlad089-B16) 2021 Little (2024011112402991000_qlad089-B20) 2002 Celeux (2024011112402991000_qlad089-B3) 1996; 55 Estellés-Arolas (2024011112402991000_qlad089-B8) 2012; 38 Chang (2024011112402991000_qlad089-B4) 2017 Zhu (2024011112402991000_qlad089-B37) 2021 Luo (2024011112402991000_qlad089-B21) 2021; 13 Qiu (2024011112402991000_qlad089-B29) 2018; 10 Lazarsfeld (2024011112402991000_qlad089-B19) 1950 Zhang (2024011112402991000_qlad089-B36) 2020 Zhu (2024011112402991000_qlad089-B39) 2022; 269 Dawid (2024011112402991000_qlad089-B5) 1979; 28 Raykar (2024011112402991000_qlad089-B30) 2011; 24 Friedman (2024011112402991000_qlad089-B11) 2001 Rubin (2024011112402991000_qlad089-B32) 1976; 63 Goodman (2024011112402991000_qlad089-B14) 1974; 61 McLachlan (2024011112402991000_qlad089-B24) 2019; 6 Zhu (2024011112402991000_qlad089-B40) 2017; 5 Gawlikowski (2024011112402991000_qlad089-B12) 2023 Dgani (2024011112402991000_qlad089-B7) 2018 Kamar (2024011112402991000_qlad089-B17) 2012 Settles (2024011112402991000_qlad089-B34) 2009 Cadez (2024011112402991000_qlad089-B2) 2001 Robbins (2024011112402991000_qlad089-B31) 1992 Peterson (2024011112402991000_qlad089-B26) 2019 Russwurm (2024011112402991000_qlad089-B33) 2020 Budd (2024011112402991000_qlad089-B1) 2021; 71 McLachlan (2024011112402991000_qlad089-B23) 2000 Qiu (2024011112402991000_qlad089-B28) 2019; 154 Frenay (2024011112402991000_qlad089-B10) 2014; 25 Zhu (2024011112402991000_qlad089-B38) 2020 Fraley (2024011112402991000_qlad089-B9) 2002; 97 Phillips (2024011112402991000_qlad089-B27) 2018; 115 Stewart (2024011112402991000_qlad089-B35) 2012 Magidson (2024011112402991000_qlad089-B22) 2020 Northcutt (2024011112402991000_qlad089-B25) 2021; 70 Dempster (2024011112402991000_qlad089-B6) 1977; 39 Geng (2024011112402991000_qlad089-B13) 2016; 28 Karger (2024011112402991000_qlad089-B18) 2013 |
References_xml | – volume-title: The elements of statistical learning year: 2001 ident: 2024011112402991000_qlad089-B11 – year: 2009 ident: 2024011112402991000_qlad089-B34 – volume: 97 start-page: 611 issue: 458 year: 2002 ident: 2024011112402991000_qlad089-B9 article-title: Model-based clustering, discriminant analysis, and density estimation publication-title: Journal of the American Statistical Association doi: 10.1198/016214502760047131 – volume: 10 start-page: 1572 issue: 10 year: 2018 ident: 2024011112402991000_qlad089-B29 article-title: Feature importance analysis for local climate zone classification using a residual convolutional neural network with multi-source datasets publication-title: Remote Sensing doi: 10.3390/rs10101572 – year: 2020 ident: 2024011112402991000_qlad089-B38 – year: 2020 ident: 2024011112402991000_qlad089-B36 – volume: 115 start-page: 6171 issue: 24 year: 2018 ident: 2024011112402991000_qlad089-B27 article-title: Face recognition accuracy of forensic examiners, superrecognizers, and face recognition algorithms publication-title: Proceedings of the National Academy of Sciences doi: 10.1073/pnas.1721355115 – volume: 110 start-page: 457 issue: 3 year: 2021 ident: 2024011112402991000_qlad089-B15 article-title: Aleatoric and epistemic uncertainty in machine learning: An introduction to concepts and methods publication-title: Machine Learning doi: 10.1007/s10994-021-05946-3 – volume-title: Latent class analysis year: 2020 ident: 2024011112402991000_qlad089-B22 – volume: 13 start-page: 755 issue: 4 year: 2021 ident: 2024011112402991000_qlad089-B21 article-title: Neighbor-based label distribution learning to model label ambiguity for aerial scene classification publication-title: Remote Sensing doi: 10.3390/rs13040755 – year: 2023 ident: 2024011112402991000_qlad089-B12 – volume: 24 year: 2011 ident: 2024011112402991000_qlad089-B30 article-title: Ranking annotators for crowdsourced labeling tasks publication-title: Advances in Neural Information Processing Systems – year: 1992 ident: 2024011112402991000_qlad089-B31 – volume: 25 start-page: 845 issue: 5 year: 2014 ident: 2024011112402991000_qlad089-B10 article-title: Classification in the presence of label noise: A survey publication-title: IEEE Transactions on Neural Networks and Learning Systems doi: 10.1109/TNNLS.2013.2292894 – volume: 71 start-page: 102062 year: 2021 ident: 2024011112402991000_qlad089-B1 article-title: A survey on active learning and human-in-the-loop deep learning for medical image analysis publication-title: Medical Image Analysis doi: 10.1016/j.media.2021.102062 – year: 2017 ident: 2024011112402991000_qlad089-B4 – year: 2013 ident: 2024011112402991000_qlad089-B18 – volume: 28 start-page: 1734 issue: 7 year: 2016 ident: 2024011112402991000_qlad089-B13 article-title: Label distribution learning publication-title: IEEE Transactions on Knowledge and Data Engineering doi: 10.1109/TKDE.2016.2545658 – year: 2001 ident: 2024011112402991000_qlad089-B2 – year: 1950 ident: 2024011112402991000_qlad089-B19 – year: 2020 ident: 2024011112402991000_qlad089-B33 – volume: 63 start-page: 581 issue: 3 year: 1976 ident: 2024011112402991000_qlad089-B32 article-title: Inference and missing data publication-title: Biometrika doi: 10.1093/biomet/63.3.581 – year: 2018 ident: 2024011112402991000_qlad089-B7 – volume-title: Finite mixture models year: 2000 ident: 2024011112402991000_qlad089-B23 doi: 10.1002/0471721182 – volume: 269 start-page: 112794 year: 2022 ident: 2024011112402991000_qlad089-B39 article-title: The urban morphology on our planet–global perspectives from space publication-title: Remote Sensing of Environment doi: 10.1016/j.rse.2021.112794 – year: 2012 ident: 2024011112402991000_qlad089-B17 – volume: 6 start-page: 355 issue: 1 year: 2019 ident: 2024011112402991000_qlad089-B24 article-title: Finite mixture models publication-title: Annual Review of Statistics and its Application doi: 10.1146/annurev-statistics-031017-100325 – volume: 70 start-page: 1373 year: 2021 ident: 2024011112402991000_qlad089-B25 article-title: Confident learning: Estimating uncertainty in dataset labels publication-title: Journal of Artificial Intelligence Research doi: 10.1613/jair.1.12125 – volume: 55 start-page: 287 issue: 4 year: 1996 ident: 2024011112402991000_qlad089-B3 article-title: Stochastic versions of the EM algorithm: An experimental study in the mixture case publication-title: Journal of Statistical Computation and Simulation doi: 10.1080/00949659608811772 – volume: 38 start-page: 189 issue: 2 year: 2012 ident: 2024011112402991000_qlad089-B8 article-title: Towards an integrated crowdsourcing definition publication-title: Journal of Information Science doi: 10.1177/0165551512437638 – volume: 154 start-page: 151 year: 2019 ident: 2024011112402991000_qlad089-B28 article-title: Local climate zone-based urban land cover classification from multi-seasonal sentinel-2 images with a recurrent residual network publication-title: ISPRS Journal of Photogrammetry and Remote Sensing doi: 10.1016/j.isprsjprs.2019.05.004 – volume: 61 start-page: 215 issue: 2 year: 1974 ident: 2024011112402991000_qlad089-B14 article-title: Exploratory latent structure analysis using both identifiable and unidentifiable models publication-title: Biometrika doi: 10.1093/biomet/61.2.215 – year: 2021 ident: 2024011112402991000_qlad089-B16 – year: 2012 ident: 2024011112402991000_qlad089-B35 – year: 2019 ident: 2024011112402991000_qlad089-B26 – volume: 39 start-page: 1 issue: 1 year: 1977 ident: 2024011112402991000_qlad089-B6 article-title: Maximum likelihood from incomplete data via the EM algorithm publication-title: Journal of the Royal Statistical Society: Series B (Methodological) doi: 10.1111/j.2517-6161.1977.tb01600.x – volume: 5 start-page: 8 issue: 4 year: 2017 ident: 2024011112402991000_qlad089-B40 article-title: Deep learning in remote sensing: A comprehensive review and list of resources publication-title: IEEE Geoscience and Remote Sensing Magazine doi: 10.1109/MGRS.2017.2762307 – volume: 28 start-page: 20 issue: 1 year: 1979 ident: 2024011112402991000_qlad089-B5 article-title: Maximum likelihood estimation of observer error-rates using the EM algorithm publication-title: Journal of the Royal Statistical Society: Series C (Applied Statistics) – year: 2021 ident: 2024011112402991000_qlad089-B37 – volume-title: Statistical analysis with missing data year: 2002 ident: 2024011112402991000_qlad089-B20 doi: 10.1002/9781119013563 |
SSID | ssj0000104 ssj0018311 |
Score | 2.387215 |
Snippet | Abstract
Image classification is often prone to labelling uncertainty. To generate suitable training data, images are labelled according to evaluations of... Image classification is often prone to labelling uncertainty. To generate suitable training data, images are labelled according to evaluations of human... |
SourceID | crossref oup |
SourceType | Index Database Publisher |
StartPage | 143 |
Title | Categorising the world into local climate zones: towards quantifying labelling uncertainty for machine learning models |
Volume | 73 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1Lj9MwELbKctkLjwXELg9ZCIlDld0mzsPhBuVRIcEB7UoVlyqOxwuIbaFpOew_5d8w43HSLFCxcLGsyLVSz5eZsT3zjRCPcxtbhbKNCluaKAX65oytI5UUUKrcuAroRvftu3xykr6ZZtPB4Ecvamm9Mof1-R_zSv5HqvgM5UpZsv8g2W5SfIB9lC-2KGFsLyXjMdE8LPAzbXOePP8pUUAsht5IUeIjuqQwPPeU_BTD4cNkG0qmpDAhn-SEQACm5kYjxyECIY7zzIdaQltb4pQL5zRbPFp6Az6OIBfWM0BTP8SF0jkcNMOxP4oMvm83rvPsJ-CDOzehHriXn3em48PHNT2ffqoW1ASrS_aiWpOJYWf8Nd3-P8-XAfjhTCOhOJgo3pxpbMmV7OtxlUVlwvTTh8Cqm1R-qbmaTKvbuUzKBQyzoo6ZHCrY_JgJ4X8zJ0y19XnZNE2NnW9fKjviike_kHRvH3xFXE1wB0PFNV687zOb0TY43HZpFcctfSj9q45cVB3xdEdhsgvOEyVk9nyh4xviWhC5fMaIvCkGMN8T19sCITLYiz2xu5HvLfG9j1aJWJEerZLQKj1aZUCr9Gh9KgNWZQ-rssOq7GFVoiBlwKpssSoZq7fFyauXx-NJFKp-RHWS6FVUGA1QjcCOcp2YCiB11tm6ypzVVYk-XWaMUybWLqkdWo9aFWmZ1LbOrC0hd-qO2Jnja94VEjQqm1gXzhmT4k6kSrXLLf5aFQq0TffFk3Y5Z1-Z3GXGQRlqxgs_Cwu_Lx7hav9l0MFlBt0TuxvE3xc7q-UaHqBfuzIPPUJ-AkRZsds |
linkProvider | Wiley-Blackwell |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Categorising+the+world+into+local+climate+zones%3A+towards+quantifying+labelling+uncertainty+for+machine+learning+models&rft.jtitle=Journal+of+the+Royal+Statistical+Society+Series+C%3A+Applied+Statistics&rft.au=Hechinger%2C+Katharina&rft.au=Zhu%2C+Xiao+Xiang&rft.au=Kauermann%2C+G%C3%B6ran&rft.date=2024-01-11&rft.pub=Oxford+University+Press&rft.issn=0035-9254&rft.eissn=1467-9876&rft.volume=73&rft.issue=1&rft.spage=143&rft.epage=161&rft_id=info:doi/10.1093%2Fjrsssc%2Fqlad089&rft.externalDocID=10.1093%2Fjrsssc%2Fqlad089 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0035-9254&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0035-9254&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0035-9254&client=summon |