A Labeling Intercomparison of Retrogressive Thaw Slumps by a Diverse Group of Domain Experts

ABSTRACT Deep‐learning (DL) models have become increasingly beneficial for the detection of retrogressive thaw slumps (RTS) in the permafrost domain. However, comparing accuracy metrics is challenging due to unstandardized labeling guidelines. To address this, we conducted an experiment with 12 inte...

Full description

Saved in:

Bibliographic Details
Published in	Permafrost and periglacial processes Vol. 36; no. 1; pp. 83 - 92
Main Authors	Nitze, Ingmar, Van der Sluijs, Jurjen, Barth, Sophia, Bernhard, Philipp, Huang, Lingcao, Kizyakov, Alexander, Lara, Mark J., Nesterova, Nina, Runge, Alexandra, Veremeeva, Alexandra, Ward Jones, Melissa, Witharana, Chandi, Xia, Zhuoxuan, Liljedahl, Anna K.
Format	Journal Article
Language	English
Published	Chichester Wiley Subscription Services, Inc 01.01.2025
Subjects	Accuracy deep learning Guidelines hillslope thermokarst Intercomparison Labeling Labels Permafrost remote sensing retrogressive thaw slumps Slump structures Subject specialists uncertainty estimation
Online Access	Get full text

Cover

Loading…

More Information
Summary:	ABSTRACT Deep‐learning (DL) models have become increasingly beneficial for the detection of retrogressive thaw slumps (RTS) in the permafrost domain. However, comparing accuracy metrics is challenging due to unstandardized labeling guidelines. To address this, we conducted an experiment with 12 international domain experts from a broad range of scientific backgrounds. Using 3 m PlanetScope multispectral imagery, they digitized RTS footprints in two sites. We evaluated label uncertainty by comparing manually outlined RTS labels using Intersection‐over‐Union (IoU) and F1 metrics. At the Canadian Peel Plateau site, we see good agreement, particularly in the active parts of RTS. Differences were observed in the interpretation of the debris tongue and the stable vegetated sections of RTS. At the Russian Bykovsky site, we observed a larger mismatch. Here, the same differences were documented, but several participants mistakenly identified non‐RTS features. This emphasizes the importance of site‐specific knowledge for reliable label creation. The experiment highlights the need for standardized labeling procedures and definition of their scientific purpose. The most similar expert labels outperformed the accuracy metrics reported in the literature, highlighting human labeling capabilities with proper training, site knowledge, and clear guidelines. These findings lay the groundwork for DL‐based RTS monitoring in the pan‐Arctic.
Bibliography:	This study was funded by the International Permafrost Association (RTSInTrainActionGroup). Individual contributors were supported by Helmholtz Association (AI‐CORE), German Federal Ministry for Economic Affairs and Climate Action (ML4Earth50EE2201C), National Science Foundation (1927723, 1927772, 1927872 and 2052107), European Space Agency (CCI+Permafrost, ESA CCI postdoctoral fellowship 4000134121/21/I‐NB), Lomonosov Moscow State University (121051100164‐0), German Academic Exchange Service (57588368), and National Aeronautics and Space Administration (80NSSC22K1254). Funding ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1045-6740 1099-1530
DOI:	10.1002/ppp.2249