Addressing data gaps in sustainability reporting: A benchmark dataset for greenhouse gas emission extraction

Reliable company-level greenhouse gas (GHG) emissions data are essential for stakeholders addressing the climate crisis. However, existing datasets are often fragmented, inconsistent, and lack transparent methodologies, making it difficult to obtain reliable emissions data. To address this challenge...

Full description

Saved in:
Bibliographic Details
Published inScientific data Vol. 12; no. 1; p. 1497
Main Authors Beck, Jacob, Steinberg, Anna, Dimmelmeier, Andreas, Domenech Burin, Laia, Kormanyos, Emily, Fehr, Maurice, Schierholz, Malte
Format Journal Article
LanguageEnglish
Published London Nature Publishing Group UK 27.08.2025
Nature Publishing Group
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Reliable company-level greenhouse gas (GHG) emissions data are essential for stakeholders addressing the climate crisis. However, existing datasets are often fragmented, inconsistent, and lack transparent methodologies, making it difficult to obtain reliable emissions data. To address this challenge, we present a gold standard dataset containing emission metrics extracted from 139 sustainability reports collected from company websites. This dataset acts as an intermediate step to validate and fine-tune models for large-scale extraction of emissions data from thousands of reports. We employ a Large Language Model (LLM)-powered extraction pipeline to automatically extract emissions metrics. These values are then independently assessed by two non-expert annotators. Reports with full agreement are directly considered gold standard, while discrepancies undergo expert review in two stages, with remaining disagreements resolved through in-person discussions. This structured process ensures high data quality while reducing reliance on experts. Our dataset serves as a benchmark for human and automated annotation, with significant reuse potential for information extraction tasks in sustainable finance as well as other downstream tasks such as greenwashing analysis.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2052-4463
DOI:10.1038/s41597-025-05664-8