A resource for automated search and collation of geochemical datasets from journal supplements

This article presents a resource for automated search, extraction and collation of geochemical and geochronological data from the Figshare repository using web scraping code. To answer fundamental questions about the Earth's evolution, such as spatial and temporal evolution and interrelationshi...

Full description

Saved in:
Bibliographic Details
Published inScientific data Vol. 9; no. 1; pp. 724 - 14
Main Authors Martin, Erin L, Barrote, Vitor R, Cawood, Peter A
Format Journal Article
LanguageEnglish
Published England Nature Publishing Group 25.11.2022
Nature Publishing Group UK
Nature Portfolio
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This article presents a resource for automated search, extraction and collation of geochemical and geochronological data from the Figshare repository using web scraping code. To answer fundamental questions about the Earth's evolution, such as spatial and temporal evolution and interrelationships between the planet's solid and surficial reservoirs, researchers must utilize global geochemical datasets. Due to the volume of data being published, these datasets become quickly outdated. We present a resource that allows researchers to rapidly curate and update their own databases from existing published data. We use open-source Python code to web scrape the Figshare repository for journal supplementary files using the application programming interface, allowing for the collection and download of hundreds of supplementary files and metadata in minutes. Use of this web scraping tool is demonstrated here by collation of a zircon geochronology and chemistry database of >150,000 analyses. The database is consistent in reproducing trends in other published zircon compilations. Providing a resource for automated collection of Figshare data files will encourage data sharing and reuse.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Undefined-1
ObjectType-Feature-3
content type line 23
ISSN:2052-4463
2052-4463
DOI:10.1038/s41597-022-01730-7