A resource for automated search and collation of geochemical datasets from journal supplements
This article presents a resource for automated search, extraction and collation of geochemical and geochronological data from the Figshare repository using web scraping code. To answer fundamental questions about the Earth's evolution, such as spatial and temporal evolution and interrelationshi...
Saved in:
Published in | Scientific data Vol. 9; no. 1; pp. 724 - 14 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
England
Nature Publishing Group
25.11.2022
Nature Publishing Group UK Nature Portfolio |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This article presents a resource for automated search, extraction and collation of geochemical and geochronological data from the Figshare repository using web scraping code. To answer fundamental questions about the Earth's evolution, such as spatial and temporal evolution and interrelationships between the planet's solid and surficial reservoirs, researchers must utilize global geochemical datasets. Due to the volume of data being published, these datasets become quickly outdated. We present a resource that allows researchers to rapidly curate and update their own databases from existing published data. We use open-source Python code to web scrape the Figshare repository for journal supplementary files using the application programming interface, allowing for the collection and download of hundreds of supplementary files and metadata in minutes. Use of this web scraping tool is demonstrated here by collation of a zircon geochronology and chemistry database of >150,000 analyses. The database is consistent in reproducing trends in other published zircon compilations. Providing a resource for automated collection of Figshare data files will encourage data sharing and reuse. |
---|---|
Bibliography: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Undefined-1 ObjectType-Feature-3 content type line 23 |
ISSN: | 2052-4463 2052-4463 |
DOI: | 10.1038/s41597-022-01730-7 |