Archiving Scientific Data Outside of the Traditional HEP Domain, Using the Archive Facilities at Fermilab

Many experiments in the HEP and Astrophysics communities generate large extremely valuable datasets, which need to be efficiently cataloged and recorded to archival storage. These datasets, both new and legacy, are often structured in a manner that is not conducive to storage and cataloging with mod...

Full description

Saved in:
Bibliographic Details
Published inJournal of physics. Conference series Vol. 664; no. 4; pp. 42039 - 42046
Main Authors Norman, A., Diesbug, M., Gheith, M., Illingworth, R., Mengel, M.
Format Journal Article
LanguageEnglish
Published Bristol IOP Publishing 01.01.2015
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Many experiments in the HEP and Astrophysics communities generate large extremely valuable datasets, which need to be efficiently cataloged and recorded to archival storage. These datasets, both new and legacy, are often structured in a manner that is not conducive to storage and cataloging with modern data handling systems and large file archive facilities. In this paper we discuss in detail how we have created a robust toolset and simple portal into the Fermilab archive facilities, which allows for scientific data to be quickly imported, organized and retrieved from the multi-petabyte facility. In particular we discuss how the data from the Sudbury Neutrino Observatory (SNO) for the COUPP dark matter detector was aggregated, cataloged, archived and re-organized to permit it to be retrieved and analyzed using modern distributed computing resources both at Fermilab and on the Open Science Grid. We pay particular attention to the methods that were employed to uniquify the namespaces for the data, derive metadata for the over 460,000 image series taken by the COUP experiment and what was required to map that information into coherent datasets that could be stored and retrieved using the large scale archives systems. We describe the data transfer and cataloging engines that are used for data importation and how these engines have been setup to import data from the data acquisition systems of ongoing experiments at non-Fermilab remote sites including the Laboratori Nazionali del Gran Sasso and the Ash River Laboratory in Orr, Minnesota. We also describe how large University computing sites around the world are using the system to store and retrieve large volumes of simulation and experiment data for physics analysis.
ISSN:1742-6588
1742-6596
DOI:10.1088/1742-6596/664/4/042039