Towards higher reliability of CMS computing facilities

The CMS experiment has adopted a computing system where resources are distributed worldwide in more than 50 sites. The operation of the system requires a stable and reliable behaviour of the underlying infrastructure. CMS has established procedures to extensively test all relevant aspects of a site...

Full description

Saved in:
Bibliographic Details
Published inJournal of physics. Conference series Vol. 396; no. 3; pp. 32041 - 12
Main Authors Bagliesi, G, Bloom, K, Brew, C, Flix, J, Kreuzer, P, Sciabà, A
Format Journal Article
LanguageEnglish
Published Bristol IOP Publishing 01.01.2012
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The CMS experiment has adopted a computing system where resources are distributed worldwide in more than 50 sites. The operation of the system requires a stable and reliable behaviour of the underlying infrastructure. CMS has established procedures to extensively test all relevant aspects of a site and their capability to sustain the various CMS computing workflows at the required scale. The Site Readiness monitoring infrastructure has been instrumental in understanding how the system as a whole was improving towards LHC operations, measuring the reliability of sites when running CMS activities, and providing sites with the information they need to troubleshoot any problem. This contribution reviews the complete automation of the Site Readiness program, with the description of monitoring tools and their inclusion into the Site Status Board (SSB), the performance checks, the use of tools like HammerCloud, and the impact in improving the overall reliability of the Grid from the point of view of the CMS computing system. These results are used by CMS to select good sites to conduct workflows, in order to maximize workflows efficiencies. The performance against these tests seen at the sites during the first years of LHC running is as well reviewed.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1742-6588
1742-6596
DOI:10.1088/1742-6596/396/3/032041