Advanced Analytics service to enhance workflow control at the ATLAS Production System

Modern workload management systems that are responsible for central data production and processing in High Energy and Nuclear Physics experiments have highly complicated architectures and require a specialized control service for resource and processing components balancing. Such a service represent...

Full description

Saved in:
Bibliographic Details
Published inEPJ Web of Conferences Vol. 214; p. 3007
Main Authors Titov, Mikhail, Borodin, Mikhail, Golubkov, Dmitry, Klimentov, Alexei
Format Journal Article Conference Proceeding
LanguageEnglish
Published Les Ulis EDP Sciences 01.01.2019
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Modern workload management systems that are responsible for central data production and processing in High Energy and Nuclear Physics experiments have highly complicated architectures and require a specialized control service for resource and processing components balancing. Such a service represents a comprehensive set of analytical tools, management utilities and monitoring views aimed at providing a deep understanding of internal processes, and is considered as an extension for situational awareness analytic service. Its key points are analysis of task processing, e.g., selection and regulation of key task features that affect its processing the most; modeling of processed data life-cycles for further analysis, e.g., generate guidelines for particular stage of data processing; and forecasting processes with focus on data and tasks states as well as on the management system itself, e.g., to detect the source of any potential malfunction. The prototype of the advanced analytics service will be an essential part of the analytical service of the ATLAS Production System (ProdSys2). Advanced analytics service uses such tools as Time-To-Complete (TTC) estimation towards units of the processing (i.e., tasks and chains of tasks) to control the processing state and to be able to highlight abnormal operations and executions. Obtained metrics are used in decision making processes to regulate the system behaviour and resources consumption.
ISSN:2100-014X
2101-6275
2100-014X
DOI:10.1051/epjconf/201921403007