PinSQL: Pinpoint Root Cause SQLs to Resolve Performance Issues in Cloud Databases

Deploying database services on cloud systems has gained increasing popularity and has become a common practice in the industry. However, the complicated cloud environments make performance issues inevitable, which could violate the service level guarantee if not addressed in a timely manner. Among t...

Full description

Saved in:
Bibliographic Details
Published in2022 IEEE 38th International Conference on Data Engineering (ICDE) pp. 2549 - 2561
Main Authors Liu, Xiaoze, Yin, Zheng, Zhao, Chao, Ge, Congcong, Chen, Lu, Gao, Yunjun, Li, Dimeng, Wang, Ziting, Liang, Gaozhong, Tan, Jian, Li, Feifei
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.05.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Deploying database services on cloud systems has gained increasing popularity and has become a common practice in the industry. However, the complicated cloud environments make performance issues inevitable, which could violate the service level guarantee if not addressed in a timely manner. Among the various problems, anomalies in SQL queries are the most commonly reported sources that cause performance issues in database applications. These anomalous queries can be divided into High-impact SQLs (H-SQLs) and Root Cause SQLs (R-SQLs), representing the related SQLs that are correlated with the anomalies and the ones that are the root causes of the performance issue, respectively. In the presence of a large number of queries, to pinpoint the R-SQLs is far more difficult than to identify the H-SQLs. To address this challenge, we aim at automatically pinpointing the R-SQLs to resolve performance issues in cloud databases. This paper introduces PinSQL, an autonomous diagnosing system for Alibaba Cloud, which has four modules that are executed sequentially, including data collection and pre-processing, anomaly detection, root cause analysis, and repairing actions. First, the related performance metrics and query logs from monitored cloud database instances are collected and aggregated as the data sources. Then, based on these inputs, efficient anomaly detection is conducted in real-time. Upon the detection of an anomaly, the root cause SQLs are pinpointed through tracking the propagation chain of the involved SQLs. Finally, repairing actions are suggested and then executed on R-SQLs to address the anomalies. Extensive experiments on an Alibaba production system show that PinSQL can achieve an 80% accuracy for pinpointing the top-1 R-SQLs and successfully resolve the database performance issues resultantly.
ISSN:2375-026X
DOI:10.1109/ICDE53745.2022.00236