Towards intelligent database systems using clusters of SQL transactions

Transactions are the bread-and-butter of database management system (DBMS) industry. When you check your bank balance, pay bill, or move money from saving to chequing account, transactions are involved. That transactions are self-similar—whether you pay a utility company or credit card, it is still...

Full description

Saved in:
Bibliographic Details
Published inKnowledge and information systems Vol. 65; no. 7; pp. 2863 - 2894
Main Author Marathe, Arunprasad P.
Format Journal Article
LanguageEnglish
Published London Springer London 01.07.2023
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Transactions are the bread-and-butter of database management system (DBMS) industry. When you check your bank balance, pay bill, or move money from saving to chequing account, transactions are involved. That transactions are self-similar—whether you pay a utility company or credit card, it is still a ‘pay bill’ transaction—has been noted before. Somewhat surprisingly, that property remains largely unexploited, barring some notable exceptions. The research reported in this paper begins to build ‘intelligence’ into database systems by offering built-in transaction classification and clustering. The utility of such an approach is demonstrated by showing how it simplifies DBMS monitoring and troubleshooting. The well-known DBSCAN algorithm clusters online transaction processing (OLTP) transactions: this paper’s contribution is in demonstrating a robust server-side feature extraction approach, rather than the previously suggested and error-prone log-mining approach. It is shown how ‘DBSCAN + angular cosine distance function’ finds better clusters than the previously tried combinations, and simplifies DBSCAN parameter tuning—a known nontrivial task. DBMS troubleshooting efficacy is demonstrated by identifying the root causes of several real-life performance problems: problematic transaction rollbacks; performance drifts; system-wide issues; CPU and memory bottlenecks; and so on. It is also shown that the cluster count remains unchanged irrespective of system load—a desirable but often overlooked property. The transaction clustering solution has been implemented inside the popular MySQL DBMS, although most modern relational database systems can benefit from the ideas described herein.
ISSN:0219-1377
0219-3116
DOI:10.1007/s10115-023-01850-5