Improving Model Performance Using Metric-Guided Data Selection Framework

The noisiness and low quality of IT operations management data is a major challenge in using machine learning to assist IT operations management. Our system mitigates this challenge by automatically measuring data quality, and then using the results to select data subsets that generate improved mode...

Full description

Saved in:
Bibliographic Details
Published in2022 IEEE International Conference on Big Data (Big Data) pp. 4750 - 4757
Main Authors Isaza, Paulina Toro, Deng, Yu, Nidd, Michael, Azad, Amar Prakash, Shwartz, Laura
Format Conference Proceeding
LanguageEnglish
Published IEEE 17.12.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The noisiness and low quality of IT operations management data is a major challenge in using machine learning to assist IT operations management. Our system mitigates this challenge by automatically measuring data quality, and then using the results to select data subsets that generate improved model performance. Based on a set of metrics that quantify the quality of a corpus with both structured and unstructured data, we are proposing a framework to automatically identify "well behaved" subsets in the corpus. By streaming input data to separate models for these subsets, we can achieve better performance when compared with a model trained on the full dataset. We present a motivating example that inspired our approach as well as a deployment case study of our system based on engagements with two clients which demonstrate that the proposed methodology is effective for detecting such subsets to improve model performance.
DOI:10.1109/BigData55660.2022.10020746