Performance evaluation of outlier detection techniques in production timeseries: A systematic review and meta-analysis

•Evaluated 17 algorithms for outlier detection in oil/gas production timeseries.•Compared the performance of the 17 algorithms based on 6 different metrics.•Selected the top 8 algorithms based on the examination outcome of synthetic data.•Compared the performance of the top 8 techniques on two real...

Full description

Saved in:
Bibliographic Details
Published inExpert systems with applications Vol. 191; p. 116371
Main Authors Alimohammadi, Hamzeh, Nancy Chen, Shengnan
Format Journal Article
LanguageEnglish
Published New York Elsevier Ltd 01.04.2022
Elsevier BV
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•Evaluated 17 algorithms for outlier detection in oil/gas production timeseries.•Compared the performance of the 17 algorithms based on 6 different metrics.•Selected the top 8 algorithms based on the examination outcome of synthetic data.•Compared the performance of the top 8 techniques on two real datasets.•KNN is a simple yet the top technique that can handle different production trends. Time-series data have been extensively collected and analyzed in many disciplines, such as stock market, medical diagnosis, meteorology, and oil and gas industry. Numerous data in these disciplines are sequence of observations measured as functions of time, which can be further used for different applications via analytical or data analytics techniques (e.g., to forecast future price, climate change, etc.). However, presence of outliers can cause significant uncertainties to interpretation results; hence, it is essential to remove the outliers accurately and efficiently before conducting any further analysis. A total of 17 techniques that belong to statistical, regression-based, and machine learning (ML) based categories for outlier detection in timeseries are applied to the oil and gas production data analysis. 15 of these methods are utilized for production data analysis for the first time. Two state-of-the-art and high-performance techniques are then selected for data cleaning which require minimum control and time complexity. Moreover, performances of these techniques are evaluated based on several metrics including the accuracy, precision, recall, F1 score, and Cohen’s Kappa to rank the techniques. Results show that eight unsupervised algorithms outperform the rest of the methods based on the synthetic case study with known outliers. For example, accuracies of the eight shortlisted methods are in the range of 0.83–0.99 with a precision between 0.83 and 0.98, compared to 0.65–0.82 and 0.07–0.77 for the others. In addition, ML-based techniques perform better than statistical techniques. Our experimental results on real field data further indicate that the k-nearest neighbor (KNN) and Fulford-Blasingame methods are superior to other outlier detection frameworks for outlier detection in production data, followed by four others including density-based spatial clustering of applications with noise (DBSCAN), and angle-based outlier detection (ABOD). Even though the techniques are examined with oil and gas production data, but the same data cleaning workflow can be used to detect timeseries’ outliers in other disciplines.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2021.116371