Efficient extraction of experimental data from line charts using advanced machine learning techniques

Line charts, as a common data visualization tool in scientific research and business analysis, encapsulate rich experimental data. However, existing data extraction tools face challenges such as low automation levels and difficulties in handling complex charts. This paper proposes a novel method for...

Full description

Saved in:
Bibliographic Details
Published inGraphical models Vol. 139; p. 101259
Main Authors Yang, Wenjin, He, Jie, Zhang, Xiaotong
Format Journal Article
LanguageEnglish
Published Elsevier Inc 01.06.2025
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Line charts, as a common data visualization tool in scientific research and business analysis, encapsulate rich experimental data. However, existing data extraction tools face challenges such as low automation levels and difficulties in handling complex charts. This paper proposes a novel method for extracting data from line charts, reformulating the extraction problem as an instance segmentation task, and introducing the Mamba-enhanced Transformer mask query method along with a curve mask-guided training approach to address challenges such as long dependencies and intersections in curve detection. Additionally, YOLOv9 is utilized for the detection and classification of chart elements, and a text recognition dataset comprising approximately 100K charts is constructed. An LSTM-based attention mechanism is employed for precise scale value recognition. Lastly, we present a method for automatically converting image data into structured JSON data, significantly enhancing the efficiency and accuracy of data extraction. Experimental results demonstrate that this method exhibits high efficiency and accuracy in handling complex charts, achieving an average extraction accuracy of 93% on public datasets, significantly surpassing the current state-of-the-art methods. This research provides an efficient foundation for large-scale scientific data analysis and machine learning model development, advancing the field of automated data extraction technology. [Display omitted]
ISSN:1524-0703
DOI:10.1016/j.gmod.2025.101259