Efficient extraction of experimental data from line charts using advanced machine learning techniques
Line charts, as a common data visualization tool in scientific research and business analysis, encapsulate rich experimental data. However, existing data extraction tools face challenges such as low automation levels and difficulties in handling complex charts. This paper proposes a novel method for...
Saved in:
Published in | Graphical models Vol. 139; p. 101259 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Inc
01.06.2025
Elsevier |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Line charts, as a common data visualization tool in scientific research and business analysis, encapsulate rich experimental data. However, existing data extraction tools face challenges such as low automation levels and difficulties in handling complex charts. This paper proposes a novel method for extracting data from line charts, reformulating the extraction problem as an instance segmentation task, and introducing the Mamba-enhanced Transformer mask query method along with a curve mask-guided training approach to address challenges such as long dependencies and intersections in curve detection. Additionally, YOLOv9 is utilized for the detection and classification of chart elements, and a text recognition dataset comprising approximately 100K charts is constructed. An LSTM-based attention mechanism is employed for precise scale value recognition. Lastly, we present a method for automatically converting image data into structured JSON data, significantly enhancing the efficiency and accuracy of data extraction. Experimental results demonstrate that this method exhibits high efficiency and accuracy in handling complex charts, achieving an average extraction accuracy of 93% on public datasets, significantly surpassing the current state-of-the-art methods. This research provides an efficient foundation for large-scale scientific data analysis and machine learning model development, advancing the field of automated data extraction technology.
[Display omitted] |
---|---|
ISSN: | 1524-0703 |
DOI: | 10.1016/j.gmod.2025.101259 |