Scatteract: Automated Extraction of Data from Scatter Plots

Charts are an excellent way to convey patterns and trends in data, but they do not facilitate further modeling of the data or close inspection of individual data points. We present a fully automated system for extracting the numerical values of data points from images of scatter plots. We use deep l...

Full description

Saved in:
Bibliographic Details
Published inMachine Learning and Knowledge Discovery in Databases Vol. 10534; pp. 135 - 150
Main Authors Cliche, Mathieu, Rosenberg, David, Madeka, Dhruv, Yee, Connie
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2017
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Charts are an excellent way to convey patterns and trends in data, but they do not facilitate further modeling of the data or close inspection of individual data points. We present a fully automated system for extracting the numerical values of data points from images of scatter plots. We use deep learning techniques to identify the key components of the chart, and optical character recognition together with robust regression to map from pixels to the coordinate system of the chart. We focus on scatter plots with linear scales, which already have several interesting challenges. Previous work has done fully automatic extraction for other types of charts, but to our knowledge this is the first approach that is fully automatic for scatter plots. Our method performs well, achieving successful data extraction on 89% of the plots in our test set.
Bibliography:D. Madeka—Work done while the author was at Bloomberg L.P.
ISBN:3319712489
9783319712482
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-319-71249-9_9