A survey on handwritten mathematical expression recognition: The rise of encoder-decoder and GNN models

•In-depth survey on recent handwritten mathematical expression recognition methods.•Paradigm shift from grammar/graph-based parsing to DNN models with attention.•Comparisons of the methods on open datasets and discussions of their pros and cons.•Reviewing the progress and presenting remaining challe...

Full description

Saved in:
Bibliographic Details
Published inPattern recognition Vol. 153; p. 110531
Main Authors Truong, Thanh-Nghia, Nguyen, Cuong Tuan, Zanibbi, Richard, Mouchère, Harold, Nakagawa, Masaki
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.09.2024
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•In-depth survey on recent handwritten mathematical expression recognition methods.•Paradigm shift from grammar/graph-based parsing to DNN models with attention.•Comparisons of the methods on open datasets and discussions of their pros and cons.•Reviewing the progress and presenting remaining challenges. Recognition of handwritten mathematical expressions (HMEs) has attracted growing interest due to steady progress in handwriting recognition techniques and the rapid emergence of pen- and touch-based devices. Math formula recognition may be understood as a generalization of text recognition: formulas represent mathematical statements using a two dimensional arrangement of symbols on writing lines that are organized hierarchically. This survey provides an overview of techniques published in the last decade, including those taking input from handwritten strokes (i.e., ‘online’, as captured by a pen/touch device), raster images (i.e., ‘offline,’ from pixels), or both. Traditionally, HMEs were recognized by performing four structural pattern recognition tasks in separate steps: (1) symbol segmentation, (2) symbol classification, (3) spatial relationship classification, and (4) structural analysis, which identifies the arrangement of symbols on writing lines (e.g., in a Symbol Layout Tree (SLT) or LaTeX string). Recently, encoder–decoder neural network models and Graph Neural Network (GNN) approaches have greatly increased HME recognition accuracy. These newer approaches perform all recognition tasks simultaneously, and utilize contextual features across tasks (e.g., using neural self-attention models). We also discuss evaluation techniques and benchmarks, and explore some implicit dependencies among the four key recognition tasks. Finally, we identify limitations of current systems, and present suggestions for future work, such as using two-dimensional language models rather than the one-dimensional models commonly used in encoder–decoder models.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2024.110531