Topology-based representative datasets to reduce neural network training resources

One of the main drawbacks of the practical use of neural networks is the long time required in the training process. Such a training process consists of an iterative change of parameters trying to minimize a loss function. These changes are driven by a dataset, which can be seen as a set of labeled...

Full description

Saved in:

Bibliographic Details
Published in	Neural computing & applications Vol. 34; no. 17; pp. 14397 - 14413
Main Authors	Gonzalez-Diaz, Rocio, Gutiérrez-Naranjo, Miguel A., Paluzo-Hidalgo, Eduardo
Format	Journal Article
Language	English
Published	London Springer London 01.09.2022 Springer Nature B.V
Subjects	Artificial Intelligence Computational Biology/Bioinformatics Computational Science and Engineering Computer architecture Computer Science Data Mining and Knowledge Discovery Datasets Evaluation Experimentation Image Processing and Computer Vision Iterative methods Network topologies Neural networks Original Article Probability and Statistics in Computer Science Software Training Neural networks Data reduction Computational topology Representative datasets
Online Access	Get full text

Cover

Loading…

More Information
Summary:	One of the main drawbacks of the practical use of neural networks is the long time required in the training process. Such a training process consists of an iterative change of parameters trying to minimize a loss function. These changes are driven by a dataset, which can be seen as a set of labeled points in an n-dimensional space. In this paper, we explore the concept of a representative dataset which is a dataset smaller than the original one, satisfying a nearness condition independent of isometric transformations. Representativeness is measured using persistence diagrams (a computational topology tool) due to its computational efficiency. We theoretically prove that the accuracy of a perceptron evaluated on the original dataset coincides with the accuracy of the neural network evaluated on the representative dataset when the neural network architecture is a perceptron, the loss function is the mean squared error, and certain conditions on the representativeness of the dataset are imposed. These theoretical results accompanied by experimentation open a door to reducing the size of the dataset to gain time in the training process of any neural network.
ISSN:	0941-0643 1433-3058
DOI:	10.1007/s00521-022-07252-y