Fast data reduction by space partitioning via convex hull and MBR computation

•Two variations of the RSP3 prototype generation algorithm for data reduction are proposed. They are very fast and can be used on large datasets.•Our proposed variations use mechanisms that exploit the notions of Convex Hull or Minimum Bounding Rectangle to approximate the diameter of a dataset.•RSP...

Full description

Saved in:

Bibliographic Details
Published in	Pattern recognition Vol. 126; p. 108553
Main Authors	Giorginis, Thomas, Ougiaroglou, Stefanos, Evangelidis, Georgios, Dervos, Dimitris A.
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.06.2022
Subjects	Big training data Classification Convex hull Minimum bounding rectangle (MBR) Prototype generation Reduction by space partitioning RSP3 99-00 RSP3 Convex hull 00-01 Prototype generation Reduction by space partitioning Classification Big training data Minimum bounding rectangle (MBR)
Online Access	Get full text

Cover

Loading…

More Information
Summary:	•Two variations of the RSP3 prototype generation algorithm for data reduction are proposed. They are very fast and can be used on large datasets.•Our proposed variations use mechanisms that exploit the notions of Convex Hull or Minimum Bounding Rectangle to approximate the diameter of a dataset.•RSP3-QH3d utilizes the Quick Hull algorithm for 3d convex hull computation. The farthest instances of the 3d convex hull are used as an approximation of the diameter of a dataset.•The second variation (RSP3-MBR) uses the farthest instances of a dataset’s MBR to approximate its diameter.•The experimental results reveal that both proposed algorithms outperform RSP3 as both achieve similar accuracy and reduction rate as RSP3 at a fraction of the computational CPU cost of the latter. Large volumes of training data introduce high computational cost in instance-based classification. Data reduction algorithms select or generate a small (condensing) set of representative training prototypes from the available training data. The Reduction by Space Partitioning algorithm is one of the most well-known prototype generation algorithms that repetitively divides the original training data into subsets. This partitioning process needs to identify the diameter of each subset, i.e., its two farthest instances. This is a costly process since it requires the calculation of all distances between the instances in each subset. The paper introduces two new very fast variations that, instead of computing the actual diameter of a subset, choose a pair of distant-enough instances. The first variation uses instances belonging to an exact 3d convex hull of the subset, while the second one uses instances belonging to the minimum bounding rectangle of the subset. Our experimental study shows that the new variations vastly outperform the original algorithm without a penalty in classification accuracy and reduction rate.
ISSN:	0031-3203 1873-5142
DOI:	10.1016/j.patcog.2022.108553