GTT: Leveraging data characteristics for guiding the tensor train decomposition

The demand for searching, querying multimedia data such as image, video and audio is omnipresent, how to effectively access data for various applications is a critical task. Nevertheless, these data usually are encoded as multi-dimensional arrays, or tensor, and traditional data mining techniques mi...

Full description

Saved in:

Bibliographic Details
Published in	Information systems (Oxford) Vol. 108; p. 102047
Main Authors	Li, Mao-Lin, Candan, K. Selçuk, Sapino, Maria Luisa
Format	Journal Article
Language	English
Published	Oxford Elsevier Ltd 01.09.2022 Elsevier Science Ltd
Subjects	Algorithms Audio data Data mining Decomposition Information systems Low-rank embedding Mathematical analysis Multimedia Order selection Tensor train decomposition Tensors Low-rank embedding Order selection Tensor train decomposition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The demand for searching, querying multimedia data such as image, video and audio is omnipresent, how to effectively access data for various applications is a critical task. Nevertheless, these data usually are encoded as multi-dimensional arrays, or tensor, and traditional data mining techniques might be limited due to the curse of dimensionality. Tensor decomposition is proposed to alleviate this issue. Commonly used tensor decomposition algorithms include CP-decomposition (which seeks a diagonal core) and Tucker-decomposition (which seeks a dense core). Naturally, Tucker maintains more information, but due to the denseness of the core, it also is subject to exponential memory growth with the number of tensor modes. Tensor train (TT) decomposition addresses this problem by seeking a sequence of three-mode cores: but unfortunately, currently, there are no guidelines to select the decomposition sequence. In this paper, we propose a GTT method for guiding the tensor train in selecting the decomposition sequence. GTT leverages the data characteristics (including number of modes, length of the individual modes, density, distribution of mutual information, and distribution of entropy) as well as the target decomposition rank to pick a decomposition order that will preserve information. Experiments with various data sets demonstrate that GTT effectively guides the TT-decomposition process towards decomposition sequences that better preserve accuracy. •We identify significant relationships among various data characteristics and the accuracies of different tensor train decomposition orders.•We propose four order selection strategies, (a) aggregate mutual information (AMI), (b) path mutual information (PMI), (c) inverse entropy (IE), and (d) number of parameters (NP), for tensor train decomposition.•We show that good tensor train orders can be selected through a hybrid (HYB) strategy that takes into account multiple characteristics of the 15 given categorical-valued data set and 3 given continuous-valued data set.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0306-4379 1873-6076
DOI:	10.1016/j.is.2022.102047