lrSVD: An efficient imputation algorithm for incomplete high‐throughput compositional data

Compositional methods have been successfully integrated into the chemometric toolkit to analyse and model different types of data generated by modern high‐throughput technologies. Within this compositional framework, the focus is put on the relative information conveyed in the data by using log‐rati...

Full description

Saved in:

Bibliographic Details
Published in	Journal of chemometrics Vol. 36; no. 12
Main Authors	Palarea‐Albaladejo, Javier, Antoni Martín‐Fernández, Josep, Ruiz‐Gazen, Anne, Thomas‐Agnan, Christine
Format	Journal Article
Language	English
Published	Chichester Wiley Subscription Services, Inc 01.12.2022
Subjects	Algorithms Chemometrics compositional data log‐ ratio analysis Missing data singular value decomposition zeros
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Compositional methods have been successfully integrated into the chemometric toolkit to analyse and model different types of data generated by modern high‐throughput technologies. Within this compositional framework, the focus is put on the relative information conveyed in the data by using log‐ratio coordinate representations. However, log‐ratios cannot be computed when the data matrix is not complete. A new computationally efficient data imputation algorithm based on compositional principles and aimed at high‐throughput continuous‐valued compositions is introduced that relies on a constrained low‐rank matrix approximation of the data. Simulation and real metabolomics data are used to demonstrate its performance and ability to deal with different forms of incomplete data: zeros, nondetects, missing values or a combination of them. The computer routines lrSVD and lrSVDplus are implemented in the R package zCompositions to facilitate its use by practitioners. Compositional methods are used to analyse modern high‐throughput data. They focus on the relative information by using log‐ratio coordinate representations. However, log‐ratios cannot be computed from data sets containing zeros or other forms of incomplete data. A computationally efficient imputation algorithm is introduced that is able to deal with zeros, nondetects, missing values or a combination of them. Simulation and real metabolomics data are used to demonstrate its performance and features. Computer routines are implemented in the R package zCompositions.
Bibliography:	Funding information French National Research Agency (ANR), Grant/Award Number: ANR‐17‐EURE‐0010; Spanish Ministry of Science and Innovation, Grant/Award Numbers: MCIN/AEI/10.13039/501100011033, PID2021‐123833OB‐I00
ISSN:	0886-9383 1099-128X
DOI:	10.1002/cem.3459