lrSVD: An efficient imputation algorithm for incomplete high‐throughput compositional data

Compositional methods have been successfully integrated into the chemometric toolkit to analyse and model different types of data generated by modern high‐throughput technologies. Within this compositional framework, the focus is put on the relative information conveyed in the data by using log‐rati...

Full description

Saved in:
Bibliographic Details
Published inJournal of chemometrics Vol. 36; no. 12
Main Authors Palarea‐Albaladejo, Javier, Antoni Martín‐Fernández, Josep, Ruiz‐Gazen, Anne, Thomas‐Agnan, Christine
Format Journal Article
LanguageEnglish
Published Chichester Wiley Subscription Services, Inc 01.12.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Compositional methods have been successfully integrated into the chemometric toolkit to analyse and model different types of data generated by modern high‐throughput technologies. Within this compositional framework, the focus is put on the relative information conveyed in the data by using log‐ratio coordinate representations. However, log‐ratios cannot be computed when the data matrix is not complete. A new computationally efficient data imputation algorithm based on compositional principles and aimed at high‐throughput continuous‐valued compositions is introduced that relies on a constrained low‐rank matrix approximation of the data. Simulation and real metabolomics data are used to demonstrate its performance and ability to deal with different forms of incomplete data: zeros, nondetects, missing values or a combination of them. The computer routines lrSVD and lrSVDplus are implemented in the R package zCompositions to facilitate its use by practitioners. Compositional methods are used to analyse modern high‐throughput data. They focus on the relative information by using log‐ratio coordinate representations. However, log‐ratios cannot be computed from data sets containing zeros or other forms of incomplete data. A computationally efficient imputation algorithm is introduced that is able to deal with zeros, nondetects, missing values or a combination of them. Simulation and real metabolomics data are used to demonstrate its performance and features. Computer routines are implemented in the R package zCompositions.
Bibliography:Funding information
French National Research Agency (ANR), Grant/Award Number: ANR‐17‐EURE‐0010; Spanish Ministry of Science and Innovation, Grant/Award Numbers: MCIN/AEI/10.13039/501100011033, PID2021‐123833OB‐I00
ISSN:0886-9383
1099-128X
DOI:10.1002/cem.3459