lrSVD: An efficient imputation algorithm for incomplete high‐throughput compositional data
Compositional methods have been successfully integrated into the chemometric toolkit to analyse and model different types of data generated by modern high‐throughput technologies. Within this compositional framework, the focus is put on the relative information conveyed in the data by using log‐rati...
Saved in:
Published in | Journal of chemometrics Vol. 36; no. 12 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Chichester
Wiley Subscription Services, Inc
01.12.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Compositional methods have been successfully integrated into the chemometric toolkit to analyse and model different types of data generated by modern high‐throughput technologies. Within this compositional framework, the focus is put on the relative information conveyed in the data by using log‐ratio coordinate representations. However, log‐ratios cannot be computed when the data matrix is not complete. A new computationally efficient data imputation algorithm based on compositional principles and aimed at high‐throughput continuous‐valued compositions is introduced that relies on a constrained low‐rank matrix approximation of the data. Simulation and real metabolomics data are used to demonstrate its performance and ability to deal with different forms of incomplete data: zeros, nondetects, missing values or a combination of them. The computer routines lrSVD and lrSVDplus are implemented in the R package zCompositions to facilitate its use by practitioners.
Compositional methods are used to analyse modern high‐throughput data. They focus on the relative information by using log‐ratio coordinate representations. However, log‐ratios cannot be computed from data sets containing zeros or other forms of incomplete data. A computationally efficient imputation algorithm is introduced that is able to deal with zeros, nondetects, missing values or a combination of them. Simulation and real metabolomics data are used to demonstrate its performance and features. Computer routines are implemented in the R package zCompositions. |
---|---|
Bibliography: | Funding information French National Research Agency (ANR), Grant/Award Number: ANR‐17‐EURE‐0010; Spanish Ministry of Science and Innovation, Grant/Award Numbers: MCIN/AEI/10.13039/501100011033, PID2021‐123833OB‐I00 |
ISSN: | 0886-9383 1099-128X |
DOI: | 10.1002/cem.3459 |