BAMITA: Bayesian multiple imputation for tensor arrays
Data increasingly take the form of a multi-way array, or tensor, in several biomedical domains. Such tensors are often incompletely observed. For example, we are motivated by longitudinal microbiome studies in which several timepoints are missing for several subjects. There is a growing literature o...
Saved in:
Published in | Biostatistics (Oxford, England) Vol. 26; no. 1 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
England
Oxford Publishing Limited (England)
14.12.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Data increasingly take the form of a multi-way array, or tensor, in several biomedical domains. Such tensors are often incompletely observed. For example, we are motivated by longitudinal microbiome studies in which several timepoints are missing for several subjects. There is a growing literature on missing data imputation for tensors. However, existing methods give a point estimate for missing values without capturing uncertainty. We propose a multiple imputation approach for tensors in a flexible Bayesian framework, that yields realistic simulated values for missing entries and can propagate uncertainty through subsequent analyses. Our model uses efficient and widely applicable conjugate priors for a CANDECOMP/PARAFAC (CP) factorization, with a separable residual covariance structure. This approach is shown to perform well with respect to both imputation accuracy and uncertainty calibration, for scenarios in which either single entries or entire fibers of the tensor are missing. For two microbiome applications, it is shown to accurately capture uncertainty in the full microbiome profile at missing timepoints and used to infer trends in species diversity for the population. Documented R code to perform our multiple imputation approach is available at https://github.com/lockEF/MultiwayImputation. |
---|---|
AbstractList | Data increasingly take the form of a multi-way array, or tensor, in several biomedical domains. Such tensors are often incompletely observed. For example, we are motivated by longitudinal microbiome studies in which several timepoints are missing for several subjects. There is a growing literature on missing data imputation for tensors. However, existing methods give a point estimate for missing values without capturing uncertainty. We propose a multiple imputation approach for tensors in a flexible Bayesian framework, that yields realistic simulated values for missing entries and can propagate uncertainty through subsequent analyses. Our model uses efficient and widely applicable conjugate priors for a CANDECOMP/PARAFAC (CP) factorization, with a separable residual covariance structure. This approach is shown to perform well with respect to both imputation accuracy and uncertainty calibration, for scenarios in which either single entries or entire fibers of the tensor are missing. For two microbiome applications, it is shown to accurately capture uncertainty in the full microbiome profile at missing timepoints and used to infer trends in species diversity for the population. Documented R code to perform our multiple imputation approach is available at https://github.com/lockEF/MultiwayImputation. Data increasingly take the form of a multi-way array, or tensor, in several biomedical domains. Such tensors are often incompletely observed. For example, we are motivated by longitudinal microbiome studies in which several timepoints are missing for several subjects. There is a growing literature on missing data imputation for tensors. However, existing methods give a point estimate for missing values without capturing uncertainty. We propose a multiple imputation approach for tensors in a flexible Bayesian framework, that yields realistic simulated values for missing entries and can propagate uncertainty through subsequent analyses. Our model uses efficient and widely applicable conjugate priors for a CANDECOMP/PARAFAC (CP) factorization, with a separable residual covariance structure. This approach is shown to perform well with respect to both imputation accuracy and uncertainty calibration, for scenarios in which either single entries or entire fibers of the tensor are missing. For two microbiome applications, it is shown to accurately capture uncertainty in the full microbiome profile at missing timepoints and used to infer trends in species diversity for the population. Documented R code to perform our multiple imputation approach is available at https://github.com/lockEF/MultiwayImputation.Data increasingly take the form of a multi-way array, or tensor, in several biomedical domains. Such tensors are often incompletely observed. For example, we are motivated by longitudinal microbiome studies in which several timepoints are missing for several subjects. There is a growing literature on missing data imputation for tensors. However, existing methods give a point estimate for missing values without capturing uncertainty. We propose a multiple imputation approach for tensors in a flexible Bayesian framework, that yields realistic simulated values for missing entries and can propagate uncertainty through subsequent analyses. Our model uses efficient and widely applicable conjugate priors for a CANDECOMP/PARAFAC (CP) factorization, with a separable residual covariance structure. This approach is shown to perform well with respect to both imputation accuracy and uncertainty calibration, for scenarios in which either single entries or entire fibers of the tensor are missing. For two microbiome applications, it is shown to accurately capture uncertainty in the full microbiome profile at missing timepoints and used to infer trends in species diversity for the population. Documented R code to perform our multiple imputation approach is available at https://github.com/lockEF/MultiwayImputation. |
Author | Lock, Eric F Li, Gen Jiang, Ziren |
Author_xml | – sequence: 1 givenname: Ziren surname: Jiang fullname: Jiang, Ziren – sequence: 2 givenname: Gen orcidid: 0000-0002-7298-2141 surname: Li fullname: Li, Gen – sequence: 3 givenname: Eric F orcidid: 0000-0003-4663-2356 surname: Lock fullname: Lock, Eric F |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/39673775$$D View this record in MEDLINE/PubMed |
BookMark | eNpdkMtOwzAQRS1URFvgE0CR2LAJdfxKza6teFQqYlPWlh1PJJckLnYi0b8npQUJVjOLozt3zhgNGt8AQlcZvsuwpBPjfGx162Lrijh5_9SAWX6CRhkT05RRng--d54ywdgQjWPcYEwIFfQMDakUOc1zPkJiPntZrmf3yVzvIDrdJHVXtW5bQeLqbbe_4Juk9CFpoYn90CHoXbxAp6WuIlwe5zl6e3xYL57T1evTcjFbpUUmRJZOraRWCq4ZANX7OiUGQ40GojXhpTXUMmGMBJwBBsKZKcCykgAn1kqg5-j2kLsN_qOD2KraxQKqSjfgu6hon9n_gRnp0Zt_6MZ3oenbKUoolTkTEvfU9ZHqTA1WbYOrddipHyM9wA9AEXyMAcpfJMNqb179Ma-O5ukXa9t7TA |
Cites_doi | 10.1111/1467-9868.00353 10.1109/TNNLS.2018.2851612 10.1137/07070111X 10.1080/10618600.2023.2257783 10.1016/j.chemolab.2010.08.004 10.1214/24-BA1423 10.1038/ng.3624 10.1109/TPAMI.2013.164 10.1214/11-BA606 10.1007/BF02310791 10.1097/NNR.0000000000000208 10.1214/15-AOAS839 10.1093/bioinformatics/btr597 10.1016/j.trc.2012.12.007 10.1109/TSP.2016.2586759 10.1002/j.1538-7305.1948.tb01338.x 10.4310/23-SII786 10.1007/BF02289464 10.1016/j.trc.2018.11.003 10.1109/TPAMI.2012.39 |
ContentType | Journal Article |
Copyright | The Author 2024. Published by Oxford University Press. All rights reserved. For Permissions, email: journals.permissions@oup.com. The Author 2024. Published by Oxford University Press. All rights reserved. For Permissions, email: journals.permissions@oup.com |
Copyright_xml | – notice: The Author 2024. Published by Oxford University Press. All rights reserved. For Permissions, email: journals.permissions@oup.com. – notice: The Author 2024. Published by Oxford University Press. All rights reserved. For Permissions, email: journals.permissions@oup.com |
DBID | AAYXX CITATION CGR CUY CVF ECM EIF NPM 7QO 8FD FR3 K9. NAPCQ P64 RC3 7X8 |
DOI | 10.1093/biostatistics/kxae047 |
DatabaseName | CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Biotechnology Research Abstracts Technology Research Database Engineering Research Database ProQuest Health & Medical Complete (Alumni) Nursing & Allied Health Premium Biotechnology and BioEngineering Abstracts Genetics Abstracts MEDLINE - Academic |
DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Nursing & Allied Health Premium Genetics Abstracts Biotechnology Research Abstracts Technology Research Database ProQuest Health & Medical Complete (Alumni) Engineering Research Database Biotechnology and BioEngineering Abstracts MEDLINE - Academic |
DatabaseTitleList | CrossRef Nursing & Allied Health Premium MEDLINE - Academic MEDLINE |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Biology |
EISSN | 1468-4357 |
ExternalDocumentID | 39673775 10_1093_biostatistics_kxae047 |
Genre | Journal Article |
GrantInformation_xml | – fundername: NIH HHS grantid: R01-HG010731 – fundername: NIGMS NIH HHS grantid: R01 GM130622 |
GroupedDBID | --- -E4 .2P .I3 0R~ 1TH 23N 2WC 4.4 48X 53G 5GY 5VS 5WA 6PF 70D AAIJN AAJKP AAMVS AAOGV AAPQZ AAPXW AARHZ AAUAY AAVAP AAWTL AAYXX ABDFA ABDTM ABEJV ABEUO ABGNP ABIXL ABJNI ABLJU ABNKS ABPQP ABPTD ABQLI ABVGC ABWST ABXVV ABZBJ ACGFS ACIWK ACPRK ACUFI ACUXJ ACYTK ADBBV ADEYI ADEZT ADGZP ADHKW ADHZD ADIPN ADNBA ADOCK ADQBN ADRDM ADRTK ADVEK ADYJX ADYVW ADZXQ AECKG AEGPL AEJOX AEKKA AEKSI AEMDU AENEX AENZO AEPUE AETBJ AEWNT AFFZL AFIYH AFOFC AFRAH AGINJ AGKEF AGORE AGQXC AGSYK AHGBF AHMBA AHXPO AIJHB AJBYB AJEEA AJEUX AJNCP ALMA_UNASSIGNED_HOLDINGS ALTZX ALUQC ALXQX ANAKG APIBT APWMN ATGXG AXUDD AZVOD BAWUL BAYMD BCRHZ BEYMZ BHONS BQUQU BTQHN CDBKE CITATION CS3 CZ4 DAKXR DILTD DU5 D~K E3Z EBS EE~ F5P F9B FLIZI FLUFQ FOEOM FQBLK GAUVT GJXCC H13 H5~ HAR HW0 HZ~ IOX J21 JXSIZ KOP KQ8 KSI KSN M-Z N9A NGC NMDNZ NOMLY O9- ODMLO OJQWA OJZSN OK1 OVD P2P PAFKI PEELM PQQKQ Q1. Q5Y RD5 ROL ROX RUSNO RW1 RXO TEORI TJP TN5 TR2 WOQ X7H YAYTL YKOAZ YXANX ZKX ~91 CGR CUY CVF ECM EIF NPM 7QO 8FD FR3 K9. NAPCQ P64 RC3 7X8 |
ID | FETCH-LOGICAL-c1661-8d93d965a4ee3a1465f0eb3bae2aa25fdb3d46bb9e01e0e254bced4f2e52dd9e3 |
ISSN | 1465-4644 1468-4357 |
IngestDate | Fri Jul 11 15:43:41 EDT 2025 Thu Aug 28 04:02:24 EDT 2025 Thu Jul 10 06:32:29 EDT 2025 Tue Jul 01 03:45:58 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 1 |
Keywords | multiple imputation Bayesian inference microbiome data missing data multiway data |
Language | English |
License | https://academic.oup.com/pages/standard-publication-reuse-rights The Author 2024. Published by Oxford University Press. All rights reserved. For Permissions, email: journals.permissions@oup.com. |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-c1661-8d93d965a4ee3a1465f0eb3bae2aa25fdb3d46bb9e01e0e254bced4f2e52dd9e3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
ORCID | 0000-0003-4663-2356 0000-0002-7298-2141 |
PMID | 39673775 |
PQID | 3233974690 |
PQPubID | 26167 |
ParticipantIDs | proquest_miscellaneous_3146775042 proquest_journals_3233974690 pubmed_primary_39673775 crossref_primary_10_1093_biostatistics_kxae047 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2024-12-14 |
PublicationDateYYYYMMDD | 2024-12-14 |
PublicationDate_xml | – month: 12 year: 2024 text: 2024-12-14 day: 14 |
PublicationDecade | 2020 |
PublicationPlace | England |
PublicationPlace_xml | – name: England – name: Oxford |
PublicationTitle | Biostatistics (Oxford, England) |
PublicationTitleAlternate | Biostatistics |
PublicationYear | 2024 |
Publisher | Oxford Publishing Limited (England) |
Publisher_xml | – name: Oxford Publishing Limited (England) |
References | Carroll (2024121421595674700_kxae047-B2) 1970; 35 Salakhutdinov (2024121421595674700_kxae047-B15) 2008 Hoff (2024121421595674700_kxae047-B10) 2015; 9 Shannon (2024121421595674700_kxae047-B16) 1948; 27 Frühwirth-Schnatter (2024121421595674700_kxae047-B6) 2024; 1 Spiegelhalter (2024121421595674700_kxae047-B17) 2002; 64 Guan (2024121421595674700_kxae047-B7) 2024; 33 Guhaniyogi (2024121421595674700_kxae047-B8) 2017; 18 Thukral (2024121421595674700_kxae047-B20) 2017; 54 Tucker (2024121421595674700_kxae047-B21) 1966; 31 Tan (2024121421595674700_kxae047-B19) 2013; 28 Hoff (2024121421595674700_kxae047-B9) 2011; 6 Wu (2024121421595674700_kxae047-B23) 2018; 30 Hore (2024121421595674700_kxae047-B11) 2016; 48 Acar (2024121421595674700_kxae047-B1) 2011; 106 Chen (2024121421595674700_kxae047-B3) 2019; 98 Cong (2024121421595674700_kxae047-B5) 2017; 66 Kolda (2024121421595674700_kxae047-B12) 2009; 51 Yokota (2024121421595674700_kxae047-B24) 2016; 64 Mazumder (2024121421595674700_kxae047-B14) 2010; 11 Stekhoven (2024121421595674700_kxae047-B18) 2012; 28 Chen (2024121421595674700_kxae047-B4) 2013; 36 Wang (2024121421595674700_kxae047-B22) 2024; 17 Liu (2024121421595674700_kxae047-B13) 2012; 35 39575114 - ArXiv. 2024 Oct 30:arXiv:2410.23412v1. |
References_xml | – volume: 11 start-page: 2287 year: 2010 ident: 2024121421595674700_kxae047-B14 article-title: Spectral regularization algorithms for learning large incomplete matrices publication-title: J Mach Learn Res – volume: 64 start-page: 583 year: 2002 ident: 2024121421595674700_kxae047-B17 article-title: Bayesian measures of model complexity and fit publication-title: J R Stat Soc Ser B (Stat Methodol doi: 10.1111/1467-9868.00353 – volume: 30 start-page: 751 year: 2018 ident: 2024121421595674700_kxae047-B23 article-title: A fused CP factorization method for incomplete tensors publication-title: IEEE Trans Neural Netw Learn Syst doi: 10.1109/TNNLS.2018.2851612 – volume: 51 start-page: 455 year: 2009 ident: 2024121421595674700_kxae047-B12 article-title: Tensor decompositions and applications publication-title: SIAM Rev. doi: 10.1137/07070111X – volume: 33 start-page: 538 year: 2024 ident: 2024121421595674700_kxae047-B7 article-title: Smooth and probabilistic parafac model with auxiliary covariates publication-title: J Comput Graph Stat doi: 10.1080/10618600.2023.2257783 – volume: 106 start-page: 41 year: 2011 ident: 2024121421595674700_kxae047-B1 article-title: Scalable tensor factorizations for incomplete data publication-title: Chemom Intell Lab Syst doi: 10.1016/j.chemolab.2010.08.004 – volume: 1 start-page: 1 year: 2024 ident: 2024121421595674700_kxae047-B6 article-title: Sparse Bayesian factor analysis when the number of factors is unknown publication-title: Bayesian Anal. doi: 10.1214/24-BA1423 – volume: 48 start-page: 1094 year: 2016 ident: 2024121421595674700_kxae047-B11 article-title: Tensor decomposition for multiple-tissue gene expression experiments publication-title: Nat Genet doi: 10.1038/ng.3624 – volume: 36 start-page: 577 year: 2013 ident: 2024121421595674700_kxae047-B4 article-title: Simultaneous tensor decomposition and completion using factor priors publication-title: IEEE Trans Pattern Anal Mach Intell doi: 10.1109/TPAMI.2013.164 – volume: 6 start-page: 179 year: 2011 ident: 2024121421595674700_kxae047-B9 article-title: Separable covariance arrays via the Tucker product, with applications to multivariate relational data publication-title: Bayesian Anal. doi: 10.1214/11-BA606 – volume: 35 start-page: 283 year: 1970 ident: 2024121421595674700_kxae047-B2 article-title: Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young” decomposition publication-title: Psychometrika doi: 10.1007/BF02310791 – start-page: 880 year: 2008 ident: 2024121421595674700_kxae047-B15 – volume: 66 start-page: 123 year: 2017 ident: 2024121421595674700_kxae047-B5 article-title: Influence of feeding type on gut microbiome development in hospitalized preterm infants publication-title: Nursing Res doi: 10.1097/NNR.0000000000000208 – volume: 54 start-page: 1 year: 2017 ident: 2024121421595674700_kxae047-B20 article-title: A review on measurement of alpha diversity in biology publication-title: Agric Res J – volume: 9 start-page: 1169 year: 2015 ident: 2024121421595674700_kxae047-B10 article-title: Multilinear tensor regression for longitudinal relational data publication-title: Annals Appl Stat doi: 10.1214/15-AOAS839 – volume: 28 start-page: 112 year: 2012 ident: 2024121421595674700_kxae047-B18 article-title: Missforest—non-parametric missing value imputation for mixed-type data publication-title: Bioinformatics doi: 10.1093/bioinformatics/btr597 – volume: 28 start-page: 15 year: 2013 ident: 2024121421595674700_kxae047-B19 article-title: A tensor-based method for missing traffic data completion publication-title: Trans Res C Emerg Technol doi: 10.1016/j.trc.2012.12.007 – volume: 64 start-page: 5423 year: 2016 ident: 2024121421595674700_kxae047-B24 article-title: Smooth parafac decomposition for tensor completion publication-title: IEEE Trans Signal Process doi: 10.1109/TSP.2016.2586759 – volume: 18 start-page: 1 year: 2017 ident: 2024121421595674700_kxae047-B8 article-title: Bayesian tensor regression publication-title: J Mach Learn Res – volume: 27 start-page: 379 year: 1948 ident: 2024121421595674700_kxae047-B16 article-title: A mathematical theory of communication publication-title: Bell Syst Techn J doi: 10.1002/j.1538-7305.1948.tb01338.x – volume: 17 start-page: 199 year: 2024 ident: 2024121421595674700_kxae047-B22 article-title: Bayesian tensor-on-tensor regression with efficient computation publication-title: Stat Its Interface. doi: 10.4310/23-SII786 – volume: 31 start-page: 279 year: 1966 ident: 2024121421595674700_kxae047-B21 article-title: Some mathematical notes on three-mode factor analysis publication-title: Psychometrika. doi: 10.1007/BF02289464 – volume: 98 start-page: 73 year: 2019 ident: 2024121421595674700_kxae047-B3 article-title: A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation publication-title: Transport Res C Emerging Technol doi: 10.1016/j.trc.2018.11.003 – volume: 35 start-page: 208 year: 2012 ident: 2024121421595674700_kxae047-B13 article-title: Tensor completion for estimating missing values in visual data publication-title: IEEE Trans Pattern Anal Mach Intell doi: 10.1109/TPAMI.2012.39 – reference: 39575114 - ArXiv. 2024 Oct 30:arXiv:2410.23412v1. |
SSID | ssj0022363 |
Score | 2.3935537 |
Snippet | Data increasingly take the form of a multi-way array, or tensor, in several biomedical domains. Such tensors are often incompletely observed. For example, we... |
SourceID | proquest pubmed crossref |
SourceType | Aggregation Database Index Database |
SubjectTerms | Arrays Bayes Theorem Bayesian analysis Biostatistics - methods Data Interpretation, Statistical Humans Microbiomes Microbiota Missing data Models, Statistical Species diversity Tensors Uncertainty |
Title | BAMITA: Bayesian multiple imputation for tensor arrays |
URI | https://www.ncbi.nlm.nih.gov/pubmed/39673775 https://www.proquest.com/docview/3233974690 https://www.proquest.com/docview/3146775042 |
Volume | 26 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1bS8MwFA46EXwR79YbFXzt7JK0W3zbRBneQJggvpSkPYUibLJNcP56T5q0a0VFfSlbOk7TfN3pd07OhZATjpZOysPUS0MBHkeC60mFNg_q41SmHFVmnsV_exf2H_jVY_BYtGW32SVT1Yzfv8wr-Q-qOIa46izZPyBbCsUB_Iz44hERxuOvMO51UfPkHvKenEGeDlkGCGa6W0MlkhDNVR0vOR7L2aS2k5uNdFKRrdesi4--FfHutsFHxVlwlVn38lNWySG7yYx3fT5QUbI2dNg6FmhevtAkdFpdyMPA46Epz9iEYqzjIcNqf6l9TWUqVZ03fn9-k-Cbspr1etef3kNldKDZF2dRTVBkxSySJYoWgW5WcX1fbhghycmb5pVzLpK1BDutiTm1Yuo05BvbIucYgzWyao0Dt2uQXicLMNwgy6Zd6GyThAbvM7dA2y3Qdudou4iea9B2Ddpb5OHyYnDe92zfCy9uIV3yOolgiQgDyQGY1LeU-qCYkkClpEGaKJbwUCkBfgt8QBNfxZDwlEJAk0QA2yaN4WgIu8SlVEgRC-nHgFS51ZGMIR9hwGWCpr_iDmkWyxC9mPIm0Y_L75CDYrEi-0-YRIwypLXa0eKQ4_I06im9-SSHMHrF3-hXsu4lQB2yYxa5vCITultSO9j762z2ycr8yT0gjen4FQ6RJE7VUf54fABTOmt- |
linkProvider | Colorado Alliance of Research Libraries |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=BAMITA%3A+Bayesian+multiple+imputation+for+tensor+arrays&rft.jtitle=Biostatistics+%28Oxford%2C+England%29&rft.au=Jiang%2C+Ziren&rft.au=Li%2C+Gen&rft.au=Lock%2C+Eric+F&rft.date=2024-12-14&rft.issn=1465-4644&rft.eissn=1468-4357&rft_id=info:doi/10.1093%2Fbiostatistics%2Fkxae047&rft.externalDBID=n%2Fa&rft.externalDocID=10_1093_biostatistics_kxae047 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1465-4644&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1465-4644&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1465-4644&client=summon |