Fitting Penalized Estimator for Sparse Covariance Matrix with Left-Censored Data by the EM Algorithm

Estimating the sparse covariance matrix can effectively identify important features and patterns, and traditional estimation methods require complete data vectors on all subjects. When data are left-censored due to detection limits, common strategies such as excluding censored individuals or replaci...

Full description

Saved in:
Bibliographic Details
Published inMathematics (Basel) Vol. 13; no. 3; p. 423
Main Authors Lin, Shanyi, Zheng, Qian-Zhen, Shang, Laixu, Xu, Ping-Feng, Tang, Man-Lai
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 01.02.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Estimating the sparse covariance matrix can effectively identify important features and patterns, and traditional estimation methods require complete data vectors on all subjects. When data are left-censored due to detection limits, common strategies such as excluding censored individuals or replacing censored values with suitable constants may result in large biases. In this paper, we propose two penalized log-likelihood estimators, incorporating the L1 penalty and SCAD penalty, for estimating the sparse covariance matrix of a multivariate normal distribution in the presence of left-censored data. However, the fitting of these penalized estimators poses challenges due to the observed log-likelihood involving high-dimensional integration over the censored variables. To address this issue, we treat censored data as a special case of incomplete data and employ the Expectation Maximization algorithm combined with the coordinate descent algorithm to efficiently fit the two penalized estimators. Through simulation studies, we demonstrate that both penalized estimators achieve greater estimation accuracy compared to methods that replace censored values with constants. Moreover, the SCAD penalized estimator generally outperforms the L1 penalized estimator. Our method is used to analyze the proteomic datasets.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:2227-7390
2227-7390
DOI:10.3390/math13030423