A Bayesian Finite Mixture Model with Variable Selection for Data with Mixed-type Variables
Finite mixture model is an important branch of clustering methods and can be applied on data sets with mixed types of variables. However, challenges exist in its applications. First, it typically relies on the EM algorithm which could be sensitive to the choice of initial values. Second, biomarkers...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
09.05.2019
|
Subjects | |
Online Access | Get full text |
DOI | 10.48550/arxiv.1905.03680 |
Cover
Abstract | Finite mixture model is an important branch of clustering methods and can be
applied on data sets with mixed types of variables. However, challenges exist
in its applications. First, it typically relies on the EM algorithm which could
be sensitive to the choice of initial values. Second, biomarkers subject to
limits of detection (LOD) are common to encounter in clinical data, which
brings censored variables into finite mixture model. Additionally, researchers
are recently getting more interest in variable importance due to the increasing
number of variables that become available for clustering.
To address these challenges, we propose a Bayesian finite mixture model to
simultaneously conduct variable selection, account for biomarker LOD and obtain
clustering results. We took a Bayesian approach to obtain parameter estimates
and the cluster membership to bypass the limitation of the EM algorithm. To
account for LOD, we added one more step in Gibbs sampling to iteratively fill
in biomarker values below or above LODs. In addition, we put a spike-and-slab
type of prior on each variable to obtain variable importance. Simulations
across various scenarios were conducted to examine the performance of this
method. Real data application on electronic health records was also conducted. |
---|---|
AbstractList | Finite mixture model is an important branch of clustering methods and can be
applied on data sets with mixed types of variables. However, challenges exist
in its applications. First, it typically relies on the EM algorithm which could
be sensitive to the choice of initial values. Second, biomarkers subject to
limits of detection (LOD) are common to encounter in clinical data, which
brings censored variables into finite mixture model. Additionally, researchers
are recently getting more interest in variable importance due to the increasing
number of variables that become available for clustering.
To address these challenges, we propose a Bayesian finite mixture model to
simultaneously conduct variable selection, account for biomarker LOD and obtain
clustering results. We took a Bayesian approach to obtain parameter estimates
and the cluster membership to bypass the limitation of the EM algorithm. To
account for LOD, we added one more step in Gibbs sampling to iteratively fill
in biomarker values below or above LODs. In addition, we put a spike-and-slab
type of prior on each variable to obtain variable importance. Simulations
across various scenarios were conducted to examine the performance of this
method. Real data application on electronic health records was also conducted. |
Author | Chang, Chung-Chou H Yabes, Jonathan G Wang, Shu |
Author_xml | – sequence: 1 givenname: Shu surname: Wang fullname: Wang, Shu – sequence: 2 givenname: Jonathan G surname: Yabes fullname: Yabes, Jonathan G – sequence: 3 givenname: Chung-Chou H surname: Chang fullname: Chang, Chung-Chou H |
BackLink | https://doi.org/10.48550/arXiv.1905.03680$$DView paper in arXiv |
BookMark | eNqFjrsOgkAQAK_QwtcHWLk_AB5BDJa-iI2VxsKGrLLETc47cpwKfy8-Yms1zUwyXdHSRpMQw0D6kziK5BhtxXc_mMnIl-E0lh1xnMMCayoZNSSs2RFsuXI329BkpODB7gIHtIwnRbAjRWfHRkNuLKzQ4UdoGso8Vxf0c8u-aOeoShp82ROjZL1fbrz3RVpYvqKt09dN-r4J_xtPsrhBWw |
ContentType | Journal Article |
Copyright | http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | AKY EPD GOX |
DOI | 10.48550/arxiv.1905.03680 |
DatabaseName | arXiv Computer Science arXiv Statistics arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 1905_03680 |
GroupedDBID | AKY EPD GOX |
ID | FETCH-arxiv_primary_1905_036803 |
IEDL.DBID | GOX |
IngestDate | Tue Jul 22 23:14:04 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-arxiv_primary_1905_036803 |
OpenAccessLink | https://arxiv.org/abs/1905.03680 |
ParticipantIDs | arxiv_primary_1905_03680 |
PublicationCentury | 2000 |
PublicationDate | 2019-05-09 |
PublicationDateYYYYMMDD | 2019-05-09 |
PublicationDate_xml | – month: 05 year: 2019 text: 2019-05-09 day: 09 |
PublicationDecade | 2010 |
PublicationYear | 2019 |
Score | 3.3787098 |
SecondaryResourceType | preprint |
Snippet | Finite mixture model is an important branch of clustering methods and can be
applied on data sets with mixed types of variables. However, challenges exist
in... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Learning Statistics - Applications Statistics - Machine Learning |
Title | A Bayesian Finite Mixture Model with Variable Selection for Data with Mixed-type Variables |
URI | https://arxiv.org/abs/1905.03680 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1LT8MwDLbGTlwQCNB4-8A10KZZ1x7Ho0xIgwMPVVyqPKVKCKGtoPHvcZLxuOyaOJHlSPH3ObENcOoyPXKuyFiacyIoBAlYyU3OrBaysEZYE_qQTe_yyZO4rYd1D_AnF0bOFu1nrA-s5ufkrYZndMcWRMrXOPfk6ua-jo-ToRTXUv5PjjBmGPrnJKpN2FiiOxzH49iCnn3bhpcxXsgv67MVsWo9xsNpu_CRe_SdyF7Rx0LxmUirT2PCh9CZhsyFhCfxSnYyCtAaa5gPmf7KznfgpLp-vJywoE3zHktHNF7RJiia7UKfCL4dACZOcZNwq1PthFRCjaTxhf4014lMc70Hg1W77K-eOoB1cu5l-JxXHkK_m33YI3KgnToOVvwGTbx16g |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Bayesian+Finite+Mixture+Model+with+Variable+Selection+for+Data+with+Mixed-type+Variables&rft.au=Wang%2C+Shu&rft.au=Yabes%2C+Jonathan+G&rft.au=Chang%2C+Chung-Chou+H&rft.date=2019-05-09&rft_id=info:doi/10.48550%2Farxiv.1905.03680&rft.externalDocID=1905_03680 |