A Bayesian Finite Mixture Model with Variable Selection for Data with Mixed-type Variables

Finite mixture model is an important branch of clustering methods and can be applied on data sets with mixed types of variables. However, challenges exist in its applications. First, it typically relies on the EM algorithm which could be sensitive to the choice of initial values. Second, biomarkers...

Full description

Saved in:
Bibliographic Details
Main Authors Wang, Shu, Yabes, Jonathan G, Chang, Chung-Chou H
Format Journal Article
LanguageEnglish
Published 09.05.2019
Subjects
Online AccessGet full text
DOI10.48550/arxiv.1905.03680

Cover

Abstract Finite mixture model is an important branch of clustering methods and can be applied on data sets with mixed types of variables. However, challenges exist in its applications. First, it typically relies on the EM algorithm which could be sensitive to the choice of initial values. Second, biomarkers subject to limits of detection (LOD) are common to encounter in clinical data, which brings censored variables into finite mixture model. Additionally, researchers are recently getting more interest in variable importance due to the increasing number of variables that become available for clustering. To address these challenges, we propose a Bayesian finite mixture model to simultaneously conduct variable selection, account for biomarker LOD and obtain clustering results. We took a Bayesian approach to obtain parameter estimates and the cluster membership to bypass the limitation of the EM algorithm. To account for LOD, we added one more step in Gibbs sampling to iteratively fill in biomarker values below or above LODs. In addition, we put a spike-and-slab type of prior on each variable to obtain variable importance. Simulations across various scenarios were conducted to examine the performance of this method. Real data application on electronic health records was also conducted.
AbstractList Finite mixture model is an important branch of clustering methods and can be applied on data sets with mixed types of variables. However, challenges exist in its applications. First, it typically relies on the EM algorithm which could be sensitive to the choice of initial values. Second, biomarkers subject to limits of detection (LOD) are common to encounter in clinical data, which brings censored variables into finite mixture model. Additionally, researchers are recently getting more interest in variable importance due to the increasing number of variables that become available for clustering. To address these challenges, we propose a Bayesian finite mixture model to simultaneously conduct variable selection, account for biomarker LOD and obtain clustering results. We took a Bayesian approach to obtain parameter estimates and the cluster membership to bypass the limitation of the EM algorithm. To account for LOD, we added one more step in Gibbs sampling to iteratively fill in biomarker values below or above LODs. In addition, we put a spike-and-slab type of prior on each variable to obtain variable importance. Simulations across various scenarios were conducted to examine the performance of this method. Real data application on electronic health records was also conducted.
Author Chang, Chung-Chou H
Yabes, Jonathan G
Wang, Shu
Author_xml – sequence: 1
  givenname: Shu
  surname: Wang
  fullname: Wang, Shu
– sequence: 2
  givenname: Jonathan G
  surname: Yabes
  fullname: Yabes, Jonathan G
– sequence: 3
  givenname: Chung-Chou H
  surname: Chang
  fullname: Chang, Chung-Chou H
BackLink https://doi.org/10.48550/arXiv.1905.03680$$DView paper in arXiv
BookMark eNqFjrsOgkAQAK_QwtcHWLk_AB5BDJa-iI2VxsKGrLLETc47cpwKfy8-Yms1zUwyXdHSRpMQw0D6kziK5BhtxXc_mMnIl-E0lh1xnMMCayoZNSSs2RFsuXI329BkpODB7gIHtIwnRbAjRWfHRkNuLKzQ4UdoGso8Vxf0c8u-aOeoShp82ROjZL1fbrz3RVpYvqKt09dN-r4J_xtPsrhBWw
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
EPD
GOX
DOI 10.48550/arxiv.1905.03680
DatabaseName arXiv Computer Science
arXiv Statistics
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 1905_03680
GroupedDBID AKY
EPD
GOX
ID FETCH-arxiv_primary_1905_036803
IEDL.DBID GOX
IngestDate Tue Jul 22 23:14:04 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-arxiv_primary_1905_036803
OpenAccessLink https://arxiv.org/abs/1905.03680
ParticipantIDs arxiv_primary_1905_03680
PublicationCentury 2000
PublicationDate 2019-05-09
PublicationDateYYYYMMDD 2019-05-09
PublicationDate_xml – month: 05
  year: 2019
  text: 2019-05-09
  day: 09
PublicationDecade 2010
PublicationYear 2019
Score 3.3787098
SecondaryResourceType preprint
Snippet Finite mixture model is an important branch of clustering methods and can be applied on data sets with mixed types of variables. However, challenges exist in...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Learning
Statistics - Applications
Statistics - Machine Learning
Title A Bayesian Finite Mixture Model with Variable Selection for Data with Mixed-type Variables
URI https://arxiv.org/abs/1905.03680
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1LT8MwDLbGTlwQCNB4-8A10KZZ1x7Ho0xIgwMPVVyqPKVKCKGtoPHvcZLxuOyaOJHlSPH3ObENcOoyPXKuyFiacyIoBAlYyU3OrBaysEZYE_qQTe_yyZO4rYd1D_AnF0bOFu1nrA-s5ufkrYZndMcWRMrXOPfk6ua-jo-ToRTXUv5PjjBmGPrnJKpN2FiiOxzH49iCnn3bhpcxXsgv67MVsWo9xsNpu_CRe_SdyF7Rx0LxmUirT2PCh9CZhsyFhCfxSnYyCtAaa5gPmf7KznfgpLp-vJywoE3zHktHNF7RJiia7UKfCL4dACZOcZNwq1PthFRCjaTxhf4014lMc70Hg1W77K-eOoB1cu5l-JxXHkK_m33YI3KgnToOVvwGTbx16g
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Bayesian+Finite+Mixture+Model+with+Variable+Selection+for+Data+with+Mixed-type+Variables&rft.au=Wang%2C+Shu&rft.au=Yabes%2C+Jonathan+G&rft.au=Chang%2C+Chung-Chou+H&rft.date=2019-05-09&rft_id=info:doi/10.48550%2Farxiv.1905.03680&rft.externalDocID=1905_03680