Biclustering with Alternating K-Means
Biclustering is the task of simultaneously clustering the rows and columns of the data matrix into different subgroups such that the rows and columns within a subgroup exhibit similar patterns. In this paper, we consider the case of producing block-diagonal biclusters. We provide a new formulation o...
Saved in:
Main Authors | , |
---|---|
Format | Journal Article |
Language | English |
Published |
09.09.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Biclustering is the task of simultaneously clustering the rows and columns of
the data matrix into different subgroups such that the rows and columns within
a subgroup exhibit similar patterns. In this paper, we consider the case of
producing block-diagonal biclusters. We provide a new formulation of the
biclustering problem based on the idea of minimizing the empirical clustering
risk. We develop and prove a consistency result with respect to the empirical
clustering risk. Since the optimization problem is combinatorial in nature,
finding the global minimum is computationally intractable. In light of this
fact, we propose a simple and novel algorithm that finds a local minimum by
alternating the use of an adapted version of the k-means clustering algorithm
between columns and rows. We evaluate and compare the performance of our
algorithm to other related biclustering methods on both simulated data and
real-world gene expression data sets. The results demonstrate that our
algorithm is able to detect meaningful structures in the data and outperform
other competing biclustering methods in various settings and situations. |
---|---|
AbstractList | Biclustering is the task of simultaneously clustering the rows and columns of
the data matrix into different subgroups such that the rows and columns within
a subgroup exhibit similar patterns. In this paper, we consider the case of
producing block-diagonal biclusters. We provide a new formulation of the
biclustering problem based on the idea of minimizing the empirical clustering
risk. We develop and prove a consistency result with respect to the empirical
clustering risk. Since the optimization problem is combinatorial in nature,
finding the global minimum is computationally intractable. In light of this
fact, we propose a simple and novel algorithm that finds a local minimum by
alternating the use of an adapted version of the k-means clustering algorithm
between columns and rows. We evaluate and compare the performance of our
algorithm to other related biclustering methods on both simulated data and
real-world gene expression data sets. The results demonstrate that our
algorithm is able to detect meaningful structures in the data and outperform
other competing biclustering methods in various settings and situations. |
Author | Fraiman, Nicolas Li, Zichao |
Author_xml | – sequence: 1 givenname: Nicolas surname: Fraiman fullname: Fraiman, Nicolas – sequence: 2 givenname: Zichao surname: Li fullname: Li, Zichao |
BackLink | https://doi.org/10.48550/arXiv.2009.04550$$DView paper in arXiv |
BookMark | eNotjjsPgjAURjvo4OsHOMniCBZoSxnV-IoaF3dyW261CVYD-Pr3Pqcv3xlOTps03NkhIf2QBkxyTkdQPuwtiChNA8reoEWGE6uLa1Vjad3Bu9v66I2L93NQf8Da3yK4qkuaBooKe__tkP18tp8u_c1usZqONz6IhPqhQQEGeaipFDnjkokk4YmUeQQqVHEkpRG5STXqiKMRwFBxocEwGisUKu6QwU_77cwupT1B-cw-vdm3N34Bg4w8Gg |
ContentType | Journal Article |
Copyright | http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | AKY EPD GOX |
DOI | 10.48550/arxiv.2009.04550 |
DatabaseName | arXiv Computer Science arXiv Statistics arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 2009_04550 |
GroupedDBID | AKY EPD GOX |
ID | FETCH-LOGICAL-a670-1fe6afe51c086d45846775788d2ab1b3288f6df9cec25ef6a4eb56caf403be6b3 |
IEDL.DBID | GOX |
IngestDate | Mon Jan 08 05:46:16 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a670-1fe6afe51c086d45846775788d2ab1b3288f6df9cec25ef6a4eb56caf403be6b3 |
OpenAccessLink | https://arxiv.org/abs/2009.04550 |
ParticipantIDs | arxiv_primary_2009_04550 |
PublicationCentury | 2000 |
PublicationDate | 2020-09-09 |
PublicationDateYYYYMMDD | 2020-09-09 |
PublicationDate_xml | – month: 09 year: 2020 text: 2020-09-09 day: 09 |
PublicationDecade | 2020 |
PublicationYear | 2020 |
Score | 1.7764759 |
SecondaryResourceType | preprint |
Snippet | Biclustering is the task of simultaneously clustering the rows and columns of
the data matrix into different subgroups such that the rows and columns within
a... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Learning Statistics - Machine Learning |
Title | Biclustering with Alternating K-Means |
URI | https://arxiv.org/abs/2009.04550 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NSwMxEB3anryIolI_2YMeg7v52uRYxVqU6qXC3kqSnYAgIrWV_nwn2RW9eMxkckhyePNmMi8Al2WI0bbesNhGyaTwFTPEd5gTXjnjBAqZGpznT3r2Ih8a1Qyg-OmFcavt61enD-w_rzs5ydR4O4Qh5-nJ1v1z0xUnsxRX7__rRzFmNv0Bieke7PbRXTHprmMfBvh-AFc3NNgkQQKCiSIlPovJW5-HI8MjmyPhxSEspneL2xnrfydgTtclqyJqF1FVgUhBKzOOJ21403LnKy-4MVG30QYMXGHUTqJXOrgoS-FRe3EEIyL4OIai1NY6QUBaKpQ81LS-JuQPpSPaG4w8hnHe0_KjE6BIX0faZd7uyf9Tp7DDEzdMxQ97BqP1aoPnBKBrf5FP8RvbkXAH |
link.rule.ids | 228,230,786,891 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Biclustering+with+Alternating+K-Means&rft.au=Fraiman%2C+Nicolas&rft.au=Li%2C+Zichao&rft.date=2020-09-09&rft_id=info:doi/10.48550%2Farxiv.2009.04550&rft.externalDocID=2009_04550 |