Biclustering with Alternating K-Means

Biclustering is the task of simultaneously clustering the rows and columns of the data matrix into different subgroups such that the rows and columns within a subgroup exhibit similar patterns. In this paper, we consider the case of producing block-diagonal biclusters. We provide a new formulation o...

Full description

Saved in:

Bibliographic Details
Main Authors	Fraiman, Nicolas, Li, Zichao
Format	Journal Article
Language	English
Published	09.09.2020
Subjects	Computer Science - Learning Statistics - Machine Learning
Online Access	Get full text

Cover

Loading…

Abstract	Biclustering is the task of simultaneously clustering the rows and columns of the data matrix into different subgroups such that the rows and columns within a subgroup exhibit similar patterns. In this paper, we consider the case of producing block-diagonal biclusters. We provide a new formulation of the biclustering problem based on the idea of minimizing the empirical clustering risk. We develop and prove a consistency result with respect to the empirical clustering risk. Since the optimization problem is combinatorial in nature, finding the global minimum is computationally intractable. In light of this fact, we propose a simple and novel algorithm that finds a local minimum by alternating the use of an adapted version of the k-means clustering algorithm between columns and rows. We evaluate and compare the performance of our algorithm to other related biclustering methods on both simulated data and real-world gene expression data sets. The results demonstrate that our algorithm is able to detect meaningful structures in the data and outperform other competing biclustering methods in various settings and situations.
AbstractList	Biclustering is the task of simultaneously clustering the rows and columns of the data matrix into different subgroups such that the rows and columns within a subgroup exhibit similar patterns. In this paper, we consider the case of producing block-diagonal biclusters. We provide a new formulation of the biclustering problem based on the idea of minimizing the empirical clustering risk. We develop and prove a consistency result with respect to the empirical clustering risk. Since the optimization problem is combinatorial in nature, finding the global minimum is computationally intractable. In light of this fact, we propose a simple and novel algorithm that finds a local minimum by alternating the use of an adapted version of the k-means clustering algorithm between columns and rows. We evaluate and compare the performance of our algorithm to other related biclustering methods on both simulated data and real-world gene expression data sets. The results demonstrate that our algorithm is able to detect meaningful structures in the data and outperform other competing biclustering methods in various settings and situations.
Author	Fraiman, Nicolas Li, Zichao
Author_xml	– sequence: 1 givenname: Nicolas surname: Fraiman fullname: Fraiman, Nicolas – sequence: 2 givenname: Zichao surname: Li fullname: Li, Zichao
BackLink	https://doi.org/10.48550/arXiv.2009.04550$$DView paper in arXiv
BookMark	eNotjjsPgjAURjvo4OsHOMniCBZoSxnV-IoaF3dyW261CVYD-Pr3Pqcv3xlOTps03NkhIf2QBkxyTkdQPuwtiChNA8reoEWGE6uLa1Vjad3Bu9v66I2L93NQf8Da3yK4qkuaBooKe__tkP18tp8u_c1usZqONz6IhPqhQQEGeaipFDnjkokk4YmUeQQqVHEkpRG5STXqiKMRwFBxocEwGisUKu6QwU_77cwupT1B-cw-vdm3N34Bg4w8Gg
ContentType	Journal Article
Copyright	http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml	– notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID	AKY EPD GOX
DOI	10.48550/arxiv.2009.04550
DatabaseName	arXiv Computer Science arXiv Statistics arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2009_04550
GroupedDBID	AKY EPD GOX
ID	FETCH-LOGICAL-a670-1fe6afe51c086d45846775788d2ab1b3288f6df9cec25ef6a4eb56caf403be6b3
IEDL.DBID	GOX
IngestDate	Mon Jan 08 05:46:16 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a670-1fe6afe51c086d45846775788d2ab1b3288f6df9cec25ef6a4eb56caf403be6b3
OpenAccessLink	https://arxiv.org/abs/2009.04550
ParticipantIDs	arxiv_primary_2009_04550
PublicationCentury	2000
PublicationDate	2020-09-09
PublicationDateYYYYMMDD	2020-09-09
PublicationDate_xml	– month: 09 year: 2020 text: 2020-09-09 day: 09
PublicationDecade	2020
PublicationYear	2020
Score	1.7764759
SecondaryResourceType	preprint
Snippet	Biclustering is the task of simultaneously clustering the rows and columns of the data matrix into different subgroups such that the rows and columns within a...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Learning Statistics - Machine Learning
Title	Biclustering with Alternating K-Means
URI	https://arxiv.org/abs/2009.04550
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NSwMxEB3anryIolI_2YMeg7v52uRYxVqU6qXC3kqSnYAgIrWV_nwn2RW9eMxkckhyePNmMi8Al2WI0bbesNhGyaTwFTPEd5gTXjnjBAqZGpznT3r2Ih8a1Qyg-OmFcavt61enD-w_rzs5ydR4O4Qh5-nJ1v1z0xUnsxRX7__rRzFmNv0Bieke7PbRXTHprmMfBvh-AFc3NNgkQQKCiSIlPovJW5-HI8MjmyPhxSEspneL2xnrfydgTtclqyJqF1FVgUhBKzOOJ21403LnKy-4MVG30QYMXGHUTqJXOrgoS-FRe3EEIyL4OIai1NY6QUBaKpQ81LS-JuQPpSPaG4w8hnHe0_KjE6BIX0faZd7uyf9Tp7DDEzdMxQ97BqP1aoPnBKBrf5FP8RvbkXAH
link.rule.ids	228,230,786,891
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Biclustering+with+Alternating+K-Means&rft.au=Fraiman%2C+Nicolas&rft.au=Li%2C+Zichao&rft.date=2020-09-09&rft_id=info:doi/10.48550%2Farxiv.2009.04550&rft.externalDocID=2009_04550