Convex Non-negative Matrix Factorization in the Wild

Non-negative matrix factorization (NMF) has recently received a lot of attention in data mining, information retrieval, and computer vision. It factorizes a non-negative input matrix V into two non-negative matrix factors V = WH such that W describes "clusters" of the datasets. Analyzing g...

Full description

Saved in:
Bibliographic Details
Published in2009 Ninth IEEE International Conference on Data Mining pp. 523 - 532
Main Authors Thurau, C., Kersting, K., Bauckhage, C.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2009
Subjects
Online AccessGet full text
ISBN9781424452422
1424452422
ISSN1550-4786
DOI10.1109/ICDM.2009.55

Cover

Abstract Non-negative matrix factorization (NMF) has recently received a lot of attention in data mining, information retrieval, and computer vision. It factorizes a non-negative input matrix V into two non-negative matrix factors V = WH such that W describes "clusters" of the datasets. Analyzing genotypes, social networks, or images, it can be beneficial to ensure V to contain meaningful "cluster centroids", i.e., to restrict W to be convex combinations of data points. But how can we run this convex NMF in the wild, i.e., given millions of data points? Triggered by the simple observation that each data point is a convex combination of vertices of the data convex hull, we propose to restrict W further to be vertices of the convex hull. The benefits of this convex-hull NMF approach are twofold. First, the expected size of the convex hull of, for example, n random Gaussian points in the plane is ¿(¿log n), i.e., the candidate set typically grows much slower than the data set. Second, distance preserving low-dimensional embeddings allow one to compute candidate vertices efficiently. Our extensive experimental evaluation shows that convex-hull NMF compares favorably to convex NMF for large data sets both in terms of speed and reconstruction quality. Moreover, we show that our method can easily be applied to large-scale, real-world data sets, in our case consisting of 1.6 million images respectively 150 million votes on World of Warcraft ® guilds.
AbstractList Non-negative matrix factorization (NMF) has recently received a lot of attention in data mining, information retrieval, and computer vision. It factorizes a non-negative input matrix V into two non-negative matrix factors V = WH such that W describes "clusters" of the datasets. Analyzing genotypes, social networks, or images, it can be beneficial to ensure V to contain meaningful "cluster centroids", i.e., to restrict W to be convex combinations of data points. But how can we run this convex NMF in the wild, i.e., given millions of data points? Triggered by the simple observation that each data point is a convex combination of vertices of the data convex hull, we propose to restrict W further to be vertices of the convex hull. The benefits of this convex-hull NMF approach are twofold. First, the expected size of the convex hull of, for example, n random Gaussian points in the plane is ¿(¿log n), i.e., the candidate set typically grows much slower than the data set. Second, distance preserving low-dimensional embeddings allow one to compute candidate vertices efficiently. Our extensive experimental evaluation shows that convex-hull NMF compares favorably to convex NMF for large data sets both in terms of speed and reconstruction quality. Moreover, we show that our method can easily be applied to large-scale, real-world data sets, in our case consisting of 1.6 million images respectively 150 million votes on World of Warcraft ® guilds.
Author Thurau, C.
Kersting, K.
Bauckhage, C.
Author_xml – sequence: 1
  givenname: C.
  surname: Thurau
  fullname: Thurau, C.
  organization: Fraunhofer IAIS, St. Augustin, Germany
– sequence: 2
  givenname: K.
  surname: Kersting
  fullname: Kersting, K.
  organization: Fraunhofer IAIS, St. Augustin, Germany
– sequence: 3
  givenname: C.
  surname: Bauckhage
  fullname: Bauckhage, C.
  organization: Fraunhofer IAIS, St. Augustin, Germany
BookMark eNotjk1Lw0AURUesYFO7c-dm_kDim6_3MkuJVgutbhSXZZq86EidSBpK9dcb0MXlwuVwOZmYpC6xEJcKCqXAXy-r23WhAXzh3InIgNA7U3rnT8XcU6msttZpq_VETJVzkFsq8Vxk-_0HgEE0MBW26tKBj_KxS3nitzDEA8t1GPp4lItQD10ff8axSzImObyzfI275kKctWG35_l_z8TL4u65eshXT_fL6maVR0VuyJGsa6BltgqoJlIYDDHVTHY7WlsocYsaXaNRj2aAbdt4O4ZLr9pQm5m4-vuNzLz56uNn6L83ziCMvPkFh5dHHg
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/ICDM.2009.55
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 0769538959
9780769538952
EndPage 532
ExternalDocumentID 5360278
Genre orig-research
GroupedDBID 29O
6IE
6IF
6IH
6IK
6IL
6IN
AAJGR
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
RNS
ID FETCH-LOGICAL-i175t-6745d0fee4107c7716a37e7ce74b0094086b6265d26278606ffd94fd9e891fac3
IEDL.DBID RIE
ISBN 9781424452422
1424452422
ISSN 1550-4786
IngestDate Wed Aug 27 02:47:04 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i175t-6745d0fee4107c7716a37e7ce74b0094086b6265d26278606ffd94fd9e891fac3
PageCount 10
ParticipantIDs ieee_primary_5360278
PublicationCentury 2000
PublicationDate 2009-Dec.
PublicationDateYYYYMMDD 2009-12-01
PublicationDate_xml – month: 12
  year: 2009
  text: 2009-Dec.
PublicationDecade 2000
PublicationTitle 2009 Ninth IEEE International Conference on Data Mining
PublicationTitleAbbrev ICDM
PublicationYear 2009
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0036630
ssib015831989
ssj0000453561
Score 1.5771571
Snippet Non-negative matrix factorization (NMF) has recently received a lot of attention in data mining, information retrieval, and computer vision. It factorizes a...
SourceID ieee
SourceType Publisher
StartPage 523
SubjectTerms archetypal analysis
Computer vision
Data analysis
data handling
Data mining
Embedded computing
Image analysis
Image reconstruction
Information retrieval
Large-scale systems
matrix decomposition
non negative matrix factorization
social network analysis
Social network services
Voting
Title Convex Non-negative Matrix Factorization in the Wild
URI https://ieeexplore.ieee.org/document/5360278
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA9zJ09TN_GbHDyarW2-2vN0TKHDg4PdRpO8yhA6kQ2Gf70vaTtFPHgotD0l4eV9JO_3-xFya1VcuEgDS0XmmOClZSl3hkldaGNLDBilByfnMzWdi6eFXHTI3R4LAwCh-QyG_jXc5bu13fqjspHkyl-UHZADNLMaq9XaTixTHrdKiMELC8kDZrP2yhwjawBHYkbOhE5VC_KSGKOSlvup_d53yGejx_F9XtNaejTgDwWWEIAmPZK3Q6_7Tt6G240Z2s9frI7_ndsRGXxD_ejzPogdkw5UJ6TXaj3QZuv3iRj79vQdna0rVsFroAunuef339FJ0OxpAJ10VVFMKim6Gzcg88nDy3jKGsUFtsI0YsOUFtJFJYDAqtBqrKUKrkFb0ML4HkSsfwxWQNIlCseKtU9ZukzgA2kWl4Xlp6RbrSs4I9TFFjOJIkoKx4Vw3IAwhUiNixXYSCXnpO8XYflek2osm_lf_P37khwmjXBDFF-R7uZjC9eYDWzMTTCDL5yjqts
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NT8JAEN0gHvSECsZv9-DRhbb71Z5RAkqJB0i4ke5HDTEpxkBC_PXObls0xoOHJm1Pu83uvJnuvPcQutMizEwgLYlZYgijuSYxNYpwmUmlcwCM3JGT04kYztjTnM8b6H7HhbHW-uYz23W3_izfrPTG_SrrcSrcQdke2gfcZ7xka9WrJ-QxDWsvRB-HGaeetVnGZQrY6umRkJMTJmNR07w4oFRUqz_Vz7se-aQ36j-kpbCl4wP-8GDxEDRoobQefNl58tbdrFVXf_7Sdfzv7I5Q55vsh192MHaMGrY4Qa3a7QFXm7-NWN81qG_xZFWQwr56wXCcOoX_LR54156K0omXBYa0EkPAMR00GzxO-0NSeS6QJSQSayIk4ybIrWVQF2oJ1VRGpZXaSqZcFyJUQApqIG4iAWOF6ifPTcLgsnES5pmmp6hZrAp7hrAJNeQSWRBlhjJmqLJMZSxWJhRWByI6R233ERbvpazGopr_xd-vb9HBcJqOF-PR5PkSHUaVjUMQXqHm-mNjryE3WKsbvyS-AMririg
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2009+Ninth+IEEE+International+Conference+on+Data+Mining&rft.atitle=Convex+Non-negative+Matrix+Factorization+in+the+Wild&rft.au=Thurau%2C+C.&rft.au=Kersting%2C+K.&rft.au=Bauckhage%2C+C.&rft.date=2009-12-01&rft.pub=IEEE&rft.isbn=9781424452422&rft.issn=1550-4786&rft.spage=523&rft.epage=532&rft_id=info:doi/10.1109%2FICDM.2009.55&rft.externalDocID=5360278
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1550-4786&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1550-4786&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1550-4786&client=summon