Keyqueries for Clustering and Labeling
In this paper we revisit the document clustering problem from an information retrieval perspective. The idea is to use queries as features in the clustering process that finally also serve as descriptive cluster labels “for free.” Our novel perspective includes query constraints for clustering and c...
Saved in:
Published in | Information Retrieval Technology Vol. 9994; pp. 42 - 55 |
---|---|
Main Authors | , , , |
Format | Book Chapter |
Language | English |
Published |
Switzerland
Springer International Publishing AG
2016
Springer International Publishing |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
ISBN | 9783319480503 3319480502 |
ISSN | 0302-9743 1611-3349 |
DOI | 10.1007/978-3-319-48051-0_4 |
Cover
Abstract | In this paper we revisit the document clustering problem from an information retrieval perspective. The idea is to use queries as features in the clustering process that finally also serve as descriptive cluster labels “for free.” Our novel perspective includes query constraints for clustering and cluster labeling that ensure consistency with a keyword-based reference search engine.
Our approach combines different methods in a three-step pipeline. Overall, a query-constrained variant of k-means using noun phrase queries against an ESA-based search engine performs best. In the evaluation, we introduce a soft clustering measure as well as a freely available extended version of the Ambient dataset. We compare our approach to two often-used baselines, descriptive k-means and k-means plus χ2 $$\chi ^2$$ . While the derived clusters are of comparable high quality, the evaluation of the corresponding cluster labels reveals a great diversity in the explanatory power. In a user study with 49 participants, the labels generated by our approach are of significantly higher discriminative power, leading to an increased human separability of the computed clusters. |
---|---|
AbstractList | In this paper we revisit the document clustering problem from an information retrieval perspective. The idea is to use queries as features in the clustering process that finally also serve as descriptive cluster labels “for free.” Our novel perspective includes query constraints for clustering and cluster labeling that ensure consistency with a keyword-based reference search engine.
Our approach combines different methods in a three-step pipeline. Overall, a query-constrained variant of k-means using noun phrase queries against an ESA-based search engine performs best. In the evaluation, we introduce a soft clustering measure as well as a freely available extended version of the Ambient dataset. We compare our approach to two often-used baselines, descriptive k-means and k-means plus χ2 $$\chi ^2$$ . While the derived clusters are of comparable high quality, the evaluation of the corresponding cluster labels reveals a great diversity in the explanatory power. In a user study with 49 participants, the labels generated by our approach are of significantly higher discriminative power, leading to an increased human separability of the computed clusters. |
Author | Busse, Matthias Hagen, Matthias Stein, Benno Gollub, Tim |
Author_xml | – sequence: 1 givenname: Tim surname: Gollub fullname: Gollub, Tim – sequence: 2 givenname: Matthias surname: Busse fullname: Busse, Matthias – sequence: 3 givenname: Benno surname: Stein fullname: Stein, Benno – sequence: 4 givenname: Matthias surname: Hagen fullname: Hagen, Matthias email: matthias.hagen@uni-weimar.de |
BookMark | eNqNkD1PwzAQhg0URFr6C1gysRnOvtiOR1TxJSKxwGw56bl8REmJ04F_j9vSnVtOr--ek993yiZd3xFjlwKuBYC5sabkyFFYXpSgBAdXHLEppoedhmOWCS0ERyzsCZun9cMMcMIyQJDcmgLPWGZ1oS1oYc7ZPMZPABBGC9Q6Y1fP9PO9oeGDYh76IV-0mzgm2a1y3y3zytfUJnHBToNvI83_-oy93d-9Lh559fLwtLit-ApRj5wCNmFZSi9LA7IRJWHQtTReE9VSicKA90HKUCtLtQqWLDRNnUob1WiPMyb2d-N6-wcaXN33X9EJcNtQXHLp0CWfbheCS6EkRu6Z9dAnK3F0tIUa6sbBt827XydD0WkEAygdWqf-DSllFRQH6BexnHK9 |
ContentType | Book Chapter |
Copyright | Springer International Publishing AG 2016 |
Copyright_xml | – notice: Springer International Publishing AG 2016 |
DBID | FFUUA |
DEWEY | 025.524 |
DOI | 10.1007/978-3-319-48051-0_4 |
DatabaseName | ProQuest Ebook Central - Book Chapters - Demo use only |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science Library & Information Science |
EISBN | 3319480510 9783319480510 |
EISSN | 1611-3349 |
Editor | Liu, Yiqun Dou, Zhicheng Zhao, Xin Ma, Shaoping Wen, Ji-Rong Chang, Yi Zhang, Min |
Editor_xml | – sequence: 1 fullname: Liu, Yiqun – sequence: 2 fullname: Dou, Zhicheng – sequence: 3 fullname: Zhao, Xin – sequence: 4 fullname: Ma, Shaoping – sequence: 5 fullname: Wen, Ji-Rong – sequence: 6 fullname: Chang, Yi – sequence: 7 fullname: Zhang, Min |
EndPage | 55 |
ExternalDocumentID | EBC6307032_39_54 EBC5595042_39_54 |
GroupedDBID | 0D6 0DA 38. AABBV AAMCO AAPIT AAQZU ABBVZ ABMNI ABOWU ACLMJ ADCXD AEDXK AEJGN AEJLV AEKFX AEZAY ALMA_UNASSIGNED_HOLDINGS AORVH AWFBM AZZ BBABE CZZ FFUUA I4C IEZ SBO SWNTM TPJZQ TSXQS Z7R Z7U Z7Z Z81 Z83 Z87 Z88 -DT -GH -~X 1SB 29L 2HA 2HV 5QI 875 AASHB ACGFS AEFIE EJD F5P FEDTE HVGLF LAS LDH P2P RIG RNI RSU SVGTG VI1 ~02 |
ID | FETCH-LOGICAL-g336t-ef3cfd82a28702c18e3f6b27a6eeb251470aaf22fb59eb5f9e90ccbbbb675c6a3 |
ISBN | 9783319480503 3319480502 |
ISSN | 0302-9743 |
IngestDate | Tue Jul 29 20:16:08 EDT 2025 Thu May 29 17:26:44 EDT 2025 Wed May 28 23:39:59 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
LCCallNum | QA75.5-76.95QA76.9.D |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-g336t-ef3cfd82a28702c18e3f6b27a6eeb251470aaf22fb59eb5f9e90ccbbbb675c6a3 |
Notes | Original Abstract: In this paper we revisit the document clustering problem from an information retrieval perspective. The idea is to use queries as features in the clustering process that finally also serve as descriptive cluster labels “for free.” Our novel perspective includes query constraints for clustering and cluster labeling that ensure consistency with a keyword-based reference search engine. Our approach combines different methods in a three-step pipeline. Overall, a query-constrained variant of k-means using noun phrase queries against an ESA-based search engine performs best. In the evaluation, we introduce a soft clustering measure as well as a freely available extended version of the Ambient dataset. We compare our approach to two often-used baselines, descriptive k-means and k-means plus χ2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\chi ^2$$\end{document}. While the derived clusters are of comparable high quality, the evaluation of the corresponding cluster labels reveals a great diversity in the explanatory power. In a user study with 49 participants, the labels generated by our approach are of significantly higher discriminative power, leading to an increased human separability of the computed clusters. |
OCLC | 964690617 |
PQID | EBC5595042_39_54 |
PageCount | 14 |
ParticipantIDs | springer_books_10_1007_978_3_319_48051_0_4 proquest_ebookcentralchapters_6307032_39_54 proquest_ebookcentralchapters_5595042_39_54 |
PublicationCentury | 2000 |
PublicationDate | 2016 |
PublicationDateYYYYMMDD | 2016-01-01 |
PublicationDate_xml | – year: 2016 text: 2016 |
PublicationDecade | 2010 |
PublicationPlace | Switzerland |
PublicationPlace_xml | – name: Switzerland – name: Cham |
PublicationSeriesSubtitle | Information Systems and Applications, incl. Internet/Web, and HCI |
PublicationSeriesTitle | Lecture Notes in Computer Science |
PublicationSeriesTitleAlternate | Lect.Notes Computer |
PublicationSubtitle | 12th Asia Information Retrieval Societies Conference, AIRS 2016, Beijing, China, November 30 - December 2, 2016, Proceedings |
PublicationTitle | Information Retrieval Technology |
PublicationYear | 2016 |
Publisher | Springer International Publishing AG Springer International Publishing |
Publisher_xml | – name: Springer International Publishing AG – name: Springer International Publishing |
RelatedPersons | Kleinberg, Jon M. Mattern, Friedemann Naor, Moni Mitchell, John C. Terzopoulos, Demetri Steffen, Bernhard Pandu Rangan, C. Kanade, Takeo Kittler, Josef Weikum, Gerhard Hutchison, David Tygar, Doug |
RelatedPersons_xml | – sequence: 1 givenname: David surname: Hutchison fullname: Hutchison, David – sequence: 2 givenname: Takeo surname: Kanade fullname: Kanade, Takeo – sequence: 3 givenname: Josef surname: Kittler fullname: Kittler, Josef – sequence: 4 givenname: Jon M. surname: Kleinberg fullname: Kleinberg, Jon M. – sequence: 5 givenname: Friedemann surname: Mattern fullname: Mattern, Friedemann – sequence: 6 givenname: John C. surname: Mitchell fullname: Mitchell, John C. – sequence: 7 givenname: Moni surname: Naor fullname: Naor, Moni – sequence: 8 givenname: C. surname: Pandu Rangan fullname: Pandu Rangan, C. – sequence: 9 givenname: Bernhard surname: Steffen fullname: Steffen, Bernhard – sequence: 10 givenname: Demetri surname: Terzopoulos fullname: Terzopoulos, Demetri – sequence: 11 givenname: Doug surname: Tygar fullname: Tygar, Doug – sequence: 12 givenname: Gerhard surname: Weikum fullname: Weikum, Gerhard |
SSID | ssj0001761366 ssj0002792 |
Score | 2.0426838 |
Snippet | In this paper we revisit the document clustering problem from an information retrieval perspective. The idea is to use queries as features in the clustering... |
SourceID | springer proquest |
SourceType | Publisher |
StartPage | 42 |
SubjectTerms | Artificial intelligence Head Noun Information retrieval Noun Phrase Retrieval Model Search Query Vector Space Model |
Title | Keyqueries for Clustering and Labeling |
URI | http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=5595042&ppg=54 http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=6307032&ppg=54 http://link.springer.com/10.1007/978-3-319-48051-0_4 |
Volume | 9994 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NS8MwFA86L-LBb_ya5CA7KBWXrzZHHdOhw5PKbqHJEi8yQaegf70vabN2ZSDaQylpGh75ta_vvbzfC0InREtLXGYSaZ1MwP8iiZRjmWSCdS0XhptxyLa4F4NHdjvio6oAc2CXTPW5-V7IK_kPqtAGuHqW7B-QnQ0KDXAN-MIZEIZzw_idD7OW6YIz4iFMkt8X69Nz7Ruh8qLvnf0C_e-94pBW2Hv58PURIj9xmOvASq9HALrNCECMADZiiLUw1uXNnNdI4bNjmS8EU1eDYCmyhTq1nkbhKU_-URBAseoXEpfNeaMt_Cf7Vz0RNAtRVCrOltFymrEWWrns3w6fqoBYCpaFEJ5_E-UjRYWkSt5Z2aiiMnBDnjknobGuHcyFhw205ikk2HM7QMRNtGQnW2g9bqCBS326hdolawR3cA3NeH8bdSrcMNzGFW4YcMMRtx30eN1_6A2ScmOL5JlSMU2so8aNM5L7VWZiupmlTmiS5sJaDQYnSy_y3BHiNJdWcyetvDBGwwHunRE53UWtyevE7iEM9jATjnDDfenGHAxqS7PU2NySlHeF20dncU5UWH4vc35NMQPvCjxKDoq7gObX3nNA7qPTOMnKd35XsQY2gKOoAnBUAEcBOAd_GvoQrVZv-hFqTd8-bBusv6k-Lt-bH5R4Vx8 |
linkProvider | Library Specific Holdings |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Information+Retrieval+Technology&rft.atitle=Keyqueries+for+Clustering+and+Labeling&rft.date=2016-01-01&rft.pub=Springer+International+Publishing+AG&rft.isbn=9783319480503&rft.volume=9994&rft_id=info:doi/10.1007%2F978-3-319-48051-0_4&rft.externalDBID=54&rft.externalDocID=EBC6307032_39_54 |
thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F5595042-l.jpg http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F6307032-l.jpg |