A practical approach to novel class discovery in tabular data
The problem of novel class discovery (NCD) consists in extracting knowledge from a labeled set of known classes to accurately partition an unlabeled set of novel classes. While NCD has recently received a lot of attention from the community, it is often solved on computer vision problems and under u...
Saved in:
Published in | Data mining and knowledge discovery Vol. 38; no. 4; pp. 2087 - 2116 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
New York
Springer US
01.07.2024
Springer Nature B.V Springer |
Series | ECML PKDD 2024 |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | The problem of novel class discovery (NCD) consists in extracting knowledge from a labeled set of known classes to accurately partition an unlabeled set of novel classes. While NCD has recently received a lot of attention from the community, it is often solved on computer vision problems and under unrealistic conditions. In particular, the number of novel classes is usually assumed to be known in advance, and their labels are sometimes used to tune hyperparameters. Methods that rely on these assumptions are not applicable in real-world scenarios. In this work, we focus on solving NCD in tabular data when no prior knowledge of the novel classes is available. To this end, we propose to tune the hyperparameters of NCD methods by adapting the
k
-fold cross-validation process and hiding some of the known classes in each fold. Since we have found that methods with too many hyperparameters are likely to overfit these hidden classes, we define a simple deep NCD model. This method is composed of only the essential elements necessary for the NCD problem and shows robust performance under realistic conditions. Furthermore, we find that the latent space of this method can be used to reliably estimate the number of novel classes. Additionally, we adapt two unsupervised clustering algorithms (
k
-means and Spectral Clustering) to leverage the knowledge of the known classes. Extensive experiments are conducted on 7 tabular datasets and demonstrate the effectiveness of the proposed method and hyperparameter tuning process, and show that the NCD problem can be solved without relying on knowledge from the novel classes. |
---|---|
AbstractList | The problem of Novel Class Discovery (NCD) consists in extracting knowledge from a labeled set of known classes to accurately partition an unlabeled set of novel classes. While NCD has recently received a lot of attention from the community, it is often solved on computer vision problems and under unrealistic conditions. In particular, the number of novel classes is usually assumed to be known in advance, and their labels are sometimes used to tune hyperparameters. Methods that rely on these assumptions are not applicable in real-world scenarios. In this work, we focus on solving NCD in tabular data when no prior knowledge of the novel classes is available. To this end, we propose to tune the hyperparameters of NCD methods by adapting the k-fold cross-validation process and hiding some of the known classes in each fold. Since we have found that methods with too many hyperparameters are likely to overfit these hidden classes, we define a simple deep NCD model. This method is composed of only the essential elements necessary for the NCD problem and performs impressively well under realistic conditions. Furthermore, we find that the latent space of this method can be used to reliably estimate the number of novel classes. Additionally, we adapt two unsupervised clustering algorithms (k-means and Spectral Clustering) to leverage the knowledge of the known classes. Extensive experiments are conducted on 7 tabular datasets and demonstrate the effectiveness of the proposed method and hyperparameter tuning process, and show that the NCD problem can be solved without relying on knowledge from the novel classes. The problem of novel class discovery (NCD) consists in extracting knowledge from a labeled set of known classes to accurately partition an unlabeled set of novel classes. While NCD has recently received a lot of attention from the community, it is often solved on computer vision problems and under unrealistic conditions. In particular, the number of novel classes is usually assumed to be known in advance, and their labels are sometimes used to tune hyperparameters. Methods that rely on these assumptions are not applicable in real-world scenarios. In this work, we focus on solving NCD in tabular data when no prior knowledge of the novel classes is available. To this end, we propose to tune the hyperparameters of NCD methods by adapting the k -fold cross-validation process and hiding some of the known classes in each fold. Since we have found that methods with too many hyperparameters are likely to overfit these hidden classes, we define a simple deep NCD model. This method is composed of only the essential elements necessary for the NCD problem and shows robust performance under realistic conditions. Furthermore, we find that the latent space of this method can be used to reliably estimate the number of novel classes. Additionally, we adapt two unsupervised clustering algorithms ( k -means and Spectral Clustering) to leverage the knowledge of the known classes. Extensive experiments are conducted on 7 tabular datasets and demonstrate the effectiveness of the proposed method and hyperparameter tuning process, and show that the NCD problem can be solved without relying on knowledge from the novel classes. The problem of novel class discovery (NCD) consists in extracting knowledge from a labeled set of known classes to accurately partition an unlabeled set of novel classes. While NCD has recently received a lot of attention from the community, it is often solved on computer vision problems and under unrealistic conditions. In particular, the number of novel classes is usually assumed to be known in advance, and their labels are sometimes used to tune hyperparameters. Methods that rely on these assumptions are not applicable in real-world scenarios. In this work, we focus on solving NCD in tabular data when no prior knowledge of the novel classes is available. To this end, we propose to tune the hyperparameters of NCD methods by adapting the k-fold cross-validation process and hiding some of the known classes in each fold. Since we have found that methods with too many hyperparameters are likely to overfit these hidden classes, we define a simple deep NCD model. This method is composed of only the essential elements necessary for the NCD problem and shows robust performance under realistic conditions. Furthermore, we find that the latent space of this method can be used to reliably estimate the number of novel classes. Additionally, we adapt two unsupervised clustering algorithms (k-means and Spectral Clustering) to leverage the knowledge of the known classes. Extensive experiments are conducted on 7 tabular datasets and demonstrate the effectiveness of the proposed method and hyperparameter tuning process, and show that the NCD problem can be solved without relying on knowledge from the novel classes. |
Author | Sandrine, Vaton Stéphane, Gosselin Vincent, Lemaire Alexandre, Reiffers-Masson Colin, Troisemaine |
Author_xml | – sequence: 1 givenname: Troisemaine surname: Colin fullname: Colin, Troisemaine email: colin.troisemaine@gmail.com organization: Department of Computer Science, IMT Atlantique, Orange Labs – sequence: 2 givenname: Reiffers-Masson surname: Alexandre fullname: Alexandre, Reiffers-Masson organization: Department of Computer Science, IMT Atlantique – sequence: 3 givenname: Gosselin surname: Stéphane fullname: Stéphane, Gosselin organization: Orange Labs – sequence: 4 givenname: Lemaire surname: Vincent fullname: Vincent, Lemaire organization: Orange Labs – sequence: 5 givenname: Vaton surname: Sandrine fullname: Sandrine, Vaton organization: Department of Computer Science, IMT Atlantique |
BackLink | https://hal.science/hal-04283853$$DView record in HAL |
BookMark | eNp9kEFLwzAYhoNMcJv-AU8BTx6qSdOk6cHDGOqEgRcFb-FrmrqO2tQkG_Tfm1lF8LBTvi88T3jzztCks51B6JKSG0pIfuspEVQmJM0SQknKk-EETSnPWZJz8TaJM5NZwiUlZ2jm_ZYQwlNGpuhugXsHOjQaWgx97yzoDQ4Wd3ZvWqxb8B5XjddxdQNuOhyg3LXgcAUBztFpDa03Fz_nHL0-3L8sV8n6-fFpuVgnmnEWEl1UpRQ807IAFidmtGG0ZDVQTjNaCR33OqsywyWBOje5LA0UQgotack1m6Pr8d0NtKp3zQe4QVlo1GqxVoc7kqWSSc72aWSvRjb-5XNnfFBbu3NdjKcYkYJmBRUyUnKktLPeO1Mr3QQIje2Cg6ZVlKhDsWosVsVi1Xexaohq-k_9TXRUYqPkI9y9G_eX6oj1BYrgjLM |
CitedBy_id | crossref_primary_10_1093_nargab_lqae166 |
Cites_doi | 10.1109/CVPR46437.2021.00934 10.1109/CVPR.2009.5206848 10.1007/s00357-003-0004-6 10.1109/CVPR52688.2022.01387 10.1109/CVPRW56347.2022.00441 10.1109/ICCV.2019.00849 10.5281/zenodo.7873825 10.1109/ICCV48922.2021.00951 10.1007/BF00114162 10.1016/j.ins.2022.07.101 10.1109/TPAMI.2021.3091944 10.1109/CVPR52688.2022.00734 10.1109/ICKG55886.2022.00041 10.1007/s11222-007-9033-z 10.1109/ICDCSW.2011.20 10.1109/CVPR52729.2023.00337 10.1109/CVPR46437.2021.01072 10.1609/aaai.v34i04.5763 10.1002/nav.3800020109 10.1016/j.patcog.2012.07.021 10.1109/TPAMI.2012.256 |
ContentType | Journal Article |
Copyright | The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Distributed under a Creative Commons Attribution 4.0 International License |
Copyright_xml | – notice: The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. – notice: Distributed under a Creative Commons Attribution 4.0 International License |
DBID | AAYXX CITATION 3V. 7SC 7WY 7WZ 7XB 87Z 8AL 8AO 8FD 8FE 8FG 8FK 8FL 8G5 ABUWG AFKRA ARAPS AZQEC BENPR BEZIV BGLVJ CCPQU DWQXO FRNLG F~G GNUQQ GUQSH HCIFZ JQ2 K60 K6~ K7- L.- L7M L~C L~D M0C M0N M2O MBDVC P5Z P62 PHGZM PHGZT PKEHL PQBIZ PQBZA PQEST PQGLB PQQKQ PQUKI Q9U 1XC VOOES |
DOI | 10.1007/s10618-024-01025-y |
DatabaseName | CrossRef ProQuest Central (Corporate) Computer and Information Systems Abstracts ABI/INFORM Collection ABI/INFORM Global (PDF only) ProQuest Central (purchase pre-March 2016) ABI/INFORM Collection Computing Database (Alumni Edition) ProQuest Pharma Collection Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) (purchase pre-March 2016) ABI/INFORM Collection (Alumni) Research Library (Alumni) ProQuest Central (Alumni) ProQuest Central UK/Ireland Advanced Technologies & Aerospace Collection ProQuest Central Essentials Proquest Central Business Premium Collection Technology Collection ProQuest One Community College ProQuest Central Business Premium Collection (Alumni) ABI/INFORM Global (Corporate) ProQuest Central Student Research Library Prep SciTech Premium Collection ProQuest Computer Science Collection ProQuest Business Collection (Alumni Edition) ProQuest Business Collection Computer Science Database ABI/INFORM Professional Advanced Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional ABI/INFORM Global Computing Database Research Library Research Library (Corporate) Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic ProQuest One Academic Middle East (New) ProQuest One Business ProQuest One Business (Alumni) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central Basic Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access) |
DatabaseTitle | CrossRef ABI/INFORM Global (Corporate) ProQuest Business Collection (Alumni Edition) ProQuest One Business Research Library Prep Computer Science Database ProQuest Central Student Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection Computer and Information Systems Abstracts ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College Research Library (Alumni Edition) ProQuest Pharma Collection ABI/INFORM Complete ProQuest Central ABI/INFORM Professional Advanced ProQuest One Applied & Life Sciences ProQuest Central Korea ProQuest Research Library ProQuest Central (New) Advanced Technologies Database with Aerospace ABI/INFORM Complete (Alumni Edition) Advanced Technologies & Aerospace Collection Business Premium Collection ABI/INFORM Global ProQuest Computing ABI/INFORM Global (Alumni Edition) ProQuest Central Basic ProQuest Computing (Alumni Edition) ProQuest One Academic Eastern Edition ProQuest Technology Collection ProQuest SciTech Collection ProQuest Business Collection Computer and Information Systems Abstracts Professional Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition ProQuest One Business (Alumni) ProQuest One Academic ProQuest One Academic (New) ProQuest Central (Alumni) Business Premium Collection (Alumni) |
DatabaseTitleList | ABI/INFORM Global (Corporate) |
Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Physics Computer Science |
EISSN | 1573-756X |
EndPage | 2116 |
ExternalDocumentID | oai_HAL_hal_04283853v2 10_1007_s10618_024_01025_y |
GrantInformation_xml | – fundername: Orange SA |
GroupedDBID | -59 -5G -BR -EM -Y2 -~C .4S .86 .DC .VR 06D 0R~ 0VY 199 1N0 1SB 203 29F 2J2 2JN 2JY 2KG 2KM 2LR 2P1 2VQ 2~H 30V 3V. 4.4 406 408 409 40D 40E 5GY 5VS 67Z 6NX 78A 7WY 8AO 8FE 8FG 8FL 8G5 8TC 8UJ 95- 95. 95~ 96X AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDZT ABECU ABFTD ABFTV ABHLI ABHQN ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABUWG ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACSNA ACZOJ ADHHG ADHIR ADINQ ADKNI ADKPE ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFGCZ AFKRA AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHKAY AHSBF AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARAPS ARCSS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN AZQEC B-. BA0 BDATZ BENPR BEZIV BGLVJ BGNMA BPHCQ BSONS CAG CCPQU COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 DWQXO EBLON EBS EDO EIOEI EJD ESBYG F5P FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRNLG FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNUQQ GNWQR GQ6 GQ7 GQ8 GROUPED_ABI_INFORM_COMPLETE GUQSH GXS H13 HCIFZ HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ I-F I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X J-C J0Z J9A JBSCW JCJTX JZLTJ K60 K6V K6~ K7- KDC KOV LAK LLZTM M0C M0N M2O M4Y MA- N2Q NB0 NPVJJ NQJWS NU0 O9- O93 O9J OAM OVD P2P P62 P9O PF0 PQBIZ PQBZA PQQKQ PROAC PT4 PT5 Q2X QOS R89 R9I RNI RNS ROL RPX RSV RZC RZE RZK S16 S1Z S27 S3B SAP SCO SDH SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 TEORI TSG TSK TSV TUC TUS U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW W23 W48 WK8 YLTOR Z45 Z7R Z7S Z7W Z7X Z7Y Z7Z Z81 Z83 Z88 ZMTXR AAPKM AAYXX ABBRH ABDBE ABFSG ACSTC ADHKG ADKFA AEZWR AFDZB AFHIU AFOHR AGQPQ AHPBZ AHWEU AIXLP AMVHM ATHPR AYFIA CITATION PHGZM PHGZT 7SC 7XB 8AL 8FD 8FK ABRTQ JQ2 L.- L7M L~C L~D MBDVC PKEHL PQEST PQGLB PQUKI Q9U 1XC VOOES |
ID | FETCH-LOGICAL-c353t-c9db8654c89a3b863ece31b3fa15141d6cce3f4d4e580af7e78bea9686c81b5c3 |
IEDL.DBID | BENPR |
ISSN | 1384-5810 |
IngestDate | Thu May 08 06:32:04 EDT 2025 Sat Aug 23 13:51:28 EDT 2025 Tue Jul 01 00:40:33 EDT 2025 Thu Apr 24 23:08:23 EDT 2025 Fri Feb 21 02:39:40 EST 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 4 |
Keywords | Open world learning Clustering Transfer learning Novel class discovery Tabular data transfer learning clustering novel class discovery novel class discovery clustering tabular data open world learning transfer learning tabular data open world learning |
Language | English |
License | Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c353t-c9db8654c89a3b863ece31b3fa15141d6cce3f4d4e580af7e78bea9686c81b5c3 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0001-8940-6004 0000-0002-4084-1977 0000-0003-2211-1767 |
OpenAccessLink | https://hal.science/hal-04283853 |
PQID | 3086149168 |
PQPubID | 43030 |
PageCount | 30 |
ParticipantIDs | hal_primary_oai_HAL_hal_04283853v2 proquest_journals_3086149168 crossref_citationtrail_10_1007_s10618_024_01025_y crossref_primary_10_1007_s10618_024_01025_y springer_journals_10_1007_s10618_024_01025_y |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2024-07-01 |
PublicationDateYYYYMMDD | 2024-07-01 |
PublicationDate_xml | – month: 07 year: 2024 text: 2024-07-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationSeriesTitle | ECML PKDD 2024 |
PublicationTitle | Data mining and knowledge discovery |
PublicationTitleAbbrev | Data Min Knowl Disc |
PublicationYear | 2024 |
Publisher | Springer US Springer Nature B.V Springer |
Publisher_xml | – name: Springer US – name: Springer Nature B.V – name: Springer |
References | Dua D, Graff C (2017) UCI machine learning repository Troisemaine C, Lemaire V, Gosselin S, Reiffers-Masson A, Flocon-Cholet J, Vaton S (2023) Novel class discovery: an introduction and key concepts Zhang L, Qi L, Yang X, Qiao H, Yang M-H, Liu Z (2022) Automatically discovering novel visual categories with self-supervised prototype learning StuetzleWEstimating the cluster tree of a density by analyzing the minimal spanning tree of a sampleJ Classif20032012547198312010.1007/s00357-003-0004-6 Hsu Y-C, Lv Z, Schlosser J, Odom P, Kira Z (2019) Multi-class classification without multi-class labels. In: ICLR Arthur D, Vassilvitskii S (2007) K-means++ the advantages of careful seeding. In: ACM-SIAM SODA, pp 1027–1035 ScheirerWJRezende RochaASapkotaABoultTEToward open set recognitionIEEE Trans Pattern Anal Mach Intell20133571757177210.1109/TPAMI.2012.256 Troisemaine C, Flocon-Cholet J, Gosselin S, Vaton S, Reiffers-Masson A, Lemaire V (2022) A method for discovering novel classes in tabular data. In: ICKG, pp 265–274 Han K, Rebuffi S-A, Ehrhardt S, Vedaldi A, Zisserman A (2021) Autonovel: automatically discovering and learning novel visual categories. In: PAMI LeLPattersonAWhiteMSupervised autoencoders: improving generalization performance with unsupervised regularizersAdv Neural Inf Process Syst20183172 Li Z, Otholt J, Dai B, Hu D, Meinel C, Yang H (2022) A closer look at novel class discovery from the labeled set. In: NeurIPS 2022 workshop on distribution shifts: connecting methods and applications NgAJordanMWeissYOn spectral clustering: analysis and an algorithmAdv Neural Inf Process Syst2001814 Zhong Z, Zhu L, Luo Z, Li S, Yang Y, Sebe N (2021) Openmix: reviving known knowledge for discovering novel visual categories in an open world. In: CVPR, pp 9462–9470 Anguita D, Ghio A, Oneto L, Parra X, Reyes-Ortiz JL (2013) A public domain dataset for human activity recognition using smartphones. In: ESANN Zhong Z, Fini E, Roy S, Luo Z, Ricci E, Sebe N (2021) Neighborhood contrastive learning for novel class discovery. In: CVPR Sun Y, Li Y (2023) Opencon: open-world contrastive learning. In: TMLR Chen Y, Zhu X, Li W, Gong S (2020) Semi-supervised learning under class distribution mismatch. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 3569–3576 Von Luxburg U, Williamson RC, Guyon I (2012) Clustering: science or art? In: ICML workshop on unsupervised and transfer learning, pp 65–79 Yang M, Zhu Y, Yu J, Wu A, Deng C (2022) Divide and conquer: compositional experts for generalized novel class discovery. In: CVPR, pp 14268–14277 Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: CVPR, pp 248–255 ArbelaitzOGurrutxagaIMuguerzaJPérezJMPeronaIAn extensive comparative study of cluster validity indicesPattern Recogn201346124325610.1016/j.patcog.2012.07.021 KhanAAMohantySKA fast spectral clustering technique using MST based proximity graph for diversified datasetsInf Sci202271113113110.1016/j.ins.2022.07.101 Vaze S, Han K, Vedaldi A, Zisserman A (2022) Generalized category discovery. In: CVPR, pp. 7492–7501 Hsu Y-C, Lv Z, Kira Z (2018) Learning to cluster in order to transfer across domains and tasks. In: ICLR Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, vol 96, pp 226–231 KuhnHWYawBThe Hungarian method for the assignment problemNaval Res Logist Quart1955683977551010.1002/nav.3800020109 SunYShiZLiangYLiYWhen and how does known class help discover unknown ones? Provable understanding through spectral analysisICML2023202301433043 Fei Y, Zhao Z, Yang S, Zhao B (2022) Xcon: learning with experts for fine-grained category discovery. In: British machine vision conference (BMVC) Yang M, Wang L, Deng C, Zhang H (2023) Bootstrap your own prior: towards distribution-agnostic novel class discovery. In: CVPR, pp 3459–3468 Satopaa V, Albrecht J, Irwin D, Raghavan B (2011) Finding a “kneedle” in a haystack: detecting knee points in system behavior. In: ICDCS workshops. IEEE, pp 166–171 Han K, Vedaldi A, Zisserman A (2019) Learning to discover novel visual categories via deep transfer clustering. In: ICCV Cao K, Brbic M, Leskovec J (2022) Open-world semi-supervised learning. In: ICLR Zheng J, Li W, Hong J, Petersson L, Barnes N (2022) Towards open-set object detection and discovery. In: CVPR, pp 3961–3970 LuxburgUA tutorial on spectral clusteringStat Comput200717395416240980310.1007/s11222-007-9033-z Gidaris S, Singh P, Komodakis N (2018) Unsupervised representation learning by predicting image rotations. In: ICLR Chi H, Liu F, Yang W, Lan L, Liu T, Han B, Niu G, Zhou M, Sugiyama M (2022) Meta discovery: learning to discover novel classes given very limited data. In: ICLR ZhaoBHanKNovel visual category discovery with dual ranking statistics and mutual knowledge distillationAdv Neural Inf Process Syst2021342298222994 FreyPWSlateDJLetter recognition using Holland-style adaptive classifiersMach Learn2005616118210.1007/BF00114162 Guo L-Z, Zhang Z-Y, Jiang Y, Li Y-F, Zhou Z-H (2020) Safe deep semi-supervised learning for unseen-class unlabeled data. In: ICML Caron M, Touvron H, Misra I, Jegou H., Mairal J, Bojanowski P, Joulin A (2021) Emerging properties in self-supervised vision transformers. In: ICCV, pp. 1–21 ArvaiKKneedZenodo202310.5281/zenodo.7873825 XieJGirshickRFarhadiAUnsupervised deep embedding for clustering analysisICML201648478487 1025_CR19 1025_CR15 1025_CR37 1025_CR16 1025_CR38 1025_CR17 HW Kuhn (1025_CR21) 1955; 6 WJ Scheirer (1025_CR27) 2013; 35 1025_CR18 A Ng (1025_CR25) 2001; 8 O Arbelaitz (1025_CR2) 2013; 46 1025_CR11 1025_CR33 1025_CR12 1025_CR34 1025_CR14 1025_CR36 1025_CR30 1025_CR31 1025_CR10 1025_CR32 L Le (1025_CR22) 2018; 31 B Zhao (1025_CR39) 2021; 34 1025_CR8 1025_CR9 1025_CR6 1025_CR26 1025_CR7 1025_CR5 K Arvai (1025_CR4) 2023 1025_CR3 1025_CR1 W Stuetzle (1025_CR28) 2003; 20 Y Sun (1025_CR29) 2023; 202 1025_CR23 U Luxburg (1025_CR24) 2007; 17 PW Frey (1025_CR13) 2005; 6 1025_CR40 1025_CR41 AA Khan (1025_CR20) 2022; 7 1025_CR42 J Xie (1025_CR35) 2016; 48 |
References_xml | – reference: ArbelaitzOGurrutxagaIMuguerzaJPérezJMPeronaIAn extensive comparative study of cluster validity indicesPattern Recogn201346124325610.1016/j.patcog.2012.07.021 – reference: ArvaiKKneedZenodo202310.5281/zenodo.7873825 – reference: FreyPWSlateDJLetter recognition using Holland-style adaptive classifiersMach Learn2005616118210.1007/BF00114162 – reference: Gidaris S, Singh P, Komodakis N (2018) Unsupervised representation learning by predicting image rotations. In: ICLR – reference: Arthur D, Vassilvitskii S (2007) K-means++ the advantages of careful seeding. In: ACM-SIAM SODA, pp 1027–1035 – reference: XieJGirshickRFarhadiAUnsupervised deep embedding for clustering analysisICML201648478487 – reference: ZhaoBHanKNovel visual category discovery with dual ranking statistics and mutual knowledge distillationAdv Neural Inf Process Syst2021342298222994 – reference: KuhnHWYawBThe Hungarian method for the assignment problemNaval Res Logist Quart1955683977551010.1002/nav.3800020109 – reference: Von Luxburg U, Williamson RC, Guyon I (2012) Clustering: science or art? In: ICML workshop on unsupervised and transfer learning, pp 65–79 – reference: Li Z, Otholt J, Dai B, Hu D, Meinel C, Yang H (2022) A closer look at novel class discovery from the labeled set. In: NeurIPS 2022 workshop on distribution shifts: connecting methods and applications – reference: ScheirerWJRezende RochaASapkotaABoultTEToward open set recognitionIEEE Trans Pattern Anal Mach Intell20133571757177210.1109/TPAMI.2012.256 – reference: NgAJordanMWeissYOn spectral clustering: analysis and an algorithmAdv Neural Inf Process Syst2001814 – reference: Zhang L, Qi L, Yang X, Qiao H, Yang M-H, Liu Z (2022) Automatically discovering novel visual categories with self-supervised prototype learning – reference: Zhong Z, Fini E, Roy S, Luo Z, Ricci E, Sebe N (2021) Neighborhood contrastive learning for novel class discovery. In: CVPR – reference: Guo L-Z, Zhang Z-Y, Jiang Y, Li Y-F, Zhou Z-H (2020) Safe deep semi-supervised learning for unseen-class unlabeled data. In: ICML – reference: Chen Y, Zhu X, Li W, Gong S (2020) Semi-supervised learning under class distribution mismatch. In: Proceedings of the AAAI conference on artificial intelligence, vol 34, pp 3569–3576 – reference: Han K, Rebuffi S-A, Ehrhardt S, Vedaldi A, Zisserman A (2021) Autonovel: automatically discovering and learning novel visual categories. In: PAMI – reference: Satopaa V, Albrecht J, Irwin D, Raghavan B (2011) Finding a “kneedle” in a haystack: detecting knee points in system behavior. In: ICDCS workshops. IEEE, pp 166–171 – reference: StuetzleWEstimating the cluster tree of a density by analyzing the minimal spanning tree of a sampleJ Classif20032012547198312010.1007/s00357-003-0004-6 – reference: Chi H, Liu F, Yang W, Lan L, Liu T, Han B, Niu G, Zhou M, Sugiyama M (2022) Meta discovery: learning to discover novel classes given very limited data. In: ICLR – reference: Cao K, Brbic M, Leskovec J (2022) Open-world semi-supervised learning. In: ICLR – reference: Yang M, Zhu Y, Yu J, Wu A, Deng C (2022) Divide and conquer: compositional experts for generalized novel class discovery. In: CVPR, pp 14268–14277 – reference: SunYShiZLiangYLiYWhen and how does known class help discover unknown ones? Provable understanding through spectral analysisICML2023202301433043 – reference: Troisemaine C, Flocon-Cholet J, Gosselin S, Vaton S, Reiffers-Masson A, Lemaire V (2022) A method for discovering novel classes in tabular data. In: ICKG, pp 265–274 – reference: Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: CVPR, pp 248–255 – reference: Hsu Y-C, Lv Z, Kira Z (2018) Learning to cluster in order to transfer across domains and tasks. In: ICLR – reference: LuxburgUA tutorial on spectral clusteringStat Comput200717395416240980310.1007/s11222-007-9033-z – reference: Ester M, Kriegel H-P, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, vol 96, pp 226–231 – reference: Troisemaine C, Lemaire V, Gosselin S, Reiffers-Masson A, Flocon-Cholet J, Vaton S (2023) Novel class discovery: an introduction and key concepts – reference: Zheng J, Li W, Hong J, Petersson L, Barnes N (2022) Towards open-set object detection and discovery. In: CVPR, pp 3961–3970 – reference: Yang M, Wang L, Deng C, Zhang H (2023) Bootstrap your own prior: towards distribution-agnostic novel class discovery. In: CVPR, pp 3459–3468 – reference: Fei Y, Zhao Z, Yang S, Zhao B (2022) Xcon: learning with experts for fine-grained category discovery. In: British machine vision conference (BMVC) – reference: Zhong Z, Zhu L, Luo Z, Li S, Yang Y, Sebe N (2021) Openmix: reviving known knowledge for discovering novel visual categories in an open world. In: CVPR, pp 9462–9470 – reference: Dua D, Graff C (2017) UCI machine learning repository – reference: Han K, Vedaldi A, Zisserman A (2019) Learning to discover novel visual categories via deep transfer clustering. In: ICCV – reference: Hsu Y-C, Lv Z, Schlosser J, Odom P, Kira Z (2019) Multi-class classification without multi-class labels. In: ICLR – reference: Vaze S, Han K, Vedaldi A, Zisserman A (2022) Generalized category discovery. In: CVPR, pp. 7492–7501 – reference: Anguita D, Ghio A, Oneto L, Parra X, Reyes-Ortiz JL (2013) A public domain dataset for human activity recognition using smartphones. In: ESANN – reference: Caron M, Touvron H, Misra I, Jegou H., Mairal J, Bojanowski P, Joulin A (2021) Emerging properties in self-supervised vision transformers. In: ICCV, pp. 1–21 – reference: LeLPattersonAWhiteMSupervised autoencoders: improving generalization performance with unsupervised regularizersAdv Neural Inf Process Syst20183172 – reference: Sun Y, Li Y (2023) Opencon: open-world contrastive learning. In: TMLR – reference: KhanAAMohantySKA fast spectral clustering technique using MST based proximity graph for diversified datasetsInf Sci202271113113110.1016/j.ins.2022.07.101 – ident: 1025_CR30 – ident: 1025_CR42 doi: 10.1109/CVPR46437.2021.00934 – ident: 1025_CR32 – volume: 34 start-page: 22982 year: 2021 ident: 1025_CR39 publication-title: Adv Neural Inf Process Syst – ident: 1025_CR9 doi: 10.1109/CVPR.2009.5206848 – volume: 20 start-page: 25 issue: 1 year: 2003 ident: 1025_CR28 publication-title: J Classif doi: 10.1007/s00357-003-0004-6 – ident: 1025_CR34 – ident: 1025_CR37 doi: 10.1109/CVPR52688.2022.01387 – ident: 1025_CR40 doi: 10.1109/CVPRW56347.2022.00441 – ident: 1025_CR17 doi: 10.1109/ICCV.2019.00849 – year: 2023 ident: 1025_CR4 publication-title: Zenodo doi: 10.5281/zenodo.7873825 – ident: 1025_CR6 doi: 10.1109/ICCV48922.2021.00951 – ident: 1025_CR19 – ident: 1025_CR8 – volume: 6 start-page: 161 year: 2005 ident: 1025_CR13 publication-title: Mach Learn doi: 10.1007/BF00114162 – ident: 1025_CR38 – ident: 1025_CR15 – volume: 7 start-page: 1113 year: 2022 ident: 1025_CR20 publication-title: Inf Sci doi: 10.1016/j.ins.2022.07.101 – volume: 8 start-page: 14 year: 2001 ident: 1025_CR25 publication-title: Adv Neural Inf Process Syst – ident: 1025_CR16 doi: 10.1109/TPAMI.2021.3091944 – ident: 1025_CR33 doi: 10.1109/CVPR52688.2022.00734 – ident: 1025_CR31 doi: 10.1109/ICKG55886.2022.00041 – ident: 1025_CR11 – volume: 202 start-page: 3014 year: 2023 ident: 1025_CR29 publication-title: ICML – volume: 17 start-page: 395 year: 2007 ident: 1025_CR24 publication-title: Stat Comput doi: 10.1007/s11222-007-9033-z – ident: 1025_CR26 doi: 10.1109/ICDCSW.2011.20 – ident: 1025_CR23 – ident: 1025_CR36 doi: 10.1109/CVPR52729.2023.00337 – ident: 1025_CR41 doi: 10.1109/CVPR46437.2021.01072 – ident: 1025_CR7 doi: 10.1609/aaai.v34i04.5763 – ident: 1025_CR14 – ident: 1025_CR5 – ident: 1025_CR3 – ident: 1025_CR18 – ident: 1025_CR1 – volume: 48 start-page: 478 year: 2016 ident: 1025_CR35 publication-title: ICML – volume: 6 start-page: 83 year: 1955 ident: 1025_CR21 publication-title: Naval Res Logist Quart doi: 10.1002/nav.3800020109 – ident: 1025_CR12 – ident: 1025_CR10 – volume: 46 start-page: 243 issue: 1 year: 2013 ident: 1025_CR2 publication-title: Pattern Recogn doi: 10.1016/j.patcog.2012.07.021 – volume: 31 start-page: 72 year: 2018 ident: 1025_CR22 publication-title: Adv Neural Inf Process Syst – volume: 35 start-page: 1757 issue: 7 year: 2013 ident: 1025_CR27 publication-title: IEEE Trans Pattern Anal Mach Intell doi: 10.1109/TPAMI.2012.256 |
SSID | ssj0005230 |
Score | 2.4070237 |
Snippet | The problem of novel class discovery (NCD) consists in extracting knowledge from a labeled set of known classes to accurately partition an unlabeled set of... The problem of Novel Class Discovery (NCD) consists in extracting knowledge from a labeled set of known classes to accurately partition an unlabeled set of... |
SourceID | hal proquest crossref springer |
SourceType | Open Access Repository Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 2087 |
SubjectTerms | Algorithms Artificial Intelligence Chemistry and Earth Sciences Clustering Computer Science Computer vision Data Mining and Knowledge Discovery Information Storage and Retrieval Knowledge Methods Physics Statistics for Engineering Tables (data) |
SummonAdditionalLinks | – databaseName: SpringerLink Journals (ICM) dbid: U2A link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV3dS8MwED_cRPDFj6k4nRLENw0sy8fSBx-GOIaoTw72VpI0RWF04upg_72Xrt2mqOBbm15auLv07sjvdwG45N5wrRJDWSc1VBhuqJVOUWNk2ycYoFxBCnt8UoOhuB_JUUkKm1Zo92pLsvhTr5HdFNMUYwoNfdAknddgU4baHb142OmtATv4ghusBZWatUuqzM_v-BKOai8BDLmWaX7bHC1iTn8PdspkkfQW1t2HDZ81YLc6iIGU67IBWwWO000P4KZHSt4Tzqv6hZN8QrLJzI-JC7kyCUzcgNyck9eM5MYGJCoJUNFDGPbvnm8HtDwhgToueU5dlFitpHA6MhyvuHeeM8tTg4FcsEQ5vE9FIrzUbZN2fVdbbyKllcN0VTp-BPVskvljIJZh4SBxGCO6cGlkOS5mbp320smEiSawSlGxK9uHh1MsxvGq8XFQbozKjQvlxvMmXC3nvC2aZ_wpfYH6XwqGvteD3kMcxkJhxzGxmHWa0KrME5erbRpzrMuw0mNKN-G6Mtnq8e-fPPmf-ClsdwrfCWjdFtTz9w9_hjlJbs8LF_wElMzV6g priority: 102 providerName: Springer Nature |
Title | A practical approach to novel class discovery in tabular data |
URI | https://link.springer.com/article/10.1007/s10618-024-01025-y https://www.proquest.com/docview/3086149168 https://hal.science/hal-04283853 |
Volume | 38 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3da9swED-a5mUv-2g3lrUNYuytFYuijygPZbgjH7RrKGWB7MlIsswGxUkbN5D_vidHbrrB8mbLsg13ku530u_uAL5wb7hWmaGsmxsqDDfUSqeoMbLjMzRQrgoKu56o8VRczuQsbrgtI62yXhOrhTqbu7BH_pUj9kY0z5T-trinoWpUOF2NJTQa0MQlWKPz1bwYTG5uX5A8-CZOWAsqNevEsJkYPKeYpmijaMirJun6L9PU-B2IkS9Q5z8HpZX9Gb6F1xE4kmSj6Xew54sDeFMXZSBxjh7CeUJi5BP2rjOGk3JOivnK3xEX0DIJsbiBu7kmfwpSGhu4qCSQRd_DdDj4-X1MY40E6rjkJXX9zGolhdN9w_GKe-c5szw3aMoFy5TD-1xkwkvdMXnP97T1pq-0cghYpeMfYL-YF_4jEMvQdZDYjDZduLxvOU5nbp320smMiRawWjypiwnEQx2Lu3Sb-jiINEWRppVI03ULTp_fWWzSZ-zs_Rml_twxZL4eJz_S0BZcO47QYtVtwXGtlDTOt2W6HR0tOKsVtX38_19-2v21I3jVrUZI4Ocew3758OhPEIWUtg0NPRy1oZmMfl0N2nHgYeu0mzwBjKXaXw |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1LT9wwEB4BPdBLC32ILbRYVXuiVtfxA-eAqhXtdikLJ5C4ubbjCCSUpSVQ7Z_qb-xMNmEBqdy4JY7jSOPxzOd4vhmADzJ5aU3huchKz5WXngcdDfde91OBDio2pLCDQzM6Vj9O9MkC_O24MBRW2dnExlAXk0j_yD9LxN6I5oWxXy5-caoaRaerXQmNmVrsp-kf3LJd7ux9xfn9mGXDb0e7I95WFeBRalnzmBfBGq2izb3EK5likiLI0qPzU6IwEe9LVaikbd-X22nbhuRzY01EiKejxHEX4YmSMqcVZYffb4WUyBkr2Squrei3JJ2WqmeE5egROWVx03x6xxEunlIY5i2Me-9YtvF2wxV41sJUNpjp1SospOoFPO9KQLDWIryEnQFreVbYu8tPzuoJqybX6ZxFwuaMmL8UKTplZxWrfaDIV0ahqa_g-FFk9xqWqkmV1oAFgRsVjc2IIFQs8yDReMgQbdJRF0L1QHTicbFNV05VM87dPNEyidShSF0jUjftwdbNOxezZB0P9n6PUr_pSHm2R4OxozbaSEoEMtdZDza6SXHt6r50c13swaduouaP___JNw-PtgnLo6ODsRvvHe6vw9Os0RaKDN6Apfr3VXqL-KcO7xqlY_DzsbX8HxxDE_E |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV3db9MwED9tnYR44RtRGGAheAJrdfxR92FCha3q2KgmxKS9GdtxBNKUDhaG-q_x13GXOutAYm97Sxwnkc6Xu9_Fv7sDeCmTl9aUnoui8lx56XnQ0XDv9SCV6KBimxT2cWamR-rDsT5eg99dLgzRKjub2Brqch7pH_mWROyNaF4Yu1VlWsThzuTt6XdOHaRop7Vrp7FUkf20-IXh29n23g6u9auimOx-fj_lucMAj1LLhsdRGazRKtqRl3gkU0xSBFl5dIRKlCbieaVKlbQd-GqYhjYkPzLWRIR7Okp87jpsDDEqGvRg493u7PDTJYKJXOYoW8W1FYOcspMT94ywHP0jp5pumi_-covrX4mUeQnx_rNJ2_q-yR24lUErGy-17C6spfoe3O4aQrBsH-7D9pjlrCuc3VUrZ82c1fPzdMIiIXVGecDEG12wbzVrfCAeLCOi6gM4uhbpPYRePa_TI2BBYNiicRjxhIrVKEg0JTJEm3TUpVB9EJ14XMzFy6mHxolblV0mkToUqWtF6hZ9eH1xz-mydMeVs1-g1C8mUtXt6fjA0RiFlRJhzXnRh81uUVz-1s_cSjP78KZbqNXl_7_y8dVPew43UMPdwd5s_wncLFplIZrwJvSaHz_TUwRDTXiWtY7Bl-tW9D-ztRmD |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+Practical+Approach+to+Novel+Class+Discovery+in+Tabular+Data&rft.jtitle=Data+mining+and+knowledge+discovery&rft.au=Troisemaine%2C+Colin&rft.au=Reiffers-Masson%2C+Alexandre&rft.au=Gosselin%2C+St%C3%A9phane&rft.au=Lemaire%2C+Vincent&rft.series=ECML+PKDD+2024&rft.date=2024-07-01&rft.pub=Springer&rft.issn=1384-5810&rft.eissn=1573-756X&rft_id=info:doi/10.1007%2Fs10618-024-01025-y&rft.externalDBID=HAS_PDF_LINK&rft.externalDocID=oai_HAL_hal_04283853v2 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1384-5810&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1384-5810&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1384-5810&client=summon |