Applying a dynamic threshold to improve cluster detection of LSI
Latent Semantic Indexing (LSI) is a standard approach for extracting and representing the meaning of words in a large set of documents. Recently it has been shown that it is also useful for identifying concerns in source code. The tree cutting strategy plays an important role in obtaining the cluste...
Saved in:
Published in | Science of computer programming Vol. 76; no. 12; pp. 1261 - 1274 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.12.2011
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Latent Semantic Indexing (LSI) is a standard approach for extracting and representing the meaning of words in a large set of documents. Recently it has been shown that it is also useful for identifying concerns in source code. The tree cutting strategy plays an important role in obtaining the clusters, which identify the concerns. In this contribution the authors compare two tree cutting strategies: the Dynamic Hybrid cut and the commonly used fixed height threshold. Two case studies have been performed on the source code of Philips Healthcare to compare the results using both approaches. While some of the settings are particular to the Philips-case, the results show that applying a dynamic threshold, implemented by the Dynamic Hybrid cut, is an improvement over the fixed height threshold in the detection of clusters representing relevant concerns. This makes the approach as a whole more usable in practice.
► We examine two dendrogram cutting algorithms for Latent Semantic Indexing. ► We discuss the limitations of the most used cutting algorithm, the fixed height cut. ► We present an alternative, the Dynamic Hybrid cut, which cuts at flexible heights. ► We present the results from two case studies performed at Philips Healthcare. ► From these case studies we conclude that the Dynamic Hybrid cut performs better. |
---|---|
AbstractList | Latent Semantic Indexing (LSI) is a standard approach for extracting and representing the meaning of words in a large set of documents. Recently it has been shown that it is also useful for identifying concerns in source code. The tree cutting strategy plays an important role in obtaining the clusters, which identify the concerns. In this contribution the authors compare two tree cutting strategies: the Dynamic Hybrid cut and the commonly used fixed height threshold. Two case studies have been performed on the source code of Philips Healthcare to compare the results using both approaches. While some of the settings are particular to the Philips-case, the results show that applying a dynamic threshold, implemented by the Dynamic Hybrid cut, is an improvement over the fixed height threshold in the detection of clusters representing relevant concerns. This makes the approach as a whole more usable in practice.
► We examine two dendrogram cutting algorithms for Latent Semantic Indexing. ► We discuss the limitations of the most used cutting algorithm, the fixed height cut. ► We present an alternative, the Dynamic Hybrid cut, which cuts at flexible heights. ► We present the results from two case studies performed at Philips Healthcare. ► From these case studies we conclude that the Dynamic Hybrid cut performs better. |
Author | Klusener, Steven van der Spek, Pieter |
Author_xml | – sequence: 1 givenname: Pieter surname: van der Spek fullname: van der Spek, Pieter email: pvdspek@cs.vu.nl – sequence: 2 givenname: Steven surname: Klusener fullname: Klusener, Steven email: steven@cs.vu.nl |
BookMark | eNp9kF1LwzAUhoMouE1_gTf5A605adK0F4Jj-DEYeKFeh-w0cRldU5I62L-3dV57deDwPi_nPHNy2YXOEnIHLAcG5f0-T-gx5JxNG54zJi7IDCrFM1WX4pLMxpTKSsGLazJPac8YK4WCGXlc9n178t0XNbQ5debgkQ67aNMutA0dAvWHPoajpdh-p8FG2tjB4uBDR4Ojm_f1Dblypk329m8uyOfz08fqNdu8vaxXy02GBSuGrN6KCiRDkCAd4yCVQkClGmUKcIbh1lQMhZG8hgqtNKXbSnR1CUYJLupiQYpzL8aQUrRO99EfTDxpYHqSoPf6V4KeJGjgepQwUg9nyo6nHb2NU8Z2aBsfxzd0E_y__A_Hn2es |
CitedBy_id | crossref_primary_10_1186_1471_2105_14_182 crossref_primary_10_1016_j_cola_2019_01_006 crossref_primary_10_1007_s10115_015_0830_y |
Cites_doi | 10.1109/CSMR.2006.56 10.1109/ICSM.2009.5306318 10.1109/TSE.2002.1041053 10.1109/TSE.2006.3 10.1109/TSE.2007.1016 10.3758/BF03203370 10.1109/ASE.2001.989796 10.1109/ICPC.2007.13 10.1109/CSMR.2008.4493321 10.1145/1276933.1276934 10.1109/MSR.2009.5069496 10.1109/ASE.2008.54 10.1109/TAI.2000.889845 10.1080/01638539809545028 10.1093/bioinformatics/btm563 10.1145/1449955.1449807 10.1109/ICSM.2005.31 10.1145/163430.163447 10.1145/1050849.1050865 10.1007/s00357-003-0004-6 10.1145/302405.302629 10.1109/WCRE.1997.624574 10.1145/1134285.1134428 10.1109/MSR.2009.5069482 10.1016/j.infsof.2006.10.010 10.1109/ICSM.2001.972795 10.1214/07-AOAS114 10.1109/MSR.2009.5069499 10.1016/j.infsof.2006.10.017 10.1109/ICSM.2005.89 10.1093/bioinformatics/btp327 10.1109/ASE.1999.802296 10.1109/ICSE.2001.919085 10.1109/PROC.1980.11805 10.1007/s00357-006-0002-6 10.1002/smr.401 10.1109/ICSM.2006.67 10.1145/133160.133205 10.1145/1342211.1342234 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9 10.1109/ICSM.2006.22 10.1109/TNN.2005.845141 |
ContentType | Journal Article |
Copyright | 2010 Elsevier B.V. |
Copyright_xml | – notice: 2010 Elsevier B.V. |
DBID | 6I. AAFTH AAYXX CITATION |
DOI | 10.1016/j.scico.2010.12.004 |
DatabaseName | ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 1872-7964 |
EndPage | 1274 |
ExternalDocumentID | 10_1016_j_scico_2010_12_004 S0167642310002297 |
GroupedDBID | --K --M .DC .~1 0R~ 123 1B1 1RT 1~. 1~5 4.4 457 4G. 5VS 6I. 7-5 71M 8P~ 9JN AACTN AAEDT AAEDW AAFTH AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AAXUO AAYFN ABBOA ABFNM ABJNI ABMAC ABTAH ABVKL ABXDB ABYKQ ACDAQ ACGFS ACNNM ACRLP ACZNC ADBBV ADEZE ADHUB ADMUD AEBSH AEKER AENEX AEXQZ AFFNX AFKWA AFTJW AGHFR AGUBO AGYEJ AHHHB AHZHX AIALX AIEXJ AIKHN AITUG AJBFU AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD ASPBG AVWKF AXJTR AZFZN BKOJK BLXMC CS3 DU5 E.L EBS EFJIC EFLBG EJD EO8 EO9 EP2 EP3 FDB FEDTE FGOYB FIRID FNPLU FYGXN G-2 G-Q GBLVA GBOLZ HVGLF HZ~ IHE IXB J1W KOM LG9 M26 M41 MO0 N9A NCXOZ O-L O9- OAUVE OK1 OZT P-8 P-9 P2P PC. Q38 R2- RIG ROL RPZ SDF SDG SDP SES SEW SPC SPCBC SSV SSZ T5K TN5 WUQ XPP ZMT ZY4 ~G- 0SF AAXKI AAYXX ADVLN AFJKZ AKRWK CITATION |
ID | FETCH-LOGICAL-c303t-9b48150c1515f021577c1c77d7a31fa0cba80c4a52918ce5a6fb5cf961a742493 |
IEDL.DBID | ABVKL |
ISSN | 0167-6423 |
IngestDate | Thu Sep 26 17:27:54 EDT 2024 Fri Feb 23 02:37:18 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 12 |
Keywords | Feature extraction Latent Semantic Indexing Software architecture Clustering Reverse engineering |
Language | English |
License | http://www.elsevier.com/open-access/userlicense/1.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c303t-9b48150c1515f021577c1c77d7a31fa0cba80c4a52918ce5a6fb5cf961a742493 |
OpenAccessLink | https://www.sciencedirect.com/science/article/pii/S0167642310002297 |
PageCount | 14 |
ParticipantIDs | crossref_primary_10_1016_j_scico_2010_12_004 elsevier_sciencedirect_doi_10_1016_j_scico_2010_12_004 |
PublicationCentury | 2000 |
PublicationDate | 2011-12-01 |
PublicationDateYYYYMMDD | 2011-12-01 |
PublicationDate_xml | – month: 12 year: 2011 text: 2011-12-01 day: 01 |
PublicationDecade | 2010 |
PublicationTitle | Science of computer programming |
PublicationYear | 2011 |
Publisher | Elsevier B.V |
Publisher_xml | – name: Elsevier B.V |
References | Y. Liu, D. Poshyvanyk, R. Ferenc, T. Gyimothy, N. Chrisochoides, Modeling class cohesion as mixtures of latent topics, in: ICSM’09, 2009, pp. 233–242. T.A. Wiggerts, Using clustering algorithms in legacy systems remodularization, in: WCRE’97, 1997, p. 33. A. Kuhn, Automatic labeling of software components and their evolution using log-likelihood ratio of word frequencies in source code, in: MSR’09, 2009, pp. 175–178. Dumais (br000220) 1991; 23 Langfelder, Zhang, Horvath (br000050) 2007; 24 Kettenring (br000180) 2006; 23 Lethbridge, Anquetil (br000135) 2002 Blei, Ng, Jordan (br000105) 2003; 3 Yin (br000240) 2009 D. Pelleg, A.W. Moore G. Hamerly, C. Elkan, Learning the Manning, Raghavan, Schütze (br000275) 2009 in Glorie, Zaidman, van Deursen, Hofland (br000165) 2009; 21 R. Adnan, B. Graaf, A. van Deursen, J. Zonneveld, Using cluster analysis to improve the design of component interfaces, in: ASE’08, 2008, pp. 383–386. Binkley, Lawrie (br000025) 2010 Aho (br000100) 1980 Hayes, Dekhtyar, Sundaram (br000070) 2006; 32 B.S. Mitchell, S. Mancoridis, Comparing the decompositions produced by software clustering algorithms using similarity measurements, in: ICSM’01, 2001, p. 744. S. Ducasse, T. Girba, A. Kuhn, Distribution map, in: ICSM’06, 2006, pp. 203–212. A. Wierda, E. Dortmans, L. Lou Somers, Using version information in architectural clustering—a case study, in: CSMR’06, 2006, pp. 214–228. G. Maskeri, S. Sarkar, K. Heafield, Mining business topics in source code using latent Dirichlet allocation, in: ISEC’08, 2008, pp. 113–120. X. Xie, D. Poshyvanyk, A. Marcus, 3D visualization for concept location in source code, in: ICSE’06, 2006, pp. 839–842. A. van Deursen, T. Kuipers, Identifying objects using cluster and concept analysis, in: ICSE’99, 1999, pp. 246–255. Walz, Elam, Curtis (br000255) 1993; 36 J.I. Maletic, A. Marcus, Supporting program comprehension using semantic and structural information, in: ICSE’01, 2001, pp. 103–112. Xu, Wunsch II (br000260) 2005; 16 A. Marcus, D. Poshyvanyk, The conceptual cohesion of classes, in: ICSM ’05, 2005, pp. 133–142. J.I. Maletic, N. Valluri, Automatic software clustering via latent semantic analysis, in: ASE’99, 1999, p. 251. Deerwester, Dumais, Furnas, Landauer, Harshman (br000010) 1990; 41 Xu, Qian, Zhang, Wu, Chen (br000085) 2005; 30 Blei, Lafferty (br000130) 2007; 1 Antoniol, Canfora, Casazza, de Lucia, Merlo (br000075) 2002; 28 means: extending R Development Core Team, R: A programming environment for data analysis and graphics, R Foundation for Statistical Computing, Vienna, Austria, 2008. J. Wu, A.E. Hassan, R.C. Holt, Comparison of clustering algorithms in the context of software evolution, in: ICSM’05, 2005, pp. 525–535. means with efficient estimation of the number of clusters, in: ICML’00, 2000, pp. 727–734. Kuhn, Ducasse, Gírba (br000030) 2007; 49 Poshyvanyk, Gueheneuc, Marcus, Antoniol, Rajlich (br000090) 2007; 33 Likert (br000245) 1932; 22 D. Poshyvanyk, A. Marcus, The conceptual coupling metrics for object-oriented systems, in: ICSM’06, 2006, pp. 469–478. de Lucia, Fasano, Oliveto, Tortora (br000080) 2007; 16 S.T. Dumais, J. Nielsen, Automating the assignment of submitted manuscripts to reviewers, in: SIGIR’92, 1992, pp. 233–244. Jain, Dubes (br000250) 1988 Dotan-Cohen, Kasif, Melkman (br000190) 2009; 25 D. Poshyvanyk, A. Marcus, Combining formal concept analysis with information retrieval for concept location in source code, in: ICPC’07, 2007, pp. 37–48. Blei, Lafferty (br000125) 2009 E. Enslen, E. Hill, L. Pollock, K. Vijay-Shanker, Mining source code to automatically split identifiers for software analysis, in: MSR’09, 2009, pp. 71–80. Baldi, Lopes, Linstead, Bajracharya (br000110) 2008; 43 means, in: NIPS’03, 2003, pp. 281–288. Andreopoulos, An, Tzerpos, Wang (br000145) 2007; 49 P. van der Spek, S. Klusener, P. van de Laar, Towards recovering architectural concepts using latent semantic indexing, in: CSMR’08, 2008, pp. 253–257. F. Wild, C. Stahl, G. Stermsek, G. Neumann, Parameters driving effectiveness of automated essay scoring with LSA, in: CAA’05, 2005, pp. 485–494. Stuetzle (br000185) 2003; 20 A. Marcus, J.I. Maletic, Identification of high-level concept clones in source code, in: ASE’01, 2001, p. 107. K. Tian, M. Revelle, D. Poshyvanyk, Using latent Dirichlet allocation for automatic categorization of software, in: MSR’09, 2009, pp. 163–166. Landauer, Foltz, Laham (br000015) 1998; 25 Lehman (br000005) 1980; 68 J.I. Maletic, A. Marcus, Using latent semantic analysis to identify similarities in source code to support program understanding, in: PICTAI ’00, 2000, pp. 46–53. Dumais (10.1016/j.scico.2010.12.004_br000220) 1991; 23 Langfelder (10.1016/j.scico.2010.12.004_br000050) 2007; 24 Lethbridge (10.1016/j.scico.2010.12.004_br000135) 2002 10.1016/j.scico.2010.12.004_br000210 10.1016/j.scico.2010.12.004_br000055 Landauer (10.1016/j.scico.2010.12.004_br000015) 1998; 25 10.1016/j.scico.2010.12.004_br000215 10.1016/j.scico.2010.12.004_br000060 Hayes (10.1016/j.scico.2010.12.004_br000070) 2006; 32 10.1016/j.scico.2010.12.004_br000140 10.1016/j.scico.2010.12.004_br000065 10.1016/j.scico.2010.12.004_br000020 Antoniol (10.1016/j.scico.2010.12.004_br000075) 2002; 28 Deerwester (10.1016/j.scico.2010.12.004_br000010) 1990; 41 Binkley (10.1016/j.scico.2010.12.004_br000025) 2010 Blei (10.1016/j.scico.2010.12.004_br000130) 2007; 1 10.1016/j.scico.2010.12.004_br000045 Yin (10.1016/j.scico.2010.12.004_br000240) 2009 10.1016/j.scico.2010.12.004_br000200 Jain (10.1016/j.scico.2010.12.004_br000250) 1988 10.1016/j.scico.2010.12.004_br000205 Andreopoulos (10.1016/j.scico.2010.12.004_br000145) 2007; 49 10.1016/j.scico.2010.12.004_br000170 10.1016/j.scico.2010.12.004_br000095 Blei (10.1016/j.scico.2010.12.004_br000125) 2009 10.1016/j.scico.2010.12.004_br000175 Kettenring (10.1016/j.scico.2010.12.004_br000180) 2006; 23 de Lucia (10.1016/j.scico.2010.12.004_br000080) 2007; 16 Manning (10.1016/j.scico.2010.12.004_br000275) 2009 Aho (10.1016/j.scico.2010.12.004_br000100) 1980 Blei (10.1016/j.scico.2010.12.004_br000105) 2003; 3 10.1016/j.scico.2010.12.004_br000155 Lehman (10.1016/j.scico.2010.12.004_br000005) 1980; 68 10.1016/j.scico.2010.12.004_br000035 Xu (10.1016/j.scico.2010.12.004_br000085) 2005; 30 10.1016/j.scico.2010.12.004_br000115 10.1016/j.scico.2010.12.004_br000235 10.1016/j.scico.2010.12.004_br000160 Stuetzle (10.1016/j.scico.2010.12.004_br000185) 2003; 20 10.1016/j.scico.2010.12.004_br000040 10.1016/j.scico.2010.12.004_br000120 Dotan-Cohen (10.1016/j.scico.2010.12.004_br000190) 2009; 25 Walz (10.1016/j.scico.2010.12.004_br000255) 1993; 36 Xu (10.1016/j.scico.2010.12.004_br000260) 2005; 16 10.1016/j.scico.2010.12.004_br000265 10.1016/j.scico.2010.12.004_br000225 Baldi (10.1016/j.scico.2010.12.004_br000110) 2008; 43 Poshyvanyk (10.1016/j.scico.2010.12.004_br000090) 2007; 33 10.1016/j.scico.2010.12.004_br000270 10.1016/j.scico.2010.12.004_br000195 10.1016/j.scico.2010.12.004_br000150 10.1016/j.scico.2010.12.004_br000230 Glorie (10.1016/j.scico.2010.12.004_br000165) 2009; 21 Kuhn (10.1016/j.scico.2010.12.004_br000030) 2007; 49 Likert (10.1016/j.scico.2010.12.004_br000245) 1932; 22 |
References_xml | – volume: 43 start-page: 543 year: 2008 end-page: 562 ident: br000110 article-title: A theory of aspects as latent topics publication-title: SIGPLAN Not. contributor: fullname: Bajracharya – year: 1988 ident: br000250 article-title: Algorithms for Clustering Data contributor: fullname: Dubes – year: 2009 ident: br000125 article-title: Topic models publication-title: Text Mining: Theory and Applications contributor: fullname: Lafferty – volume: 25 start-page: 259 year: 1998 end-page: 284 ident: br000015 article-title: Introduction to latent semantic analysis publication-title: Disc. Proc. contributor: fullname: Laham – volume: 30 start-page: 1 year: 2005 end-page: 36 ident: br000085 article-title: A brief survey of program slicing publication-title: SIGSOFT Softw. Eng. Notes contributor: fullname: Chen – volume: 24 start-page: 719 year: 2007 end-page: 720 ident: br000050 article-title: Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut library for R publication-title: Bioinformatics contributor: fullname: Horvath – volume: 21 start-page: 113 year: 2009 end-page: 141 ident: br000165 article-title: Splitting a large software repository for easing future software evolution—an industrial experience report publication-title: J. Softw. Maint. Evol. contributor: fullname: Hofland – volume: 49 start-page: 244 year: 2007 end-page: 254 ident: br000145 article-title: Clustering large software systems at multiple layers publication-title: Inf. Softw. Technol. contributor: fullname: Wang – start-page: 325 year: 1980 end-page: 347 ident: br000100 article-title: Pattern matching in strings publication-title: Formal Language Theory: Perspectives and Open Problems contributor: fullname: Aho – volume: 68 start-page: 1060 year: 1980 end-page: 1076 ident: br000005 article-title: Programs, life cycles, and laws of software evolution publication-title: Proc. IEEE contributor: fullname: Lehman – volume: 36 start-page: 63 year: 1993 end-page: 77 ident: br000255 article-title: Inside a software design team: knowledge acquisition, sharing, and integration publication-title: Commun. ACM contributor: fullname: Curtis – volume: 28 start-page: 970 year: 2002 end-page: 983 ident: br000075 article-title: Recovering traceability links between code and documentation publication-title: IEEE Trans. Softw. Eng. contributor: fullname: Merlo – volume: 23 start-page: 3 year: 2006 end-page: 30 ident: br000180 article-title: The practice of cluster analysis publication-title: J. Classification contributor: fullname: Kettenring – volume: 49 start-page: 230 year: 2007 end-page: 243 ident: br000030 article-title: Semantic clustering: identifying topics in source code publication-title: Inf. Softw. Technol. contributor: fullname: Gírba – volume: 25 start-page: 1789 year: 2009 end-page: 1795 ident: br000190 article-title: Seeing the forest for the trees: using the gene ontology to restructure hierarchical clustering publication-title: Bioinformatics contributor: fullname: Melkman – volume: 33 start-page: 420 year: 2007 end-page: 432 ident: br000090 article-title: Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval publication-title: IEEE Trans. Softw. Eng. contributor: fullname: Rajlich – volume: 20 start-page: 25 year: 2003 end-page: 47 ident: br000185 article-title: Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample publication-title: J. Classification contributor: fullname: Stuetzle – volume: 22 start-page: 1 year: 1932 end-page: 55 ident: br000245 article-title: A technique for the measurement of attitudes publication-title: Arch. Psych. contributor: fullname: Likert – volume: 16 start-page: 13 year: 2007 ident: br000080 article-title: Recovering traceability links in software artifact management systems using information retrieval methods publication-title: ACM Trans. Softw. Eng. Methodol. contributor: fullname: Tortora – volume: 32 start-page: 4 year: 2006 end-page: 19 ident: br000070 article-title: Advancing candidate link generation for requirements tracing: the study of methods publication-title: IEEE Trans. Softw. Eng. contributor: fullname: Sundaram – volume: 16 start-page: 645 year: 2005 end-page: 678 ident: br000260 article-title: Survey of clustering algorithms publication-title: IEEE Trans. Neural Netw. contributor: fullname: Wunsch II – volume: 3 start-page: 993 year: 2003 end-page: 1022 ident: br000105 article-title: Latent Dirichlet allocation publication-title: J. Mach. Learn. Res. contributor: fullname: Jordan – start-page: 137 year: 2002 end-page: 157 ident: br000135 article-title: Approaches to clustering for program comprehension and remodularization publication-title: Advances in Software Engineering: Topics in Evolution, Comprehension and Evaluation contributor: fullname: Anquetil – volume: 23 start-page: 229 year: 1991 end-page: 236 ident: br000220 article-title: Improving the retrieval of information from external sources publication-title: Behav. Res. Methods Instrum. Comput. contributor: fullname: Dumais – year: 2010 ident: br000025 article-title: Information retrieval applications in software maintenance and evolution publication-title: Encyclopedia of Software Engineering contributor: fullname: Lawrie – year: 2009 ident: br000275 article-title: An Introduction to Information Retrieval contributor: fullname: Schütze – volume: 41 start-page: 391 year: 1990 end-page: 407 ident: br000010 article-title: Indexing by latent semantic analysis publication-title: J. Amer. Soc. Inform. Sci. contributor: fullname: Harshman – volume: 1 start-page: 17 year: 2007 end-page: 35 ident: br000130 article-title: A correlated topic model of science publication-title: Ann. Appl. Statist. contributor: fullname: Lafferty – year: 2009 ident: br000240 article-title: Case Study Research: Design and Methods contributor: fullname: Yin – ident: 10.1016/j.scico.2010.12.004_br000095 doi: 10.1109/CSMR.2006.56 – ident: 10.1016/j.scico.2010.12.004_br000225 – ident: 10.1016/j.scico.2010.12.004_br000115 doi: 10.1109/ICSM.2009.5306318 – volume: 28 start-page: 970 issue: 10 year: 2002 ident: 10.1016/j.scico.2010.12.004_br000075 article-title: Recovering traceability links between code and documentation publication-title: IEEE Trans. Softw. Eng. doi: 10.1109/TSE.2002.1041053 contributor: fullname: Antoniol – volume: 32 start-page: 4 issue: 1 year: 2006 ident: 10.1016/j.scico.2010.12.004_br000070 article-title: Advancing candidate link generation for requirements tracing: the study of methods publication-title: IEEE Trans. Softw. Eng. doi: 10.1109/TSE.2006.3 contributor: fullname: Hayes – year: 2009 ident: 10.1016/j.scico.2010.12.004_br000125 article-title: Topic models contributor: fullname: Blei – volume: 33 start-page: 420 year: 2007 ident: 10.1016/j.scico.2010.12.004_br000090 article-title: Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval publication-title: IEEE Trans. Softw. Eng. doi: 10.1109/TSE.2007.1016 contributor: fullname: Poshyvanyk – start-page: 137 year: 2002 ident: 10.1016/j.scico.2010.12.004_br000135 article-title: Approaches to clustering for program comprehension and remodularization contributor: fullname: Lethbridge – year: 2010 ident: 10.1016/j.scico.2010.12.004_br000025 article-title: Information retrieval applications in software maintenance and evolution contributor: fullname: Binkley – volume: 23 start-page: 229 issue: 3 year: 1991 ident: 10.1016/j.scico.2010.12.004_br000220 article-title: Improving the retrieval of information from external sources publication-title: Behav. Res. Methods Instrum. Comput. doi: 10.3758/BF03203370 contributor: fullname: Dumais – ident: 10.1016/j.scico.2010.12.004_br000055 doi: 10.1109/ASE.2001.989796 – ident: 10.1016/j.scico.2010.12.004_br000160 doi: 10.1109/ICPC.2007.13 – ident: 10.1016/j.scico.2010.12.004_br000045 doi: 10.1109/CSMR.2008.4493321 – volume: 16 start-page: 13 issue: 4 year: 2007 ident: 10.1016/j.scico.2010.12.004_br000080 article-title: Recovering traceability links in software artifact management systems using information retrieval methods publication-title: ACM Trans. Softw. Eng. Methodol. doi: 10.1145/1276933.1276934 contributor: fullname: de Lucia – ident: 10.1016/j.scico.2010.12.004_br000195 doi: 10.1109/MSR.2009.5069496 – ident: 10.1016/j.scico.2010.12.004_br000150 doi: 10.1109/ASE.2008.54 – volume: 22 start-page: 1 issue: 140 year: 1932 ident: 10.1016/j.scico.2010.12.004_br000245 article-title: A technique for the measurement of attitudes publication-title: Arch. Psych. contributor: fullname: Likert – ident: 10.1016/j.scico.2010.12.004_br000020 doi: 10.1109/TAI.2000.889845 – volume: 25 start-page: 259 year: 1998 ident: 10.1016/j.scico.2010.12.004_br000015 article-title: Introduction to latent semantic analysis publication-title: Disc. Proc. doi: 10.1080/01638539809545028 contributor: fullname: Landauer – volume: 3 start-page: 993 year: 2003 ident: 10.1016/j.scico.2010.12.004_br000105 article-title: Latent Dirichlet allocation publication-title: J. Mach. Learn. Res. contributor: fullname: Blei – start-page: 325 year: 1980 ident: 10.1016/j.scico.2010.12.004_br000100 article-title: Pattern matching in strings contributor: fullname: Aho – volume: 24 start-page: 719 issue: 5 year: 2007 ident: 10.1016/j.scico.2010.12.004_br000050 article-title: Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut library for R publication-title: Bioinformatics doi: 10.1093/bioinformatics/btm563 contributor: fullname: Langfelder – year: 2009 ident: 10.1016/j.scico.2010.12.004_br000240 contributor: fullname: Yin – volume: 43 start-page: 543 issue: 10 year: 2008 ident: 10.1016/j.scico.2010.12.004_br000110 article-title: A theory of aspects as latent topics publication-title: SIGPLAN Not. doi: 10.1145/1449955.1449807 contributor: fullname: Baldi – ident: 10.1016/j.scico.2010.12.004_br000175 doi: 10.1109/ICSM.2005.31 – volume: 36 start-page: 63 issue: 10 year: 1993 ident: 10.1016/j.scico.2010.12.004_br000255 article-title: Inside a software design team: knowledge acquisition, sharing, and integration publication-title: Commun. ACM doi: 10.1145/163430.163447 contributor: fullname: Walz – volume: 30 start-page: 1 issue: 2 year: 2005 ident: 10.1016/j.scico.2010.12.004_br000085 article-title: A brief survey of program slicing publication-title: SIGSOFT Softw. Eng. Notes doi: 10.1145/1050849.1050865 contributor: fullname: Xu – volume: 20 start-page: 25 issue: 1 year: 2003 ident: 10.1016/j.scico.2010.12.004_br000185 article-title: Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample publication-title: J. Classification doi: 10.1007/s00357-003-0004-6 contributor: fullname: Stuetzle – ident: 10.1016/j.scico.2010.12.004_br000155 doi: 10.1145/302405.302629 – ident: 10.1016/j.scico.2010.12.004_br000140 doi: 10.1109/WCRE.1997.624574 – ident: 10.1016/j.scico.2010.12.004_br000235 doi: 10.1145/1134285.1134428 – ident: 10.1016/j.scico.2010.12.004_br000205 doi: 10.1109/MSR.2009.5069482 – volume: 49 start-page: 244 issue: 3 year: 2007 ident: 10.1016/j.scico.2010.12.004_br000145 article-title: Clustering large software systems at multiple layers publication-title: Inf. Softw. Technol. doi: 10.1016/j.infsof.2006.10.010 contributor: fullname: Andreopoulos – ident: 10.1016/j.scico.2010.12.004_br000170 doi: 10.1109/ICSM.2001.972795 – volume: 1 start-page: 17 issue: 1 year: 2007 ident: 10.1016/j.scico.2010.12.004_br000130 article-title: A correlated topic model of science publication-title: Ann. Appl. Statist. doi: 10.1214/07-AOAS114 contributor: fullname: Blei – ident: 10.1016/j.scico.2010.12.004_br000200 doi: 10.1109/MSR.2009.5069499 – volume: 49 start-page: 230 issue: 3 year: 2007 ident: 10.1016/j.scico.2010.12.004_br000030 article-title: Semantic clustering: identifying topics in source code publication-title: Inf. Softw. Technol. doi: 10.1016/j.infsof.2006.10.017 contributor: fullname: Kuhn – ident: 10.1016/j.scico.2010.12.004_br000060 doi: 10.1109/ICSM.2005.89 – volume: 25 start-page: 1789 issue: 14 year: 2009 ident: 10.1016/j.scico.2010.12.004_br000190 article-title: Seeing the forest for the trees: using the gene ontology to restructure hierarchical clustering publication-title: Bioinformatics doi: 10.1093/bioinformatics/btp327 contributor: fullname: Dotan-Cohen – ident: 10.1016/j.scico.2010.12.004_br000265 – ident: 10.1016/j.scico.2010.12.004_br000035 doi: 10.1109/ASE.1999.802296 – ident: 10.1016/j.scico.2010.12.004_br000040 doi: 10.1109/ICSE.2001.919085 – volume: 68 start-page: 1060 issue: 9 year: 1980 ident: 10.1016/j.scico.2010.12.004_br000005 article-title: Programs, life cycles, and laws of software evolution publication-title: Proc. IEEE doi: 10.1109/PROC.1980.11805 contributor: fullname: Lehman – volume: 23 start-page: 3 issue: 1 year: 2006 ident: 10.1016/j.scico.2010.12.004_br000180 article-title: The practice of cluster analysis publication-title: J. Classification doi: 10.1007/s00357-006-0002-6 contributor: fullname: Kettenring – year: 2009 ident: 10.1016/j.scico.2010.12.004_br000275 contributor: fullname: Manning – volume: 21 start-page: 113 issue: 2 year: 2009 ident: 10.1016/j.scico.2010.12.004_br000165 article-title: Splitting a large software repository for easing future software evolution—an industrial experience report publication-title: J. Softw. Maint. Evol. doi: 10.1002/smr.401 contributor: fullname: Glorie – year: 1988 ident: 10.1016/j.scico.2010.12.004_br000250 contributor: fullname: Jain – ident: 10.1016/j.scico.2010.12.004_br000065 doi: 10.1109/ICSM.2006.67 – ident: 10.1016/j.scico.2010.12.004_br000210 doi: 10.1145/133160.133205 – ident: 10.1016/j.scico.2010.12.004_br000120 doi: 10.1145/1342211.1342234 – volume: 41 start-page: 391 issue: 6 year: 1990 ident: 10.1016/j.scico.2010.12.004_br000010 article-title: Indexing by latent semantic analysis publication-title: J. Amer. Soc. Inform. Sci. doi: 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9 contributor: fullname: Deerwester – ident: 10.1016/j.scico.2010.12.004_br000230 doi: 10.1109/ICSM.2006.22 – volume: 16 start-page: 645 issue: 3 year: 2005 ident: 10.1016/j.scico.2010.12.004_br000260 article-title: Survey of clustering algorithms publication-title: IEEE Trans. Neural Netw. doi: 10.1109/TNN.2005.845141 contributor: fullname: Xu – ident: 10.1016/j.scico.2010.12.004_br000215 – ident: 10.1016/j.scico.2010.12.004_br000270 |
SSID | ssj0006471 |
Score | 2.0075827 |
Snippet | Latent Semantic Indexing (LSI) is a standard approach for extracting and representing the meaning of words in a large set of documents. Recently it has been... |
SourceID | crossref elsevier |
SourceType | Aggregation Database Publisher |
StartPage | 1261 |
SubjectTerms | Clustering Feature extraction Latent Semantic Indexing Reverse engineering Software architecture |
Title | Applying a dynamic threshold to improve cluster detection of LSI |
URI | https://dx.doi.org/10.1016/j.scico.2010.12.004 |
Volume | 76 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwED71sbDwRpRH5YGR0CRNnHijrUAtLV1KUTfL8UMqQkkF6cpvx3YcBEJiYIpkyVH02b777vL5DuAqxVJIHVZ4hBHuRVkceCyOsaeZMmYmQFC22cTjHI-X0cMqXjVgVN-FMbJKZ_srm26ttRvpOTR7m_W6tzACes2ebYY6DEnShHao2a8-ne3B8Hk6-zLIuIq7bIlvM6EuPmRlXvrVvKgkXiYt6Bq2_XJQ35zO_T7sOraIBtUHHUBD5oewV3diQO5gHsGtIZPmwhJiSFQ95lGpV-nd_FxCZYHWNncgEX_dmsoISMjSarByVCg0W0yOYXl_9zQae643gse10yk9kpkqKz43fEQZv50kPOBJIhLWDxTzecZSn0csDkmQchkzrLKYK4IDpoPhiPRPoJUXuTwFxAVWqfIzFvVVxAlONWOKBCMsE77gnHTgugaEbqoSGLTWhr1Qix81-NEgpBq_DuAaNPpjJak20n9NPPvvxHPYsZleKzK5gFb5tpWXmiqUWReaNx9B120I85xMx3M9OlkNPwFQMr9Q |
link.rule.ids | 315,783,787,3513,4509,24128,27581,27936,27937,45597,45675,45691,45886 |
linkProvider | Elsevier |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwED6VMsDCG1GeHhgJTVLHiTcQomqh7dJW6mb5KRWhpIJ05bdjO4kAITGwRnZkfXbuvrt8vgO4zohW2oYVAeVUBlgkUcCThASWKRPuAgTjm02MJ2Qwx0-LZNGCh-YujJNV1ra_suneWtdPujWa3dVy2Z06Ab1lzz5DHcc03YBN7PixPdS3H186D1JFXb7AtxvelB7yIi_7YllUAi-XFKzbtf1yT99cTn8PdmquiO6r5exDS-cHsNv0YUD1Z3kId45KuutKiCNVdZhHpd2jd_drCZUFWvrMgUbyde3qIiClS6_AylFh0Gg6PIJ5_3H2MAjqzgiBtC6nDKhwNVZC6diIcV47TWUk01SlvBcZHkrBs1BinsQ0yqROODEikYaSiNtQGNPeMbTzItcngKQiJjOh4LhnsKQks3wJK065UKGSknbgpgGEraoCGKxRhr0wjx9z-LEoZha_DpAGNPZjH5k10X9NPP3vxCvYGszGIzYaTp7PYNvnfL3c5Bza5dtaX1jSUIpLfyg-AYNzvYc |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Applying+a+dynamic+threshold+to+improve+cluster+detection+of+LSI&rft.jtitle=Science+of+computer+programming&rft.au=van+der+Spek%2C+Pieter&rft.au=Klusener%2C+Steven&rft.date=2011-12-01&rft.pub=Elsevier+B.V&rft.issn=0167-6423&rft.eissn=1872-7964&rft.volume=76&rft.issue=12&rft.spage=1261&rft.epage=1274&rft_id=info:doi/10.1016%2Fj.scico.2010.12.004&rft.externalDocID=S0167642310002297 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-6423&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-6423&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-6423&client=summon |