Applying a dynamic threshold to improve cluster detection of LSI

Latent Semantic Indexing (LSI) is a standard approach for extracting and representing the meaning of words in a large set of documents. Recently it has been shown that it is also useful for identifying concerns in source code. The tree cutting strategy plays an important role in obtaining the cluste...

Full description

Saved in:
Bibliographic Details
Published inScience of computer programming Vol. 76; no. 12; pp. 1261 - 1274
Main Authors van der Spek, Pieter, Klusener, Steven
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.12.2011
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Latent Semantic Indexing (LSI) is a standard approach for extracting and representing the meaning of words in a large set of documents. Recently it has been shown that it is also useful for identifying concerns in source code. The tree cutting strategy plays an important role in obtaining the clusters, which identify the concerns. In this contribution the authors compare two tree cutting strategies: the Dynamic Hybrid cut and the commonly used fixed height threshold. Two case studies have been performed on the source code of Philips Healthcare to compare the results using both approaches. While some of the settings are particular to the Philips-case, the results show that applying a dynamic threshold, implemented by the Dynamic Hybrid cut, is an improvement over the fixed height threshold in the detection of clusters representing relevant concerns. This makes the approach as a whole more usable in practice. ► We examine two dendrogram cutting algorithms for Latent Semantic Indexing. ► We discuss the limitations of the most used cutting algorithm, the fixed height cut. ► We present an alternative, the Dynamic Hybrid cut, which cuts at flexible heights. ► We present the results from two case studies performed at Philips Healthcare. ► From these case studies we conclude that the Dynamic Hybrid cut performs better.
AbstractList Latent Semantic Indexing (LSI) is a standard approach for extracting and representing the meaning of words in a large set of documents. Recently it has been shown that it is also useful for identifying concerns in source code. The tree cutting strategy plays an important role in obtaining the clusters, which identify the concerns. In this contribution the authors compare two tree cutting strategies: the Dynamic Hybrid cut and the commonly used fixed height threshold. Two case studies have been performed on the source code of Philips Healthcare to compare the results using both approaches. While some of the settings are particular to the Philips-case, the results show that applying a dynamic threshold, implemented by the Dynamic Hybrid cut, is an improvement over the fixed height threshold in the detection of clusters representing relevant concerns. This makes the approach as a whole more usable in practice. ► We examine two dendrogram cutting algorithms for Latent Semantic Indexing. ► We discuss the limitations of the most used cutting algorithm, the fixed height cut. ► We present an alternative, the Dynamic Hybrid cut, which cuts at flexible heights. ► We present the results from two case studies performed at Philips Healthcare. ► From these case studies we conclude that the Dynamic Hybrid cut performs better.
Author Klusener, Steven
van der Spek, Pieter
Author_xml – sequence: 1
  givenname: Pieter
  surname: van der Spek
  fullname: van der Spek, Pieter
  email: pvdspek@cs.vu.nl
– sequence: 2
  givenname: Steven
  surname: Klusener
  fullname: Klusener, Steven
  email: steven@cs.vu.nl
BookMark eNp9kF1LwzAUhoMouE1_gTf5A605adK0F4Jj-DEYeKFeh-w0cRldU5I62L-3dV57deDwPi_nPHNy2YXOEnIHLAcG5f0-T-gx5JxNG54zJi7IDCrFM1WX4pLMxpTKSsGLazJPac8YK4WCGXlc9n178t0XNbQ5debgkQ67aNMutA0dAvWHPoajpdh-p8FG2tjB4uBDR4Ojm_f1Dblypk329m8uyOfz08fqNdu8vaxXy02GBSuGrN6KCiRDkCAd4yCVQkClGmUKcIbh1lQMhZG8hgqtNKXbSnR1CUYJLupiQYpzL8aQUrRO99EfTDxpYHqSoPf6V4KeJGjgepQwUg9nyo6nHb2NU8Z2aBsfxzd0E_y__A_Hn2es
CitedBy_id crossref_primary_10_1186_1471_2105_14_182
crossref_primary_10_1016_j_cola_2019_01_006
crossref_primary_10_1007_s10115_015_0830_y
Cites_doi 10.1109/CSMR.2006.56
10.1109/ICSM.2009.5306318
10.1109/TSE.2002.1041053
10.1109/TSE.2006.3
10.1109/TSE.2007.1016
10.3758/BF03203370
10.1109/ASE.2001.989796
10.1109/ICPC.2007.13
10.1109/CSMR.2008.4493321
10.1145/1276933.1276934
10.1109/MSR.2009.5069496
10.1109/ASE.2008.54
10.1109/TAI.2000.889845
10.1080/01638539809545028
10.1093/bioinformatics/btm563
10.1145/1449955.1449807
10.1109/ICSM.2005.31
10.1145/163430.163447
10.1145/1050849.1050865
10.1007/s00357-003-0004-6
10.1145/302405.302629
10.1109/WCRE.1997.624574
10.1145/1134285.1134428
10.1109/MSR.2009.5069482
10.1016/j.infsof.2006.10.010
10.1109/ICSM.2001.972795
10.1214/07-AOAS114
10.1109/MSR.2009.5069499
10.1016/j.infsof.2006.10.017
10.1109/ICSM.2005.89
10.1093/bioinformatics/btp327
10.1109/ASE.1999.802296
10.1109/ICSE.2001.919085
10.1109/PROC.1980.11805
10.1007/s00357-006-0002-6
10.1002/smr.401
10.1109/ICSM.2006.67
10.1145/133160.133205
10.1145/1342211.1342234
10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
10.1109/ICSM.2006.22
10.1109/TNN.2005.845141
ContentType Journal Article
Copyright 2010 Elsevier B.V.
Copyright_xml – notice: 2010 Elsevier B.V.
DBID 6I.
AAFTH
AAYXX
CITATION
DOI 10.1016/j.scico.2010.12.004
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1872-7964
EndPage 1274
ExternalDocumentID 10_1016_j_scico_2010_12_004
S0167642310002297
GroupedDBID --K
--M
.DC
.~1
0R~
123
1B1
1RT
1~.
1~5
4.4
457
4G.
5VS
6I.
7-5
71M
8P~
9JN
AACTN
AAEDT
AAEDW
AAFTH
AAIAV
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AAXUO
AAYFN
ABBOA
ABFNM
ABJNI
ABMAC
ABTAH
ABVKL
ABXDB
ABYKQ
ACDAQ
ACGFS
ACNNM
ACRLP
ACZNC
ADBBV
ADEZE
ADHUB
ADMUD
AEBSH
AEKER
AENEX
AEXQZ
AFFNX
AFKWA
AFTJW
AGHFR
AGUBO
AGYEJ
AHHHB
AHZHX
AIALX
AIEXJ
AIKHN
AITUG
AJBFU
AJOXV
ALMA_UNASSIGNED_HOLDINGS
AMFUW
AMRAJ
AOUOD
ASPBG
AVWKF
AXJTR
AZFZN
BKOJK
BLXMC
CS3
DU5
E.L
EBS
EFJIC
EFLBG
EJD
EO8
EO9
EP2
EP3
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-2
G-Q
GBLVA
GBOLZ
HVGLF
HZ~
IHE
IXB
J1W
KOM
LG9
M26
M41
MO0
N9A
NCXOZ
O-L
O9-
OAUVE
OK1
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RIG
ROL
RPZ
SDF
SDG
SDP
SES
SEW
SPC
SPCBC
SSV
SSZ
T5K
TN5
WUQ
XPP
ZMT
ZY4
~G-
0SF
AAXKI
AAYXX
ADVLN
AFJKZ
AKRWK
CITATION
ID FETCH-LOGICAL-c303t-9b48150c1515f021577c1c77d7a31fa0cba80c4a52918ce5a6fb5cf961a742493
IEDL.DBID ABVKL
ISSN 0167-6423
IngestDate Thu Sep 26 17:27:54 EDT 2024
Fri Feb 23 02:37:18 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 12
Keywords Feature extraction
Latent Semantic Indexing
Software architecture
Clustering
Reverse engineering
Language English
License http://www.elsevier.com/open-access/userlicense/1.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c303t-9b48150c1515f021577c1c77d7a31fa0cba80c4a52918ce5a6fb5cf961a742493
OpenAccessLink https://www.sciencedirect.com/science/article/pii/S0167642310002297
PageCount 14
ParticipantIDs crossref_primary_10_1016_j_scico_2010_12_004
elsevier_sciencedirect_doi_10_1016_j_scico_2010_12_004
PublicationCentury 2000
PublicationDate 2011-12-01
PublicationDateYYYYMMDD 2011-12-01
PublicationDate_xml – month: 12
  year: 2011
  text: 2011-12-01
  day: 01
PublicationDecade 2010
PublicationTitle Science of computer programming
PublicationYear 2011
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References Y. Liu, D. Poshyvanyk, R. Ferenc, T. Gyimothy, N. Chrisochoides, Modeling class cohesion as mixtures of latent topics, in: ICSM’09, 2009, pp. 233–242.
T.A. Wiggerts, Using clustering algorithms in legacy systems remodularization, in: WCRE’97, 1997, p. 33.
A. Kuhn, Automatic labeling of software components and their evolution using log-likelihood ratio of word frequencies in source code, in: MSR’09, 2009, pp. 175–178.
Dumais (br000220) 1991; 23
Langfelder, Zhang, Horvath (br000050) 2007; 24
Kettenring (br000180) 2006; 23
Lethbridge, Anquetil (br000135) 2002
Blei, Ng, Jordan (br000105) 2003; 3
Yin (br000240) 2009
D. Pelleg, A.W. Moore
G. Hamerly, C. Elkan, Learning the
Manning, Raghavan, Schütze (br000275) 2009
in
Glorie, Zaidman, van Deursen, Hofland (br000165) 2009; 21
R. Adnan, B. Graaf, A. van Deursen, J. Zonneveld, Using cluster analysis to improve the design of component interfaces, in: ASE’08, 2008, pp. 383–386.
Binkley, Lawrie (br000025) 2010
Aho (br000100) 1980
Hayes, Dekhtyar, Sundaram (br000070) 2006; 32
B.S. Mitchell, S. Mancoridis, Comparing the decompositions produced by software clustering algorithms using similarity measurements, in: ICSM’01, 2001, p. 744.
S. Ducasse, T. Girba, A. Kuhn, Distribution map, in: ICSM’06, 2006, pp. 203–212.
A. Wierda, E. Dortmans, L. Lou Somers, Using version information in architectural clustering—a case study, in: CSMR’06, 2006, pp. 214–228.
G. Maskeri, S. Sarkar, K. Heafield, Mining business topics in source code using latent Dirichlet allocation, in: ISEC’08, 2008, pp. 113–120.
X. Xie, D. Poshyvanyk, A. Marcus, 3D visualization for concept location in source code, in: ICSE’06, 2006, pp. 839–842.
A. van Deursen, T. Kuipers, Identifying objects using cluster and concept analysis, in: ICSE’99, 1999, pp. 246–255.
Walz, Elam, Curtis (br000255) 1993; 36
J.I. Maletic, A. Marcus, Supporting program comprehension using semantic and structural information, in: ICSE’01, 2001, pp. 103–112.
Xu, Wunsch II (br000260) 2005; 16
A. Marcus, D. Poshyvanyk, The conceptual cohesion of classes, in: ICSM ’05, 2005, pp. 133–142.
J.I. Maletic, N. Valluri, Automatic software clustering via latent semantic analysis, in: ASE’99, 1999, p. 251.
Deerwester, Dumais, Furnas, Landauer, Harshman (br000010) 1990; 41
Xu, Qian, Zhang, Wu, Chen (br000085) 2005; 30
Blei, Lafferty (br000130) 2007; 1
Antoniol, Canfora, Casazza, de Lucia, Merlo (br000075) 2002; 28
means: extending
R Development Core Team, R: A programming environment for data analysis and graphics, R Foundation for Statistical Computing, Vienna, Austria, 2008.
J. Wu, A.E. Hassan, R.C. Holt, Comparison of clustering algorithms in the context of software evolution, in: ICSM’05, 2005, pp. 525–535.
means with efficient estimation of the number of clusters, in: ICML’00, 2000, pp. 727–734.
Kuhn, Ducasse, Gírba (br000030) 2007; 49
Poshyvanyk, Gueheneuc, Marcus, Antoniol, Rajlich (br000090) 2007; 33
Likert (br000245) 1932; 22
D. Poshyvanyk, A. Marcus, The conceptual coupling metrics for object-oriented systems, in: ICSM’06, 2006, pp. 469–478.
de Lucia, Fasano, Oliveto, Tortora (br000080) 2007; 16
S.T. Dumais, J. Nielsen, Automating the assignment of submitted manuscripts to reviewers, in: SIGIR’92, 1992, pp. 233–244.
Jain, Dubes (br000250) 1988
Dotan-Cohen, Kasif, Melkman (br000190) 2009; 25
D. Poshyvanyk, A. Marcus, Combining formal concept analysis with information retrieval for concept location in source code, in: ICPC’07, 2007, pp. 37–48.
Blei, Lafferty (br000125) 2009
E. Enslen, E. Hill, L. Pollock, K. Vijay-Shanker, Mining source code to automatically split identifiers for software analysis, in: MSR’09, 2009, pp. 71–80.
Baldi, Lopes, Linstead, Bajracharya (br000110) 2008; 43
means, in: NIPS’03, 2003, pp. 281–288.
Andreopoulos, An, Tzerpos, Wang (br000145) 2007; 49
P. van der Spek, S. Klusener, P. van de Laar, Towards recovering architectural concepts using latent semantic indexing, in: CSMR’08, 2008, pp. 253–257.
F. Wild, C. Stahl, G. Stermsek, G. Neumann, Parameters driving effectiveness of automated essay scoring with LSA, in: CAA’05, 2005, pp. 485–494.
Stuetzle (br000185) 2003; 20
A. Marcus, J.I. Maletic, Identification of high-level concept clones in source code, in: ASE’01, 2001, p. 107.
K. Tian, M. Revelle, D. Poshyvanyk, Using latent Dirichlet allocation for automatic categorization of software, in: MSR’09, 2009, pp. 163–166.
Landauer, Foltz, Laham (br000015) 1998; 25
Lehman (br000005) 1980; 68
J.I. Maletic, A. Marcus, Using latent semantic analysis to identify similarities in source code to support program understanding, in: PICTAI ’00, 2000, pp. 46–53.
Dumais (10.1016/j.scico.2010.12.004_br000220) 1991; 23
Langfelder (10.1016/j.scico.2010.12.004_br000050) 2007; 24
Lethbridge (10.1016/j.scico.2010.12.004_br000135) 2002
10.1016/j.scico.2010.12.004_br000210
10.1016/j.scico.2010.12.004_br000055
Landauer (10.1016/j.scico.2010.12.004_br000015) 1998; 25
10.1016/j.scico.2010.12.004_br000215
10.1016/j.scico.2010.12.004_br000060
Hayes (10.1016/j.scico.2010.12.004_br000070) 2006; 32
10.1016/j.scico.2010.12.004_br000140
10.1016/j.scico.2010.12.004_br000065
10.1016/j.scico.2010.12.004_br000020
Antoniol (10.1016/j.scico.2010.12.004_br000075) 2002; 28
Deerwester (10.1016/j.scico.2010.12.004_br000010) 1990; 41
Binkley (10.1016/j.scico.2010.12.004_br000025) 2010
Blei (10.1016/j.scico.2010.12.004_br000130) 2007; 1
10.1016/j.scico.2010.12.004_br000045
Yin (10.1016/j.scico.2010.12.004_br000240) 2009
10.1016/j.scico.2010.12.004_br000200
Jain (10.1016/j.scico.2010.12.004_br000250) 1988
10.1016/j.scico.2010.12.004_br000205
Andreopoulos (10.1016/j.scico.2010.12.004_br000145) 2007; 49
10.1016/j.scico.2010.12.004_br000170
10.1016/j.scico.2010.12.004_br000095
Blei (10.1016/j.scico.2010.12.004_br000125) 2009
10.1016/j.scico.2010.12.004_br000175
Kettenring (10.1016/j.scico.2010.12.004_br000180) 2006; 23
de Lucia (10.1016/j.scico.2010.12.004_br000080) 2007; 16
Manning (10.1016/j.scico.2010.12.004_br000275) 2009
Aho (10.1016/j.scico.2010.12.004_br000100) 1980
Blei (10.1016/j.scico.2010.12.004_br000105) 2003; 3
10.1016/j.scico.2010.12.004_br000155
Lehman (10.1016/j.scico.2010.12.004_br000005) 1980; 68
10.1016/j.scico.2010.12.004_br000035
Xu (10.1016/j.scico.2010.12.004_br000085) 2005; 30
10.1016/j.scico.2010.12.004_br000115
10.1016/j.scico.2010.12.004_br000235
10.1016/j.scico.2010.12.004_br000160
Stuetzle (10.1016/j.scico.2010.12.004_br000185) 2003; 20
10.1016/j.scico.2010.12.004_br000040
10.1016/j.scico.2010.12.004_br000120
Dotan-Cohen (10.1016/j.scico.2010.12.004_br000190) 2009; 25
Walz (10.1016/j.scico.2010.12.004_br000255) 1993; 36
Xu (10.1016/j.scico.2010.12.004_br000260) 2005; 16
10.1016/j.scico.2010.12.004_br000265
10.1016/j.scico.2010.12.004_br000225
Baldi (10.1016/j.scico.2010.12.004_br000110) 2008; 43
Poshyvanyk (10.1016/j.scico.2010.12.004_br000090) 2007; 33
10.1016/j.scico.2010.12.004_br000270
10.1016/j.scico.2010.12.004_br000195
10.1016/j.scico.2010.12.004_br000150
10.1016/j.scico.2010.12.004_br000230
Glorie (10.1016/j.scico.2010.12.004_br000165) 2009; 21
Kuhn (10.1016/j.scico.2010.12.004_br000030) 2007; 49
Likert (10.1016/j.scico.2010.12.004_br000245) 1932; 22
References_xml – volume: 43
  start-page: 543
  year: 2008
  end-page: 562
  ident: br000110
  article-title: A theory of aspects as latent topics
  publication-title: SIGPLAN Not.
  contributor:
    fullname: Bajracharya
– year: 1988
  ident: br000250
  article-title: Algorithms for Clustering Data
  contributor:
    fullname: Dubes
– year: 2009
  ident: br000125
  article-title: Topic models
  publication-title: Text Mining: Theory and Applications
  contributor:
    fullname: Lafferty
– volume: 25
  start-page: 259
  year: 1998
  end-page: 284
  ident: br000015
  article-title: Introduction to latent semantic analysis
  publication-title: Disc. Proc.
  contributor:
    fullname: Laham
– volume: 30
  start-page: 1
  year: 2005
  end-page: 36
  ident: br000085
  article-title: A brief survey of program slicing
  publication-title: SIGSOFT Softw. Eng. Notes
  contributor:
    fullname: Chen
– volume: 24
  start-page: 719
  year: 2007
  end-page: 720
  ident: br000050
  article-title: Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut library for R
  publication-title: Bioinformatics
  contributor:
    fullname: Horvath
– volume: 21
  start-page: 113
  year: 2009
  end-page: 141
  ident: br000165
  article-title: Splitting a large software repository for easing future software evolution—an industrial experience report
  publication-title: J. Softw. Maint. Evol.
  contributor:
    fullname: Hofland
– volume: 49
  start-page: 244
  year: 2007
  end-page: 254
  ident: br000145
  article-title: Clustering large software systems at multiple layers
  publication-title: Inf. Softw. Technol.
  contributor:
    fullname: Wang
– start-page: 325
  year: 1980
  end-page: 347
  ident: br000100
  article-title: Pattern matching in strings
  publication-title: Formal Language Theory: Perspectives and Open Problems
  contributor:
    fullname: Aho
– volume: 68
  start-page: 1060
  year: 1980
  end-page: 1076
  ident: br000005
  article-title: Programs, life cycles, and laws of software evolution
  publication-title: Proc. IEEE
  contributor:
    fullname: Lehman
– volume: 36
  start-page: 63
  year: 1993
  end-page: 77
  ident: br000255
  article-title: Inside a software design team: knowledge acquisition, sharing, and integration
  publication-title: Commun. ACM
  contributor:
    fullname: Curtis
– volume: 28
  start-page: 970
  year: 2002
  end-page: 983
  ident: br000075
  article-title: Recovering traceability links between code and documentation
  publication-title: IEEE Trans. Softw. Eng.
  contributor:
    fullname: Merlo
– volume: 23
  start-page: 3
  year: 2006
  end-page: 30
  ident: br000180
  article-title: The practice of cluster analysis
  publication-title: J. Classification
  contributor:
    fullname: Kettenring
– volume: 49
  start-page: 230
  year: 2007
  end-page: 243
  ident: br000030
  article-title: Semantic clustering: identifying topics in source code
  publication-title: Inf. Softw. Technol.
  contributor:
    fullname: Gírba
– volume: 25
  start-page: 1789
  year: 2009
  end-page: 1795
  ident: br000190
  article-title: Seeing the forest for the trees: using the gene ontology to restructure hierarchical clustering
  publication-title: Bioinformatics
  contributor:
    fullname: Melkman
– volume: 33
  start-page: 420
  year: 2007
  end-page: 432
  ident: br000090
  article-title: Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval
  publication-title: IEEE Trans. Softw. Eng.
  contributor:
    fullname: Rajlich
– volume: 20
  start-page: 25
  year: 2003
  end-page: 47
  ident: br000185
  article-title: Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample
  publication-title: J. Classification
  contributor:
    fullname: Stuetzle
– volume: 22
  start-page: 1
  year: 1932
  end-page: 55
  ident: br000245
  article-title: A technique for the measurement of attitudes
  publication-title: Arch. Psych.
  contributor:
    fullname: Likert
– volume: 16
  start-page: 13
  year: 2007
  ident: br000080
  article-title: Recovering traceability links in software artifact management systems using information retrieval methods
  publication-title: ACM Trans. Softw. Eng. Methodol.
  contributor:
    fullname: Tortora
– volume: 32
  start-page: 4
  year: 2006
  end-page: 19
  ident: br000070
  article-title: Advancing candidate link generation for requirements tracing: the study of methods
  publication-title: IEEE Trans. Softw. Eng.
  contributor:
    fullname: Sundaram
– volume: 16
  start-page: 645
  year: 2005
  end-page: 678
  ident: br000260
  article-title: Survey of clustering algorithms
  publication-title: IEEE Trans. Neural Netw.
  contributor:
    fullname: Wunsch II
– volume: 3
  start-page: 993
  year: 2003
  end-page: 1022
  ident: br000105
  article-title: Latent Dirichlet allocation
  publication-title: J. Mach. Learn. Res.
  contributor:
    fullname: Jordan
– start-page: 137
  year: 2002
  end-page: 157
  ident: br000135
  article-title: Approaches to clustering for program comprehension and remodularization
  publication-title: Advances in Software Engineering: Topics in Evolution, Comprehension and Evaluation
  contributor:
    fullname: Anquetil
– volume: 23
  start-page: 229
  year: 1991
  end-page: 236
  ident: br000220
  article-title: Improving the retrieval of information from external sources
  publication-title: Behav. Res. Methods Instrum. Comput.
  contributor:
    fullname: Dumais
– year: 2010
  ident: br000025
  article-title: Information retrieval applications in software maintenance and evolution
  publication-title: Encyclopedia of Software Engineering
  contributor:
    fullname: Lawrie
– year: 2009
  ident: br000275
  article-title: An Introduction to Information Retrieval
  contributor:
    fullname: Schütze
– volume: 41
  start-page: 391
  year: 1990
  end-page: 407
  ident: br000010
  article-title: Indexing by latent semantic analysis
  publication-title: J. Amer. Soc. Inform. Sci.
  contributor:
    fullname: Harshman
– volume: 1
  start-page: 17
  year: 2007
  end-page: 35
  ident: br000130
  article-title: A correlated topic model of science
  publication-title: Ann. Appl. Statist.
  contributor:
    fullname: Lafferty
– year: 2009
  ident: br000240
  article-title: Case Study Research: Design and Methods
  contributor:
    fullname: Yin
– ident: 10.1016/j.scico.2010.12.004_br000095
  doi: 10.1109/CSMR.2006.56
– ident: 10.1016/j.scico.2010.12.004_br000225
– ident: 10.1016/j.scico.2010.12.004_br000115
  doi: 10.1109/ICSM.2009.5306318
– volume: 28
  start-page: 970
  issue: 10
  year: 2002
  ident: 10.1016/j.scico.2010.12.004_br000075
  article-title: Recovering traceability links between code and documentation
  publication-title: IEEE Trans. Softw. Eng.
  doi: 10.1109/TSE.2002.1041053
  contributor:
    fullname: Antoniol
– volume: 32
  start-page: 4
  issue: 1
  year: 2006
  ident: 10.1016/j.scico.2010.12.004_br000070
  article-title: Advancing candidate link generation for requirements tracing: the study of methods
  publication-title: IEEE Trans. Softw. Eng.
  doi: 10.1109/TSE.2006.3
  contributor:
    fullname: Hayes
– year: 2009
  ident: 10.1016/j.scico.2010.12.004_br000125
  article-title: Topic models
  contributor:
    fullname: Blei
– volume: 33
  start-page: 420
  year: 2007
  ident: 10.1016/j.scico.2010.12.004_br000090
  article-title: Feature location using probabilistic ranking of methods based on execution scenarios and information retrieval
  publication-title: IEEE Trans. Softw. Eng.
  doi: 10.1109/TSE.2007.1016
  contributor:
    fullname: Poshyvanyk
– start-page: 137
  year: 2002
  ident: 10.1016/j.scico.2010.12.004_br000135
  article-title: Approaches to clustering for program comprehension and remodularization
  contributor:
    fullname: Lethbridge
– year: 2010
  ident: 10.1016/j.scico.2010.12.004_br000025
  article-title: Information retrieval applications in software maintenance and evolution
  contributor:
    fullname: Binkley
– volume: 23
  start-page: 229
  issue: 3
  year: 1991
  ident: 10.1016/j.scico.2010.12.004_br000220
  article-title: Improving the retrieval of information from external sources
  publication-title: Behav. Res. Methods Instrum. Comput.
  doi: 10.3758/BF03203370
  contributor:
    fullname: Dumais
– ident: 10.1016/j.scico.2010.12.004_br000055
  doi: 10.1109/ASE.2001.989796
– ident: 10.1016/j.scico.2010.12.004_br000160
  doi: 10.1109/ICPC.2007.13
– ident: 10.1016/j.scico.2010.12.004_br000045
  doi: 10.1109/CSMR.2008.4493321
– volume: 16
  start-page: 13
  issue: 4
  year: 2007
  ident: 10.1016/j.scico.2010.12.004_br000080
  article-title: Recovering traceability links in software artifact management systems using information retrieval methods
  publication-title: ACM Trans. Softw. Eng. Methodol.
  doi: 10.1145/1276933.1276934
  contributor:
    fullname: de Lucia
– ident: 10.1016/j.scico.2010.12.004_br000195
  doi: 10.1109/MSR.2009.5069496
– ident: 10.1016/j.scico.2010.12.004_br000150
  doi: 10.1109/ASE.2008.54
– volume: 22
  start-page: 1
  issue: 140
  year: 1932
  ident: 10.1016/j.scico.2010.12.004_br000245
  article-title: A technique for the measurement of attitudes
  publication-title: Arch. Psych.
  contributor:
    fullname: Likert
– ident: 10.1016/j.scico.2010.12.004_br000020
  doi: 10.1109/TAI.2000.889845
– volume: 25
  start-page: 259
  year: 1998
  ident: 10.1016/j.scico.2010.12.004_br000015
  article-title: Introduction to latent semantic analysis
  publication-title: Disc. Proc.
  doi: 10.1080/01638539809545028
  contributor:
    fullname: Landauer
– volume: 3
  start-page: 993
  year: 2003
  ident: 10.1016/j.scico.2010.12.004_br000105
  article-title: Latent Dirichlet allocation
  publication-title: J. Mach. Learn. Res.
  contributor:
    fullname: Blei
– start-page: 325
  year: 1980
  ident: 10.1016/j.scico.2010.12.004_br000100
  article-title: Pattern matching in strings
  contributor:
    fullname: Aho
– volume: 24
  start-page: 719
  issue: 5
  year: 2007
  ident: 10.1016/j.scico.2010.12.004_br000050
  article-title: Defining clusters from a hierarchical cluster tree: the Dynamic Tree Cut library for R
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btm563
  contributor:
    fullname: Langfelder
– year: 2009
  ident: 10.1016/j.scico.2010.12.004_br000240
  contributor:
    fullname: Yin
– volume: 43
  start-page: 543
  issue: 10
  year: 2008
  ident: 10.1016/j.scico.2010.12.004_br000110
  article-title: A theory of aspects as latent topics
  publication-title: SIGPLAN Not.
  doi: 10.1145/1449955.1449807
  contributor:
    fullname: Baldi
– ident: 10.1016/j.scico.2010.12.004_br000175
  doi: 10.1109/ICSM.2005.31
– volume: 36
  start-page: 63
  issue: 10
  year: 1993
  ident: 10.1016/j.scico.2010.12.004_br000255
  article-title: Inside a software design team: knowledge acquisition, sharing, and integration
  publication-title: Commun. ACM
  doi: 10.1145/163430.163447
  contributor:
    fullname: Walz
– volume: 30
  start-page: 1
  issue: 2
  year: 2005
  ident: 10.1016/j.scico.2010.12.004_br000085
  article-title: A brief survey of program slicing
  publication-title: SIGSOFT Softw. Eng. Notes
  doi: 10.1145/1050849.1050865
  contributor:
    fullname: Xu
– volume: 20
  start-page: 25
  issue: 1
  year: 2003
  ident: 10.1016/j.scico.2010.12.004_br000185
  article-title: Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample
  publication-title: J. Classification
  doi: 10.1007/s00357-003-0004-6
  contributor:
    fullname: Stuetzle
– ident: 10.1016/j.scico.2010.12.004_br000155
  doi: 10.1145/302405.302629
– ident: 10.1016/j.scico.2010.12.004_br000140
  doi: 10.1109/WCRE.1997.624574
– ident: 10.1016/j.scico.2010.12.004_br000235
  doi: 10.1145/1134285.1134428
– ident: 10.1016/j.scico.2010.12.004_br000205
  doi: 10.1109/MSR.2009.5069482
– volume: 49
  start-page: 244
  issue: 3
  year: 2007
  ident: 10.1016/j.scico.2010.12.004_br000145
  article-title: Clustering large software systems at multiple layers
  publication-title: Inf. Softw. Technol.
  doi: 10.1016/j.infsof.2006.10.010
  contributor:
    fullname: Andreopoulos
– ident: 10.1016/j.scico.2010.12.004_br000170
  doi: 10.1109/ICSM.2001.972795
– volume: 1
  start-page: 17
  issue: 1
  year: 2007
  ident: 10.1016/j.scico.2010.12.004_br000130
  article-title: A correlated topic model of science
  publication-title: Ann. Appl. Statist.
  doi: 10.1214/07-AOAS114
  contributor:
    fullname: Blei
– ident: 10.1016/j.scico.2010.12.004_br000200
  doi: 10.1109/MSR.2009.5069499
– volume: 49
  start-page: 230
  issue: 3
  year: 2007
  ident: 10.1016/j.scico.2010.12.004_br000030
  article-title: Semantic clustering: identifying topics in source code
  publication-title: Inf. Softw. Technol.
  doi: 10.1016/j.infsof.2006.10.017
  contributor:
    fullname: Kuhn
– ident: 10.1016/j.scico.2010.12.004_br000060
  doi: 10.1109/ICSM.2005.89
– volume: 25
  start-page: 1789
  issue: 14
  year: 2009
  ident: 10.1016/j.scico.2010.12.004_br000190
  article-title: Seeing the forest for the trees: using the gene ontology to restructure hierarchical clustering
  publication-title: Bioinformatics
  doi: 10.1093/bioinformatics/btp327
  contributor:
    fullname: Dotan-Cohen
– ident: 10.1016/j.scico.2010.12.004_br000265
– ident: 10.1016/j.scico.2010.12.004_br000035
  doi: 10.1109/ASE.1999.802296
– ident: 10.1016/j.scico.2010.12.004_br000040
  doi: 10.1109/ICSE.2001.919085
– volume: 68
  start-page: 1060
  issue: 9
  year: 1980
  ident: 10.1016/j.scico.2010.12.004_br000005
  article-title: Programs, life cycles, and laws of software evolution
  publication-title: Proc. IEEE
  doi: 10.1109/PROC.1980.11805
  contributor:
    fullname: Lehman
– volume: 23
  start-page: 3
  issue: 1
  year: 2006
  ident: 10.1016/j.scico.2010.12.004_br000180
  article-title: The practice of cluster analysis
  publication-title: J. Classification
  doi: 10.1007/s00357-006-0002-6
  contributor:
    fullname: Kettenring
– year: 2009
  ident: 10.1016/j.scico.2010.12.004_br000275
  contributor:
    fullname: Manning
– volume: 21
  start-page: 113
  issue: 2
  year: 2009
  ident: 10.1016/j.scico.2010.12.004_br000165
  article-title: Splitting a large software repository for easing future software evolution—an industrial experience report
  publication-title: J. Softw. Maint. Evol.
  doi: 10.1002/smr.401
  contributor:
    fullname: Glorie
– year: 1988
  ident: 10.1016/j.scico.2010.12.004_br000250
  contributor:
    fullname: Jain
– ident: 10.1016/j.scico.2010.12.004_br000065
  doi: 10.1109/ICSM.2006.67
– ident: 10.1016/j.scico.2010.12.004_br000210
  doi: 10.1145/133160.133205
– ident: 10.1016/j.scico.2010.12.004_br000120
  doi: 10.1145/1342211.1342234
– volume: 41
  start-page: 391
  issue: 6
  year: 1990
  ident: 10.1016/j.scico.2010.12.004_br000010
  article-title: Indexing by latent semantic analysis
  publication-title: J. Amer. Soc. Inform. Sci.
  doi: 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
  contributor:
    fullname: Deerwester
– ident: 10.1016/j.scico.2010.12.004_br000230
  doi: 10.1109/ICSM.2006.22
– volume: 16
  start-page: 645
  issue: 3
  year: 2005
  ident: 10.1016/j.scico.2010.12.004_br000260
  article-title: Survey of clustering algorithms
  publication-title: IEEE Trans. Neural Netw.
  doi: 10.1109/TNN.2005.845141
  contributor:
    fullname: Xu
– ident: 10.1016/j.scico.2010.12.004_br000215
– ident: 10.1016/j.scico.2010.12.004_br000270
SSID ssj0006471
Score 2.0075827
Snippet Latent Semantic Indexing (LSI) is a standard approach for extracting and representing the meaning of words in a large set of documents. Recently it has been...
SourceID crossref
elsevier
SourceType Aggregation Database
Publisher
StartPage 1261
SubjectTerms Clustering
Feature extraction
Latent Semantic Indexing
Reverse engineering
Software architecture
Title Applying a dynamic threshold to improve cluster detection of LSI
URI https://dx.doi.org/10.1016/j.scico.2010.12.004
Volume 76
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwED71sbDwRpRH5YGR0CRNnHijrUAtLV1KUTfL8UMqQkkF6cpvx3YcBEJiYIpkyVH02b777vL5DuAqxVJIHVZ4hBHuRVkceCyOsaeZMmYmQFC22cTjHI-X0cMqXjVgVN-FMbJKZ_srm26ttRvpOTR7m_W6tzACes2ebYY6DEnShHao2a8-ne3B8Hk6-zLIuIq7bIlvM6EuPmRlXvrVvKgkXiYt6Bq2_XJQ35zO_T7sOraIBtUHHUBD5oewV3diQO5gHsGtIZPmwhJiSFQ95lGpV-nd_FxCZYHWNncgEX_dmsoISMjSarByVCg0W0yOYXl_9zQae643gse10yk9kpkqKz43fEQZv50kPOBJIhLWDxTzecZSn0csDkmQchkzrLKYK4IDpoPhiPRPoJUXuTwFxAVWqfIzFvVVxAlONWOKBCMsE77gnHTgugaEbqoSGLTWhr1Qix81-NEgpBq_DuAaNPpjJak20n9NPPvvxHPYsZleKzK5gFb5tpWXmiqUWReaNx9B120I85xMx3M9OlkNPwFQMr9Q
link.rule.ids 315,783,787,3513,4509,24128,27581,27936,27937,45597,45675,45691,45886
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwED6VMsDCG1GeHhgJTVLHiTcQomqh7dJW6mb5KRWhpIJ05bdjO4kAITGwRnZkfXbuvrt8vgO4zohW2oYVAeVUBlgkUcCThASWKRPuAgTjm02MJ2Qwx0-LZNGCh-YujJNV1ra_suneWtdPujWa3dVy2Z06Ab1lzz5DHcc03YBN7PixPdS3H186D1JFXb7AtxvelB7yIi_7YllUAi-XFKzbtf1yT99cTn8PdmquiO6r5exDS-cHsNv0YUD1Z3kId45KuutKiCNVdZhHpd2jd_drCZUFWvrMgUbyde3qIiClS6_AylFh0Gg6PIJ5_3H2MAjqzgiBtC6nDKhwNVZC6diIcV47TWUk01SlvBcZHkrBs1BinsQ0yqROODEikYaSiNtQGNPeMbTzItcngKQiJjOh4LhnsKQks3wJK065UKGSknbgpgGEraoCGKxRhr0wjx9z-LEoZha_DpAGNPZjH5k10X9NPP3vxCvYGszGIzYaTp7PYNvnfL3c5Bza5dtaX1jSUIpLfyg-AYNzvYc
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Applying+a+dynamic+threshold+to+improve+cluster+detection+of+LSI&rft.jtitle=Science+of+computer+programming&rft.au=van+der+Spek%2C+Pieter&rft.au=Klusener%2C+Steven&rft.date=2011-12-01&rft.pub=Elsevier+B.V&rft.issn=0167-6423&rft.eissn=1872-7964&rft.volume=76&rft.issue=12&rft.spage=1261&rft.epage=1274&rft_id=info:doi/10.1016%2Fj.scico.2010.12.004&rft.externalDocID=S0167642310002297
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0167-6423&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0167-6423&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0167-6423&client=summon