Using the uniqueness of global identifiers to determine the provenance of Python software source code

We consider the problem of identifying the provenance of free/open source software (FOSS) and specifically the need of identifying where reused source code has been copied from. We propose a lightweight approach to solve the problem based on software identifiers—such as the names of variables, class...

Full description

Saved in:
Bibliographic Details
Published inEmpirical software engineering : an international journal Vol. 28; no. 5; p. 107
Main Authors Sun, Yiming, German, Daniel, Zacchiroli, Stefano
Format Journal Article
LanguageEnglish
Published New York Springer US 01.10.2023
Springer Nature B.V
Springer Verlag
Subjects
Online AccessGet full text

Cover

Loading…
Abstract We consider the problem of identifying the provenance of free/open source software (FOSS) and specifically the need of identifying where reused source code has been copied from. We propose a lightweight approach to solve the problem based on software identifiers—such as the names of variables, classes, and functions chosen by programmers. The proposed approach is able to efficiently narrow down to a small set of candidate origin products, to be further analyzed with more expensive techniques to make a final provenance determination. By analyzing the PyPI (Python Packaging Index) open source ecosystem we find that globally defined identifiers are very distinct. Across PyPI’s 244  K packages we found 11.2  M different global identifiers (classes and method/function names—with only 0.6% of identifiers shared among the two types of entities); 76% of identifiers were used only in one package, and 93% in at most 3. Randomly selecting 3 non-frequent global identifiers from an input product is enough to narrow down its origins to a maximum of 3 products within 89% of the cases. We validate the proposed approach by mapping Debian source packages implemented in Python to the corresponding PyPI packages; this approach uses at most five trials, where each trial uses three randomly chosen global identifiers from a randomly chosen python file of the subject software package, then ranks results using a popularity index and requires to inspect only the top result. In our experiments, this method is effective at finding the true origin of a project with a recall of 0.9 and precision of 0.77.
AbstractList We consider the problem of identifying the provenance of free/open source software (FOSS) and specifically the need of identifying where reused source code has been copied from. We propose a lightweight approach to solve the problem based on software identifiers—such as the names of variables, classes, and functions chosen by programmers. The proposed approach is able to efficiently narrow down to a small set of candidate origin products, to be further analyzed with more expensive techniques to make a final provenance determination. By analyzing the PyPI (Python Packaging Index) open source ecosystem we find that globally defined identifiers are very distinct. Across PyPI’s 244 K packages we found 11.2 M different global identifiers (classes and method/function names—with only 0.6% of identifiers shared among the two types of entities); 76% of identifiers were used only in one package, and 93% in at most 3. Randomly selecting 3 non-frequent global identifiers from an input product is enough to narrow down its origins to a maximum of 3 products within 89% of the cases. We validate the proposed approach by mapping Debian source packages implemented in Python to the corresponding PyPI packages; this approach uses at most five trials, where each trial uses three randomly chosen global identifiers from a randomly chosen python file of the subject software package, then ranks results using a popularity index and requires to inspect only the top result. In our experiments, this method is effective at finding the true origin of a project with a recall of 0.9 and precision of 0.77.
We consider the problem of identifying the provenance of free/open source software (FOSS) and specifically the need of identifying where reused source code has been copied from. We propose a lightweight approach to solve the problem based on software identifiers—such as the names of variables, classes, and functions chosen by programmers. The proposed approach is able to efficiently narrow down to a small set of candidate origin products, to be further analyzed with more expensive techniques to make a final provenance determination. By analyzing the PyPI (Python Packaging Index) open source ecosystem we find that globally defined identifiers are very distinct. Across PyPI’s 244  K packages we found 11.2  M different global identifiers (classes and method/function names—with only 0.6% of identifiers shared among the two types of entities); 76% of identifiers were used only in one package, and 93% in at most 3. Randomly selecting 3 non-frequent global identifiers from an input product is enough to narrow down its origins to a maximum of 3 products within 89% of the cases. We validate the proposed approach by mapping Debian source packages implemented in Python to the corresponding PyPI packages; this approach uses at most five trials, where each trial uses three randomly chosen global identifiers from a randomly chosen python file of the subject software package, then ranks results using a popularity index and requires to inspect only the top result. In our experiments, this method is effective at finding the true origin of a project with a recall of 0.9 and precision of 0.77.
ArticleNumber 107
Author Zacchiroli, Stefano
Sun, Yiming
German, Daniel
Author_xml – sequence: 1
  givenname: Yiming
  surname: Sun
  fullname: Sun, Yiming
  organization: University of Victoria
– sequence: 2
  givenname: Daniel
  orcidid: 0000-0001-5661-4392
  surname: German
  fullname: German, Daniel
  email: dmg@uvic.ca
  organization: University of Victoria
– sequence: 3
  givenname: Stefano
  surname: Zacchiroli
  fullname: Zacchiroli, Stefano
  organization: LTCI, Télécom Paris, Institut Polytechnique de Paris
BackLink https://hal.science/hal-04101937$$DView record in HAL
BookMark eNp9kclOwzAQhi0EEusLcIrEiUPAEydejlXFJlWCA5wtN5m0rlIbbBfE2-MSBBKHnjy2_2-2_5jsO--QkHOgV0CpuI5AOa9LWrESKANRyj1yBI1gpeDA93PMZFWyquGH5DjGFaVUibo5IvgSrVsUaYnFxtm3DTqMsfB9sRj83AyF7dAl21sMsUi-6DBhWFuH38Rr8O_ojGtxSzx9pqV3RfR9-jABc7AJ-af1HZ6Sg94MEc9-zhPycnvzPL0vZ493D9PJrGyZhFQqM-8ktKIRSlQUuWpqKRRDEBy7bo75DjUCo6plfU97xYzqZMW47JQyUrITcjnmXZpBvwa7NuFTe2P1_WSmt2-0BgqKiXfI2otRm6fIc8ekV7lhl9vTlRRVQ3ne2W5VnVevBNRZVY2qNvgYA_a_xYHqrUF6NEhng_S3QXqbWv6DWptMst6lYOywG2UjGnMdt8Dw19UO6gvPgqWc
CitedBy_id crossref_primary_10_1145_3660822
Cites_doi 10.1109/MC.2020.2983530
10.1007/s11219-006-9219-1
10.3233/IDA-150744
10.1109/MC.2020.3024403
10.1145/2000791.2000792
10.5120/ijca2016908896
10.5033/ifosslr.v4i1.45
10.1007/s10664-012-9201-4
10.1109/TSE.2011.84
10.1016/j.scico.2013.11.021
10.1109/MC.2020.3011082
10.1016/j.future.2010.07.005
10.1109/TSE.2014.2312942
10.1007/s11334-007-0031-2
10.1109/TSE.2002.1019480
10.1109/TSE.2006.28
10.1109/TSE.2005.28
10.1007/s10664-016-9461-5
10.1007/s10664-012-9199-7
10.1007/s10664-020-09828-5
10.1109/WCRE.1995.514690
10.1007/978-981-10-4600-1_3
10.1145/1985441.1985468
10.1109/SANER.2017.7884623
10.1109/MSR.2010.5463282
10.1145/3377811.3380926
10.1145/1882291.1882315
10.1007/978-3-319-09156-3_35
10.1109/APSEC48747.2019.00010
10.1109/MSR.2019.00030
10.1109/WCRE.1996.558901
10.1109/MSR.2017.15
10.1109/MSR.2019.00078
10.1109/ICSM.2000.883022
10.1109/ICPC.2011.26
10.1145/2884781.2884877
10.1145/3387940.3392209
10.1145/1370175.1370215
10.1007/978-3-642-17819-1_7
10.1109/ICSM.2011.6080795
10.1002/smr.2265
10.1007/978-3-030-62008-0_30
10.1007/978-981-10-3433-6_79
10.1145/2351676.2351725
10.5281/zenedo.7637703
10.1145/2452376.2452478
ContentType Journal Article
Copyright The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
Distributed under a Creative Commons Attribution 4.0 International License
Copyright_xml – notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
– notice: Distributed under a Creative Commons Attribution 4.0 International License
DBID AAYXX
CITATION
7SC
8FD
8FE
8FG
ABJCF
AFKRA
ARAPS
BENPR
BGLVJ
CCPQU
DWQXO
HCIFZ
JQ2
L6V
L7M
L~C
L~D
M7S
P5Z
P62
PHGZM
PHGZT
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PTHSS
S0W
1XC
VOOES
DOI 10.1007/s10664-023-10317-8
DatabaseName CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest SciTech Collection
ProQuest Technology Collection
Materials Science & Engineering Collection
ProQuest Central UK/Ireland
Advanced Technologies & Aerospace Collection
ProQuest Central
Technology Collection
ProQuest One
ProQuest Central Korea
SciTech Premium Collection
ProQuest Computer Science Collection
ProQuest Engineering Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
Engineering Database
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
Engineering collection
DELNET Engineering & Technology Collection
Hyper Article en Ligne (HAL)
Hyper Article en Ligne (HAL) (Open Access)
DatabaseTitle CrossRef
Technology Collection
Technology Research Database
Computer and Information Systems Abstracts – Academic
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
SciTech Premium Collection
ProQuest One Community College
ProQuest Central
ProQuest One Applied & Life Sciences
ProQuest Engineering Collection
ProQuest Central Korea
ProQuest Central (New)
Advanced Technologies Database with Aerospace
Engineering Collection
Advanced Technologies & Aerospace Collection
Engineering Database
ProQuest One Academic Eastern Edition
ProQuest Technology Collection
ProQuest SciTech Collection
Computer and Information Systems Abstracts Professional
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
ProQuest DELNET Engineering and Technology Collection
Materials Science & Engineering Collection
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList Technology Collection
Technology Collection


Database_xml – sequence: 1
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1573-7616
ExternalDocumentID oai_HAL_hal_04101937v1
10_1007_s10664_023_10317_8
GrantInformation_xml – fundername: Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada
  funderid: http://dx.doi.org/10.13039/501100002790
GroupedDBID -4Z
-59
-5G
-BR
-EM
-Y2
-~C
.86
.DC
.VR
06D
0R~
0VY
199
1N0
1SB
2.D
203
28-
29G
2J2
2JN
2JY
2KG
2LR
2P1
2VQ
2~H
30V
4.4
406
408
409
40D
40E
5GY
5QI
5VS
67Z
6NX
78A
8FE
8FG
8TC
8UJ
95-
95.
95~
96X
AABHQ
AACDK
AAHNG
AAIAL
AAJBT
AAJKR
AANZL
AAOBN
AARHV
AARTL
AASML
AATNV
AATVU
AAUYE
AAWCG
AAYIU
AAYOK
AAYQN
AAYTO
AAYZH
ABAKF
ABBBX
ABBXA
ABDZT
ABECU
ABFTD
ABFTV
ABHLI
ABHQN
ABJCF
ABJNI
ABJOX
ABKCH
ABKTR
ABMNI
ABMQK
ABNWP
ABQBU
ABQSL
ABSXP
ABTEG
ABTHY
ABTKH
ABTMW
ABULA
ABWNU
ABXPI
ACAOD
ACBXY
ACDTI
ACGFS
ACHSB
ACHXU
ACIWK
ACKNC
ACMDZ
ACMLO
ACOKC
ACOMO
ACPIV
ACSNA
ACZOJ
ADHHG
ADHIR
ADIMF
ADINQ
ADKNI
ADKPE
ADRFC
ADTPH
ADURQ
ADYFF
ADZKW
AEBTG
AEFIE
AEFQL
AEGAL
AEGNC
AEJHL
AEJRE
AEKMD
AEMSY
AENEX
AEOHA
AEPYU
AESKC
AETLH
AEVLU
AEXYK
AFBBN
AFEXP
AFGCZ
AFKRA
AFLOW
AFQWF
AFWTZ
AFZKB
AGAYW
AGDGC
AGGDS
AGJBK
AGMZJ
AGQEE
AGQMX
AGRTI
AGWIL
AGWZB
AGYKE
AHAVH
AHBYD
AHKAY
AHSBF
AHYZX
AIAKS
AIGIU
AIIXL
AILAN
AITGF
AJBLW
AJRNO
AJZVZ
ALMA_UNASSIGNED_HOLDINGS
ALWAN
AMKLP
AMXSW
AMYLF
AMYQR
AOCGG
ARAPS
ARMRJ
ASPBG
AVWKF
AXYYD
AYJHY
AZFZN
B-.
BA0
BBWZM
BDATZ
BENPR
BGLVJ
BGNMA
BSONS
CAG
CCPQU
COF
CS3
CSCUP
DDRTE
DL5
DNIVK
DPUIP
DU5
EBLON
EBS
EIOEI
EJD
ESBYG
FEDTE
FERAY
FFXSO
FIGPU
FINBP
FNLPD
FRRFC
FSGXE
FWDCC
GGCAI
GGRSB
GJIRD
GNWQR
GQ6
GQ7
GQ8
GXS
H13
HCIFZ
HF~
HG5
HG6
HMJXF
HQYDN
HRMNR
HVGLF
HZ~
I09
IHE
IJ-
IKXTQ
ITM
IWAJR
IXC
IZIGR
IZQ
I~X
I~Z
J-C
J0Z
JBSCW
JCJTX
JZLTJ
KDC
KOV
KOW
L6V
LAK
LLZTM
M4Y
M7S
MA-
N2Q
NB0
NDZJH
NPVJJ
NQJWS
NU0
O9-
O93
O9G
O9I
O9J
OAM
P19
P62
P9O
PF0
PT4
PT5
PTHSS
Q2X
QOK
QOS
R4E
R89
R9I
RHV
RNI
RNS
ROL
RPX
RSV
RZC
RZE
RZK
S0W
S16
S1Z
S26
S27
S28
S3B
SAP
SCJ
SCLPG
SCO
SDH
SDM
SHX
SISQX
SJYHP
SNE
SNPRN
SNX
SOHCF
SOJ
SPISZ
SRMVM
SSLCW
STPWE
SZN
T13
T16
TSG
TSK
TSV
TUC
U2A
UG4
UOJIU
UTJUX
UZXMN
VC2
VFIZW
W23
W48
WK8
YLTOR
Z45
Z7R
Z7S
Z7V
Z7X
Z7Z
Z81
Z83
Z86
Z88
Z8M
Z8N
Z8P
Z8R
Z8T
Z8U
Z8W
Z92
ZMTXR
~EX
AAPKM
AAYXX
ABBRH
ABDBE
ABFSG
ACSTC
ADHKG
AEZWR
AFDZB
AFHIU
AFOHR
AGQPQ
AHPBZ
AHWEU
AIXLP
ATHPR
AYFIA
CITATION
PHGZM
PHGZT
7SC
8FD
ABRTQ
DWQXO
JQ2
L7M
L~C
L~D
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
1XC
VOOES
ID FETCH-LOGICAL-c381t-9abd81c7579720e69548793e176eddbe95414e1309c3ff0f93a9d82368d99a883
IEDL.DBID U2A
ISSN 1382-3256
IngestDate Thu Aug 21 07:33:11 EDT 2025
Fri Jul 25 19:03:12 EDT 2025
Fri Jul 25 12:27:48 EDT 2025
Tue Jul 01 03:32:22 EDT 2025
Thu Apr 24 23:02:24 EDT 2025
Fri Feb 21 02:41:25 EST 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 5
Keywords Source code tracking
Identifiers
Open source software
Software provenance
Python
python
software provenance
source code tracking
open source software
identifiers
Language English
License Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c381t-9abd81c7579720e69548793e176eddbe95414e1309c3ff0f93a9d82368d99a883
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0001-5661-4392
0000-0002-4576-136X
OpenAccessLink https://hal.science/hal-04101937
PQID 2840079714
PQPubID 326341
ParticipantIDs hal_primary_oai_HAL_hal_04101937v1
proquest_journals_2872506138
proquest_journals_2840079714
crossref_primary_10_1007_s10664_023_10317_8
crossref_citationtrail_10_1007_s10664_023_10317_8
springer_journals_10_1007_s10664_023_10317_8
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2023-10-01
PublicationDateYYYYMMDD 2023-10-01
PublicationDate_xml – month: 10
  year: 2023
  text: 2023-10-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
– name: Dordrecht
PublicationSubtitle An International Journal
PublicationTitle Empirical software engineering : an international journal
PublicationTitleAbbrev Empir Software Eng
PublicationYear 2023
Publisher Springer US
Springer Nature B.V
Springer Verlag
Publisher_xml – name: Springer US
– name: Springer Nature B.V
– name: Springer Verlag
References Kamiya, Kusumoto, Inoue (CR24) 2002; 28
CR39
CR37
CR36
Phipps, Zacchiroli (CR38) 2020; 53
CR34
CR32
CR31
Rousseau, Cosmo, Zacchiroli (CR41) 2020; 25
Sheneamer, Kalita (CR46) 2016; 137
Roy, Cordy (CR42) 2007; 541
Arnaoudova, Eshkevari, Di Penta, Oliveto, Antoniol, Guéhéneuc (CR1) 2014; 40
Harutyunyan (CR22) 2020; 53
McMillan, Grechanik, Poshyvanyk, Fu, Xie (CR29) 2012; 38
CR2
CR4
CR6
CR5
CR8
Godfrey (CR19) 2015; 97
CR9
CR49
CR47
CR45
CR44
CR43
Stewart, Odence, Rockett (CR48) 2010; 2
CR40
Deissenboeck, Pizka (CR14) 2006; 14
Caniell, German (CR7) 2017; 22
Warintarawej, Huchard, Lafourcade, Laurent, Pompidor (CR51) 2015; 19
CR18
CR17
Miles, Groth, Munroe, Moreau (CR30) 2011; 20
CR16
CR15
CR12
CR11
CR10
CR54
Lawrie, Morrell, Feild, Binkley (CR26) 2007; 3
CR53
CR52
CR50
Binkley, Davis, Lawrie, Maletic, Morrell, Sharif (CR3) 2013; 18
Godfrey, Zou (CR20) 2005; 31
CR25
Manning, Raghavan, Schutze (CR28) 2009
CR23
Li, Lu, Myagmar, Zhou (CR27) 2006; 32
CR21
Moreau, Clifford, Freire, Futrelle, Gil, Groth, Kwasnikowska, Miles, Missier, Myers (CR33) 2011; 27
Davies, German, Godfrey, Hindle (CR13) 2013; 18
Ombredanne (CR35) 2020; 53
10317_CR4
10317_CR5
10317_CR6
10317_CR8
10317_CR9
S Phipps (10317_CR38) 2020; 53
N Harutyunyan (10317_CR22) 2020; 53
P Warintarawej (10317_CR51) 2015; 19
Z Li (10317_CR27) 2006; 32
10317_CR34
10317_CR36
Philippe Ombredanne (10317_CR35) 2020; 53
10317_CR32
10317_CR31
10317_CR37
10317_CR39
D Lawrie (10317_CR26) 2007; 3
J Davies (10317_CR13) 2013; 18
D Binkley (10317_CR3) 2013; 18
MW Godfrey (10317_CR20) 2005; 31
10317_CR45
10317_CR44
10317_CR47
10317_CR40
10317_CR43
T Kamiya (10317_CR24) 2002; 28
10317_CR49
M Caniell (10317_CR7) 2017; 22
CD Manning (10317_CR28) 2009
A Sheneamer (10317_CR46) 2016; 137
10317_CR50
L Moreau (10317_CR33) 2011; 27
10317_CR12
10317_CR11
G Rousseau (10317_CR41) 2020; 25
10317_CR52
10317_CR10
10317_CR54
10317_CR53
MW Godfrey (10317_CR19) 2015; 97
10317_CR16
10317_CR15
10317_CR18
10317_CR17
K Stewart (10317_CR48) 2010; 2
V Arnaoudova (10317_CR1) 2014; 40
C McMillan (10317_CR29) 2012; 38
F Deissenboeck (10317_CR14) 2006; 14
10317_CR23
10317_CR25
10317_CR21
CK Roy (10317_CR42) 2007; 541
S Miles (10317_CR30) 2011; 20
10317_CR2
References_xml – volume: 53
  start-page: 77
  issue: 6
  year: 2020
  end-page: 81
  ident: CR22
  article-title: Managing your open source supply chain-why and how?
  publication-title: Computer
  doi: 10.1109/MC.2020.2983530
– ident: CR45
– volume: 14
  start-page: 261
  issue: 3
  year: 2006
  end-page: 282
  ident: CR14
  article-title: Concise and consistent naming
  publication-title: Software Quality Journal
  doi: 10.1007/s11219-006-9219-1
– volume: 19
  start-page: 761
  issue: 4
  year: 2015
  end-page: 778
  ident: CR51
  article-title: Software understanding: automatic classification of software identifiers
  publication-title: Intelligent Data Analysis
  doi: 10.3233/IDA-150744
– ident: CR49
– ident: CR4
– ident: CR39
– ident: CR16
– ident: CR12
– ident: CR54
– ident: CR8
– ident: CR25
– volume: 53
  start-page: 115
  issue: 12
  year: 2020
  end-page: 119
  ident: CR38
  article-title: Continuous open source license compliance
  publication-title: Computer
  doi: 10.1109/MC.2020.3024403
– ident: CR21
– volume: 20
  start-page: 1
  issue: 3
  year: 2011
  end-page: 42
  ident: CR30
  article-title: Prime: a methodology for developing provenance-aware applications
  publication-title: ACM Transactions on Software Engineering and Methodology (TOSEM)
  doi: 10.1145/2000791.2000792
– volume: 137
  start-page: 1
  issue: 10
  year: 2016
  end-page: 21
  ident: CR46
  article-title: A survey of software clone detection techniques
  publication-title: International Journal of Computer Applications
  doi: 10.5120/ijca2016908896
– volume: 2
  start-page: 191
  year: 2010
  ident: CR48
  article-title: Software package data exchange (SPDX) specification
  publication-title: IFOSS L. Rev.
  doi: 10.5033/ifosslr.v4i1.45
– ident: CR15
– ident: CR50
– ident: CR11
– ident: CR9
– ident: CR32
– ident: CR36
– ident: CR5
– volume: 18
  start-page: 219
  issue: 2
  year: 2013
  end-page: 276
  ident: CR3
  article-title: The impact of identifier style on effort and comprehension
  publication-title: Empirical Software Engineering
  doi: 10.1007/s10664-012-9201-4
– volume: 38
  start-page: 1069
  issue: 5
  year: 2012
  end-page: 1087
  ident: CR29
  article-title: Exemplar: a source code search engine for finding highly relevant applications
  publication-title: IEEE Trans Software Eng
  doi: 10.1109/TSE.2011.84
– year: 2009
  ident: CR28
  publication-title: An Introduction to Information Retrieval
– volume: 97
  start-page: 86
  year: 2015
  end-page: 90
  ident: CR19
  article-title: Understanding software artifact provenance
  publication-title: Science of Computer Programming
  doi: 10.1016/j.scico.2013.11.021
– ident: CR18
– ident: CR43
– ident: CR47
– ident: CR2
– ident: CR37
– ident: CR53
– ident: CR10
– volume: 53
  start-page: 105
  issue: 10
  year: 2020
  end-page: 109
  ident: CR35
  article-title: Free and open source software license compliance: Tools for software composition analysis
  publication-title: Computer
  doi: 10.1109/MC.2020.3011082
– ident: CR6
– volume: 27
  start-page: 743
  issue: 6
  year: 2011
  end-page: 756
  ident: CR33
  article-title: The open provenance model core specification (v1. 1)
  publication-title: Future Generation Computer Systems
  doi: 10.1016/j.future.2010.07.005
– ident: CR40
– ident: CR23
– volume: 40
  start-page: 502
  issue: 5
  year: 2014
  end-page: 532
  ident: CR1
  article-title: Repent: analyzing the nature of identifier renamings
  publication-title: IEEE Transactions on Software Engineering
  doi: 10.1109/TSE.2014.2312942
– volume: 3
  start-page: 303
  issue: 4
  year: 2007
  end-page: 318
  ident: CR26
  article-title: Effective identifier names for comprehension and memory
  publication-title: Innovations in Systems and Software Engineering
  doi: 10.1007/s11334-007-0031-2
– volume: 28
  start-page: 654
  issue: 7
  year: 2002
  end-page: 670
  ident: CR24
  article-title: Ccfinder: a multilinguistic token-based code clone detection system for large scale source code
  publication-title: IEEE Transactions on Software Engineering
  doi: 10.1109/TSE.2002.1019480
– ident: CR44
– volume: 32
  start-page: 176
  issue: 3
  year: 2006
  end-page: 192
  ident: CR27
  article-title: Cp-miner: finding copy-paste and related bugs in large-scale software code
  publication-title: IEEE Transactions on Software Engineering
  doi: 10.1109/TSE.2006.28
– ident: CR52
– ident: CR17
– volume: 31
  start-page: 166
  issue: 2
  year: 2005
  end-page: 181
  ident: CR20
  article-title: Using origin analysis to detect merging and splitting of source code entities
  publication-title: IEEE Transactions on Software Engineering
  doi: 10.1109/TSE.2005.28
– ident: CR31
– ident: CR34
– volume: 22
  start-page: 1405
  year: 2017
  end-page: 1437
  ident: CR7
  article-title: Zacchiroli S (2017) The debsources dataset: two decades of free and open source software
  publication-title: Empirical Software Engineering
  doi: 10.1007/s10664-016-9461-5
– volume: 18
  start-page: 1195
  issue: 6
  year: 2013
  end-page: 1237
  ident: CR13
  article-title: Software bertillonage
  publication-title: Empirical Software Engineering
  doi: 10.1007/s10664-012-9199-7
– volume: 25
  start-page: 2930
  year: 2020
  end-page: 2959
  ident: CR41
  article-title: Software provenance tracking at the scale of public source code
  publication-title: Empirical Software Engineering
  doi: 10.1007/s10664-020-09828-5
– volume: 541
  start-page: 64
  issue: 115
  year: 2007
  end-page: 68
  ident: CR42
  article-title: A survey on software clone detection research
  publication-title: Queen’s School of Computing TR
– volume: 31
  start-page: 166
  issue: 2
  year: 2005
  ident: 10317_CR20
  publication-title: IEEE Transactions on Software Engineering
  doi: 10.1109/TSE.2005.28
– volume: 53
  start-page: 105
  issue: 10
  year: 2020
  ident: 10317_CR35
  publication-title: Computer
  doi: 10.1109/MC.2020.3011082
– ident: 10317_CR5
  doi: 10.1109/WCRE.1995.514690
– ident: 10317_CR21
  doi: 10.1007/978-981-10-4600-1_3
– ident: 10317_CR12
  doi: 10.1145/1985441.1985468
– ident: 10317_CR23
  doi: 10.1109/SANER.2017.7884623
– volume: 53
  start-page: 77
  issue: 6
  year: 2020
  ident: 10317_CR22
  publication-title: Computer
  doi: 10.1109/MC.2020.2983530
– ident: 10317_CR15
  doi: 10.1109/MSR.2010.5463282
– volume: 18
  start-page: 219
  issue: 2
  year: 2013
  ident: 10317_CR3
  publication-title: Empirical Software Engineering
  doi: 10.1007/s10664-012-9201-4
– volume-title: An Introduction to Information Retrieval
  year: 2009
  ident: 10317_CR28
– volume: 20
  start-page: 1
  issue: 3
  year: 2011
  ident: 10317_CR30
  publication-title: ACM Transactions on Software Engineering and Methodology (TOSEM)
  doi: 10.1145/2000791.2000792
– ident: 10317_CR34
  doi: 10.1145/3377811.3380926
– volume: 137
  start-page: 1
  issue: 10
  year: 2016
  ident: 10317_CR46
  publication-title: International Journal of Computer Applications
  doi: 10.5120/ijca2016908896
– volume: 14
  start-page: 261
  issue: 3
  year: 2006
  ident: 10317_CR14
  publication-title: Software Quality Journal
  doi: 10.1007/s11219-006-9219-1
– volume: 40
  start-page: 502
  issue: 5
  year: 2014
  ident: 10317_CR1
  publication-title: IEEE Transactions on Software Engineering
  doi: 10.1109/TSE.2014.2312942
– ident: 10317_CR10
– ident: 10317_CR16
  doi: 10.1145/1882291.1882315
– ident: 10317_CR25
  doi: 10.1007/978-3-319-09156-3_35
– ident: 10317_CR4
  doi: 10.1109/APSEC48747.2019.00010
– ident: 10317_CR39
  doi: 10.1109/MSR.2019.00030
– volume: 53
  start-page: 115
  issue: 12
  year: 2020
  ident: 10317_CR38
  publication-title: Computer
  doi: 10.1109/MC.2020.3024403
– ident: 10317_CR47
  doi: 10.1109/WCRE.1996.558901
– volume: 25
  start-page: 2930
  year: 2020
  ident: 10317_CR41
  publication-title: Empirical Software Engineering
  doi: 10.1007/s10664-020-09828-5
– ident: 10317_CR18
  doi: 10.1109/MSR.2017.15
– ident: 10317_CR37
  doi: 10.1109/MSR.2019.00078
– volume: 2
  start-page: 191
  year: 2010
  ident: 10317_CR48
  publication-title: IFOSS L. Rev.
  doi: 10.5033/ifosslr.v4i1.45
– ident: 10317_CR8
  doi: 10.1109/ICSM.2000.883022
– volume: 32
  start-page: 176
  issue: 3
  year: 2006
  ident: 10317_CR27
  publication-title: IEEE Transactions on Software Engineering
  doi: 10.1109/TSE.2006.28
– volume: 38
  start-page: 1069
  issue: 5
  year: 2012
  ident: 10317_CR29
  publication-title: IEEE Trans Software Eng
  doi: 10.1109/TSE.2011.84
– ident: 10317_CR9
  doi: 10.1109/ICPC.2011.26
– ident: 10317_CR44
  doi: 10.1145/2884781.2884877
– ident: 10317_CR54
  doi: 10.1145/3387940.3392209
– volume: 97
  start-page: 86
  year: 2015
  ident: 10317_CR19
  publication-title: Science of Computer Programming
  doi: 10.1016/j.scico.2013.11.021
– ident: 10317_CR11
  doi: 10.1145/1370175.1370215
– volume: 3
  start-page: 303
  issue: 4
  year: 2007
  ident: 10317_CR26
  publication-title: Innovations in Systems and Software Engineering
  doi: 10.1007/s11334-007-0031-2
– ident: 10317_CR32
– volume: 19
  start-page: 761
  issue: 4
  year: 2015
  ident: 10317_CR51
  publication-title: Intelligent Data Analysis
  doi: 10.3233/IDA-150744
– volume: 22
  start-page: 1405
  year: 2017
  ident: 10317_CR7
  publication-title: Empirical Software Engineering
  doi: 10.1007/s10664-016-9461-5
– volume: 541
  start-page: 64
  issue: 115
  year: 2007
  ident: 10317_CR42
  publication-title: Queen’s School of Computing TR
– ident: 10317_CR52
  doi: 10.1007/978-3-642-17819-1_7
– ident: 10317_CR2
– ident: 10317_CR36
  doi: 10.1109/ICSM.2011.6080795
– ident: 10317_CR43
  doi: 10.1002/smr.2265
– volume: 28
  start-page: 654
  issue: 7
  year: 2002
  ident: 10317_CR24
  publication-title: IEEE Transactions on Software Engineering
  doi: 10.1109/TSE.2002.1019480
– ident: 10317_CR45
– volume: 27
  start-page: 743
  issue: 6
  year: 2011
  ident: 10317_CR33
  publication-title: Future Generation Computer Systems
  doi: 10.1016/j.future.2010.07.005
– volume: 18
  start-page: 1195
  issue: 6
  year: 2013
  ident: 10317_CR13
  publication-title: Empirical Software Engineering
  doi: 10.1007/s10664-012-9199-7
– ident: 10317_CR6
  doi: 10.1007/978-3-030-62008-0_30
– ident: 10317_CR50
– ident: 10317_CR17
  doi: 10.1007/978-981-10-3433-6_79
– ident: 10317_CR53
  doi: 10.1145/2351676.2351725
– ident: 10317_CR49
  doi: 10.5281/zenedo.7637703
– ident: 10317_CR40
– ident: 10317_CR31
  doi: 10.1145/2452376.2452478
SSID ssj0009745
Score 2.3696315
Snippet We consider the problem of identifying the provenance of free/open source software (FOSS) and specifically the need of identifying where reused source code has...
SourceID hal
proquest
crossref
springer
SourceType Open Access Repository
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 107
SubjectTerms Compilers
Computer Science
Interpreters
Mathematical analysis
Names
Open source software
Programming Languages
Software Engineering
Software Engineering/Programming and Operating Systems
Software packages
Source code
SummonAdditionalLinks – databaseName: ProQuest Central
  dbid: BENPR
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSwMxEA7aXrz4Ft8E8abBZHebx0lULEW0iCh4W5LdBAVp1VbFf-_MbtaqYK-bDFlmkpkhmfk-QvaFC4Y7KVnIksCyhDtmneSso6VyLlVeCexGvurL3l12cd-5jxduo1hW2fjEylGXwwLvyI_AjUI4M0pkx88vDFmj8HU1UmjMkja4YK1bpH163r--mcDuqoqmGIH2WArRPbbNxOY5KTMGMYsh1QH46l-hafYBCyN_ZJ1_Hkqr-NNdJPMxcaQntaWXyIwfLJOFhpSBxjO6QnxVBEAhr6NvFTgr-jI6DLSG_qCPZV0fBGkfHQ9pGcthfCWBFwweITg8Slx_IrAAHYGn_rCvntYX_RS74FfJXff89qzHIpcCKyAmj5mxrtSiUB1QYcK9RJw3OJpeKOnL0nmDdOAeApop0hB4MKk1JZKh69IYq3W6RlqD4cCvE6oLndkkCIcFqE4FG7iV3BruDZJR6g0iGjXmRQQaR76Lp3wCkYyqz0H1eaX6HGQOvmWea5iNqbP3wDrfExEhu3dymeM3noGPgZTrXWyQ7cZ4eTyXo3yyi_4ZVpASQoYDSxw29p4M__9Hm9MX2yJzSb3RGBfbpDV-ffM7kMyM3W7csV8HzO35
  priority: 102
  providerName: ProQuest
Title Using the uniqueness of global identifiers to determine the provenance of Python software source code
URI https://link.springer.com/article/10.1007/s10664-023-10317-8
https://www.proquest.com/docview/2840079714
https://www.proquest.com/docview/2872506138
https://hal.science/hal-04101937
Volume 28
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5WL158i_VRgnjTQLK7zeNYpbX4QsSCnpZkN0FBWrFV8d87sw-rooKnhd0ku8wkM98mM98QsidcMNxJyUISBZZE3DHrJGdtLZVzsfJKYDby-YXsD5KTm_ZNlRQ2rqPd6yPJwlJ_SnaTMmHgYxiWJgDb2iBzbfh3x0CuQdSZUu2qojQxkuuxGDx6lSrz8xhf3FHjDoMhPyHNb4ejhc_pLZGFCizSTqndZTLjhytksS7EQKt1uUp8cfBPAcvR54KQFe0XHQVa0n3Q-7yMCQKoRycjmlchML7ogZsKHmk3PPa4fEMyAToG6_xqnzwtN_cpZr6vkUGve33UZ1X9BJaBH54wY12uRabayqiIe4ncbrAcvVDS57nzBkuAe3BiJotD4MHE1uRYAF3nxlit43UyOxwN_QahOtOJjYJwGHTqVLCBW8mt4d5gAUrdJKIWY5pV5OJY4-IhndIio-hTEH1aiD6FPvsffR5Lao0_W--Cdj4aIit2v3OW4j2egF0BmPUimmS7Vl5arcVxCg4YhjRKJL88VgADAdXAKw5qfU8f__5Fm_9rvkXmo3LiMS62yezk6dnvAKCZuBZp6N5xi8x1jm9Pu3A97F5cXrWKWf0OY1LtjA
linkProvider Springer Nature
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Nb9QwEB1tl0O5tHyKQgELwalY2EnWjg9VtQKWLd2temil3oyd2AIJ7bbdbVf9U_xGZvLRpUjdW69JnETjycyLPfMewHvpoxFeKR6zJPIsEZ47rwTv5Up7n-qgJXUjjw_V8CT7fto77cCftheGyirbmFgF6nJa0Br5JwyjmM6Mltne2Tkn1SjaXW0lNGq3OAjXC_xlm-3uf8H5_ZAkg6_Hn4e8URXgBWanOTfOl7ksdA9vloigiPEMnTRIrUJZ-mBIGDtgaDdFGqOIJnWmJFnwvDTG5XmK912DB1mKmZw60wffliS_uhJFJlo_niKWaJp0mlY9pTKOGZKTsAJmhluJcO0nlWH-g3H_25atst3gEWw0MJX1a796DJ0weQKbrQQEayLCUwhVyQFDFMkuKypYipxsGllNNMJ-lXU1EoJMNp-ysim-CdUIWs4IRPgRaMTRNdEYsBnmhYW7CKzeVmDUc_8MTu7Fxs-hO5lOwgtgeZFnLonSU7mr19FF4ZRwRgRD0pf5FsjWjLZoaM1JXeO3XRIyk-ktmt5Wprc4ZudmzFlN6rHy6nc4OzcXEh_3sD-ydExkGNEQ4F3JLdhuJ882UWBmlz57x2mNABTxFD7iYzvfy9N3v9HL1Q97C-vD4_HIjvYPD17Bw6R2Oi7kNnTnF5fhNcKouX9T-S6DH_f9sfwFVBgoPg
linkToPdf http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1NT9wwEB3BIlW90G8VSlurak-thZ1k7fhQVbSwWgpdraoicXPtxFaR0C5llyL-Gr-OmcRh20pw45rESTSezLzYM-8BvJU-GuGV4rHIIi8y4bnzSvB-qbT3uQ5aUjfyt5EaHhRfD_uHS3DZ9cJQWWUXE5tAXU8rWiPfxDCK6cxoWWzGVBYx3h58OvnNSUGKdlo7OY3WRfbCxTn-vs0-7m7jXL_LssHOjy9DnhQGeIWZas6N83UpK93HG2ciKGI_Q4cNUqtQ1z4YEskOGOZNlccoosmdqUkivKyNcWWZ432XYUXTX1EPVj7vjMbfF5S_upFIJpI_niOySC07qXFPqYJjvuQks4B54p-0uPyLijL_Qrz_bdI2uW_wEFYTaGVbrZc9gqUweQwPOkEIluLDEwhNAQJDTMnOGmJYiqNsGllLO8KO6rY2CSEnm09ZnUpxQjOCFjcC0X8EGjG-IFIDNsMsce5OA2s3GRh14D-Fgzux8jPoTaaT8BxYWZWFy6L0VPzqdXRROCWcEcGQEGa5BrIzo60SyTlpbRzbBT0zmd6i6W1jeotj3l-POWkpPm69-g3OzvWFxM493Nq3dEwUGN8Q7v2Ra7DRTZ5NMWFmFx58w2mNcBTRFT7iQzffi9M3v9H67Q97DffwQ7H7u6O9F3A_a32OC7kBvfnpWXiJmGruXyXnZfDzrr-XK2qELdA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Using+the+uniqueness+of+global+identifiers+to+determine+the+provenance+of+Python+software+source+code&rft.jtitle=Empirical+software+engineering+%3A+an+international+journal&rft.au=Sun%2C+Yiming&rft.au=German%2C+Daniel&rft.au=Zacchiroli%2C+Stefano&rft.date=2023-10-01&rft.pub=Springer+US&rft.issn=1382-3256&rft.eissn=1573-7616&rft.volume=28&rft.issue=5&rft_id=info:doi/10.1007%2Fs10664-023-10317-8&rft.externalDocID=10_1007_s10664_023_10317_8
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1382-3256&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1382-3256&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1382-3256&client=summon