Using the uniqueness of global identifiers to determine the provenance of Python software source code
We consider the problem of identifying the provenance of free/open source software (FOSS) and specifically the need of identifying where reused source code has been copied from. We propose a lightweight approach to solve the problem based on software identifiers—such as the names of variables, class...
Saved in:
Published in | Empirical software engineering : an international journal Vol. 28; no. 5; p. 107 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
Springer US
01.10.2023
Springer Nature B.V Springer Verlag |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | We consider the problem of identifying the provenance of free/open source software (FOSS) and specifically the need of identifying where reused source code has been copied from. We propose a lightweight approach to solve the problem based on software identifiers—such as the names of variables, classes, and functions chosen by programmers. The proposed approach is able to efficiently narrow down to a small set of candidate origin products, to be further analyzed with more expensive techniques to make a final provenance determination. By analyzing the PyPI (Python Packaging Index) open source ecosystem we find that globally defined identifiers are very distinct. Across PyPI’s 244 K packages we found 11.2 M different global identifiers (classes and method/function names—with only 0.6% of identifiers shared among the two types of entities); 76% of identifiers were used only in one package, and 93% in at most 3. Randomly selecting 3 non-frequent global identifiers from an input product is enough to narrow down its origins to a maximum of 3 products within 89% of the cases. We validate the proposed approach by mapping Debian source packages implemented in Python to the corresponding PyPI packages; this approach uses at most five trials, where each trial uses three randomly chosen global identifiers from a randomly chosen python file of the subject software package, then ranks results using a popularity index and requires to inspect only the top result. In our experiments, this method is effective at finding the true origin of a project with a recall of 0.9 and precision of 0.77. |
---|---|
AbstractList | We consider the problem of identifying the provenance of free/open source software (FOSS) and specifically the need of identifying where reused source code has been copied from. We propose a lightweight approach to solve the problem based on software identifiers—such as the names of variables, classes, and functions chosen by programmers. The proposed approach is able to efficiently narrow down to a small set of candidate origin products, to be further analyzed with more expensive techniques to make a final provenance determination. By analyzing the PyPI (Python Packaging Index) open source ecosystem we find that globally defined identifiers are very distinct. Across PyPI’s 244 K packages we found 11.2 M different global identifiers (classes and method/function names—with only 0.6% of identifiers shared among the two types of entities); 76% of identifiers were used only in one package, and 93% in at most 3. Randomly selecting 3 non-frequent global identifiers from an input product is enough to narrow down its origins to a maximum of 3 products within 89% of the cases. We validate the proposed approach by mapping Debian source packages implemented in Python to the corresponding PyPI packages; this approach uses at most five trials, where each trial uses three randomly chosen global identifiers from a randomly chosen python file of the subject software package, then ranks results using a popularity index and requires to inspect only the top result. In our experiments, this method is effective at finding the true origin of a project with a recall of 0.9 and precision of 0.77. We consider the problem of identifying the provenance of free/open source software (FOSS) and specifically the need of identifying where reused source code has been copied from. We propose a lightweight approach to solve the problem based on software identifiers—such as the names of variables, classes, and functions chosen by programmers. The proposed approach is able to efficiently narrow down to a small set of candidate origin products, to be further analyzed with more expensive techniques to make a final provenance determination. By analyzing the PyPI (Python Packaging Index) open source ecosystem we find that globally defined identifiers are very distinct. Across PyPI’s 244 K packages we found 11.2 M different global identifiers (classes and method/function names—with only 0.6% of identifiers shared among the two types of entities); 76% of identifiers were used only in one package, and 93% in at most 3. Randomly selecting 3 non-frequent global identifiers from an input product is enough to narrow down its origins to a maximum of 3 products within 89% of the cases. We validate the proposed approach by mapping Debian source packages implemented in Python to the corresponding PyPI packages; this approach uses at most five trials, where each trial uses three randomly chosen global identifiers from a randomly chosen python file of the subject software package, then ranks results using a popularity index and requires to inspect only the top result. In our experiments, this method is effective at finding the true origin of a project with a recall of 0.9 and precision of 0.77. |
ArticleNumber | 107 |
Author | Zacchiroli, Stefano Sun, Yiming German, Daniel |
Author_xml | – sequence: 1 givenname: Yiming surname: Sun fullname: Sun, Yiming organization: University of Victoria – sequence: 2 givenname: Daniel orcidid: 0000-0001-5661-4392 surname: German fullname: German, Daniel email: dmg@uvic.ca organization: University of Victoria – sequence: 3 givenname: Stefano surname: Zacchiroli fullname: Zacchiroli, Stefano organization: LTCI, Télécom Paris, Institut Polytechnique de Paris |
BackLink | https://hal.science/hal-04101937$$DView record in HAL |
BookMark | eNp9kclOwzAQhi0EEusLcIrEiUPAEydejlXFJlWCA5wtN5m0rlIbbBfE2-MSBBKHnjy2_2-2_5jsO--QkHOgV0CpuI5AOa9LWrESKANRyj1yBI1gpeDA93PMZFWyquGH5DjGFaVUibo5IvgSrVsUaYnFxtm3DTqMsfB9sRj83AyF7dAl21sMsUi-6DBhWFuH38Rr8O_ojGtxSzx9pqV3RfR9-jABc7AJ-af1HZ6Sg94MEc9-zhPycnvzPL0vZ493D9PJrGyZhFQqM-8ktKIRSlQUuWpqKRRDEBy7bo75DjUCo6plfU97xYzqZMW47JQyUrITcjnmXZpBvwa7NuFTe2P1_WSmt2-0BgqKiXfI2otRm6fIc8ekV7lhl9vTlRRVQ3ne2W5VnVevBNRZVY2qNvgYA_a_xYHqrUF6NEhng_S3QXqbWv6DWptMst6lYOywG2UjGnMdt8Dw19UO6gvPgqWc |
CitedBy_id | crossref_primary_10_1145_3660822 |
Cites_doi | 10.1109/MC.2020.2983530 10.1007/s11219-006-9219-1 10.3233/IDA-150744 10.1109/MC.2020.3024403 10.1145/2000791.2000792 10.5120/ijca2016908896 10.5033/ifosslr.v4i1.45 10.1007/s10664-012-9201-4 10.1109/TSE.2011.84 10.1016/j.scico.2013.11.021 10.1109/MC.2020.3011082 10.1016/j.future.2010.07.005 10.1109/TSE.2014.2312942 10.1007/s11334-007-0031-2 10.1109/TSE.2002.1019480 10.1109/TSE.2006.28 10.1109/TSE.2005.28 10.1007/s10664-016-9461-5 10.1007/s10664-012-9199-7 10.1007/s10664-020-09828-5 10.1109/WCRE.1995.514690 10.1007/978-981-10-4600-1_3 10.1145/1985441.1985468 10.1109/SANER.2017.7884623 10.1109/MSR.2010.5463282 10.1145/3377811.3380926 10.1145/1882291.1882315 10.1007/978-3-319-09156-3_35 10.1109/APSEC48747.2019.00010 10.1109/MSR.2019.00030 10.1109/WCRE.1996.558901 10.1109/MSR.2017.15 10.1109/MSR.2019.00078 10.1109/ICSM.2000.883022 10.1109/ICPC.2011.26 10.1145/2884781.2884877 10.1145/3387940.3392209 10.1145/1370175.1370215 10.1007/978-3-642-17819-1_7 10.1109/ICSM.2011.6080795 10.1002/smr.2265 10.1007/978-3-030-62008-0_30 10.1007/978-981-10-3433-6_79 10.1145/2351676.2351725 10.5281/zenedo.7637703 10.1145/2452376.2452478 |
ContentType | Journal Article |
Copyright | The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. Distributed under a Creative Commons Attribution 4.0 International License |
Copyright_xml | – notice: The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. – notice: Distributed under a Creative Commons Attribution 4.0 International License |
DBID | AAYXX CITATION 7SC 8FD 8FE 8FG ABJCF AFKRA ARAPS BENPR BGLVJ CCPQU DWQXO HCIFZ JQ2 L6V L7M L~C L~D M7S P5Z P62 PHGZM PHGZT PKEHL PQEST PQGLB PQQKQ PQUKI PTHSS S0W 1XC VOOES |
DOI | 10.1007/s10664-023-10317-8 |
DatabaseName | CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest SciTech Collection ProQuest Technology Collection Materials Science & Engineering Collection ProQuest Central UK/Ireland Advanced Technologies & Aerospace Collection ProQuest Central Technology Collection ProQuest One ProQuest Central Korea SciTech Premium Collection ProQuest Computer Science Collection ProQuest Engineering Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional Engineering Database Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition Engineering collection DELNET Engineering & Technology Collection Hyper Article en Ligne (HAL) Hyper Article en Ligne (HAL) (Open Access) |
DatabaseTitle | CrossRef Technology Collection Technology Research Database Computer and Information Systems Abstracts – Academic ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Computer Science Collection Computer and Information Systems Abstracts SciTech Premium Collection ProQuest One Community College ProQuest Central ProQuest One Applied & Life Sciences ProQuest Engineering Collection ProQuest Central Korea ProQuest Central (New) Advanced Technologies Database with Aerospace Engineering Collection Advanced Technologies & Aerospace Collection Engineering Database ProQuest One Academic Eastern Edition ProQuest Technology Collection ProQuest SciTech Collection Computer and Information Systems Abstracts Professional Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition ProQuest DELNET Engineering and Technology Collection Materials Science & Engineering Collection ProQuest One Academic ProQuest One Academic (New) |
DatabaseTitleList | Technology Collection Technology Collection |
Database_xml | – sequence: 1 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 1573-7616 |
ExternalDocumentID | oai_HAL_hal_04101937v1 10_1007_s10664_023_10317_8 |
GrantInformation_xml | – fundername: Canadian Network for Research and Innovation in Machining Technology, Natural Sciences and Engineering Research Council of Canada funderid: http://dx.doi.org/10.13039/501100002790 |
GroupedDBID | -4Z -59 -5G -BR -EM -Y2 -~C .86 .DC .VR 06D 0R~ 0VY 199 1N0 1SB 2.D 203 28- 29G 2J2 2JN 2JY 2KG 2LR 2P1 2VQ 2~H 30V 4.4 406 408 409 40D 40E 5GY 5QI 5VS 67Z 6NX 78A 8FE 8FG 8TC 8UJ 95- 95. 95~ 96X AABHQ AACDK AAHNG AAIAL AAJBT AAJKR AANZL AAOBN AARHV AARTL AASML AATNV AATVU AAUYE AAWCG AAYIU AAYOK AAYQN AAYTO AAYZH ABAKF ABBBX ABBXA ABDZT ABECU ABFTD ABFTV ABHLI ABHQN ABJCF ABJNI ABJOX ABKCH ABKTR ABMNI ABMQK ABNWP ABQBU ABQSL ABSXP ABTEG ABTHY ABTKH ABTMW ABULA ABWNU ABXPI ACAOD ACBXY ACDTI ACGFS ACHSB ACHXU ACIWK ACKNC ACMDZ ACMLO ACOKC ACOMO ACPIV ACSNA ACZOJ ADHHG ADHIR ADIMF ADINQ ADKNI ADKPE ADRFC ADTPH ADURQ ADYFF ADZKW AEBTG AEFIE AEFQL AEGAL AEGNC AEJHL AEJRE AEKMD AEMSY AENEX AEOHA AEPYU AESKC AETLH AEVLU AEXYK AFBBN AFEXP AFGCZ AFKRA AFLOW AFQWF AFWTZ AFZKB AGAYW AGDGC AGGDS AGJBK AGMZJ AGQEE AGQMX AGRTI AGWIL AGWZB AGYKE AHAVH AHBYD AHKAY AHSBF AHYZX AIAKS AIGIU AIIXL AILAN AITGF AJBLW AJRNO AJZVZ ALMA_UNASSIGNED_HOLDINGS ALWAN AMKLP AMXSW AMYLF AMYQR AOCGG ARAPS ARMRJ ASPBG AVWKF AXYYD AYJHY AZFZN B-. BA0 BBWZM BDATZ BENPR BGLVJ BGNMA BSONS CAG CCPQU COF CS3 CSCUP DDRTE DL5 DNIVK DPUIP DU5 EBLON EBS EIOEI EJD ESBYG FEDTE FERAY FFXSO FIGPU FINBP FNLPD FRRFC FSGXE FWDCC GGCAI GGRSB GJIRD GNWQR GQ6 GQ7 GQ8 GXS H13 HCIFZ HF~ HG5 HG6 HMJXF HQYDN HRMNR HVGLF HZ~ I09 IHE IJ- IKXTQ ITM IWAJR IXC IZIGR IZQ I~X I~Z J-C J0Z JBSCW JCJTX JZLTJ KDC KOV KOW L6V LAK LLZTM M4Y M7S MA- N2Q NB0 NDZJH NPVJJ NQJWS NU0 O9- O93 O9G O9I O9J OAM P19 P62 P9O PF0 PT4 PT5 PTHSS Q2X QOK QOS R4E R89 R9I RHV RNI RNS ROL RPX RSV RZC RZE RZK S0W S16 S1Z S26 S27 S28 S3B SAP SCJ SCLPG SCO SDH SDM SHX SISQX SJYHP SNE SNPRN SNX SOHCF SOJ SPISZ SRMVM SSLCW STPWE SZN T13 T16 TSG TSK TSV TUC U2A UG4 UOJIU UTJUX UZXMN VC2 VFIZW W23 W48 WK8 YLTOR Z45 Z7R Z7S Z7V Z7X Z7Z Z81 Z83 Z86 Z88 Z8M Z8N Z8P Z8R Z8T Z8U Z8W Z92 ZMTXR ~EX AAPKM AAYXX ABBRH ABDBE ABFSG ACSTC ADHKG AEZWR AFDZB AFHIU AFOHR AGQPQ AHPBZ AHWEU AIXLP ATHPR AYFIA CITATION PHGZM PHGZT 7SC 8FD ABRTQ DWQXO JQ2 L7M L~C L~D PKEHL PQEST PQGLB PQQKQ PQUKI 1XC VOOES |
ID | FETCH-LOGICAL-c381t-9abd81c7579720e69548793e176eddbe95414e1309c3ff0f93a9d82368d99a883 |
IEDL.DBID | U2A |
ISSN | 1382-3256 |
IngestDate | Thu Aug 21 07:33:11 EDT 2025 Fri Jul 25 19:03:12 EDT 2025 Fri Jul 25 12:27:48 EDT 2025 Tue Jul 01 03:32:22 EDT 2025 Thu Apr 24 23:02:24 EDT 2025 Fri Feb 21 02:41:25 EST 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 5 |
Keywords | Source code tracking Identifiers Open source software Software provenance Python python software provenance source code tracking open source software identifiers |
Language | English |
License | Distributed under a Creative Commons Attribution 4.0 International License: http://creativecommons.org/licenses/by/4.0 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c381t-9abd81c7579720e69548793e176eddbe95414e1309c3ff0f93a9d82368d99a883 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0001-5661-4392 0000-0002-4576-136X |
OpenAccessLink | https://hal.science/hal-04101937 |
PQID | 2840079714 |
PQPubID | 326341 |
ParticipantIDs | hal_primary_oai_HAL_hal_04101937v1 proquest_journals_2872506138 proquest_journals_2840079714 crossref_primary_10_1007_s10664_023_10317_8 crossref_citationtrail_10_1007_s10664_023_10317_8 springer_journals_10_1007_s10664_023_10317_8 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2023-10-01 |
PublicationDateYYYYMMDD | 2023-10-01 |
PublicationDate_xml | – month: 10 year: 2023 text: 2023-10-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York – name: Dordrecht |
PublicationSubtitle | An International Journal |
PublicationTitle | Empirical software engineering : an international journal |
PublicationTitleAbbrev | Empir Software Eng |
PublicationYear | 2023 |
Publisher | Springer US Springer Nature B.V Springer Verlag |
Publisher_xml | – name: Springer US – name: Springer Nature B.V – name: Springer Verlag |
References | Kamiya, Kusumoto, Inoue (CR24) 2002; 28 CR39 CR37 CR36 Phipps, Zacchiroli (CR38) 2020; 53 CR34 CR32 CR31 Rousseau, Cosmo, Zacchiroli (CR41) 2020; 25 Sheneamer, Kalita (CR46) 2016; 137 Roy, Cordy (CR42) 2007; 541 Arnaoudova, Eshkevari, Di Penta, Oliveto, Antoniol, Guéhéneuc (CR1) 2014; 40 Harutyunyan (CR22) 2020; 53 McMillan, Grechanik, Poshyvanyk, Fu, Xie (CR29) 2012; 38 CR2 CR4 CR6 CR5 CR8 Godfrey (CR19) 2015; 97 CR9 CR49 CR47 CR45 CR44 CR43 Stewart, Odence, Rockett (CR48) 2010; 2 CR40 Deissenboeck, Pizka (CR14) 2006; 14 Caniell, German (CR7) 2017; 22 Warintarawej, Huchard, Lafourcade, Laurent, Pompidor (CR51) 2015; 19 CR18 CR17 Miles, Groth, Munroe, Moreau (CR30) 2011; 20 CR16 CR15 CR12 CR11 CR10 CR54 Lawrie, Morrell, Feild, Binkley (CR26) 2007; 3 CR53 CR52 CR50 Binkley, Davis, Lawrie, Maletic, Morrell, Sharif (CR3) 2013; 18 Godfrey, Zou (CR20) 2005; 31 CR25 Manning, Raghavan, Schutze (CR28) 2009 CR23 Li, Lu, Myagmar, Zhou (CR27) 2006; 32 CR21 Moreau, Clifford, Freire, Futrelle, Gil, Groth, Kwasnikowska, Miles, Missier, Myers (CR33) 2011; 27 Davies, German, Godfrey, Hindle (CR13) 2013; 18 Ombredanne (CR35) 2020; 53 10317_CR4 10317_CR5 10317_CR6 10317_CR8 10317_CR9 S Phipps (10317_CR38) 2020; 53 N Harutyunyan (10317_CR22) 2020; 53 P Warintarawej (10317_CR51) 2015; 19 Z Li (10317_CR27) 2006; 32 10317_CR34 10317_CR36 Philippe Ombredanne (10317_CR35) 2020; 53 10317_CR32 10317_CR31 10317_CR37 10317_CR39 D Lawrie (10317_CR26) 2007; 3 J Davies (10317_CR13) 2013; 18 D Binkley (10317_CR3) 2013; 18 MW Godfrey (10317_CR20) 2005; 31 10317_CR45 10317_CR44 10317_CR47 10317_CR40 10317_CR43 T Kamiya (10317_CR24) 2002; 28 10317_CR49 M Caniell (10317_CR7) 2017; 22 CD Manning (10317_CR28) 2009 A Sheneamer (10317_CR46) 2016; 137 10317_CR50 L Moreau (10317_CR33) 2011; 27 10317_CR12 10317_CR11 G Rousseau (10317_CR41) 2020; 25 10317_CR52 10317_CR10 10317_CR54 10317_CR53 MW Godfrey (10317_CR19) 2015; 97 10317_CR16 10317_CR15 10317_CR18 10317_CR17 K Stewart (10317_CR48) 2010; 2 V Arnaoudova (10317_CR1) 2014; 40 C McMillan (10317_CR29) 2012; 38 F Deissenboeck (10317_CR14) 2006; 14 10317_CR23 10317_CR25 10317_CR21 CK Roy (10317_CR42) 2007; 541 S Miles (10317_CR30) 2011; 20 10317_CR2 |
References_xml | – volume: 53 start-page: 77 issue: 6 year: 2020 end-page: 81 ident: CR22 article-title: Managing your open source supply chain-why and how? publication-title: Computer doi: 10.1109/MC.2020.2983530 – ident: CR45 – volume: 14 start-page: 261 issue: 3 year: 2006 end-page: 282 ident: CR14 article-title: Concise and consistent naming publication-title: Software Quality Journal doi: 10.1007/s11219-006-9219-1 – volume: 19 start-page: 761 issue: 4 year: 2015 end-page: 778 ident: CR51 article-title: Software understanding: automatic classification of software identifiers publication-title: Intelligent Data Analysis doi: 10.3233/IDA-150744 – ident: CR49 – ident: CR4 – ident: CR39 – ident: CR16 – ident: CR12 – ident: CR54 – ident: CR8 – ident: CR25 – volume: 53 start-page: 115 issue: 12 year: 2020 end-page: 119 ident: CR38 article-title: Continuous open source license compliance publication-title: Computer doi: 10.1109/MC.2020.3024403 – ident: CR21 – volume: 20 start-page: 1 issue: 3 year: 2011 end-page: 42 ident: CR30 article-title: Prime: a methodology for developing provenance-aware applications publication-title: ACM Transactions on Software Engineering and Methodology (TOSEM) doi: 10.1145/2000791.2000792 – volume: 137 start-page: 1 issue: 10 year: 2016 end-page: 21 ident: CR46 article-title: A survey of software clone detection techniques publication-title: International Journal of Computer Applications doi: 10.5120/ijca2016908896 – volume: 2 start-page: 191 year: 2010 ident: CR48 article-title: Software package data exchange (SPDX) specification publication-title: IFOSS L. Rev. doi: 10.5033/ifosslr.v4i1.45 – ident: CR15 – ident: CR50 – ident: CR11 – ident: CR9 – ident: CR32 – ident: CR36 – ident: CR5 – volume: 18 start-page: 219 issue: 2 year: 2013 end-page: 276 ident: CR3 article-title: The impact of identifier style on effort and comprehension publication-title: Empirical Software Engineering doi: 10.1007/s10664-012-9201-4 – volume: 38 start-page: 1069 issue: 5 year: 2012 end-page: 1087 ident: CR29 article-title: Exemplar: a source code search engine for finding highly relevant applications publication-title: IEEE Trans Software Eng doi: 10.1109/TSE.2011.84 – year: 2009 ident: CR28 publication-title: An Introduction to Information Retrieval – volume: 97 start-page: 86 year: 2015 end-page: 90 ident: CR19 article-title: Understanding software artifact provenance publication-title: Science of Computer Programming doi: 10.1016/j.scico.2013.11.021 – ident: CR18 – ident: CR43 – ident: CR47 – ident: CR2 – ident: CR37 – ident: CR53 – ident: CR10 – volume: 53 start-page: 105 issue: 10 year: 2020 end-page: 109 ident: CR35 article-title: Free and open source software license compliance: Tools for software composition analysis publication-title: Computer doi: 10.1109/MC.2020.3011082 – ident: CR6 – volume: 27 start-page: 743 issue: 6 year: 2011 end-page: 756 ident: CR33 article-title: The open provenance model core specification (v1. 1) publication-title: Future Generation Computer Systems doi: 10.1016/j.future.2010.07.005 – ident: CR40 – ident: CR23 – volume: 40 start-page: 502 issue: 5 year: 2014 end-page: 532 ident: CR1 article-title: Repent: analyzing the nature of identifier renamings publication-title: IEEE Transactions on Software Engineering doi: 10.1109/TSE.2014.2312942 – volume: 3 start-page: 303 issue: 4 year: 2007 end-page: 318 ident: CR26 article-title: Effective identifier names for comprehension and memory publication-title: Innovations in Systems and Software Engineering doi: 10.1007/s11334-007-0031-2 – volume: 28 start-page: 654 issue: 7 year: 2002 end-page: 670 ident: CR24 article-title: Ccfinder: a multilinguistic token-based code clone detection system for large scale source code publication-title: IEEE Transactions on Software Engineering doi: 10.1109/TSE.2002.1019480 – ident: CR44 – volume: 32 start-page: 176 issue: 3 year: 2006 end-page: 192 ident: CR27 article-title: Cp-miner: finding copy-paste and related bugs in large-scale software code publication-title: IEEE Transactions on Software Engineering doi: 10.1109/TSE.2006.28 – ident: CR52 – ident: CR17 – volume: 31 start-page: 166 issue: 2 year: 2005 end-page: 181 ident: CR20 article-title: Using origin analysis to detect merging and splitting of source code entities publication-title: IEEE Transactions on Software Engineering doi: 10.1109/TSE.2005.28 – ident: CR31 – ident: CR34 – volume: 22 start-page: 1405 year: 2017 end-page: 1437 ident: CR7 article-title: Zacchiroli S (2017) The debsources dataset: two decades of free and open source software publication-title: Empirical Software Engineering doi: 10.1007/s10664-016-9461-5 – volume: 18 start-page: 1195 issue: 6 year: 2013 end-page: 1237 ident: CR13 article-title: Software bertillonage publication-title: Empirical Software Engineering doi: 10.1007/s10664-012-9199-7 – volume: 25 start-page: 2930 year: 2020 end-page: 2959 ident: CR41 article-title: Software provenance tracking at the scale of public source code publication-title: Empirical Software Engineering doi: 10.1007/s10664-020-09828-5 – volume: 541 start-page: 64 issue: 115 year: 2007 end-page: 68 ident: CR42 article-title: A survey on software clone detection research publication-title: Queen’s School of Computing TR – volume: 31 start-page: 166 issue: 2 year: 2005 ident: 10317_CR20 publication-title: IEEE Transactions on Software Engineering doi: 10.1109/TSE.2005.28 – volume: 53 start-page: 105 issue: 10 year: 2020 ident: 10317_CR35 publication-title: Computer doi: 10.1109/MC.2020.3011082 – ident: 10317_CR5 doi: 10.1109/WCRE.1995.514690 – ident: 10317_CR21 doi: 10.1007/978-981-10-4600-1_3 – ident: 10317_CR12 doi: 10.1145/1985441.1985468 – ident: 10317_CR23 doi: 10.1109/SANER.2017.7884623 – volume: 53 start-page: 77 issue: 6 year: 2020 ident: 10317_CR22 publication-title: Computer doi: 10.1109/MC.2020.2983530 – ident: 10317_CR15 doi: 10.1109/MSR.2010.5463282 – volume: 18 start-page: 219 issue: 2 year: 2013 ident: 10317_CR3 publication-title: Empirical Software Engineering doi: 10.1007/s10664-012-9201-4 – volume-title: An Introduction to Information Retrieval year: 2009 ident: 10317_CR28 – volume: 20 start-page: 1 issue: 3 year: 2011 ident: 10317_CR30 publication-title: ACM Transactions on Software Engineering and Methodology (TOSEM) doi: 10.1145/2000791.2000792 – ident: 10317_CR34 doi: 10.1145/3377811.3380926 – volume: 137 start-page: 1 issue: 10 year: 2016 ident: 10317_CR46 publication-title: International Journal of Computer Applications doi: 10.5120/ijca2016908896 – volume: 14 start-page: 261 issue: 3 year: 2006 ident: 10317_CR14 publication-title: Software Quality Journal doi: 10.1007/s11219-006-9219-1 – volume: 40 start-page: 502 issue: 5 year: 2014 ident: 10317_CR1 publication-title: IEEE Transactions on Software Engineering doi: 10.1109/TSE.2014.2312942 – ident: 10317_CR10 – ident: 10317_CR16 doi: 10.1145/1882291.1882315 – ident: 10317_CR25 doi: 10.1007/978-3-319-09156-3_35 – ident: 10317_CR4 doi: 10.1109/APSEC48747.2019.00010 – ident: 10317_CR39 doi: 10.1109/MSR.2019.00030 – volume: 53 start-page: 115 issue: 12 year: 2020 ident: 10317_CR38 publication-title: Computer doi: 10.1109/MC.2020.3024403 – ident: 10317_CR47 doi: 10.1109/WCRE.1996.558901 – volume: 25 start-page: 2930 year: 2020 ident: 10317_CR41 publication-title: Empirical Software Engineering doi: 10.1007/s10664-020-09828-5 – ident: 10317_CR18 doi: 10.1109/MSR.2017.15 – ident: 10317_CR37 doi: 10.1109/MSR.2019.00078 – volume: 2 start-page: 191 year: 2010 ident: 10317_CR48 publication-title: IFOSS L. Rev. doi: 10.5033/ifosslr.v4i1.45 – ident: 10317_CR8 doi: 10.1109/ICSM.2000.883022 – volume: 32 start-page: 176 issue: 3 year: 2006 ident: 10317_CR27 publication-title: IEEE Transactions on Software Engineering doi: 10.1109/TSE.2006.28 – volume: 38 start-page: 1069 issue: 5 year: 2012 ident: 10317_CR29 publication-title: IEEE Trans Software Eng doi: 10.1109/TSE.2011.84 – ident: 10317_CR9 doi: 10.1109/ICPC.2011.26 – ident: 10317_CR44 doi: 10.1145/2884781.2884877 – ident: 10317_CR54 doi: 10.1145/3387940.3392209 – volume: 97 start-page: 86 year: 2015 ident: 10317_CR19 publication-title: Science of Computer Programming doi: 10.1016/j.scico.2013.11.021 – ident: 10317_CR11 doi: 10.1145/1370175.1370215 – volume: 3 start-page: 303 issue: 4 year: 2007 ident: 10317_CR26 publication-title: Innovations in Systems and Software Engineering doi: 10.1007/s11334-007-0031-2 – ident: 10317_CR32 – volume: 19 start-page: 761 issue: 4 year: 2015 ident: 10317_CR51 publication-title: Intelligent Data Analysis doi: 10.3233/IDA-150744 – volume: 22 start-page: 1405 year: 2017 ident: 10317_CR7 publication-title: Empirical Software Engineering doi: 10.1007/s10664-016-9461-5 – volume: 541 start-page: 64 issue: 115 year: 2007 ident: 10317_CR42 publication-title: Queen’s School of Computing TR – ident: 10317_CR52 doi: 10.1007/978-3-642-17819-1_7 – ident: 10317_CR2 – ident: 10317_CR36 doi: 10.1109/ICSM.2011.6080795 – ident: 10317_CR43 doi: 10.1002/smr.2265 – volume: 28 start-page: 654 issue: 7 year: 2002 ident: 10317_CR24 publication-title: IEEE Transactions on Software Engineering doi: 10.1109/TSE.2002.1019480 – ident: 10317_CR45 – volume: 27 start-page: 743 issue: 6 year: 2011 ident: 10317_CR33 publication-title: Future Generation Computer Systems doi: 10.1016/j.future.2010.07.005 – volume: 18 start-page: 1195 issue: 6 year: 2013 ident: 10317_CR13 publication-title: Empirical Software Engineering doi: 10.1007/s10664-012-9199-7 – ident: 10317_CR6 doi: 10.1007/978-3-030-62008-0_30 – ident: 10317_CR50 – ident: 10317_CR17 doi: 10.1007/978-981-10-3433-6_79 – ident: 10317_CR53 doi: 10.1145/2351676.2351725 – ident: 10317_CR49 doi: 10.5281/zenedo.7637703 – ident: 10317_CR40 – ident: 10317_CR31 doi: 10.1145/2452376.2452478 |
SSID | ssj0009745 |
Score | 2.3696315 |
Snippet | We consider the problem of identifying the provenance of free/open source software (FOSS) and specifically the need of identifying where reused source code has... |
SourceID | hal proquest crossref springer |
SourceType | Open Access Repository Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 107 |
SubjectTerms | Compilers Computer Science Interpreters Mathematical analysis Names Open source software Programming Languages Software Engineering Software Engineering/Programming and Operating Systems Software packages Source code |
SummonAdditionalLinks | – databaseName: ProQuest Central dbid: BENPR link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV1LSwMxEA7aXrz4Ft8E8abBZHebx0lULEW0iCh4W5LdBAVp1VbFf-_MbtaqYK-bDFlmkpkhmfk-QvaFC4Y7KVnIksCyhDtmneSso6VyLlVeCexGvurL3l12cd-5jxduo1hW2fjEylGXwwLvyI_AjUI4M0pkx88vDFmj8HU1UmjMkja4YK1bpH163r--mcDuqoqmGIH2WArRPbbNxOY5KTMGMYsh1QH46l-hafYBCyN_ZJ1_Hkqr-NNdJPMxcaQntaWXyIwfLJOFhpSBxjO6QnxVBEAhr6NvFTgr-jI6DLSG_qCPZV0fBGkfHQ9pGcthfCWBFwweITg8Slx_IrAAHYGn_rCvntYX_RS74FfJXff89qzHIpcCKyAmj5mxrtSiUB1QYcK9RJw3OJpeKOnL0nmDdOAeApop0hB4MKk1JZKh69IYq3W6RlqD4cCvE6oLndkkCIcFqE4FG7iV3BruDZJR6g0iGjXmRQQaR76Lp3wCkYyqz0H1eaX6HGQOvmWea5iNqbP3wDrfExEhu3dymeM3noGPgZTrXWyQ7cZ4eTyXo3yyi_4ZVpASQoYDSxw29p4M__9Hm9MX2yJzSb3RGBfbpDV-ffM7kMyM3W7csV8HzO35 priority: 102 providerName: ProQuest |
Title | Using the uniqueness of global identifiers to determine the provenance of Python software source code |
URI | https://link.springer.com/article/10.1007/s10664-023-10317-8 https://www.proquest.com/docview/2840079714 https://www.proquest.com/docview/2872506138 https://hal.science/hal-04101937 |
Volume | 28 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1LSwMxEA5WL158i_VRgnjTQLK7zeNYpbX4QsSCnpZkN0FBWrFV8d87sw-rooKnhd0ku8wkM98mM98QsidcMNxJyUISBZZE3DHrJGdtLZVzsfJKYDby-YXsD5KTm_ZNlRQ2rqPd6yPJwlJ_SnaTMmHgYxiWJgDb2iBzbfh3x0CuQdSZUu2qojQxkuuxGDx6lSrz8xhf3FHjDoMhPyHNb4ejhc_pLZGFCizSTqndZTLjhytksS7EQKt1uUp8cfBPAcvR54KQFe0XHQVa0n3Q-7yMCQKoRycjmlchML7ogZsKHmk3PPa4fEMyAToG6_xqnzwtN_cpZr6vkUGve33UZ1X9BJaBH54wY12uRabayqiIe4ncbrAcvVDS57nzBkuAe3BiJotD4MHE1uRYAF3nxlit43UyOxwN_QahOtOJjYJwGHTqVLCBW8mt4d5gAUrdJKIWY5pV5OJY4-IhndIio-hTEH1aiD6FPvsffR5Lao0_W--Cdj4aIit2v3OW4j2egF0BmPUimmS7Vl5arcVxCg4YhjRKJL88VgADAdXAKw5qfU8f__5Fm_9rvkXmo3LiMS62yezk6dnvAKCZuBZp6N5xi8x1jm9Pu3A97F5cXrWKWf0OY1LtjA |
linkProvider | Springer Nature |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1Nb9QwEB1tl0O5tHyKQgELwalY2EnWjg9VtQKWLd2temil3oyd2AIJ7bbdbVf9U_xGZvLRpUjdW69JnETjycyLPfMewHvpoxFeKR6zJPIsEZ47rwTv5Up7n-qgJXUjjw_V8CT7fto77cCftheGyirbmFgF6nJa0Br5JwyjmM6Mltne2Tkn1SjaXW0lNGq3OAjXC_xlm-3uf8H5_ZAkg6_Hn4e8URXgBWanOTfOl7ksdA9vloigiPEMnTRIrUJZ-mBIGDtgaDdFGqOIJnWmJFnwvDTG5XmK912DB1mKmZw60wffliS_uhJFJlo_niKWaJp0mlY9pTKOGZKTsAJmhluJcO0nlWH-g3H_25atst3gEWw0MJX1a796DJ0weQKbrQQEayLCUwhVyQFDFMkuKypYipxsGllNNMJ-lXU1EoJMNp-ysim-CdUIWs4IRPgRaMTRNdEYsBnmhYW7CKzeVmDUc_8MTu7Fxs-hO5lOwgtgeZFnLonSU7mr19FF4ZRwRgRD0pf5FsjWjLZoaM1JXeO3XRIyk-ktmt5Wprc4ZudmzFlN6rHy6nc4OzcXEh_3sD-ydExkGNEQ4F3JLdhuJ882UWBmlz57x2mNABTxFD7iYzvfy9N3v9HL1Q97C-vD4_HIjvYPD17Bw6R2Oi7kNnTnF5fhNcKouX9T-S6DH_f9sfwFVBgoPg |
linkToPdf | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwtV1NT9wwEB3BIlW90G8VSlurak-thZ1k7fhQVbSwWgpdraoicXPtxFaR0C5llyL-Gr-OmcRh20pw45rESTSezLzYM-8BvJU-GuGV4rHIIi8y4bnzSvB-qbT3uQ5aUjfyt5EaHhRfD_uHS3DZ9cJQWWUXE5tAXU8rWiPfxDCK6cxoWWzGVBYx3h58OvnNSUGKdlo7OY3WRfbCxTn-vs0-7m7jXL_LssHOjy9DnhQGeIWZas6N83UpK93HG2ciKGI_Q4cNUqtQ1z4YEskOGOZNlccoosmdqUkivKyNcWWZ432XYUXTX1EPVj7vjMbfF5S_upFIJpI_niOySC07qXFPqYJjvuQks4B54p-0uPyLijL_Qrz_bdI2uW_wEFYTaGVbrZc9gqUweQwPOkEIluLDEwhNAQJDTMnOGmJYiqNsGllLO8KO6rY2CSEnm09ZnUpxQjOCFjcC0X8EGjG-IFIDNsMsce5OA2s3GRh14D-Fgzux8jPoTaaT8BxYWZWFy6L0VPzqdXRROCWcEcGQEGa5BrIzo60SyTlpbRzbBT0zmd6i6W1jeotj3l-POWkpPm69-g3OzvWFxM493Nq3dEwUGN8Q7v2Ra7DRTZ5NMWFmFx58w2mNcBTRFT7iQzffi9M3v9H67Q97DffwQ7H7u6O9F3A_a32OC7kBvfnpWXiJmGruXyXnZfDzrr-XK2qELdA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Using+the+uniqueness+of+global+identifiers+to+determine+the+provenance+of+Python+software+source+code&rft.jtitle=Empirical+software+engineering+%3A+an+international+journal&rft.au=Sun%2C+Yiming&rft.au=German%2C+Daniel&rft.au=Zacchiroli%2C+Stefano&rft.date=2023-10-01&rft.pub=Springer+US&rft.issn=1382-3256&rft.eissn=1573-7616&rft.volume=28&rft.issue=5&rft_id=info:doi/10.1007%2Fs10664-023-10317-8&rft.externalDocID=10_1007_s10664_023_10317_8 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1382-3256&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1382-3256&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1382-3256&client=summon |