Dataset: Copy-based Reuse in Open Source Software

In Open Source Software, the source code and any other resources available in a project can be viewed or reused by anyone subject to often permissive licensing restrictions. In contrast to some studies of dependency-based reuse supported via package managers, no studies of OSS-wide copy-based reuse...

Full description

Saved in:
Bibliographic Details
Published inProceedings (IEEE/ACM International Conference on Mining Software Repositories. Online) pp. 42 - 47
Main Authors Jahanshahi, Mahmoud, Mockus, Audris
Format Conference Proceeding
LanguageEnglish
Published ACM 15.04.2024
Subjects
Online AccessGet full text
ISSN2574-3864
DOI10.1145/3643991.3644868

Cover

Abstract In Open Source Software, the source code and any other resources available in a project can be viewed or reused by anyone subject to often permissive licensing restrictions. In contrast to some studies of dependency-based reuse supported via package managers, no studies of OSS-wide copy-based reuse exist. This dataset seeks to encourage the studies of OSS-wide copy-based reuse by providing copying activity data that captures whole-file reuse in nearly all OSS. To accomplish that, we develop approaches to detect copybased reuse by developing an efficient algorithm that exploits World of Code infrastructure: a curated and cross referenced collection of nearly all open source repositories. We expect this data will enable future research and tool development that support such reuse and minimize associated risks.CCS CONCEPTS*Software and its engineering → Software creation and management; * General and reference → Empirical studies.
AbstractList In Open Source Software, the source code and any other resources available in a project can be viewed or reused by anyone subject to often permissive licensing restrictions. In contrast to some studies of dependency-based reuse supported via package managers, no studies of OSS-wide copy-based reuse exist. This dataset seeks to encourage the studies of OSS-wide copy-based reuse by providing copying activity data that captures whole-file reuse in nearly all OSS. To accomplish that, we develop approaches to detect copybased reuse by developing an efficient algorithm that exploits World of Code infrastructure: a curated and cross referenced collection of nearly all open source repositories. We expect this data will enable future research and tool development that support such reuse and minimize associated risks.CCS CONCEPTS*Software and its engineering → Software creation and management; * General and reference → Empirical studies.
Author Jahanshahi, Mahmoud
Mockus, Audris
Author_xml – sequence: 1
  givenname: Mahmoud
  surname: Jahanshahi
  fullname: Jahanshahi, Mahmoud
  email: mjahansh@vols.utk.edu
  organization: University of Tennessee,Knoxville,USA
– sequence: 2
  givenname: Audris
  surname: Mockus
  fullname: Mockus, Audris
  email: audris@utk.edu
  organization: University of Tennessee Vilnius University,Knoxville,USA
BookMark eNotzEtLxDAUBeAoCo5j125c9A9kvGlybxJ3Up8wMOBjPSTpDRS0LW0HmX9vQVffORw4l-Ks6zsW4lrBRimDt5qM9l5tFo0jdyIKb70zABbQWXcqVhVaI7UjcyGKaWojIFWVR-NXQj2EOUw835V1PxxlXHJTvvFh4rLtyt3AXfneH8bEC3n-CSNfifMcviYu_l2Lz6fHj_pFbnfPr_X9VobKuFlqomwxKmN1YxEo5kgOUmpcakJmwxVRs0yEyiSM2gPpTOwha7-UpNfi5u-3Zeb9MLbfYTzuFSCiBdC_mZdFGw
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3643991.3644868
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798400705878
EISSN 2574-3864
EndPage 47
ExternalDocumentID 10555700
Genre orig-research
GrantInformation_xml – fundername: National Science Foundation
  funderid: 10.13039/100000001
GroupedDBID 6IE
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
OCL
RIE
RIL
ID FETCH-LOGICAL-a248t-366f75b1473d7506bfb680ccd8cdafe4e266d3d76514c5b39063f6e90f39390c3
IEDL.DBID RIE
IngestDate Thu May 08 06:04:09 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a248t-366f75b1473d7506bfb680ccd8cdafe4e266d3d76514c5b39063f6e90f39390c3
PageCount 6
ParticipantIDs ieee_primary_10555700
PublicationCentury 2000
PublicationDate 2024-April-15
PublicationDateYYYYMMDD 2024-04-15
PublicationDate_xml – month: 04
  year: 2024
  text: 2024-April-15
  day: 15
PublicationDecade 2020
PublicationTitle Proceedings (IEEE/ACM International Conference on Mining Software Repositories. Online)
PublicationTitleAbbrev MSR
PublicationYear 2024
Publisher ACM
Publisher_xml – name: ACM
SSID ssib056229549
ssj0003211714
Score 1.8761829
Snippet In Open Source Software, the source code and any other resources available in a project can be viewed or reused by anyone subject to often permissive licensing...
SourceID ieee
SourceType Publisher
StartPage 42
SubjectTerms Codes
Copy-based Reuse
Data mining
Open source software
Reuse
Software algorithms
Software Development
Software Supply Chain
Source coding
World of Code
Title Dataset: Copy-based Reuse in Open Source Software
URI https://ieeexplore.ieee.org/document/10555700
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NSwMxEA22J08qVvwmB6_ZNs3H7nqtluKhiFrorWSTCYiwLTVL0V9vJm2tCIKnzW4uCcnmzWTmvSHkRqIfLR1n3IFgsuQm_lJaMEDzwWo8EJPa51iPJvJhqqYbsnriwgBASj6DDJsplu_mtsGrsi4Wc0Q99hZpxX22JmttN0_E8V3ICo9hEV2bnMuNnA-XqisQfUueCfRJUFv1Rz2VBCfDAzLeDmSdRfKWNaHK7OcvjcZ_j_SQdHbMPfr4jUlHZA_qY8LvTIhgFW7pYL74YAhcjj5B8w70taaYUkKf0x1-fPiwMkvokMnw_mUwYptSCcz0ZRGY0NrnquIyFy7aALrylS56Fon_zniQEHHYxS4d7SOrqrhAWngNZc-LMr5YcULa9byGU0LL3IGyhbdQIG02N8oaqwXqwCsvvDkjHZzvbLFWw5htp3r-x_cLst-PhgBGYLi6JO2wbOAqAnmortMCfgFLvpkY
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NSwMxEA1aD3pSsaL1Kwevu22aj931Wi1VaxFtobeSTSYgwrbULKK_3sy2tSIInja7uSQkmzeTmfeGkEuBfrSwLGIWeCQypsMvpXgEaD4YhQdipfY5UL2RuBvL8ZKsXnFhAKBKPoMYm1Us305NiVdlTSzmiHrsm2QrAL-QC7rWavsEJF8HrfAg5sG5SZhYCvowIZsc8TdjMUevBNVVf1RUqQClu0sGq6Es8khe49Lnsfn8pdL477Hukfqau0cfv1Fpn2xAcUDYtfYBrvwV7UxnHxFCl6VPUL4BfSkoJpXQ5-oWPzycf9dzqJNR92bY6UXLYgmRbovUR1wpl8iciYTbYAWo3OUqbRmk_lvtQEBAYhu6VLCQjMzDEinuFGQtx7PwYvghqRXTAo4IzRIL0qTOQIrE2URLo43iqAQvHXf6mNRxvpPZQg9jsppq44_vF2S7N3zoT_q3g_sTstMOZgHGY5g8JTU_L-EswLrPz6vF_AKGfpxl
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE%2FACM+International+Conference+on+Mining+Software+Repositories.+Online%29&rft.atitle=Dataset%3A+Copy-based+Reuse+in+Open+Source+Software&rft.au=Jahanshahi%2C+Mahmoud&rft.au=Mockus%2C+Audris&rft.date=2024-04-15&rft.pub=ACM&rft.eissn=2574-3864&rft.spage=42&rft.epage=47&rft_id=info:doi/10.1145%2F3643991.3644868&rft.externalDocID=10555700