Dataset: Copy-based Reuse in Open Source Software
In Open Source Software, the source code and any other resources available in a project can be viewed or reused by anyone subject to often permissive licensing restrictions. In contrast to some studies of dependency-based reuse supported via package managers, no studies of OSS-wide copy-based reuse...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , |
Format | Paper Journal Article |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
14.12.2023
|
Subjects | |
Online Access | Get full text |
ISSN | 2331-8422 |
DOI | 10.48550/arxiv.2312.09370 |
Cover
Summary: | In Open Source Software, the source code and any other resources available in a project can be viewed or reused by anyone subject to often permissive licensing restrictions. In contrast to some studies of dependency-based reuse supported via package managers, no studies of OSS-wide copy-based reuse exist. This dataset seeks to encourage the studies of OSS-wide copy-based reuse by providing copying activity data that captures whole-file reuse in nearly all OSS. To accomplish that, we develop approaches to detect copy-based reuse by developing an efficient algorithm that exploits World of Code infrastructure: a curated and cross referenced collection of nearly all open source repositories. We expect this data to enable future research and tool development that support such reuse and minimize associated risks. |
---|---|
Bibliography: | SourceType-Working Papers-1 ObjectType-Working Paper/Pre-Print-1 content type line 50 |
ISSN: | 2331-8422 |
DOI: | 10.48550/arxiv.2312.09370 |