Source File Set Search for Clone-and-Own Reuse Analysis
Clone-and-own approach is a natural way of source code reuse for software developers. To assess how known bugs and security vulnerabilities of a cloned component affect an application, developers and security analysts need to identify an original version of the component and understand how the clone...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
26.04.2017
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Clone-and-own approach is a natural way of source code reuse for software
developers. To assess how known bugs and security vulnerabilities of a cloned
component affect an application, developers and security analysts need to
identify an original version of the component and understand how the cloned
component is different from the original one. Although developers may record
the original version information in a version control system and/or directory
names, such information is often either unavailable or incomplete. In this
research, we propose a code search method that takes as input a set of source
files and extracts all the components including similar files from a software
ecosystem (i.e., a collection of existing versions of software packages). Our
method employs an efficient file similarity computation using b-bit minwise
hashing technique. We use an aggregated file similarity for ranking components.
To evaluate the effectiveness of this tool, we analyzed 75 cloned components in
Firefox and Android source code. The tool took about two hours to report the
original components from 10 million files in Debian GNU/Linux packages. Recall
of the top-five components in the extracted lists is 0.907, while recall of a
baseline using SHA-1 file hash is 0.773, according to the ground truth recorded
in the source code repositories. |
---|---|
AbstractList | Clone-and-own approach is a natural way of source code reuse for software
developers. To assess how known bugs and security vulnerabilities of a cloned
component affect an application, developers and security analysts need to
identify an original version of the component and understand how the cloned
component is different from the original one. Although developers may record
the original version information in a version control system and/or directory
names, such information is often either unavailable or incomplete. In this
research, we propose a code search method that takes as input a set of source
files and extracts all the components including similar files from a software
ecosystem (i.e., a collection of existing versions of software packages). Our
method employs an efficient file similarity computation using b-bit minwise
hashing technique. We use an aggregated file similarity for ranking components.
To evaluate the effectiveness of this tool, we analyzed 75 cloned components in
Firefox and Android source code. The tool took about two hours to report the
original components from 10 million files in Debian GNU/Linux packages. Recall
of the top-five components in the extracted lists is 0.907, while recall of a
baseline using SHA-1 file hash is 0.773, according to the ground truth recorded
in the source code repositories. |
Author | Inoue, Katsuro Ito, Kaoru Sakaguchi, Yusuke Ishio, Takashi |
Author_xml | – sequence: 1 givenname: Takashi surname: Ishio fullname: Ishio, Takashi – sequence: 2 givenname: Yusuke surname: Sakaguchi fullname: Sakaguchi, Yusuke – sequence: 3 givenname: Kaoru surname: Ito fullname: Ito, Kaoru – sequence: 4 givenname: Katsuro surname: Inoue fullname: Inoue, Katsuro |
BackLink | https://doi.org/10.48550/arXiv.1704.08395$$DView paper in arXiv |
BookMark | eNotj81OwzAQhH2AAxQegBN-AQevf5NjFVFAqlSJ9h5tvVsRKTjIoUDfnlJ6GM1pPs13LS7ymFmIO9CVq73XD1h--q8KonaVrm3jr0Rcj_uSWC76geWaP4_Bkt7kbiyyHY5zhZnU6jvLV95PLOcZh8PUTzficofDxLfnnonN4nHTPqvl6umlnS8VhuiVaUADJmtCjFgH8syNIeSGtoSGiGzyHp2rkycH6AAgIIQEBrZkPNiZuP_Hnp53H6V_x3Lo_gy6k4H9BSRcQSk |
ContentType | Journal Article |
Copyright | http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | AKY GOX |
DOI | 10.48550/arxiv.1704.08395 |
DatabaseName | arXiv Computer Science arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 1704_08395 |
GroupedDBID | AKY GOX |
ID | FETCH-LOGICAL-a675-29101ac32677a86d5ee92dae9dbda2ddd3c55a448c5d41a41116a16c121bd2513 |
IEDL.DBID | GOX |
IngestDate | Mon Jan 08 05:46:43 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a675-29101ac32677a86d5ee92dae9dbda2ddd3c55a448c5d41a41116a16c121bd2513 |
OpenAccessLink | https://arxiv.org/abs/1704.08395 |
ParticipantIDs | arxiv_primary_1704_08395 |
PublicationCentury | 2000 |
PublicationDate | 2017-04-26 |
PublicationDateYYYYMMDD | 2017-04-26 |
PublicationDate_xml | – month: 04 year: 2017 text: 2017-04-26 day: 26 |
PublicationDecade | 2010 |
PublicationYear | 2017 |
Score | 1.6669431 |
SecondaryResourceType | preprint |
Snippet | Clone-and-own approach is a natural way of source code reuse for software
developers. To assess how known bugs and security vulnerabilities of a cloned... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Software Engineering |
Title | Source File Set Search for Clone-and-Own Reuse Analysis |
URI | https://arxiv.org/abs/1704.08395 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NSwMxEB3anryIolI_ycFr0GST7OYoxbUIWrAV9rbkYxYEqdJurT_f2Y-iFy85JLnkheS9SSYvANeJcMqhMVwmVeCk_5HbylIRHHplVSWz5jXy07OZvqrHQhcDYLu3MG71_fbV-QP79Y1IGxtS4nA9hKGUTcrWw6zoLidbK66-_28_0pht1R-SyA9gv1d37K6bjkMY4PII0nl7QM5yWoBsjjXrUnwZyUU2ef9YIqdons-2S_aCmzWynU_IMSzy-8Vkyvv_Crgj2c0lMa9wgfRQmrrMRI1oZXRoo49OxhiToLUjOIKOiiCiXcY4YYKQwkeSGckJjCjkxzEwEazIgk0JXqeyqL3WFEz60OSCaqLkUxi3oyw_O0uKsgGgbAE4-7_pHPZkQ0q3iktzAaN6tcFLotTaX7W4_gA6PnUp |
link.rule.ids | 228,230,783,888 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Source+File+Set+Search+for+Clone-and-Own+Reuse+Analysis&rft.au=Ishio%2C+Takashi&rft.au=Sakaguchi%2C+Yusuke&rft.au=Ito%2C+Kaoru&rft.au=Inoue%2C+Katsuro&rft.date=2017-04-26&rft_id=info:doi/10.48550%2Farxiv.1704.08395&rft.externalDocID=1704_08395 |