Distributed vectorized representations of source code commits

Distributed vector representations of source code commits, are generated to become part of a data corpus for machine learning (ML) for analyzing source code. The code commit is received, and time information is referenced to split the source code into pre-change source code and post-change source co...

Full description

Saved in:
Bibliographic Details
Main Authors Sabetta, Antonino, Baumann, Arnaud, Cabrera Lozoya, Rocio, Bezzi, Michele
Format Patent
LanguageEnglish
Published 19.07.2022
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Distributed vector representations of source code commits, are generated to become part of a data corpus for machine learning (ML) for analyzing source code. The code commit is received, and time information is referenced to split the source code into pre-change source code and post-change source code. The pre-change source code is converted into a first code representation (e.g., based on a graph model), and the post-change source code into a second code representation. A first particle is generated from the first code representation, and a second particle is generated from the second code representation. The first particle and the second particle are compared to create a delta. The delta is transformed into a first commit vector by referencing an embedding matrix to numerically encode the first particle and the second particle. Following classification, the commit vector is stored in a data corpus for performing ML analysis upon source code.
Bibliography:Application Number: US202017080520