Distributed vectorized representations of source code commits
Distributed vector representations of source code commits, are generated to become part of a data corpus for machine learning (ML) for analyzing source code. The code commit is received, and time information is referenced to split the source code into pre-change source code and post-change source co...
Saved in:
Main Authors | , , , |
---|---|
Format | Patent |
Language | English |
Published |
19.07.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Distributed vector representations of source code commits, are generated to become part of a data corpus for machine learning (ML) for analyzing source code. The code commit is received, and time information is referenced to split the source code into pre-change source code and post-change source code. The pre-change source code is converted into a first code representation (e.g., based on a graph model), and the post-change source code into a second code representation. A first particle is generated from the first code representation, and a second particle is generated from the second code representation. The first particle and the second particle are compared to create a delta. The delta is transformed into a first commit vector by referencing an embedding matrix to numerically encode the first particle and the second particle. Following classification, the commit vector is stored in a data corpus for performing ML analysis upon source code. |
---|---|
Bibliography: | Application Number: US202017080520 |