Will They Like This? Evaluating Code Contributions with Language Models

Popular open-source software projects receive and review contributions from a diverse array of developers, many of whom have little to no prior involvement with the project. A recent survey reported that reviewers consider conformance to the project's code style to be one of the top priorities...

Full description

Saved in:

Bibliographic Details
Published in	2015 IEEE/ACM 12th Working Conference on Mining Software Repositories pp. 157 - 167
Main Authors	Hellendoorn, Vincent J., Devanbu, Premkumar T., Bacchelli, Alberto
Format	Conference Proceeding
Language	English
Published	IEEE 01.05.2015
Subjects	code review Context Context modeling Data mining Entropy Java language model Mathematical model pull request Software
Online Access	Get full text
ISSN	2160-1852
DOI	10.1109/MSR.2015.22

Cover

More Information
Summary:	Popular open-source software projects receive and review contributions from a diverse array of developers, many of whom have little to no prior involvement with the project. A recent survey reported that reviewers consider conformance to the project's code style to be one of the top priorities when evaluating code contributions on Github. We propose to quantitatively evaluate the existence and effects of this phenomenon. To this aim we use language models, which were shown to accurately capture stylistic aspects of code. We find that rejected change sets do contain code significantly less similar to the project than accepted ones, furthermore, the less similar change sets are more likely to be subject to thorough review. Armed with these results we further investigate whether new contributors learn to conform to the project style and find that experience is positively correlated with conformance to the project's code style.
ISSN:	2160-1852
DOI:	10.1109/MSR.2015.22