Natural Language Insights from Code Reviews that Missed a Vulnerability A Large Scale Study of Chromium

Engineering secure software is challenging. Software development organizations leverage a host of processes and tools to enable developers to prevent vulnerabilities in software. Code reviewing is one such approach which has been instrumental in improving the overall quality of a software system. In...

Full description

Saved in:

Bibliographic Details
Published in	Engineering Secure Software and Systems pp. 70 - 86
Main Authors	Munaiah, Nuthan, Meyers, Benjamin S., Alm, Cecilia O., Meneely, Andrew, Murukannaiah, Pradeep K., Prud’hommeaux, Emily, Wolff, Josephine, Yu, Yang
Format	Book Chapter
Language	English
Published	Cham Springer International Publishing 24.06.2017
Series	Lecture Notes in Computer Science
Online Access	Get full text
ISBN	3319621041 9783319621043
ISSN	0302-9743 1611-3349
DOI	10.1007/978-3-319-62105-0_5

Cover

More Information
Summary:	Engineering secure software is challenging. Software development organizations leverage a host of processes and tools to enable developers to prevent vulnerabilities in software. Code reviewing is one such approach which has been instrumental in improving the overall quality of a software system. In a typical code review, developers critique a proposed change to uncover potential vulnerabilities. Despite best efforts by developers, some vulnerabilities inevitably slip through the reviews. In this study, we characterized linguistic features—inquisitiveness, sentiment and syntactic complexity—of conversations between developers in a code review, to identify factors that could explain developers missing a vulnerability. We used natural language processing to collect these linguistic features from 3,994,976 messages in 788,437 code reviews from the Chromium project. We collected 1,462 Chromium vulnerabilities to empirically analyze the linguistic features. We found that code reviews with lower inquisitiveness, higher sentiment, and lower complexity were more likely to miss a vulnerability. We used a Naïve Bayes classifier to assess if the words (or lemmas) in the code reviews could differentiate reviews that are likely to miss vulnerabilities. The classifier used a subset of all lemmas (over 2 million) as features and their corresponding TF-IDF scores as values. The average precision, recall, and F-measure of the classifier were 14%, 73%, and 23%, respectively. We believe that our linguistic characterization will help developers identify problematic code reviews before they result in a vulnerability being missed.
ISBN:	3319621041 9783319621043
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-319-62105-0_5