ANGHABENCH: A Suite with One Million Compilable C Benchmarks for Code-Size Reduction

A predictive compiler uses properties of a program to decide how to optimize it. The compiler is trained on a collection of programs to derive a model which determines its actions in face of unknown codes. One of the challenges of predictive compilation is how to find good training sets. Regardless...

Full description

Saved in:
Bibliographic Details
Published in2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) pp. 378 - 390
Main Authors da Silva, Anderson Faustino, Kind, Bruno Conde, de Souza Magalhaes, Jose Wesley, Rocha, Jeronimo Nunes, Ferreira Guimaraes, Breno Campos, Quinao Pereira, Fernando Magno
Format Conference Proceeding
LanguageEnglish
Published IEEE 27.02.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:A predictive compiler uses properties of a program to decide how to optimize it. The compiler is trained on a collection of programs to derive a model which determines its actions in face of unknown codes. One of the challenges of predictive compilation is how to find good training sets. Regardless of the programming language, the availability of human-made benchmarks is limited. Moreover, current synthesizers produce code that is very different from actual programs, and mining compilable code from open repositories is difficult, due to program dependencies. In this paper, we use a combination of web crawling and type inference to overcome these problems for the C programming language. We use a type reconstructor based on Hindley-Milner's algorithm to produce ANGHABENCH, a virtually unlimited collection of real-world compilable C programs. Although ANGHABENCH programs are not executable, they can be transformed into object files by any C compliant compiler. Therefore, they can be used to train compilers for code size reduction. We have used thousands of ANGHABENCH programs to train YACOS, a predictive compiler based on LLVM. The version of YACOS autotuned with ANGHABENCH generates binaries for the LLVM test suite over 10% smaller than clang -Oz. It compresses code impervious even to the state-of-the-art Function Sequence Alignment technique published in 2019, as it does not require large binaries to work well.
DOI:10.1109/CGO51591.2021.9370322