Analyzing Linux on a Supercomputer

The C preprocessor, a key element of the language, has become a liability due to its lack of integration with modern language semantics. This column describes the analysis of the C preprocessor usage in the Linux kernel, comprising 20 million lines of code, using the CScout refactoring browser. Proc...

Full description

Saved in:

Bibliographic Details
Published in	IEEE software Vol. 42; no. 2; pp. 18 - 23
Main Author	Spinellis, Diomidis
Format	Journal Article
Language	English
Published	Los Alamitos IEEE 01.03.2025 IEEE Computer Society
Subjects	Codes Data integration Language preprocessors Linux Nodes Parallel processing Semantics Supercomputers
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The C preprocessor, a key element of the language, has become a liability due to its lack of integration with modern language semantics. This column describes the analysis of the C preprocessor usage in the Linux kernel, comprising 20 million lines of code, using the CScout refactoring browser. Processing limitations led to a solution leveraging a supercomputer’s parallel processing capabilities. The analysis divided the kernel’s source files across 32 supercomputer nodes and implemented a binary tournament database merging strategy. Initial efforts revealed multiple difficulties. Resolving them involved several false starts involving recursive SQL statements, an SQLite extension, and the GraphViz connected components tool. After a number of redesigns guided by stress-testing, the analysis finished in just 32 hours rather than a week, using 374 CPU hours and 640 GiB RAM on the supercomputer’s nodes.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	0740-7459 1937-4194
DOI:	10.1109/MS.2024.3512732