Compilation Quotient (CQ): A Metric for the Compilation Hardness of Programming Languages (2406.04778v1)

Published 7 Jun 2024 in cs.PL and cs.SE

Abstract: Today's programmers can choose from an exceptional range of programming languages, each with its own traits, purpose, and complexity. A key aspect of a language's complexity is how hard it is to compile programs in the language. While most programmers have an intuition about compilation hardness for different programming languages, no metric exists to quantify it. We introduce the compilation quotient (CQ), a metric to quantify the compilation hardness of compiled programming languages. The key idea is to measure the compilation success rates of programs sampled from context-free grammars. To this end, we fairly sample over 12 million programs in total. CQ ranges between 0 and 100, where 0 indicates that no programs compile, and 100 means that all programs compile. Our findings on 12 popular compiled programming languages show high variation in CQ. C has a CQ of 48.11, C++ has 0.60, Java has 0.27 and Haskell has 0.13. Strikingly, Rust's CQ is nearly 0, and for C, even a large fraction of very sizable programs compile. We believe CQ can help understand the differences of compiled programming languages better and help language designers.

Summary

The paper introduces the Compilation Quotient (CQ) metric that quantifies the ratio of successfully compiled programs to total samples.
It uses systematic sampling of over 12 million programs across 12 popular languages to derive empirical compilation complexities.
Findings highlight significant differences, such as C’s high CQ versus Rust’s low CQ, with implications for language design and compiler testing.

Overview of Compilation Quotient: A Metric for Compilation Hardness of Programming Languages

The paper "Compilation Quotient (CQ): A Metric for the Compilation Hardness of Programming Languages" introduces a novel metric called the Compilation Quotient (CQ), which quantifies the complexity of compiling programs written in various programming languages. The authors propose this metric to capture a key aspect of language complexity that has been largely intuitive until now.

Definition and Methodology

The CQ metric ranges between 0 and 100—a CQ value of 0 denotes that none of the sampled programs compile, while a value of 100 signifies that all sampled programs compile. To compute CQ, the authors followed these systematic steps:

Translation of context-free grammars of the selected programming languages into algebraic datatypes.
Sampling a large number of programs using the existing property-based tester FEAT.
Forwarding these programs to the respective language compilers and capturing the compilation outcomes.
Calculating the CQ by computing the ratio of successfully compiled programs to the total number of samples.

The paper spans twelve popular compiled programming languages: C, C++, Java, C#, Kotlin, Haskell, Rust, Fortran, COBOL, Go, Swift, and Erlang. Over 12 million programs were sampled in total, with a meticulous approach to ensure fair representation across languages.

Empirical Results

The empirical investigation revealed substantial variability in CQ across the evaluated languages:

C exhibited the highest CQ at 48.11.
Rust displayed a remarkably low CQ at 0.0004.
Object-oriented languages like C#, Java, C++, and Kotlin ranked relatively high, with CQ values of 1.691, 0.265, 0.598, and 0.308, respectively.
Functional languages, Haskell (0.128) and Erlang (6.511), showed varying results, indicating different levels of compilation hardness.
Legacy languages like Fortran and COBOL had lower CQ, at 0.033 and 0.032 respectively.

Furthermore, the LCQ (Local Compilation Quotient), which provides a more granular analysis of the CQ by explicitly considering program size, demonstrated unique patterns. For instance, C's LCQ did not converge to zero even as program size increased, thereby underscoring its distinctive compilation ease compared to other languages where LCQ trends towards zero for larger programs.

Language-Specific Insights

The detailed analysis of code examples for each language revealed intrinsic language features impacting CQ:

C: Dominated by pointer declarations that are flexible and often compile successfully, contributing to its high CQ.
Rust: Strict type system and expressions that have a high likelihood of generating type errors result in a low CQ.
C++: The presence of namespaces, templates, and operator functions reduce CQ due to the likelihood of errors in complex expressions.
Java and C#: Java’s lower CQ was attributed to its strict type-checking in expressions, whereas C# demonstrated higher CQ thanks to simpler declaration constructs.
Erlang and Haskell: Erlang's dynamically typed system showed a higher CQ, whereas Haskell’s statically typed system caused more frequent compilation failures.

Implications and Future Work

The implications of CQ are profound. For language designers, CQ offers a quantifiable measure to gauge language complexity, aiming to enhance productivity by minimizing compilation errors. In terms of long-term language adoption, popular languages tend to have higher CQ, possibly aiding their persistence and adoption. For compiler testing, CQ could serve as a proxy for the effectiveness of fuzz-testing strategies, with higher CQ languages being more accessible for extensive testing.

The paper points out potential future directions, such as extending CQ to interpreted languages, generating more complex multi-function programs, and exploring methods to quantify error correction difficulty across languages.

Conclusion

The introduction of CQ by Szabó, Winterer, and Su marks a significant step in quantifying the often subjective notion of compilation hardness. By providing a framework for empirical analysis, the paper lays the groundwork for a deeper understanding of programming languages' compilation complexities, thus offering tangible benefits to language designers, programmers, and the compiler testing community.

PDF Markdown

Related Papers

Tweets

https://twitter.com/zetalyrae/status/1801651310808240624

https://twitter.com/unlikelydoorway/status/1808162929625169933

HackerNews

A Metric for the Compilation Hardness of Programming Languages (3 points, 0 comments)