Separate compilation units vs Single Compilation unit for faster compilation, linking, and optimised code?

Question

There are several questions which talk about why we should have separate compilation units so improve compile times (for example, not including any code in the hpp file,but only in the cpp files).

But then I found this question:

#include all .cpp files into a single compilation unit?

If we can ignore the question of maintainability, if we can just look at compile / link times, as well as optimising code, what would be the benefits and pitfalls of having just one hpp and cpp file ?

Note that the post i linked to talks about a single cpp file (while there are many header files). I'm asking what happens if we just have one hpp file and one cpp file.....

EDIT: if we can ignore the fact that changing a single line, will cause the entire code to be recompiled, will it still be faster than if 1000's of separate files are recompiled from scratch...

EDIT: I am not interested in a discussion about maintainability. I'm trying to understand what makes a compiler compile faster. This question has nothing to do with what is practical, but more to do with just understanding a simple matter:

Will one large hpp & cpp file, compile faster than if the code was split across many hpp and cpp files, using a single core.

EDIT: I think people are getting sidetracked and talking about what is practical and what one SHOULD do. This question is not about what one SHOULD do - it is simply to help me understand what the compiler is doing under the hood - until now no one has answered that question, and instead is talking about whether it is practical or not.

EDIT: Besides the one person who actually tried to answer this question - I feel this question hasn't got the justice it deserved and is being unnecessarily down voted. SO is about sharing information, not punishing questions because people asking don't already know the answer.

C++ compilers work at file-level granularity: If I have a million-line program split up into 1000 files, and I modify one line in of those files, then only a small part (1/1000th?) of my program needs to be recompiled. If all million lines of my program are located in a single file, and I modify one line in that file, all 1 million lines will need to be recompiled. Multiply that by the number of times you're likely to want to modify-then-recompile-then-test your program, and you can see why keeping everything in one file is going to get impractical for any non-trivial program. — Jeremy Friesner, Sep 12 '16 at 04:18
This depends on a lot of things. What optimization level? Is link-time optimization enabled? What about ThinLTO? Does the code make heavy use of templates? Are you compiling in parallel? If so, how many cores are you utilizing? Do you have a fast SSD or a slow HDD? And other factors. — Cornstalks, Sep 12 '16 at 04:19
@JeremyFriesner I see what you're saying - but assuming that all my code would have to be recompiled from scratch anyway, would the "one-file" approach be quicker that the "1000-file" approach.... — Rahul Iyer, Sep 12 '16 at 04:21
John what makes **all your code** to be recompiled **every** time you run make in case of several files? — Serge, Sep 12 '16 at 04:23
There are situations you may not be able to do this. E.g. the C-style static variables - owned by each compilation unit, with a single compilation unit you either collapse them under a single declaration and have all the compilation units share the same instance (which defies the purpose) or you won't be able to compile. — Adrian Colomitchi, Sep 12 '16 at 04:28
*At least* `gnumake` is able to work in parallel - other compilers may be able too. While this alleviates the compilation time, linking stage stays the same. — Adrian Colomitchi, Sep 12 '16 at 04:31
@Serge there is nothing preventing me from doing anything. I'm just trying to understand how things work. — Rahul Iyer, Sep 12 '16 at 04:33
@AdrianColomitchi I feel we're getting side tracked. I would prefer to keep to discussion simply about compilation times and assume that all the code can be kept in just one hpp file & cpp file. — Rahul Iyer, Sep 12 '16 at 04:34
"assume that all the code can be kept in just one hpp file & cpp file" just don't. Even a proponent of this method doesn't do "all in one hpp and one cpp" file, but [#includes them in a "unity file"](https://cheind.wordpress.com/2009/12/10/reducing-compilation-time-unity-builds/) *only for continuous integration builds*. Quoting from linked "we do not use UB for our daily developer work, as minor changes quite frequently cause a rebuild of the entire unity." Even when doing it, the risk of running out of memory, disk trashing and overall slowdown is present: he may use multiple UB-es — Adrian Colomitchi, Sep 12 '16 at 04:41
@John Assuming a full recompile (which is not the usual case during development, but for the sake of argument), it might be slightly faster, but only if your computer has sufficient memory; if compiling a million-line file causes your computer to exhaust physical RAM, then your computer will start swapping to disk and the compile will be very, very slow compared to compiling smaller separate files that do not exhaust RAM. — Jeremy Friesner, Sep 12 '16 at 04:41
@John as you mentioned you are free to do whatever you want. If you like to put everything into a single file - do it. In terms of compilation speed it is not the same: at some point when the overhead time spent by compiler to lookup symbols in its internal tables would exceed the time cost of starting several copies of compiler then the compile time of a single file would be greater than a compile time of several files — Serge, Sep 12 '16 at 04:42
I think people here are getting side tracked by the practical problems they have faced which in cases may be specific to their issues (not incorrect, just doesn't have much to do with my question). My question is theoretical, as I just want to understand what the compiler is actually doing, and no one has provided a comprehensive answer. My question has nothing to do with RAM, CPU or limitations of the machine, or a specific project with a 1000 lines of code or a million. I want to know what happens in both cases (one file vs many files). It would be great if someone can answer my question.. — Rahul Iyer, Sep 12 '16 at 04:54
One file: the compiler is started once. Many files: the compiler is started several times. Many header files: the each copy of compiler opens many header files. Is not this trivial? — Serge, Sep 12 '16 at 04:56
@Serge This might be trivial to you because you know it. I don't have much C++ background which is why I asked the question. Most C++ programmers who respond here are more experienced than people in other SO tags, and assume everyone here knows as much, which is really not the case. This is actually a simple question which no one has answered properly. — Rahul Iyer, Sep 12 '16 at 04:58
John, you post a question that you had to edit 4 times. You asking what effects the compiler performance. I answered your question (in the comment above) why compiler may be faster or slower due to the amount of code it has to compile and why exactly. You say you are not asking about number of lines and how that may effect the performance. Do you understand yourself what you are trying to knew? — Serge, Sep 12 '16 at 05:02
"I don't have much C++ background which is why I asked the question." [there you have one](http://read.pudn.com/downloads120/ebook/511418/inside.the.c%2B%2B.object.model.pdf) - about 180 pages of "Inside the C++ Object Model", probably others exists. Do you think one can summarize for you on SO? — Adrian Colomitchi, Sep 12 '16 at 05:03
@Serge " Do you understand yourself what you are trying to knew?" no he doesn't. It seems that he's after an answer that is clear and simple, good or wrong be damned. It's typical with the beginners, not knowing what they don;t know. The best we can do is to feed him references and wish him good luck on his path as the next super-compiler author. — Adrian Colomitchi, Sep 12 '16 at 05:08
@Serge I edited it because you don't seem to understand. On any other day, someone else who sees this question may have thought differently. This argument is not a productive use of anyones time, and all I can say is that SO is not a friendly place for people who as simple questions. — Rahul Iyer, Sep 12 '16 at 05:08
I will quote it for you: "In terms of compilation speed it is not the same: at some point when the overhead time spent by compiler to lookup symbols in its internal tables would exceed the time cost of starting several copies of compiler then the compile time of a single file would be greater than a compile time of several files". — Serge, Sep 12 '16 at 05:10
@AdrianColomitchi Just because I don't find your answer helpful to me there is no reason to go off topic or be sarcastic.. SO is not just for people who have all the answers. — Rahul Iyer, Sep 12 '16 at 05:11
"SO is not a friendly place for people who as simple questions." John, understand this: not every simple question has a simple answer which is still a good one. "How brain works" is a simple question, do you get this? — Adrian Colomitchi, Sep 12 '16 at 05:11
@AdrianColomitchi And while I may not have as much experience as many with C++, I do with other programming languages, and would rather help anyone I can, instead of an unproductive argument. — Rahul Iyer, Sep 12 '16 at 05:12
This question is slightly off-topic on StackOverflow. http://programmers.stackexchange.com/ could have been a better place for it. — Basile Starynkevitch, Sep 12 '16 at 05:12
@AdrianColomitchi While I appreciate the time you spent trying to answer my question, I think you're going off topic for no reason. Basile gave me a useful answer. — Rahul Iyer, Sep 12 '16 at 05:14
John, I don't give a damned p..s if you consider my answer useful or not, I offered a comment (not even answer) to you, up to you what you are doing with it. The sooner you understand that not everything accepts a simple answer, the less time you will waste from digging into the complex ones and growing up. — Adrian Colomitchi, Sep 12 '16 at 05:15
@BasileStarynkevitch I think thats a thin line. I wasn't asking about a specific compiler (which you talked about). If it was a compiler specific question then programmers would have been a better place. — Rahul Iyer, Sep 12 '16 at 05:16
@AdrianColomitchi: it looks like you did not gave any answers, but just comments. — Basile Starynkevitch, Sep 12 '16 at 05:16
@BasileStarynkevitch Yes, because I admitted from the very first I don't get what he's after. Once I got it, I offered what I thought may be a point for him to start digging - that was not an answer. — Adrian Colomitchi, Sep 12 '16 at 05:20
@BasileStarynkevitch when referring other sites, it is often helpful to point that [cross-posting is frowned upon](http://meta.stackexchange.com/tags/cross-posting/info) — gnat, Sep 12 '16 at 06:41

Basile Starynkevitch · Accepted Answer · 2018-07-18T07:38:14.393

It is compiler-specific, and depends upon the optimizations you are asking from your compiler.

Most recent free software C++11 (or C++14) compilers are able to do link-time optimization : both recent GCC & Clang/LLVM are accepting the -flto flag (for link time optimiation...). To use it you should compile and link your code with it, and some additional (same) optimization flags. A typical use thru the make builder could be:

make 'CXX=g++ -flto -O2'

or, in separate commands:

g++ -flto -O2 -Wall -I/usr/local/include -c src1.cc
g++ -flto -O2 -Wall -I/usr/local/include -c src2.cc
g++ -flto -O2 -Wall src1.o src2.o -L/usr/local/lib -lsome -o binprog

^{Don't forget -flto -O2 at link time !}

Then the code is compiled nearly the same as if you put all src1.cc & src2.cc in the same compilation unit. In particular, the compiler is able to (and sometimes will) inline a call from a function in src1.cc to a function in src2.cc

What happens under the hoods with -flto (with GCC, but in principle it is similar in Clang) is that the compiler is putting some intermediate representation (in some Gimple/SSA form) of your source code in each object file. At "link-time" (actually done also by the compiler, not only the linker) this intermediate representation is reloaded and processed and recompiled for the entire program. So the compilation time nearly doubles.

So -flto is slowing the compilation (approximately by a factor of 2) and might sometimes give a few percents of performance improvement (execution time of the produced binary). Hence I almost never use it.

I'm trying to understand what makes a compiler compile faster.

This is compiler specific, and depends a lot with the optimizations you are asking from it. Using a recent GCC5 or GCC6, with g++ -O2 (and IIRC also with clang++ -O2) by practical and empirical measure the compilation time is proportional not only to the total size of the compilation unit (e.g. the number of tokens or size/volume of AST produced after preprocessing & include & macro expansions, and even template expansion) but also to the square of the size of the biggest function. A possible explanation is related to the time complexity of register allocation and instruction scheduling. Notice that the standard headers of the C++11 or C++14 containers are expanded to something quite big (e.g #include <vector> gives about ten thousand lines). BTW, compiling with g++ -O0 is faster than with g++ -O1 faster that g++ -O2. And asking for debug information (e.g. g++ -g2) slows down the compiler. So g++ -O1 -g2 is a slower compilation that g++ -O0 (which would produce a slower executable).

Precompiled headers might help reducing the compilation time (but not always!). You would have a single common header, and you'll better have not too small compilation units: total compilation time is slightly faster with 20 *.cc files of about two thousand lines each than with 200 *.cc files of two hundred lines each (notably because header files expand to many tokens). I generally recommend having at least a thousand lines per *.cc file if possible, so having just one small file of a hundred lines per class implementation is often a bad idea (in terms of overall compilation time). For a tiny project of e.g. 4KLOC having a single source file is quite sensible.

Notice also that C++ template expansion happens very "syntactically" (there are no modules yet in C++; Ocaml modules & functors are much better in that aspect). In other words vector<map<string,long>> is "expanded" (and is as compile-time consuming...) almost as if <vector> and <map & <string> standard headers have been inserted at the first occurrence of vector<map<string,long>> ... Template expansion is somehow an internal rewriting of ASTs. So a vector<map<string,set<long>>> requires -on its first occurrence- a lot of compiler work and nearly the same amount of work would have to be done for the "similar" vector<map<string,set<double>>>

Of course, several compilation units could be compiled in parallel, e.g. with make -j

To understand where a given GCC compilation is passing compiler time, pass -ftime-report to g++, see this. To be scared by the complexity of internal GCC representations, try -fdump-tree-all once.

To speedup your overall compilation time (with a focus on Linux systems with GCC; but you could adapt my answer to your system):

have a parallel build (e.g. make -j would run several g++ processes in parallel, perhaps one per translation unit e.g. per *.cc file). Learn to write good enough Makefile-s.
consider having one common header file and pre-compile your header (but that might slow down the compilation time, you need to benchmark); if you keep several header files (and there are many good reasons to do so), avoid having too many tiny header files and prefer having fewer, but bigger, ones. A single common precompiled header file of nearly ten thousand lines is not unusual (and you might #include several other files in it).
consider having larger source files, e.g. having 20 source files of 2000 lines each might compile faster than 200 source files of 200 lines each (because with many small source files, preprocessing & template expansion is more repeated) and I do sometimes have source files of nearly ten thousand lines. however you'll often do an incremental build (and then that could be false, YMMV and you need to benchmark)
disable optimizations, or lower the optimization level, so compile with g++ -O0 or g++ -O1 instead of g++ -O2 and avoid -flto. In many (but not all) cases, g++ -O3 -with or without -flto ...- is not worth the effort (it compiles slower, but the resulting machine code is not significantly faster than g++ -O2). But YMMV. Some numerical computations profit a lot from -O3. You could consider using function-specific pragmas or attributes to optimize some functions more than others in the same *.cc source file.
disable debugging info, or lower it, so compile with g++ -O1 instead of g++ -01 -g2; but higher debugging info (e.g. g++ -g3) is very useful to the gdb debugger so YMMV.
you could disable warnings, but that is not worth the trouble. On the contrary, always enable all of them, so at least -Wall to g++ and probably also -Wextra and be sure that your code compiles without warnings.
avoid using too much nested templates, like e.g. std::set<std::vector<std::map<std::string,long>>> ; in some cases having opaque pointers and using the PIMPL idiom could help. You might then only include some extra headers (e.g. for containers) in some *.cc and not all of them (but this is incompatible with precompiled headers, so YMMV).

Some compilers or versions are slightly faster than others. So you could prefer clang++ to g++. I do recommend using several compilers (with warnings enabled). Be scared of undefined behavior in your code.

Notice that C++ is unlike Java: you can and often should have several classes or functions per file. Again YMMV.

PS. See (the slides and documentations and follow the many links on) starynkevitch.net/Basile/gcc-melt for more about GCC internals. I have abandoned GCC MELT in 2018 but the slides are still useful.

An anecdote just to illustrate even more compiler-specific stuff: I have found that the Borland compiler, at least circa 2010, doesn't have much measurable difference between many vs one compilation unit, but really chokes on source files with a lot of function *calls* (dunno why). Striking example for me was a generated source file with about 20k lines of the form `object->function(x, y)`, which took approximately 60 full seconds to compile. I redesigned the code generator to produce an array of (x,y) pairs instead + a loop to call said function, compile time decreased to under 1 second. — Jason C, Sep 12 '16 at 05:10
Point being: Like this answer implies, there's a ton of variables (beyond just # of compilation units), very compiler specific, and very unpredictable to us folks who don't actually know how the compiler's internals work. — Jason C, Sep 12 '16 at 05:11
@JasonC That's enough for me. For people who just code in C++ this may be obvious, but coming from different programming languages where things are quite different, a simple explanation like your comments is quite useful. — Rahul Iyer, Sep 12 '16 at 05:18

score 0 · Answer 2 · answered Sep 12 '16 at 04:40

0

If the question is about comparing one cpp including all cpps and one header file including all header files versus one cpp and multiple header files, then I don't think there will be a significant difference (or any at all).

Including all cpps in one file that is (and probably only that) what makes the difference. But this is a highly theoretical discussion with no real value. No one would ever do this in a project for the reasons all here have mentioned.

If you want to know what is going on under the hood, read this:

https://randomascii.wordpress.com/2014/03/22/make-vc-compiles-fast-through-parallel-compilation/

answered Sep 12 '16 at 04:40

GeorgeT

464
3
5

This question is theoretical - this site isn't restricted just to what people do in practice, but also to help people understanding how things work. While everyone else seems to be distracted about whether this is practical or not, my question is really quite simple - will it be faster / slower / what factors will affect it and so on. It is not about whether one SHOULD do it. Its about understanding what is going on. – Rahul Iyer Sep 12 '16 at 04:42
@John what's going on is the fact that, used judiciously as a technique, it can provide shorter build times. – Adrian Colomitchi Sep 12 '16 at 04:46
@John I agree. That is why I gave a straight answer and a link for further exploitation. – GeorgeT Sep 12 '16 at 04:49
The question was about having one header file, and one cpp file, vs many header files and many cpp files.. – Rahul Iyer Sep 12 '16 at 05:20

Separate compilation units vs Single Compilation unit for faster compilation, linking, and optimised code?

2 Answers2

Linked