How to measure pimpl candidates?

Question

The pimpl (also: compiler firewall) idiom is used to shorten compile times, at the cost of readability and a little runtime performance. At the moment a project takes to long to compile, how to measure the best pimpl candidates?

I have my experience in using pimpl, shortening a project its compile time from two hours to ten minutes, but I did this just following my instincts: I reasoned that class header files that include (1) a lot of source code (2) complex/template classes, are the best candidates to use the pimple idiom on.

Is there a tool that points out which classes are good pimpl candidates objectively?

OT: I was always under the impression that the main reason to use the PIMPL was to un-bloat the public interface from the implementation details. Anyway, great question — sjaustirni, Feb 27 '16 at 12:39
did you try using precompiled headers to see what the gain of performance at compile time was ? — Christophe, Feb 27 '16 at 16:32
@Christophe: yes, I have used precompiled headers. Siding my original question would be: 'How to measure precompiled header candidates?' — richelbilderbeek, Feb 28 '16 at 13:11
I believe this is very compiler dependent. Slow down is caused by 2 main drivers: header size (disk i/o, because even with gards the full header has to be read) and object complexity (number of functions, classes, members) defined (lexical and syntatic analysis etc). I think you could use such basic metrics to make your choice. I guess the future c++ modules could help to improve this with or without pimpl. — Christophe, Feb 28 '16 at 13:37
for gcc I would suggest follow: Use strace (check time spent on open/close), -frepo option (to generate template .rpo files) and -ftime-report. It will give you some ideas about where time spent. You could use pragma once on top of include guards. — VladimirS, Mar 04 '16 at 14:37
@VladimirS: I use gcc. I will try out strace and report back later — richelbilderbeek, Apr 06 '16 at 11:05

score 0 · Answer 1 · answered Aug 18 '16 at 21:10

This is true that Pimpl is useful for incremental compile.

But the main reason to use Pimpl is to preserve ABI compatibility. This was the rule in my past company for almost all public class in the API.

Other advantage is that you can also distribute your library as a package containing header that not expose implementation details.

For this I will say : use Pimpl wherever possible.

A very good article of the Qt Pimpl implementation details and the benefits : https://wiki.qt.io/D-Pointer

The compile time problem must be addressed with :

using precompiled header
dividing your big projects into small ones by code touch frequency. Parts that not change often can be compiled in library and published in local repository that others project reference by version.
...

Thanks saad for enlightening me on these points. Still, my question is, how to measure where using a pimpl will give most increase in compile-time. Closest to that answer is your suggestion of 'dividing your big projects into small ones by code touch frequency'. My question is: 'How do you measure that?' — richelbilderbeek, Aug 22 '16 at 11:21

score 0 · Answer 2 · answered Mar 15 '20 at 13:33

I'm not aware of a existing tool to do this, but I would suggest:

First, measure the stand-alone cost of including every header by itself. Make a list of all headers, and for each header, preprocess it. The simplest measure of the cost of that header is the number of lines that result from preprocessing. A possibly more accurate measure would be to count the occurrences of 'template', as processing template definitions seems to dominate compilation time in my experience. You could also count occurrences of 'inline', as I've seen large numbers of inline functions defined in headers be an issue too (but be aware that inline definitions of class methods don't necessarily use the keyword).

Next, measure the number of translation units (TUs) that include that header. For each main file of a TU (e.g., .cpp file), preprocess that file and gather the set of distinct headers that appear in the output (in the # lines). Afterward, invert that to get a map from header to number of TUs that use it.

Finally, for each header, multiply its stand-alone cost by the number of TUs that include it. This is a measure of the cumulative effect of this header on total compilation time. Sort that list and go through it in descending order, moving private implementation details into the associated implementation file and trimming the public header accordingly.

Now, the main issue with this or any such approach to measuring the benefit of private implementations is you probably won't see much change at first because, in the absence of engineering discipline to do otherwise, usually there will be many headers that include many others, with lots of overlap. Consequently, optimizing one heavily-used header will simply mean that some other heavily-used header that includes almost as much will keep compilation times high. But once you break through the critical mass of commonly used headers that have many dependencies, optimizing most or all of them, compilation times should start to drop dramatically.

One way to focus the effort, so it's not so "pie in the sky", is to begin by selecting the single TU that takes the most time to compile, and work on optimizing only the headers that it depends on. Once you've significantly reduced the time for that TU, look again at the big picture. And if you can't significantly improve that one TU's compilation time through the private implementation technique, then that suggests you need to consider other approaches for that code base.

How to measure pimpl candidates?

2 Answers2