How to Make Your C++ Qt Project Build 10x Faster with 4 Optimizations

By | March 13, 2017

How to Build Your C++ Qt Project 10x Faster with 4 Simple Optimizations

— Hey! Get back to work!
— Compiling!
— OK. Carry on.
XKCD #303

As a project grows, the time spent on building it becomes more and more valuable. Generally, the longer it takes to build a project, the more time you waste every day.
Multiply that by the time it takes for an entire team or a company of teams, and you’ve got a lot of wasted time in your hands. While having to wait hours for a rebuild for each the smallest change can make you more attentive to details and force you to consider each step systematically, it can block a more agile process or collaboration.

We’ll describe our experience with a build process optimization, try to summarize it and make a simple guide for handling optimization with C++ using Qt.

1. Obviously, the most effective way to optimize is to use a parallel build

A parallel build is handled mostly by ‘make’ utilities, but for MSVC it is implemented internally in the ‘cl’ compiler.

If you use gcc / clang compilers you’ll have ‘make’ calls, which need a ‘-j8’ parameter.
In Qt Creator it’s in “Projects” -> “Build Settings”.
In “Build Steps” you can find the editable line starting with “Make: ….” and “Details” button on right hand side, clicking “Details” reveals two more edit fields, one of which is “Make arguments:” – type ‘-j8’ (without quotes) there and that’s it.
For automatic build you’ll have manual calls for make/nmake/jom somewhere, similarly use ‘-j8’ parameter in case of ‘make’.

If you build with MSVC – you will internally use ‘jom’ or ‘nmake’ for which you don’t need to specify any extra parameters.
To turn on optimization you need to specify ‘-MP’ parameter for the compiler, which can be easily done in your Qt .pro file (e.g. MyProject.pro).
Add in .pro file the following:

  *msvc* { # visual studio spec filter
      QMAKE_CXXFLAGS += -MP
  }

In cases where you work with MSVC IDE – you’ll use ‘nmake’ internally. ‘nmake’ is worse and we have nothing to improve in this case.

In cases where you work with Qt Creator – you’ll likely use ‘jom’ internally, you can see it in “Projects” -> “Build Settings” -> “Build Steps” : “Make: jom.exe …”, if there’s ‘nmake.exe’ – you can click “Details” and override it with ‘jom’, which is a free addition to Qt.

Why ‘jom’ is better than ‘nmake’?
– The Microsoft compiler handles parallelism by itself, while ‘nmake’ – doesn’t, so custom build steps won’t run in parallel unless you use ‘make’ or ‘jom’.

What are the custom build steps?
– major custom build steps are:

  1. processing .ui files via UIC utility
  2. processing header files via MOC utility

They’re major because you’ll have multiple of them, so they should run in parallel.

What are the time savings??

It depends on your hardware. The build should run on the maximum amount of effective parallel threads, so if you have 4 cores with hyperthreading – you’ll have up to 8 effective threads (logical cores).
In theory the acceleration will be about equal to the amount of effective threads, in practice it will be less. For example, a factor of 3-5 for 8 threads.

Why -j8?

According to the ‘make’ manual the simple ‘-j’ option doesn’t restrict the amount of threads created, which can lead to unpredictable problems – ‘make’ can try to create a process for each target, which is not efficient and also may result in this.
‘-j8’ specifies 8 threads to run in parallel. 8 is a good choice for most cases and a good starting point (if you have 4-8 effective threads), if you have more – you could try with your number of threads.

For those of us on Windows gnu, make works correctly with a simple ‘-j’ while on Mac OS it doesn’t.

If you have only 4 effective threads you should probably use ‘-j4’ but in our experience – it doesn’t make any difference in speed.

Why -MP?

Similarly with -j, -MP allows you to specify the of number of threads (e.g. -j8, -MP8) but it works best without the specified number.
You can specify the number of threads in -MP in case you don’t want to overburden your CPU – then you should use less threads than you have.

2. Precompiled header (pch)

A precompiled header is a good way to highly reduce the amount of work for a compiler. When the compiler processes a file – it has to parse the whole code, which is mostly not yours, but what you include from a standard library and some other third-party sources (e.g. Qt sources). For example your source file can contain a hundred lines of your code and “#include <QVector>” – but after QVector is included you have ~133, 000 lines of code (or a 1.65 MB file) that has to be parsed. Most of the code is parsed over and over again for all your sources – this can be easily avoided. Pch allows you to specify which files are commonly used so the compiler can pre-compile them before a build starts and then use the resulting information when compiling each .cpp.

To use a precompiled header open your MyProject.pro file and add:

    PRECOMPILED_HEADER = <path_to_your_pch_file>
    CONFIG += precompile_header

where <path_to_your_pch_file> is for example ‘src/pch/my_precompiled_header.h’
and the file contents look like this:

    // this file is not to be explicitly included in project.
    #include <QFileDialog>
    #include <QPushButton>
    #include <QLabel>
    #include <QCheckBox>
    #include <QComboBox>
    #include <QVBoxLayout>
    #include <QHBoxLayout>
    #include <QDrag>
    #include <QMimeData>
    #include <QToolButton>
    #include <QFrame>
    #include <QValidator>
    #include <QAction>
    #include <QButtonGroup>
    #include <QHeaderView>
    #include <QLineEdit>
    #include <QSpacerItem>
    #include <QStackedWidget>
    #include <QRadioButton>
    #include <QTabWidget>
    #include <QToolTip>
    #include <QMouseEvent>
    #include <QStyle>
    #include <QTimer>

    #include <QApplication>
    #include <QVariant>
    #include <QMap>
    #include <QVector>
    #include <QStringList>
    #include <QDir>
    #include <QPointer>
    #include <QColor>

    #include <string>
    #include <set>
    #include <map>
    #include <memory>
    #include <vector>
    #include <unordered_map>
    #include <array>
    #include <bitset>
    #include <initializer_list>
    #include <functional>
    #include <algorithm>
    #include <numeric>
    #include <iterator>
    #include <type_traits>
    #include <cmath>
    #include <cassert>
    #include <cfloat>
    #include <complex>
    #include <cstddef>
    #include <cstdint>
    #include <cstdlib>
    #include <mutex>
    #include <thread>
    #include <condition_variable>

 

And that’s it! A very simple adjustment and yet extremely effective. This will enable your compiler to use precompiled header optimization and will speed up your build process.

Measurements on our project gave us the following accelerations (without SSD on the machine):

  • Windows MSVC compiler: x3.5
  • Mac OS clang: x3.0
  • Windows g++: x1.6

What do you do with the pch header file?

– Nothing, don’t include it in your code, as the comment says.

Why this set of files?

– These are the files, which are typically used in our project, your optimal set of files might be different.
The first group of includes is added for the GUI module, second group contains major Qt includes, third group – major components of namespace std.

Should I just include everything?

– No, at some point it will start slowing you down.

Any pitfalls?

– Yes, including some particular files may make it non-compilable. For example, it happens with including <atomic> on clang compiler for me, because pch compiler tries to use all template code, including code that you don’t plan to use and that code may contain some mistakes.
– You may also have linking problems if you share object files for different builds (e.g. for tests and app by specifying ‘OBJECTS_DIR’ in .pro file).

Note that once you start working with pch, the compiler will forcibly include those headers, so if you disable pch you may find out that your code doesn’t compile anymore because some needed includes are missed (which can be easily fixed).

Should I include local files in pch?

– No, pch is effective when it’s stable, if any file is included in pch changes – it will require a full rebuild.

3. Remove redundant targets from makefile (indirectly)

Qt has UIC and MOC utilities (mentioned above) for it’s autogen, if you use the Q_OBJECT macro you get files auto-generated with MOC.

There are two improvements here:

3.1 Remove Q_OBJECT where it’s not needed

Your class shouldn’t contain Q_OBJECT unless it’s necessary.
You need the Q_OBJECT macro if:

  • you have signals
  • you use ‘tr’ method inside this class (Q_OBJECT generates ‘context’ in translation files according to class name)
  • you have slots (why would you ever use slots except for QtTests? see Qt 5 connect syntax)
  • you use some other Qt macro for autogen (e.g. Q_PROPERTY, Q_ENUM, Q_FLAGS…)

If a header file doesn’t contain Q_OBJECT – it won’t require MOC to generate the ‘moc_*.cpp’ file.
So if you don’t use any Qt macro within your class – you won’t need the Q_OBJECT either.
Note that 3.1 is a smaller optimization compared to 3.2, since MOC utility works pretty quickly.

3.2 Include generated moc files in corresponding .cpp

Usually you have a class, like so:

  MyClass.h
  MyClass.cpp

If MyClass.h contains Q_OBJECT macro, MOC will produce “moc_MyClass.cpp” file which is separate target for build and by compilation complexity is equal to another small .cpp file.
You should include this file in the end of your “MyClass.cpp”:

  ...
  #include "moc_MyClass.cpp"

In this way qmake will notice that and exclude this file from list of targets, thus significantly reducing the amount of work for a compiler.

Why is this better?

Compiling 2 files “A.cpp” + “B.cpp” is much slower than compiling single “(A+B).cpp” file.

Why not use a single cpp file then?

You may group files with some restrictions, but obviously it will cause huge harm to the code and development.

Here is our Python 2.7 script that can help you with identifying such issues:

    # check_code v1.0
    # run from root of repository
    from subprocess import check_output

    _qt_autogen_macro_list = (\
    'signals:', \
    'Q_CLASSINFO', \
    'Q_PLUGIN_METADATA', \
    'Q_INTERFACES', \
    'Q_PROPERTY', \
    'Q_PRIVATE_PROPERTY', \
    'Q_REVISION', \
    'Q_OVERRIDE', \
    'Q_ENUM', \
    'Q_FLAGS', \
    'Q_SCRIPTABLE', \
    'Q_INVOKABLE', \
    'Q_SIGNAL', \
    'Q_SLOT' \
    )
    _qt_autogen_macro_regexp = '({0})'.format('|'.join(_qt_autogen_macro_list))

    def _FnamesToBaseFileNames(fnames_list):
        result = []
        for fn in fnames_list:
            start = fn.rfind('/')
            end = fn.rfind('.')
            if end > start:
                result.append(fn[start + 1 : end])
        return result

    def CheckQObject():
        def get_files(pattern):
            files = check_output('git grep -E "{0}"'.format(pattern)).split('\n')
            files = [s[:s.find(':')] for s in files]
            return set([s for s in files if len(s) > 0])
        qobject_files = _FnamesToBaseFileNames(get_files('Q_OBJECT'))
        macro_files = _FnamesToBaseFileNames(get_files(_qt_autogen_macro_regexp))
        include_moc_files = _FnamesToBaseFileNames(get_files('#include \\"moc_.*?\.cpp\\"'))
        for fn in qobject_files:
            if fn not in include_moc_files:
                print 'looks like {0} misses include for moc_* file'.format(fn)
            if fn not in macro_files:
                print 'probably {0} has redundant Q_OBJECT macro'.format(fn)

    CheckQObject()

The script is to be run from the root of the repository.

4. Forward stuff

Forwarding classes will reduce time of a partial build during regular work by reducing dependencies of each .cpp file.
We have ‘src/forwards.h’ file with forward declarations for most classes, like this:

  class QString;
  template  <typename T> class QVector;
  enum class Foo : int;
  enum Bar : int;
  void MyHelperFunction(QString&);

  namespace MyNamespace {
  class MyClass;
  struct MyStruct;
  };

By having such a file you can greatly reduce the amount of includes in header files, mostly declaring stuff by just including ‘src/forwards.h’.

If you’re not familiar with forward declarations – here’s a crash course.
Generally forwards can be used when a forwarded object (e.g. “class QString;”) is allowed
to have an unknown size – i.e. when it’s declared by a pointer or reference and not used for any operations.

So the following class A will require full information about classes B and QObject, but not about C nor enum D, thus:

    #include "src/forwards.h"
    #include "src/B.h"

    #include <Qobject>

    class A : public QObject { // base class can not be forwarded
        C* m_c; // pointer can be forwarded
        C& m_ref_c; // reference can be forwarded
        B m_b; // object by value can not be forwarded
        D m_d; // D is declared as "enum class D : int;" thus underlying type is known and it can be forwarded (particular enum feature)
    public:
        //...
        void foo(C* c); // ok - pointer
        void foo(const C& c); // ok - reference
        void foo(D d); // ok - enum with known underlying type
    };

We benchmarked our project on a machine with an SSD drive and 4 threads,
and got the following results playing with optimizations 1 and 2:

  • no-parallel no-pch : 4:53
  • parallel no-pch : 1:44
  • no-parallel pch : 1:55
  • parallel pch : 0:48

So even with an SSD drive, the first two optimizations gave x6.1 acceleration in total, and after the parallel build was enabled, pch gave an additional x2.15 accel.

With these optimizations you can expect to make your build run up to 10x times faster.

Evgeniy Evstratov, C++ Team Lead.

As always, feel free to contact us for a consultation!

 

Leave a Reply

Your email address will not be published. Required fields are marked *