The idea of this post is to discuss what is necessary to compile C++ code. There’s a lot of issues that can crop up during compilation time and it is very hard to understand what is going on if you don’t understand how C++ compilation works.
This also is useful to understand how different build toolchains work. It will be focused on Microsoft C++ compiler, but the same ideas would apply for cmake, clang or other compilers.
Overview of compilation/linking process
The output of the compilation process is a set of .obj files. The output of a linking process will be a static library, a dynamic library (.dll or .so) or an executable.
- The compiler will compile all the source files (.cpp) in .obj files. It will need the headers to do that.
- A static library (.lib) will be just a set of .obj glued on each other. You can’t execute a static library. It is just there for reuse purposes.
- A dll is a piece of executable that can be dynamically loaded in an existing process in runtime.
- An executable can be launched directly. There are different kinds of executable, like a console application, a windows application or an UWP application.
- Executables and dlls may need static libraries to compile.
This is the overview of the process. Obviously there are lots of quirks which I’ll detail next.
Visual Studio Toolchain and the compilation process
When you build your solution (.sln) in Visual Studio, a lot happens. There’s also some other tools like “MSBuild” and the linker that you may or may not be aware that they exist. This is just to help you fire a lot of command lines which actually will produce the results. In other platforms, it is quite common to have makefiles that are like a recipe of what to do step by step to produce the outputs.
In Visual Studio usually you organise your code in solutions (.sln). The solution is just a collection of projects. A project (for C++ a .vcxproj file) can be a static library, a dynamic link library (dll) or an executable (console app, windows app, UWP app, etc).
A solution allows you to create references between projects. In C++ this is used for having projects that are executables referencing projects that are static libraries, which is one way of isolating your code in cohesive blocks and avoid them to become a monolith where everything depends on everything therefore you only have a big chunk of code that you can’t reuse anywhere (a very common thing to find in C++ projects, unfortunately).
So, when you build a solution, what happens under the hood is that MSBuild (an utility, part of Visual Studio now, used to be part of .NET Framework) will be triggered for your solution and will build all the projects. Project files (.vcxproj, .csproj, etc) are MSBuild projects. They are xml on the format expected by MSBuild and contains includes (MSBuild includes) that defines how to compile for C++, how to compile for C#, etc.
MSBuild files contains Targets (basically commands), that will be defined for projects in the includes and Properties, which can be anywhere. The files that will be compiled are properties, the configuration types are properties, the required libraries and so and so.
So, basically what MSBuild does is resolving the dependencies and build all the projects in the right order. Then for every project, MSBuild will be invoked again, which will have the specific compiler-related magic. Makefiles and cmake build structure is not very different from this. Conceptually they play the same role.
In any other build tool for other compilers (for example XCode process), you will have a similar structure. A file that contains a bunch of configuration that will be forwarded to the compiler.
CMake is slightly different. It is a script, but instead of sending its output in commands to the compiler, it generates makefiles or Visual Studio projects. It is designed for cross-compilation on multiple compilers and platforms. In short, it compiles CMakeLists.txt files in project files for a specific platform (makefiles in linux/unix, Visual Studio projects in Windows).
Step one: The compilation process
- Include directories
- Precompiled headers (if they are used)
- Source file
- Object file
For each source file, the compiler (CL.exe for Microsoft Visual C++) will be invoked passing the inputs described above and other things like optimisation switches, where to build warnings or not, to report warnings errors or not and so and so. The source file is the .cpp file. The include directories, the directories where the compiler should look for .h files and the defines a list of macros that will be created.
The list of macros is commonly used by #ifdef directives to create different flavours of compilation. They are very important, because you can have a #ifdef that gives you one behaviour if you are compiling as a library or another behaviour if you are compiling as an executable or any other kind of conditional compilation on your code. This idea of conditional compiling is very powerful, yet a very good way of shooting your own foot.
Also, defines are commonly used to handle cross-platform compilation. Yes, in C++ you have to compile your code for each different platform it targets, differently from Java and C# where you compile IL and the platform-specific binary is generated in runtime.
The includes are the same as typing the content of a file inside your .cpp file. That’s why you need #pragma once or a #if to avoid includes to be included twice by different files and duplicating definitions.
Also, there is the process of precompiled headers, which I won’t detail here. But the main concept is having a .cpp file that includes some of your most used headers and then compiling that .cpp file, which will produce a .pch file. This .pch file can be reused to compile another .cpp file without having to recompile all the headers again.
It is important to mention that the header files is one the most common ways one translation unit (a .obj, .lib or .exe) can reuse code that is on another translation unit. There are other methods, like RTTI or discoving functions in DLL’s but they also have their own drawbacks and may require some knowledge of the code that is in that translation unit.
Step Two: Linking process
- Some .obj files
- Some .lib files produced by your own code
- Some 3rd party .obj files or .lib files.
- One .dll or .exe file
The linking process (link.exe) is the idea of taking a lot of compiled code (.obj or .lib) and merging them in a single .exe. Although it seems quite straightforward, there are obviously some special cases.
When you are linking your code, some extra validation will be done, for example, if you have your application “entry point” (your main.cpp, winmain.cpp), some other validations if you are building a .dll. Also, during linking time, is where C++ will validate if all the functions defined in the headers are actually implemented. So, if you defined a type in a header, used and never implemented it in a .cpp file, you will discover that at linking time. Also, during linking time some optimisations can be applied, like remove unused executable code, for example.
And last, because your .exe is targetting a specific OS or platform, there will be a lot of system libraries (usually .lib) that needs to be linked. A good example is the C standard libraries like wincrt (Windows C Runtime library) or some other .lib to any other toolkit. XBOX will have its specific libraries to link.
<sarcasm>Because there’s not enough complexity in this process </sarcasm>, different standard C library flavours exists. One for release, a different one for debug. One for bundling in your executable everything other one for using a .dll version of the standard library (which is loaded in runtime and must be installed in the target machine).
What can go wrong?
Well, in any process with a lot of options and variations: a lot. Here goes some ideas for troubleshooting.
- Always search for the first compiler error and any previous warnings that are being displayed before that error. That can be helpful. Visual Studio sometimes changes the order displayed in the error list. Checking the output window may be helpful.
- If your code doesn’t compile, first make sure if it doesn’t compile (error coming from CL.exe) or does’t link (error coming from LINK.exe). Based on this guide and where the error is coming from, you should have a very good idea of what is going on.
- If your problem is compilation
- Make sure your header files are included and the ones you are expecting to be included
- Make sure there are no missing defines. Following your compilation errors through the chain will help you to understand that.
- If based on your build tool you have no idea which property changes which compiler behaviour you can increase log verbosity and check for the exact CL.EXE command that is being invoked.
- Try to understand the problems. Linker errors are a pain, but possible to understand.
- Most of the cases of problems with linking are functions not implemented. Functions defined in a header (which you used somewhere in your code) and the implementation wasn’t found anywhere by the linker (maybe a missing .lib or .obj as additional dependency).
- For missing functions you will always have errors like Class::Function expected in Class::Function. Based on that you can understand if it is a wrong define in compilation time that is not generating that implementation or a whole .lib that was not included.
- If the function that is missing isn’t something that looks like your code, it is very likely that is 3rd party code or system libraries, in this case, the same idea applies: making sure a wrong define is not forcing your code to expect a function you don’t have (like defining “WIN32” and trying to link agains UWP libraries) or you are not completely forgetting a .lib that should be included.
- Try to understand how the command line of the linker works and getting the full command line can help you to dig deeper if necessary.
I hope this helps you to understand more about what happens under the hood of your build tool. Behind the scenes they are just issuing command lines for your compiler, but because a lot of magic happens during the translation of properties, references, json files, xml files, these fundamentals gets forgotten.
Because most of the C++ projects likes to use “in-house”, “custom” or “variations” of standard build flavours, this can get very complicated, messy and hard to troubleshoot.