Build native artifacts using CMake

Warning

CatBoost uses CMake-based build process since this commit. Previously Ya Make (Yandex's build system) had been used.

For building CatBoost using Ya Make see here

We define native artifacts as build system artifacts that contain native code - executable binaries, shared libraries, static libraries, Python and R binary extension modules.

Host platform refers to the operating system and CPU architecture you run build on.

Target platform refers to the operating system and CPU architecture you run build for (where you intend to run built artifacts like executable CLI application, dynamic library, Python extension library etc.)

Possible host and target platform combinations:

Host platform Target platform
Linux x86_64 Linux or Android
Linux non x86_64 Linux
macOS x86_64 or arm64 macOS x86_64 or arm64 or universal binaries
Windows x86_64 Windows x86_64

Note

Training or inference on CUDA-enabled GPUs requires NVIDIA Driver of version 450.80.02 or higher.

Source code

CatBoost source code is stored as a Git repository on GitHub at https://github.com/catboost/catboost/. You can obtain a local copy of this Git repository by running the following command from a command line interpreter (you need to have Git command line tools installed):

git clone https://github.com/catboost/catboost.git

Later in this document $CATBOOST_SRC_ROOT refers to the root dir of the local working copy of the source code cloned from the GitHub CatBoost repository.

Dependencies and requirements

Targets

CMakeFiles for CatBoost CMake projects contain different targets that correspond to native artifacts.

Final targets that are important:

Target Component Output location Description
catboost app $CMAKE_BINARY_DIR/catboost/app CLI app
_catboost python-package $CMAKE_BINARY_DIR/catboost/python-package/catboost python package shared library
catboostmodel libs $CMAKE_BINARY_DIR/catboost/libs/model_interface C/C++ applier shared library
catboostmodel_static libs $CMAKE_BINARY_DIR/catboost/libs/model_interface/static C/C++ applier static library with CatBoost dependencies linked in
catboost_train_interface libs $CMAKE_BINARY_DIR/catboost/libs/train_interface C/C++ train API shared library
catboost4j-prediction jvm-packages $CMAKE_BINARY_DIR/catboost/jvm-packages/catboost4j-prediction/src/native_impl JVM applier JNI shared library
catboost4j-spark-impl spark $CMAKE_BINARY_DIR/catboost/spark/catboost4j-spark/core/src/native_impl Spark native JNI shared library part
catboostr R-package $CMAKE_BINARY_DIR/catboost/R-package/src R package shared library

Supplementary utilities targets (used for testing):

Target Output location Description
limited_precision_dsv_diff $CMAKE_BINARY_DIR/catboost/tools/limited_precision_dsv_diff Utility to compare dsv files that may contain floating point numbers with limited precision
limited_precision_json_diff $CMAKE_BINARY_DIR/catboost/tools/limited_precision_json_diff Utility to compare JSON files that may contain floating point numbers with limited precision
model_comparator $CMAKE_BINARY_DIR/catboost/tools/model_comparator Utility to compare models saved as files

Build using build_native.py

build_native.py is a convenient wrapper for building native artifacts with a simpler interface compared to invoking cmake, conan and ninja directly.

You can obtain all the options for build_native.py with the description by calling it with the --help flag:

python $CATBOOST_SRC_ROOT/build/build_native.py --help

The required options are:

  • --targets - List of CMake targets to build (,-separated). See the list of supported targets
  • --build-root-dir - CMake build dir (forwarded to cmake's -B option)

Importantly, build_native.py has --dry-run and --verbose options so you can examine the commands it is going to run without actually running them.

Examples

  • Build catboost CLI app for the current platform without CUDA:

    python $CATBOOST_SRC_ROOT/build/build_native.py --build-root-dir=./build_no_cuda --targets catboost
    
  • Build catboost CLI app for the current platform with CUDA support (path to CUDA is taken from CUDA_ROOT or CUDA_PATH environment variable, if they are not defined the build will fail):

    python $CATBOOST_SRC_ROOT/build/build_native.py --build-root-dir=./build_with_cuda --targets catboost --have-cuda
    
  • Build catboost CLI app for the current platform with CUDA support and CUDA path specified explicitly:

    python $CATBOOST_SRC_ROOT/build/build_native.py --targets catboost --build-root-dir=./build_with_cuda_11 --have-cuda --cuda-root-dir=/usr/local/cuda-11/
    
  • Build C/C++ applier shared library as a macOS universal binary with macOS minimal version set to 11.0:

    python $CATBOOST_SRC_ROOT/build/build_native.py --targets catboostmodel --build-root-dir=./build_applier --macos-universal-binaries --macosx-version-min=11.0
    

Build by calling cmake, conan and ninjadirectly

Note

For most common scenarios it is easier to run build_native.py descibed above.

Host platform is the same as the target platform (no cross-compilation)

  1. Choose some directory as a build root. Prefer short paths on Windows to avoid hitting the path length limit of 260 characters for files in this directory. This directory is referred to as $CMAKE_BINARY_DIR later.

  2. If you build on Linux for aarch64 architecture set special compilation flags (will be used in conan packages builds):

    export CFLAGS="-mno-outline-atomics"
    export CXXFLAGS="-mno-outline-atomics"
    

    See GitHub issue #2527 for details.

  3. Call cmake with $CATBOOST_SRC_ROOT as a source tree root and a build root specification: -B $CMAKE_BINARY_DIR. See CMake CLI documentation for details. Other important options and definitions for this call are described below.

  4. Call the build tool (depending on what generator has been specified in the cmake call above) with the build specification files generated in $CMAKE_BINARY_DIR.
    For ninja this will be:

    ninja -C $CMAKE_BINARY_DIR <target> [<target> ... ]
    

    See the list of possible targets (also depends on specified CATBOOST_COMPONENTS during the cmake call above).

  5. Native artifacts are generated in $CMAKE_BINARY_DIR subdirectories, see the list of possible targets for an exact output location for each target.

Host platform is different from the target platform (cross-compilation)

  1. Choose some directory as a build root for host platform tools (build as parts of CatBoost's CMake project). This directory is referred to as $CMAKE_NATIVE_TOOLS_BINARY_DIR later.

  2. Call cmake with $CATBOOST_SRC_ROOT as a source tree root and a build root specification: -B $CMAKE_NATIVE_TOOLS_BINARY_DIR. Also pass -DCATBOOST_COMPONENTS=none to disable building components of CatBoost itself - we need only tools here.

    See CMake CLI documentation for details. Other important options and definitions for this call are described below.

  3. Build host platform tools:
    For ninja this will be:

    ninja -C $CMAKE_NATIVE_TOOLS_BINARY_DIR archiver cpp_styleguide enum_parser flatc protoc rescompiler triecompiler
    
  4. Choose some directory as a target platform build root. This directory is referred to as $CMAKE_TARGET_PLATFORM_BINARY_DIR later.

  5. Call conan to install host platform tools to $CMAKE_TARGET_PLATFORM_BINARY_DIR.

    conan install -s build_type=<build-type> -if $CMAKE_TARGET_PLATFORM_BINARY_DIR --build=missing $CATBOOST_SRC_ROOT/conanfile.txt
    

    where build-type is either Debug or Release.

  6. Call conan to install target platform libraries to $CMAKE_TARGET_PLATFORM_BINARY_DIR.

    conan install -s build_type=<build-type> -if $CMAKE_TARGET_PLATFORM_BINARY_DIR --build=missing --no-imports -pr:h=<conan_host_profile> -pr:b=default $CATBOOST_SRC_ROOT/conanfile.txt
    

    where

  7. Call cmake with $CATBOOST_SRC_ROOT as a source tree root and a build root specification: -B $CMAKE_TARGET_PLATFORM_BINARY_DIR.

    Compared to a usual cmake call for a non cross-platform build you have to:

    • Specify a special toolchain for cross-platform building (not usual $CATBOOST_SRC_ROOT/build/toolchains/clang.toolchain). Examples of such toolchains are available in $CATBOOST_SRC_ROOT/build/toolchains/ (with cross-build prefix).
    • pass -DTOOLS_ROOT=$CMAKE_NATIVE_TOOLS_BINARY_DIR to specify the path to native platform tools to be used during the build (that have been build during the first cmake and ninja calls above). Note that the path specified here has to be absolute!

    See CMake CLI documentation for details. Other important options and definitions for this call are described below.

    See also CMake cross-compilation documentation.

  8. Native artifacts are generated in $CMAKE_TARGET_PLATFORM_BINARY_DIR subdirectories, see the list of possible targets for an exact output location for each target.

CMake - important options and definitions

  • -B <path-to-build> - path to directory which CMake will use as the root of build directory.

  • -G <generator-name> - generator name. The recommended generator is "Ninja".

    Alternatively, on Windows you could also use Visual Studio generators for CMake.

    • For builds with CUDA use Visual Studio 16 2019 or Visual Studio 17 2022 generator and also specify the required toolset version when calling CMake by adding -T version=<toolset_version> to the command line. Currently supported toolset versions are 14.28 and 14.29.
    • For builds without CUDA use Visual Studio 17 2022 generator and also specify the required ClangCL toolset when calling CMake by adding -T ClangCL to the command line.

    Unix Makefiles CMake generator usage on macOS and Linux is possible but not recommended because of some issues with properly taking dependencies into account.

  • -DCMAKE_BUILD_TYPE=<build-type> -
    build type. Use one of Debug, Release, RelWithDebInfo and MinSizeRel.
    Note that multiconfig generators are not properly supported right now. Even if you select Visual Studio as a CMake generator choosing between configurations in the generated solution won't switch all the necessary options, this option will take precedence.

  • -DCMAKE_TOOLCHAIN_FILE=<path> - pass toolchain to CMake. On Linux CMake's default configuration will most likely select gcc as a C and C++ compiler, but CatBoost needs to be built with either clang (on Linux or macOS) or Microsoft's cl compiler on Windows.
    So it is recommended to pass toolchain that will set clang and clang++ as C and C++ compilers on Linux and macOS and also set clang as a compiler for host code for CUDA (applicable only on Linux). The default toolchain that does that is $CATBOOST_SRC_ROOT/build/toolchains/clang.toolchain.

    As CatBoost requires Clang 12+ to build if the default Clang version available from the command line is less than that then use the modified toolchain where all occurences of clang and clang++ are replaced with clang-$CLANG_VERSION and clang++-$CLANG_VERSION respectively where $CLANG_VERSION is the version of clang you want to use like, for example, 12 or 14 (must be already installed).

  • -DCMAKE_POSITION_INDEPENDENT_CODE=<On|Off> - Turn on or off Position-independent code generation. Required for building shared libraries. Off by default.

  • -DCATBOOST_COMPONENTS=<components-list> - As CatBoost's CMake project contains many different CMake targets for different components that have their specific configuration dependencies it is often useful to restrict build configuration if you need only a subset of them (e.g. if you only need to build the CLI app you don't want to set up JDK that is required for building components with JVM API). The list is ;-delimited so it might require shell escaping or quoting.

    See Targets for components needed to build a certain target. none value is also possible and used during cross-compilation when building host platform tools only.

  • -DCMAKE_OSX_DEPLOYMENT_TARGET=<min_macos_version> - Specify the minimum version of macOS on which the target binaries are to be deployed. Relevant only for macOS.

  • -DHAVE_CUDA=<yes|no> - Turn CUDA support on or off. No (off) by default.

  • -DCUDAToolkit_ROOT=<path> - Specify path to CUDA installation directory. Useful if a specific CUDA version has to be selected from several versions installed or if CUDA has been installed to a non-standard path so CMake is unable to find it automatically.

  • -DCMAKE_CUDA_RUNTIME_LIBRARY=<None|Shared|Static> - Select the CUDA runtime library for use when compiling and linking CUDA.

  • -DJAVA_HOME=<path> - path of JDK installation to be used during build. Relevant only to build components with JVM API (JVM applier and CatBoost for Apache Spark). JAVA_HOME environment variable will be used by default.

  • -DJAVA_AWT_LIBRARY=<path> - The path to the Java AWT Native Interface (JAWT) library. Use if specifying JAVA_HOME has not been sufficient (if CMake default search logic has been unable to find this library inside JDK).

  • -DJAVA_JVM_LIBRARY=<path> - The path to the Java Virtual Machine (JVM) library. Use if specifying JAVA_HOME has not been sufficient (if CMake default search logic has been unable to find this library inside JDK).

  • -DPython3_ROOT_DIR=<path> - path to the Python installation to build extension for (not necessarily the same as the current Python interpreter used by default in command line). Must contain Python development artifacts (Python headers in an include directory and Python library for building modules)

  • -DPython3_LIBRARY=<path> - The path to the Python library for building modules. Use if specifying Python3_ROOT_DIR has not been sufficient (if CMake default search logic has been unable to find this library inside Python3_ROOT_DIR)

  • -DPython3_INCLUDE_DIR=<path> - The path to the directory of the Python headers. Use if specifying Python3_ROOT_DIR has not been sufficient (if CMake default search logic has been unable to find this library inside Python3_ROOT_DIR)

  • -DCMAKE_FIND_ROOT_PATH=<path>[;<path>] - Semicolon-separated list of root paths to search on the filesystem. This variable is most useful when cross-compiling.