Build native artifacts using CMake
Warning
CatBoost uses CMake-based build process since this commit. Previously Ya Make
(Yandex's build system) had been used.
For building CatBoost using Ya Make see here
We define native artifacts as build system artifacts that contain native code - executable binaries, shared libraries, static libraries, Python and R binary extension modules.
Host platform refers to the operating system and CPU architecture you run build on.
Target platform refers to the operating system and CPU architecture you run build for (where you intend to run built artifacts like executable CLI application, dynamic library, Python extension library etc.)
Possible host and target platform combinations:
Host platform | Target platform |
---|---|
Linux x86_64 | Linux or Android |
Linux non x86_64 | Linux |
macOS x86_64 or arm64 | macOS x86_64 or arm64 or universal binaries |
Windows x86_64 | Windows x86_64 |
Note
Training or inference on CUDA-enabled GPUs requires NVIDIA Driver of version 450.80.02 or higher.
Source code
CatBoost source code is stored as a Git repository on GitHub at https://github.com/catboost/catboost/. You can obtain a local copy of this Git repository by running the following command from a command line interpreter (you need to have Git command line tools installed):
git clone https://github.com/catboost/catboost.git
Later in this document $CATBOOST_SRC_ROOT
refers to the root dir of the local working copy of the source code cloned from the GitHub CatBoost repository.
Dependencies and requirements
Targets
CMakeFiles for CatBoost CMake projects contain different targets that correspond to native artifacts.
Final targets that are important:
Target | Component | Output location | Description |
---|---|---|---|
catboost |
app |
$CMAKE_BINARY_DIR/catboost/app |
CLI app |
_catboost |
python-package |
$CMAKE_BINARY_DIR/catboost/python-package/catboost |
python package shared library |
catboostmodel |
libs |
$CMAKE_BINARY_DIR/catboost/libs/model_interface |
C/C++ applier shared library |
catboostmodel_static |
libs |
$CMAKE_BINARY_DIR/catboost/libs/model_interface/static |
C/C++ applier static library with CatBoost dependencies linked in |
catboost_train_interface |
libs |
$CMAKE_BINARY_DIR/catboost/libs/train_interface |
C/C++ train API shared library |
catboost4j-prediction |
jvm-packages |
$CMAKE_BINARY_DIR/catboost/jvm-packages/catboost4j-prediction/src/native_impl |
JVM applier JNI shared library |
catboost4j-spark-impl |
spark |
$CMAKE_BINARY_DIR/catboost/spark/catboost4j-spark/core/src/native_impl |
Spark native JNI shared library part |
catboostr |
R-package |
$CMAKE_BINARY_DIR/catboost/R-package/src |
R package shared library |
Supplementary utilities targets (used for testing):
Target | Output location | Description |
---|---|---|
limited_precision_dsv_diff |
$CMAKE_BINARY_DIR/catboost/tools/limited_precision_dsv_diff |
Utility to compare dsv files that may contain floating point numbers with limited precision |
limited_precision_json_diff |
$CMAKE_BINARY_DIR/catboost/tools/limited_precision_json_diff |
Utility to compare JSON files that may contain floating point numbers with limited precision |
model_comparator |
$CMAKE_BINARY_DIR/catboost/tools/model_comparator |
Utility to compare models saved as files |
build_native.py
Build using build_native.py
is a convenient wrapper for building native artifacts with a simpler interface compared to invoking cmake
, conan
and ninja
directly.
You can obtain all the options for build_native.py
with the description by calling it with the --help
flag:
python $CATBOOST_SRC_ROOT/build/build_native.py --help
The required options are:
--targets
- List of CMake targets to build (,-separated). See the list of supported targets--build-root-dir
- CMake build dir (forwarded tocmake
's-B
option)
Importantly, build_native.py
has --dry-run
and --verbose
options so you can examine the commands it is going to run without actually running them.
Examples
-
Build
catboost
CLI app for the current platform without CUDA:python $CATBOOST_SRC_ROOT/build/build_native.py --build-root-dir=./build_no_cuda --targets catboost
-
Build
catboost
CLI app for the current platform with CUDA support (path to CUDA is taken fromCUDA_ROOT
orCUDA_PATH
environment variable, if they are not defined the build will fail):python $CATBOOST_SRC_ROOT/build/build_native.py --build-root-dir=./build_with_cuda --targets catboost --have-cuda
-
Build
catboost
CLI app for the current platform with CUDA support and CUDA path specified explicitly:python $CATBOOST_SRC_ROOT/build/build_native.py --targets catboost --build-root-dir=./build_with_cuda_11 --have-cuda --cuda-root-dir=/usr/local/cuda-11/
-
Build C/C++ applier shared library as a macOS universal binary with macOS minimal version set to 11.0:
python $CATBOOST_SRC_ROOT/build/build_native.py --targets catboostmodel --build-root-dir=./build_applier --macos-universal-binaries --macosx-version-min=11.0
cmake
, conan
and ninja
directly
Build by calling Note
For most common scenarios it is easier to run build_native.py
descibed above.
Host platform is the same as the target platform (no cross-compilation)
-
Choose some directory as a build root. Prefer short paths on Windows to avoid hitting the path length limit of 260 characters for files in this directory. This directory is referred to as
$CMAKE_BINARY_DIR
later. -
If you build on Linux for
aarch64
architecture set special compilation flags (will be used inconan
packages builds):export CFLAGS="-mno-outline-atomics" export CXXFLAGS="-mno-outline-atomics"
See GitHub issue #2527 for details.
-
Call
cmake
with$CATBOOST_SRC_ROOT
as a source tree root and a build root specification:-B $CMAKE_BINARY_DIR
. See CMake CLI documentation for details. Other important options and definitions for this call are described below. -
Call the build tool (depending on what generator has been specified in the
cmake
call above) with the build specification files generated in$CMAKE_BINARY_DIR
.
Forninja
this will be:ninja -C $CMAKE_BINARY_DIR <target> [<target> ... ]
See the list of possible targets (also depends on specified
CATBOOST_COMPONENTS
during thecmake
call above). -
Native artifacts are generated in
$CMAKE_BINARY_DIR
subdirectories, see the list of possible targets for an exact output location for each target.
Host platform is different from the target platform (cross-compilation)
-
Choose some directory as a build root for host platform tools (build as parts of CatBoost's CMake project). This directory is referred to as
$CMAKE_NATIVE_TOOLS_BINARY_DIR
later. -
Call
cmake
with$CATBOOST_SRC_ROOT
as a source tree root and a build root specification:-B $CMAKE_NATIVE_TOOLS_BINARY_DIR
. Also pass-DCATBOOST_COMPONENTS=none
to disable building components of CatBoost itself - we need only tools here.See CMake CLI documentation for details. Other important options and definitions for this call are described below.
-
Build host platform tools:
Forninja
this will be:ninja -C $CMAKE_NATIVE_TOOLS_BINARY_DIR archiver cpp_styleguide enum_parser flatc protoc rescompiler triecompiler
-
Choose some directory as a target platform build root. This directory is referred to as
$CMAKE_TARGET_PLATFORM_BINARY_DIR
later. -
Call
conan
to install host platform tools to$CMAKE_TARGET_PLATFORM_BINARY_DIR
.conan install -s build_type=<build-type> -if $CMAKE_TARGET_PLATFORM_BINARY_DIR --build=missing $CATBOOST_SRC_ROOT/conanfile.txt
where
build-type
is eitherDebug
orRelease
. -
Call
conan
to install target platform libraries to$CMAKE_TARGET_PLATFORM_BINARY_DIR
.conan install -s build_type=<build-type> -if $CMAKE_TARGET_PLATFORM_BINARY_DIR --build=missing --no-imports -pr:h=<conan_host_profile> -pr:b=default $CATBOOST_SRC_ROOT/conanfile.txt
where
build-type
is eitherDebug
orRelease
conan_host_profile
is a path to a Conan profile for the target platform.
CatBoost provides such profiles for supported target platforms in $CATBOOST_SRC_ROOT/cmake/conan-profiles
-
Call
cmake
with$CATBOOST_SRC_ROOT
as a source tree root and a build root specification:-B $CMAKE_TARGET_PLATFORM_BINARY_DIR
.Compared to a usual
cmake
call for a non cross-platform build you have to:- Specify a special toolchain for cross-platform building (not usual
$CATBOOST_SRC_ROOT/build/toolchains/clang.toolchain
). Examples of such toolchains are available in $CATBOOST_SRC_ROOT/build/toolchains/ (withcross-build
prefix). - pass
-DTOOLS_ROOT=$CMAKE_NATIVE_TOOLS_BINARY_DIR
to specify the path to native platform tools to be used during the build (that have been build during the firstcmake
andninja
calls above). Note that the path specified here has to be absolute!
See CMake CLI documentation for details. Other important options and definitions for this call are described below.
See also CMake cross-compilation documentation.
- Specify a special toolchain for cross-platform building (not usual
-
Native artifacts are generated in
$CMAKE_TARGET_PLATFORM_BINARY_DIR
subdirectories, see the list of possible targets for an exact output location for each target.
CMake - important options and definitions
-
-B <path-to-build>
- path to directory which CMake will use as the root of build directory. -
-G <generator-name>
- generator name. The recommended generator is "Ninja".Alternatively, on Windows you could also use Visual Studio generators for CMake.
- For builds with CUDA use
Visual Studio 16 2019
orVisual Studio 17 2022
generator and also specify the required toolset version when calling CMake by adding-T version=<toolset_version>
to the command line. Currently supported toolset versions are14.28
and14.29
. - For builds without CUDA use
Visual Studio 17 2022
generator and also specify the requiredClangCL
toolset when calling CMake by adding-T ClangCL
to the command line.
Unix Makefiles
CMake generator usage on macOS and Linux is possible but not recommended because of some issues with properly taking dependencies into account. - For builds with CUDA use
-
-DCMAKE_BUILD_TYPE=<build-type>
-
build type. Use one ofDebug
,Release
,RelWithDebInfo
andMinSizeRel
.
Note that multiconfig generators are not properly supported right now. Even if you select Visual Studio as a CMake generator choosing between configurations in the generated solution won't switch all the necessary options, this option will take precedence. -
-DCMAKE_TOOLCHAIN_FILE=<path>
- pass toolchain to CMake. On Linux CMake's default configuration will most likely selectgcc
as a C and C++ compiler, but CatBoost needs to be built with eitherclang
(on Linux or macOS) or Microsoft'scl
compiler on Windows.
So it is recommended to pass toolchain that will setclang
andclang++
as C and C++ compilers on Linux and macOS and also setclang
as a compiler for host code for CUDA (applicable only on Linux). The default toolchain that does that is$CATBOOST_SRC_ROOT/build/toolchains/clang.toolchain
.As CatBoost requires Clang 12+ to build if the default Clang version available from the command line is less than that then use the modified toolchain where all occurences of
clang
andclang++
are replaced withclang-$CLANG_VERSION
andclang++-$CLANG_VERSION
respectively where$CLANG_VERSION
is the version ofclang
you want to use like, for example,12
or14
(must be already installed). -
-DCMAKE_POSITION_INDEPENDENT_CODE=<On|Off>
- Turn on or off Position-independent code generation. Required for building shared libraries. Off by default. -
-DCATBOOST_COMPONENTS=<components-list>
- As CatBoost's CMake project contains many different CMake targets for different components that have their specific configuration dependencies it is often useful to restrict build configuration if you need only a subset of them (e.g. if you only need to build the CLI app you don't want to set up JDK that is required for building components with JVM API). The list is;
-delimited so it might require shell escaping or quoting.See Targets for components needed to build a certain target.
none
value is also possible and used during cross-compilation when building host platform tools only. -
-DCMAKE_OSX_DEPLOYMENT_TARGET=<min_macos_version>
- Specify the minimum version of macOS on which the target binaries are to be deployed. Relevant only for macOS. -
-DHAVE_CUDA=<yes|no>
- Turn CUDA support on or off. No (off) by default. -
-DCUDAToolkit_ROOT=<path>
- Specify path to CUDA installation directory. Useful if a specific CUDA version has to be selected from several versions installed or if CUDA has been installed to a non-standard path so CMake is unable to find it automatically. -
-DCMAKE_CUDA_RUNTIME_LIBRARY=<None|Shared|Static>
- Select the CUDA runtime library for use when compiling and linking CUDA. -
-DJAVA_HOME=<path>
- path of JDK installation to be used during build. Relevant only to build components with JVM API (JVM applier and CatBoost for Apache Spark).JAVA_HOME
environment variable will be used by default. -
-DJAVA_AWT_LIBRARY=<path>
- The path to the Java AWT Native Interface (JAWT) library. Use if specifyingJAVA_HOME
has not been sufficient (if CMake default search logic has been unable to find this library inside JDK). -
-DJAVA_JVM_LIBRARY=<path>
- The path to the Java Virtual Machine (JVM) library. Use if specifyingJAVA_HOME
has not been sufficient (if CMake default search logic has been unable to find this library inside JDK). -
-DPython3_ROOT_DIR=<path>
- path to the Python installation to build extension for (not necessarily the same as the current Python interpreter used by default in command line). Must contain Python development artifacts (Python headers in an include directory and Python library for building modules) -
-DPython3_LIBRARY=<path>
- The path to the Python library for building modules. Use if specifyingPython3_ROOT_DIR
has not been sufficient (if CMake default search logic has been unable to find this library insidePython3_ROOT_DIR
) -
-DPython3_INCLUDE_DIR=<path>
- The path to the directory of the Python headers. Use if specifyingPython3_ROOT_DIR
has not been sufficient (if CMake default search logic has been unable to find this library insidePython3_ROOT_DIR
) -
-DCMAKE_FIND_ROOT_PATH=<path>[;<path>]
- Semicolon-separated list of root paths to search on the filesystem. This variable is most useful when cross-compiling.