Build the binary with MPI support from a local source repository (GPU only)

CatBoost provides a beta version of multi-node GPU training. Only the feature parallel learning scheme is currently supported. Therefore, only datasets with many features gain the benefit from multi-host multi-GPU support.

Build

To build the command-line package with MPI support from a local copy of the CatBoost repository:

Warning

CatBoost uses CMake-based build process since this commit. Previously Ya Make (Yandex's build system) had been used.

Select the appropriate build method below accordingly.

Requirements

Common build environment setup for Ya Make or CMake.
Some MPI implementation library (for example, OpenMPI) installation is required.

Source code

CatBoost source code is stored as a Git repository on GitHub at https://github.com/catboost/catboost/. You can obtain a local copy of this Git repository by running the following command from a command line interpreter (you need to have Git command line tools installed):

git clone https://github.com/catboost/catboost.git

Build using CMake

Build catboost target with CUDA support enabled and -DUSE_MPI=1 flag passed to cmake (possibly through build_native.py). See Build native artifacts.

Another flag can be also used:

-DWITHOUT_CUDA_AWARE_MPI=1 is required if the installed MPI library does not support CUDA-aware MPI with multiple GPUs for each MPI process on a host.

Build using Ya Make

Add the following environment variables:

export CPLUS_INCLUDE_PATH=$PATH_TO_MPI/include:$CPLUS_INCLUDE_PATH
export LIBRARY_PATH=$PATH_TO_MPI/lib:$LIBRARY_PATH
export LD_LIBRARY_PATH=$PATH_TO_MPI/lib:$LD_LIBRARY_PATH

Open the catboost/catboost/app directory from the local copy of the CatBoost repository.
Build the command-line version of CatBoost with MPI support:
```
ya make -r -DUSE_MPI -DCUDA_ROOT=<path to CUDA> [-DWITHOUT_CUDA_AWARE_MPI]
```
-DWITHOUT_CUDA_AWARE_MPI is required if the installed MPI library does not support CUDA-aware MPI with multiple GPUs for each MPI process on a host.

Usage

Training or inference on CUDA-enabled GPUs requires NVIDIA Driver of version 450.80.02 or higher.

Refer to the MPI library documentation for guides on running MPI jobs on multiple hosts.

Note

Performance notes:

CatBoost requires one working thread per GPU, one thread for routing network messages, and several threads for preprocessing data. Use an appropriate CPU binding policy. It is recommended to bind the application to the full host or PCI root complex used by CUDA devices for optimal performance. See the MPI documentation for more details.
A fast network connection (like InfiniBand) is required in order to gain the benefit from multi-node support. Slow networks with network capacity of 1Gb/s or less can whittle down the advantages of multi-node training.

Was the article helpful?

Build the binary with make on Linux (CPU only)

Build the binary with CMake