Build the binary with MPI support from a local source repository (GPU only)
CatBoost provides a beta version of multi-node GPU training. Only the feature parallel
learning scheme is currently supported. Therefore, only datasets with many features gain the benefit from multi-host multi-GPU support.
Build
To build the command-line package with MPI support from a local copy of the CatBoost repository:
Warning
CatBoost uses CMake-based build process since this commit. Previously Ya Make
(Yandex's build system) had been used.
Select the appropriate build method below accordingly.
Requirements
- Common build environment setup for Ya Make or CMake.
- Some MPI implementation library (for example,
OpenMPI
) installation is required.
Source code
CatBoost source code is stored as a Git repository on GitHub at https://github.com/catboost/catboost/. You can obtain a local copy of this Git repository by running the following command from a command line interpreter (you need to have Git command line tools installed):
git clone https://github.com/catboost/catboost.git
Build using CMake
Build catboost
target with CUDA support enabled and -DUSE_MPI=1
flag passed to cmake
(possibly through build_native.py
). See Build native artifacts.
Another flag can be also used:
-DWITHOUT_CUDA_AWARE_MPI=1
is required if the installed MPI library does not support CUDA-aware MPI with multiple GPUs for each MPI process on a host.
Build using Ya Make
-
Add the following environment variables:
export CPLUS_INCLUDE_PATH=$PATH_TO_MPI/include:$CPLUS_INCLUDE_PATH export LIBRARY_PATH=$PATH_TO_MPI/lib:$LIBRARY_PATH export LD_LIBRARY_PATH=$PATH_TO_MPI/lib:$LD_LIBRARY_PATH
-
Open the
catboost/catboost/app
directory from the local copy of the CatBoost repository. -
Build the command-line version of CatBoost with MPI support:
ya make -r -DUSE_MPI -DCUDA_ROOT=<path to CUDA> [-DWITHOUT_CUDA_AWARE_MPI]
-DWITHOUT_CUDA_AWARE_MPI
is required if the installed MPI library does not support CUDA-aware MPI with multiple GPUs for each MPI process on a host.
Usage
Training or inference on CUDA-enabled GPUs requires NVIDIA Driver of version 450.80.02 or higher.
Refer to the MPI library documentation for guides on running MPI jobs on multiple hosts.
Note
Performance notes:
- CatBoost requires one working thread per GPU, one thread for routing network messages, and several threads for preprocessing data. Use an appropriate CPU binding policy. It is recommended to bind the application to the full host or PCI root complex used by CUDA devices for optimal performance. See the MPI documentation for more details.
- A fast network connection (like InfiniBand) is required in order to gain the benefit from multi-node support. Slow networks with network capacity of 1Gb/s or less can whittle down the advantages of multi-node training.