Build the binary with MPI support from a local copy (GPU only)
This approach will work only for versions prior to this commit.
CatBoost provides a beta version of multi-node GPU training. Only the
feature parallel learning scheme is currently supported. Therefore, only datasets with many features gain the benefit from multi-host multi-GPU support.
Training on GPU requires NVIDIA Driver of version 418.xx or higher.
To build the command-line package with MPI support from a local copy of the CatBoost repository:
Install an MPI library (for example, Open MPI) on all machines that are supposed to participate in the training.
Refer to the MPI library documentation for guides on running MPI jobs on multiple hosts.
Add the following environment variables:
export CPLUS_INCLUDE_PATH=$PATH_TO_MPI/include:$CPLUS_INCLUDE_PATH export LIBRARY_PATH=$PATH_TO_MPI/lib:$LIBRARY_PATH export LD_LIBRARY_PATH=$PATH_TO_MPI/lib:$LD_LIBRARY_PATH
Clone the repository:
git clone https://github.com/catboost/catboost.git
catboost/catboost/appdirectory from the local copy of the CatBoost repository.
Build the command-line version of CatBoost with MPI support:
ya make -r -DUSE_MPI -DCUDA_ROOT=<path to CUDA> [-DWITHOUT_CUDA_AWARE_MPI]
-DWITHOUT_CUDA_AWARE_MPIis required if the installed MPI library does not support CUDA-aware MPI with multiple GPUs for each MPI process on a host.
- CatBoost requires one working thread per GPU, one thread for routing network messages, and several threads for preprocessing data. Use an appropriate CPU binding policy. It is recommended to bind the application to the full host or PCI root complex used by CUDA devices for optimal performance. See the MPI documentation for more details.
- A fast network connection (like InfiniBand) is required in order to gain the benefit from multi-node support. Slow networks with network capacity of 1Gb/s or less can whittle down the advantages of multi-node training.