Development and contributions

Build from source

Run tests

Warning

CatBoost uses CMake-based build process since this commit. Previously Ya Make (Yandex's build system) had been used.

CMake-based build tests

  • C/C++ libraries.

    C/C++ libraries contain tests for them in ut subdirectories in the source tree. For library in x/y/z the corresponding test code will be in x/y/z/ut and the target name will be x-y-z-ut.
    So, in order to run the test run CMake and then build the corresponding x-y-z-ut target. Building this target will produce an executable ${CMAKE_BUILD_DIR}/x/y/z/x-y-z-ut. Run this executable to execute all the tests.

  • R package

    1. Install additional R packages that are required to run tests:

      • caret
      • dplyr
      • jsonlite
      • testthat
    2. Open the R-package directory from the local copy of the CatBoost repository.

    3. Run the following command:

      R CMD check .
      

    To run tests using the devtools package:

    1. Install devtools.

    2. Run the following command from the R session:

      devtools::test()
      
  • CLI

    1. Install testpath, pytest, pandas and catboost (used for reading column description files using catboost.utils.read_cd) packages for the python interpreter you intend to use.
      Optionally install pytest-xdist and pytest-randomly to run tests in parallel (it will be faster).
    2. Build the CLI binary (target catboost for Ninja or another build tool) and a supplementary tool that is used to compare results generated as tests output with the canonical ones (target limited_precision_dsv_diff for Ninja or another build tool).
    3. Set the following environment variables:
      • CMAKE_SOURCE_DIR to the root of the local copy of the CatBoost repository.
      • CMAKE_BINARY_DIR to the root for the build directory that has been generated by CMake and where the aformentioned targets have been built.
      • TEST_OUTPUT_DIR to the root for the directory where tests temporary data will be generated.
      • PORT_SYNC_PATH to the path to the directory that will be used for network ports allocation syncronization. The directory will be created if not exists.
      • HAVE_CUDA - set to 1 if you want to run tests on GPU with CUDA, set to 0 otherwise.
    4. Open the catboost/pytest directory from the local copy of the CatBoost repository.
    5. Run python -m pytest or (if you use pytest-xdist) python -m pytest -n <parallel_worker_count> or python -m pytest -n auto (in the auto case the number of parallel workers will be equal to the total count of detected CPU cores).
  • Python package

    Tests will check catboost module for the python interpreter you run them with, so if you want to test catboost python package built from source build and install it first.

    1. Install testpath, pytest, pandas, ipywidgets and scikit-learn packages for the python interpreter you intend to use.
      Optionally install pytest-xdist and pytest-randomly to run tests in parallel (it will be faster).
    2. Build supplementary tools that are used to compare results generated as tests output with the canonical ones (targets limited_precision_dsv_diff, limited_precision_json_diff, model_comparator for Ninja or another build tool).
    3. Set the following environment variables:
      • CMAKE_SOURCE_DIR to the root of the local copy of the CatBoost repository.
      • CMAKE_BINARY_DIR to the root for the build directory that has been generated by CMake and where the aformentioned targets have been built.
      • TEST_OUTPUT_DIR to the root for the directory where tests temporary data will be generated.
      • PORT_SYNC_PATH to the path to the directory that will be used for network ports allocation syncronization. The directory will be created if not exists.
    4. Open the catboost/python-package/ut/medium directory from the local copy of the CatBoost repository.
    5. Run python -m pytest or (if you use pytest-xdist) python -m pytest -n <parallel_worker_count> or python -m pytest -n auto (in the auto case the number of parallel workers will be equal to the total count of detected CPU cores).

    Warning

    Tests on GPU with CUDA will be run if and only if GPU with CUDA drivers installed is present.

  • JVM applier

    Open the catboost/jvm-packages/catboost4j-prediction directory from the local copy of the CatBoost repository. Run standard mvn test command.
    To run tests on GPU as well add -DtestOnGPU=1 command line flag.

  • CatBoost for Apache Spark

    See building CatBoost for Apache Spark from source. Use standard mvn test command.

YaMake-based build tests

Warning

The following documentation describes running tests using Ya Make which is applicable only for versions prior to this commit.

CatBoost provides tests that check the compliance of the canonical data with the resulting data.

The required steps for running these tests depend on the implementation.

  1. Execute common tests:

    1. Open the catboost/pytest directory from the local copy of the CatBoost repository.

    2. Run the following command:

    ../../ya make -t -A [-Z]
    

    -Z — Optional key to replace the canonical files if the code breaks tests intentionally.

  2. Execute tests for the GPU implementation:

    1. Open the catboost/pytest/cuda_tests directory from the local copy of the CatBoost repository.

    2. Run the following command:

    ../../../ya make -DCUDA_ROOT=<path_to_CUDA_SDK> -t -A [-Z]
    
    • path_to_CUDA_SDK is the path to directory where CUDA SDK is installed. For example, the typical installation directory for Linux is /usr/local/cuda-X.Y, where X.Y is the installed CUDA SDK version.

    • -Z — Optional key to replace the canonical files if the code breaks tests intentionally.

Use the VCS diff tool to analyze the differences.

  1. Execute common tests:

    1. Open the catboost/python-package/ut/medium directory from the local copy of the CatBoost repository.

    2. Run the following command:

    ../../../../ya make -t -A [-Z]
    

    -Z — Optional key to replace the canonical files if the code breaks tests intentionally.

  2. Execute tests for the GPU implementation:

    1. Open the catboost/python-package/ut/medium/gpu directory from the local copy of the CatBoost repository.

    2. Run the following command:

    ../../../../../ya make -DCUDA_ROOT=<path_to_CUDA_SDK> -t -A [-Z]
    
    • path_to_CUDA_SDK is the path to directory where CUDA SDK is installed. For example, the typical installation directory for Linux is /usr/local/cuda-X.Y, where X.Y is the installed CUDA SDK version.

    • -Z — Optional key to replace the canonical files if the code breaks tests intentionally.

Use the VCS diff tool to analyze the differences.

  1. Install additional R packages that are required to run tests:

    • caret
    • dplyr
    • jsonlite
    • testthat
  2. Open the R-package directory from the local copy of the CatBoost repository.

  3. Run the following command:

    R CMD check .
    

To run tests using the devtools package:

  1. Install devtools.

  2. Run the following command from the R session:

    devtools::test()
    

Microsoft Visual Studio solution

Warning

Ready Microsoft Visual Studio solution had been provided until this commit.

For versions after this commit it is recommended to generate Microsoft Visual Studio 2019 solution using the corresponding CMake generator.

A solution for Visual Studio is available in the CatBoost repository:

catboost/msvs/arcadia.sln

Coding conventions

The following coding conventions must be followed in order to successfully contribute to the CatBoost project:

Versioning conventions

Do not change the package version when submitting pull requests. Yandex uses an internal repository for this purpose.

Yandex Contributor License Agreement

To contribute to CatBoost you need to read the Yandex CLA and indicate that you agree to its terms. Details of how to do that and the text of the CLA can be found in CONTRIBUTING.md.

Previous