CatBoost for Apache Spark Documentation

Main features

  • Support for both Numerical and Categorical (as both One-hot and CTRs) features.

  • Reproducible training results.

  • Model interoperability with local CatBoost implementations.

  • Distributed feature evaluation (including SHAP values).

  • Spark MLLib compatible APIs for JVM languages (Java, Scala, Kotlin etc.) and PySpark.

  • Extended Apache Spark versions support: 3.0 to 3.5.

    Previous versions

    CatBoost versions before 1.2.8 supported Apache Spark versions 2.3 - 2.4 as well.

CatBoost for Apache Spark installation

Quick start for Scala and Python

Spark cluster configuration

API documentation

Known limitations