CatBoost for Apache Spark Documentation

Main features

  • Support for both Numerical and Categorical (as both One-hot and CTRs) features.
  • Reproducible training results.
  • Model interoperability with local CatBoost implementations.
  • Distributed feature evaluation (including SHAP values).
  • Spark MLLib compatible APIs for JVM languages (Java, Scala, Kotlin etc.) and PySpark.
  • Extended Apache Spark versions support: 2.3 to 3.2.

CatBoost for Apache Spark installation

Quick start for Scala and Python

Spark cluster configuration

API documentation

Known limitations