CatBoost for Apache Spark Documentation
Main features
-
Support for both Numerical and Categorical (as both One-hot and CTRs) features.
-
Reproducible training results.
-
Model interoperability with local CatBoost implementations.
-
Distributed feature evaluation (including SHAP values).
-
Spark MLLib compatible APIs for JVM languages (Java, Scala, Kotlin etc.) and PySpark.
-
Extended Apache Spark versions support: 3.0 to 3.5.
Previous versions
CatBoost versions before 1.2.8 supported Apache Spark versions 2.3 - 2.4 as well.
CatBoost for Apache Spark installation
Scala and Python
Quick start forSpark cluster configuration
API documentation
Known limitations
Was the article helpful?
Previous
Next