CatBoost for Apache Spark Documentation
Main features
- Support for both Numerical and Categorical (as both One-hot and CTRs) features.
- Reproducible training results.
- Model interoperability with local CatBoost implementations.
- Distributed feature evaluation (including SHAP values).
- Spark MLLib compatible APIs for JVM languages (Java, Scala, Kotlin etc.) and PySpark.
- Extended Apache Spark versions support: 2.3 to 3.5.