For PySpark
Get the appropriate catboost_spark_version
(see available versions at Maven central).
Choose the appropriate spark_compat_version
(2.3
, 2.4
, 3.0
, 3.1
, 3.2
, 3.3
, 3.4
or 3.5
) and scala_compat_version
(2.11
, 2.12
or 2.13
, corresponding to versions supported by the particular Spark version).
Just add the catboost-spark
Maven artifact with the appropriate spark_compat_version
, scala_compat_version
and catboost_spark_version
to spark.jar.packages
Spark config parameter and import the catboost_spark
package:
from pyspark.sql import SparkSession
sparkSession = (SparkSession.builder
.master(...)
.config("spark.jars.packages", "ai.catboost:catboost-spark_<spark_compat_version>_<scala_compat_version>:<catboost_spark_version>")
.getOrCreate()
)
import catboost_spark
...