For PySpark
Get the appropriate catboost_spark_version
(see available versions at Maven central).
Choose the appropriate spark_compat_version
(2.3
, 2.4
or 3.0
) and scala_compat_version
(2.11
or 2.12
).
Just add the catboost-spark
Maven artifact with the appropriate spark_compat_version
, scala_compat_version
and catboost_spark_version
to spark.jar.packages
Spark config parameter and import the catboost_spark
package:
from pyspark.sql import SparkSession
sparkSession = (SparkSession.builder
.master(...)
.config("spark.jars.packages", "ai.catboost:catboost-spark_<spark_compat_version>_<scala_compat_version>:<catboost_spark_version>")
.getOrCreate()
)
import catboost_spark
...