For PySpark

    Get the appropriate catboost_spark_version (see available versions at Maven central).

    Choose the appropriate spark_compat_version (2.3, 2.4 or 3.0) and scala_compat_version (2.11 or 2.12).

    Just add the catboost-spark Maven artifact with the appropriate spark_compat_version, scala_compat_version and catboost_spark_version to spark.jar.packages Spark config parameter and import the catboost_spark package:

    from pyspark.sql import SparkSession
    
    sparkSession = (SparkSession.builder
        .master(...)
        .config("spark.jars.packages", "ai.catboost:catboost-spark_<spark_compat_version>_<scala_compat_version>:<catboost_spark_version>")
        .getOrCreate()
    )
    
    import catboost_spark
    
    ...