Packages

class CatBoostRegressionModel extends RegressionModel[Vector, CatBoostRegressionModel] with CatBoostModelTrait[CatBoostRegressionModel]

Regression model trained by CatBoost. Use CatBoostRegressor to train it

Serialization

Supports standard Spark MLLib serialization. Data can be saved to distributed filesystem like HDFS or local files. When saved to path two files are created: -<path>/metadata which contains Spark-specific metadata in JSON format -<path>/model which contains model in usual CatBoost format which can be read using other local CatBoost APIs (if stored in a distributed filesystem it has to be copied to the local filesystem first).

Saving to and loading from local files in standard CatBoost model formats is also supported.

Examples:
  1. Save model

    val trainPool : Pool = ... init Pool ...
    val regressor = new CatBoostRegressor
    val model = regressor.fit(trainPool)
    val path = "/home/user/catboost_spark_models/model0"
    model.write.save(path)
  2. ,
  3. Load model

    val dataFrameForPrediction : DataFrame = ... init DataFrame ...
    val path = "/home/user/catboost_spark_models/model0"
    val model = CatBoostRegressionModel.load(path)
    val predictions = model.transform(dataFrameForPrediction)
    predictions.show()
  4. ,
  5. Save as a native model

    val trainPool : Pool = ... init Pool ...
    val regressor = new CatBoostRegressor
    val model = regressor.fit(trainPool)
    val path = "/home/user/catboost_native_models/model0.cbm"
    model.saveNativeModel(path)
  6. ,
  7. Load native model

    val dataFrameForPrediction : DataFrame = ... init DataFrame ...
    val path = "/home/user/catboost_native_models/model0.cbm"
    val model = CatBoostRegressionModel.loadNativeModel(path)
    val predictions = model.transform(dataFrameForPrediction)
    predictions.show()
Linear Supertypes
CatBoostModelTrait[CatBoostRegressionModel], MLWritable, RegressionModel[Vector, CatBoostRegressionModel], PredictionModel[Vector, CatBoostRegressionModel], PredictorParams, HasPredictionCol, HasFeaturesCol, HasLabelCol, Model[CatBoostRegressionModel], Transformer, PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. CatBoostRegressionModel
  2. CatBoostModelTrait
  3. MLWritable
  4. RegressionModel
  5. PredictionModel
  6. PredictorParams
  7. HasPredictionCol
  8. HasFeaturesCol
  9. HasLabelCol
  10. Model
  11. Transformer
  12. PipelineStage
  13. Logging
  14. Params
  15. Serializable
  16. Serializable
  17. Identifiable
  18. AnyRef
  19. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new CatBoostRegressionModel(nativeModel: TFullModel)
  2. new CatBoostRegressionModel(uid: String, nativeModel: TFullModel = null, nativeDimension: Int)

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T
    Attributes
    protected
    Definition Classes
    Params
  4. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  5. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  6. final def clear(param: Param[_]): CatBoostRegressionModel.this.type
    Definition Classes
    Params
  7. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  8. def copy(extra: ParamMap): CatBoostRegressionModel
    Definition Classes
    CatBoostRegressionModel → Model → Transformer → PipelineStage → Params
  9. def copyValues[T <: Params](to: T, extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  10. final def defaultCopy[T <: Params](extra: ParamMap): T
    Attributes
    protected
    Definition Classes
    Params
  11. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  12. def equals(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  13. def explainParam(param: Param[_]): String
    Definition Classes
    Params
  14. def explainParams(): String
    Definition Classes
    Params
  15. def extractInstances(dataset: Dataset[_], validateInstance: (Instance) ⇒ Unit): RDD[Instance]
    Attributes
    protected
    Definition Classes
    PredictorParams
  16. def extractInstances(dataset: Dataset[_]): RDD[Instance]
    Attributes
    protected
    Definition Classes
    PredictorParams
  17. final def extractParamMap(): ParamMap
    Definition Classes
    Params
  18. final def extractParamMap(extra: ParamMap): ParamMap
    Definition Classes
    Params
  19. final val featuresCol: Param[String]
    Definition Classes
    HasFeaturesCol
  20. def featuresDataType: DataType
    Attributes
    protected
    Definition Classes
    PredictionModel
  21. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  22. final def get[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  23. def getAdditionalColumnsForApply: Seq[StructField]
    Attributes
    protected
    Definition Classes
    CatBoostRegressionModel → CatBoostModelTrait
  24. final def getClass(): Class[_]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  25. final def getDefault[T](param: Param[T]): Option[T]
    Definition Classes
    Params
  26. def getFeatureImportance(fstrType: EFstrType = EFstrType.FeatureImportance, data: Pool = null, calcType: ECalcTypeShapValues = ECalcTypeShapValues.Regular): Array[Double]

    fstrType

    Supported values are FeatureImportance, PredictionValuesChange, LossFunctionChange, PredictionDiff

    data

    if fstrType is PredictionDiff it is required and must contain 2 samples if fstrType is PredictionValuesChange this param is required in case if model was explicitly trained with flag to store no leaf weights. otherwise it can be null

    calcType

    Used only for PredictionValuesChange. Possible values:

    • Regular Calculate regular SHAP values
    • Approximate Calculate approximate SHAP values
    • Exact Calculate exact SHAP values
    returns

    array of feature importances (index corresponds to the order of features in the model)

    Definition Classes
    CatBoostModelTrait
  27. def getFeatureImportanceInteraction(): Array[FeatureInteractionScore]

    returns

    array of feature interaction scores

    Definition Classes
    CatBoostModelTrait
  28. def getFeatureImportancePrettified(fstrType: EFstrType = EFstrType.FeatureImportance, data: Pool = null, calcType: ECalcTypeShapValues = ECalcTypeShapValues.Regular): Array[FeatureImportance]

    fstrType

    Supported values are FeatureImportance, PredictionValuesChange, LossFunctionChange, PredictionDiff

    data

    if fstrType is PredictionDiff it is required and must contain 2 samples if fstrType is PredictionValuesChange this param is required in case if model was explicitly trained with flag to store no leaf weights. otherwise it can be null

    calcType

    Used only for PredictionValuesChange. Possible values:

    • Regular Calculate regular SHAP values
    • Approximate Calculate approximate SHAP values
    • Exact Calculate exact SHAP values
    returns

    array of feature importances sorted in descending order by importance

    Definition Classes
    CatBoostModelTrait
  29. def getFeatureImportanceShapInteractionValues(data: Pool, featureIndices: Pair[Int, Int] = null, featureNames: Pair[String, String] = null, preCalcMode: EPreCalcShapValues = EPreCalcShapValues.Auto, calcType: ECalcTypeShapValues = ECalcTypeShapValues.Regular, outputColumns: Array[String] = null): DataFrame

    SHAP interaction values are calculated for all features pairs if nor featureIndices nor featureNames are specified.

    SHAP interaction values are calculated for all features pairs if nor featureIndices nor featureNames are specified.

    data

    dataset to calculate SHAP interaction values

    featureIndices

    (optional) pair of feature indices to calculate SHAP interaction values for.

    featureNames

    (optional) pair of feature names to calculate SHAP interaction values for.

    preCalcMode

    Possible values:

    • Auto Use direct SHAP Values calculation only if data size is smaller than average leaves number (the best of two strategies below is chosen).
    • UsePreCalc Calculate SHAP Values for every leaf in preprocessing. Final complexity is O(NT(D+F))+O(TL2 D2) where N is the number of documents(objects), T - number of trees, D - average tree depth, F - average number of features in tree, L - average number of leaves in tree This is much faster (because of a smaller constant) than direct calculation when N >> L
    • NoPreCalc Use direct SHAP Values calculation calculation with complexity O(NTLD^2). Direct algorithm is faster when N < L (algorithm from https://arxiv.org/abs/1802.03888)
    calcType

    Possible values:

    • Regular Calculate regular SHAP values
    • Approximate Calculate approximate SHAP values
    • Exact Calculate exact SHAP values
    outputColumns

    columns from data to add to output DataFrame, if null - add all columns

    returns

    • for binclass or regression: DataFrame which contains outputColumns and "featureIdx1", "featureIdx2", "shapInteractionValue" columns
    • for multiclass: DataFrame which contains outputColumns and "classIdx", "featureIdx1", "featureIdx2", "shapInteractionValue" columns
    Definition Classes
    CatBoostModelTrait
  30. def getFeatureImportanceShapValues(data: Pool, preCalcMode: EPreCalcShapValues = EPreCalcShapValues.Auto, calcType: ECalcTypeShapValues = ECalcTypeShapValues.Regular, modelOutputType: EExplainableModelOutput = EExplainableModelOutput.Raw, referenceData: Pool = null, outputColumns: Array[String] = null): DataFrame

    data

    dataset to calculate SHAP values for

    preCalcMode

    Possible values:

    • Auto Use direct SHAP Values calculation only if data size is smaller than average leaves number (the best of two strategies below is chosen).
    • UsePreCalc Calculate SHAP Values for every leaf in preprocessing. Final complexity is O(NT(D+F))+O(TL2 D2) where N is the number of documents(objects), T - number of trees, D - average tree depth, F - average number of features in tree, L - average number of leaves in tree This is much faster (because of a smaller constant) than direct calculation when N >> L
    • NoPreCalc Use direct SHAP Values calculation calculation with complexity O(NTLD^2). Direct algorithm is faster when N < L (algorithm from https://arxiv.org/abs/1802.03888)
    calcType

    Possible values:

    • Regular Calculate regular SHAP values
    • Approximate Calculate approximate SHAP values
    • Exact Calculate exact SHAP values
    referenceData

    reference data for Independent Tree SHAP values from https://arxiv.org/abs/1905.04610v1 if referenceData is not null, then Independent Tree SHAP values are calculated

    outputColumns

    columns from data to add to output DataFrame, if null - add all columns

    returns

    • for regression and binclass models: DataFrame which contains outputColumns and "shapValues" column with Vector of length (n_features + 1) with SHAP values
    • for multiclass models: DataFrame which contains outputColumns and "shapValues" column with Matrix of shape (n_classes x (n_features + 1)) with SHAP values
    Definition Classes
    CatBoostModelTrait
  31. final def getFeaturesCol: String
    Definition Classes
    HasFeaturesCol
  32. final def getLabelCol: String
    Definition Classes
    HasLabelCol
  33. final def getOrDefault[T](param: Param[T]): T
    Definition Classes
    Params
  34. def getParam(paramName: String): Param[Any]
    Definition Classes
    Params
  35. final def getPredictionCol: String
    Definition Classes
    HasPredictionCol
  36. def getResultIteratorForApply(objectsDataProvider: SWIGTYPE_p_NCB__TObjectsDataProviderPtr, dstRows: ArrayBuffer[Array[Any]], localExecutor: TLocalExecutor): Iterator[Row]
    Attributes
    protected
    Definition Classes
    CatBoostRegressionModel → CatBoostModelTrait
  37. final def hasDefault[T](param: Param[T]): Boolean
    Definition Classes
    Params
  38. def hasParam(paramName: String): Boolean
    Definition Classes
    Params
  39. def hasParent: Boolean
    Definition Classes
    Model
  40. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  41. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  42. def initializeLogIfNecessary(isInterpreter: Boolean): Unit
    Attributes
    protected
    Definition Classes
    Logging
  43. final def isDefined(param: Param[_]): Boolean
    Definition Classes
    Params
  44. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  45. final def isSet(param: Param[_]): Boolean
    Definition Classes
    Params
  46. def isTraceEnabled(): Boolean
    Attributes
    protected
    Definition Classes
    Logging
  47. final val labelCol: Param[String]
    Definition Classes
    HasLabelCol
  48. def log: Logger
    Attributes
    protected
    Definition Classes
    Logging
  49. def logDebug(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  50. def logDebug(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  51. def logError(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  52. def logError(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  53. def logInfo(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  54. def logInfo(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  55. def logName: String
    Attributes
    protected
    Definition Classes
    Logging
  56. def logTrace(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  57. def logTrace(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  58. def logWarning(msg: ⇒ String, throwable: Throwable): Unit
    Attributes
    protected
    Definition Classes
    Logging
  59. def logWarning(msg: ⇒ String): Unit
    Attributes
    protected
    Definition Classes
    Logging
  60. var nativeDimension: Int
    Attributes
    protected
    Definition Classes
    CatBoostRegressionModel → CatBoostModelTrait
  61. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  62. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  63. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  64. def numFeatures: Int
    Definition Classes
    PredictionModel
    Annotations
    @Since( "1.6.0" )
  65. lazy val params: Array[Param[_]]
    Definition Classes
    Params
  66. var parent: Estimator[CatBoostRegressionModel]
    Definition Classes
    Model
  67. def predict(features: Vector): Double

    Prefer batch computations operating on datasets as a whole for efficiency

    Prefer batch computations operating on datasets as a whole for efficiency

    Definition Classes
    CatBoostRegressionModel → PredictionModel
  68. final def predictRawImpl(features: Vector): Array[Double]

    Prefer batch computations operating on datasets as a whole for efficiency

    Prefer batch computations operating on datasets as a whole for efficiency

    Definition Classes
    CatBoostModelTrait
  69. final val predictionCol: Param[String]
    Definition Classes
    HasPredictionCol
  70. def save(path: String): Unit
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  71. def saveNativeModel(fileName: String, format: EModelType = EModelType.CatboostBinary, exportParameters: Map[String, Any] = null, pool: Pool = null): Unit

    Save the model to a local file.

    Save the model to a local file.

    fileName

    The path to the output model.

    format

    The output format of the model. Possible values:

    CatboostBinary CatBoost binary format (default).
    AppleCoreML Apple CoreML format (only datasets without categorical features are currently supported).
    Cpp Standalone C++ code (multiclassification models are not currently supported). See the C++ section for details on applying the resulting model.
    Python Standalone Python code (multiclassification models are not currently supported). See the Python section for details on applying the resulting model.
    Json JSON format. Refer to the CatBoost JSON model tutorial for format details.
    Onnx ONNX-ML format (only datasets without categorical features are currently supported). Refer to https://onnx.ai for details.
    Pmml PMML version 4.3 format. Categorical features must be interpreted as one-hot encoded during the training if present in the training dataset. This can be accomplished by setting the --one-hot-max-size/one_hot_max_size parameter to a value that is greater than the maximum number of unique categorical feature values among all categorical features in the dataset. Note. Multiclassification models are not currently supported. See the PMML section for details on applying the resulting model.

    exportParameters

    Additional format-dependent parameters for AppleCoreML, Onnx or Pmml formats. See python API documentation for details.

    pool

    The dataset previously used for training. This parameter is required if the model contains categorical features and the output format is Cpp, Python, or Json.

    Definition Classes
    CatBoostModelTrait
    Example:
    1. val spark = SparkSession.builder()
        .master("local[*]")
        .appName("testSaveLocalModel")
        .getOrCreate()
      
      val pool = Pool.load(
        spark,
        "dsv:///home/user/datasets/my_dataset/train.dsv",
        columnDescription = "/home/user/datasets/my_dataset/cd"
      )
      
      val regressor = new CatBoostRegressor()
      val model = regressor.fit(pool)
      
      // save in CatBoostBinary format
      model.saveNativeModel("/home/user/model/model.cbm")
      
      // save in ONNX format with metadata
      model.saveNativeModel(
        "/home/user/model/model.onnx",
        EModelType.Onnx,
        Map(
          "onnx_domain" -> "ai.catboost",
          "onnx_model_version" -> 1,
          "onnx_doc_string" -> "test model for regression",
          "onnx_graph_name" -> "CatBoostModel_for_regression"
        )
      )
  72. final def set(paramPair: ParamPair[_]): CatBoostRegressionModel.this.type
    Attributes
    protected
    Definition Classes
    Params
  73. final def set(param: String, value: Any): CatBoostRegressionModel.this.type
    Attributes
    protected
    Definition Classes
    Params
  74. final def set[T](param: Param[T], value: T): CatBoostRegressionModel.this.type
    Definition Classes
    Params
  75. final def setDefault(paramPairs: ParamPair[_]*): CatBoostRegressionModel.this.type
    Attributes
    protected
    Definition Classes
    Params
  76. final def setDefault[T](param: Param[T], value: T): CatBoostRegressionModel.this.type
    Attributes
    protected
    Definition Classes
    Params
  77. def setFeaturesCol(value: String): CatBoostRegressionModel
    Definition Classes
    PredictionModel
  78. def setParent(parent: Estimator[CatBoostRegressionModel]): CatBoostRegressionModel
    Definition Classes
    Model
  79. def setPredictionCol(value: String): CatBoostRegressionModel
    Definition Classes
    PredictionModel
  80. final def synchronized[T0](arg0: ⇒ T0): T0
    Definition Classes
    AnyRef
  81. def toString(): String
    Definition Classes
    Identifiable → AnyRef → Any
  82. def transform(dataset: Dataset[_]): DataFrame
    Definition Classes
    PredictionModel → Transformer
  83. def transform(dataset: Dataset[_], paramMap: ParamMap): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" )
  84. def transform(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): DataFrame
    Definition Classes
    Transformer
    Annotations
    @Since( "2.0.0" ) @varargs()
  85. def transformCatBoostImpl(dataset: Dataset[_]): DataFrame
    Attributes
    protected
    Definition Classes
    CatBoostModelTrait
  86. def transformImpl(dataset: Dataset[_]): DataFrame
    Definition Classes
    CatBoostRegressionModel → PredictionModel
  87. def transformPool(dataset: Pool): DataFrame

    This function is useful when the dataset has been already quantized but works with any Pool

    This function is useful when the dataset has been already quantized but works with any Pool

    Definition Classes
    CatBoostModelTrait
  88. def transformSchema(schema: StructType): StructType
    Definition Classes
    PredictionModel → PipelineStage
  89. def transformSchema(schema: StructType, logging: Boolean): StructType
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  90. val uid: String
    Definition Classes
    CatBoostRegressionModel → Identifiable
  91. def validateAndTransformSchema(schema: StructType, fitting: Boolean, featuresDataType: DataType): StructType
    Attributes
    protected
    Definition Classes
    PredictorParams
  92. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  93. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  94. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws( ... ) @native()
  95. def write: MLWriter
    Definition Classes
    CatBoostModelTrait → MLWritable

Inherited from CatBoostModelTrait[CatBoostRegressionModel]

Inherited from MLWritable

Inherited from RegressionModel[Vector, CatBoostRegressionModel]

Inherited from PredictionModel[Vector, CatBoostRegressionModel]

Inherited from PredictorParams

Inherited from HasPredictionCol

Inherited from HasFeaturesCol

Inherited from HasLabelCol

Inherited from Model[CatBoostRegressionModel]

Inherited from Transformer

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Ungrouped