predict

Method call format
Parameters
Return value

Apply the model to the given dataset.

Note

The model prediction results will be correct only if the data parameter with feature values contains all the features used in the model. Typically, the order of these features must match the order of the corresponding columns that is provided during the training. But if feature names are provided both during the training and when applying the model, they can be matched by names instead of columns order. Feature names can be specified if the data parameter has one of the following types:

FeaturesData
catboost.Pool
pandas.DataFrame (in this case, feature names are taken from column names)

Method call format

predict(data,
        prediction_type=None,
        ntree_start=0,
        ntree_end=0,
        thread_count=-1,
        verbose=None,
        task_type="CPU")

Parameters

data

Description

Feature values data.

The format depends on the number of input objects:

Multiple — Matrix-like data of shape (object_count, feature_count)
Single — An array

Possible types

For multiple objects:

catboost.Pool
list of lists
numpy.ndarray of shape (object_count, feature_count)
pandas.DataFrame
pandas.SparseDataFrame
pandas.Series
catboost.FeaturesData
scipy.sparse.spmatrix (all subclasses except dia_matrix)

For a single object:

list of feature values
one-dimensional numpy.ndarray with feature values

Default value

Required parameter

prediction_type

Description

The required prediction type.

Supported prediction types:

Probability
Class
RawFormulaVal
Exponent
LogProbability

Possible types

string

Default value

None (Exponent for Poisson and Tweedie, RawFormulaVal for all other loss functions)

ntree_start

Description

To reduce the number of trees to use when the model is applied or the metrics are calculated, set the range of the tree indices to[ntree_start; ntree_end) and the eval_period parameter to k to calculate metrics on every k-th iteration.

This parameter defines the index of the first tree to be used when applying the model or calculating the metrics (the inclusive left border of the range). Indices are zero-based.

Possible types

int

Default value

ntree_end

Description

This parameter defines the index of the first tree not to be used when applying the model or calculating the metrics (the exclusive right border of the range). Indices are zero-based.

Possible types

int

Default value

0 (the index of the last tree to use equals to the number of trees in the
model minus one)

thread_count

Description

The number of threads to use.

Optimizes the speed of execution. This parameter doesn't affect results.

Possible types

int

Default value

-1 (the number of threads is equal to the number of processor cores)

verbose

Description

Output the measured evaluation metric to stderr.

Possible types

bool

Default value

None

task_type

Description

The evaluator type.

Possible values:
- 'CPU'
- 'GPU' (models with only numerical features are supported for now)

Possible types

string

Default value

CPU

Return value

Predictions for the given dataset.

The return value type depends on the number of input objects:

Single object — Single float formula return value
Multiple objects — One-dimensional numpy.ndarray of formula values for each object.

predict

Method call formatMethod call format

ParametersParameters

datadata

DescriptionDescription

prediction_typeprediction_type

DescriptionDescription

ntree_startntree_start

DescriptionDescription

ntree_endntree_end

DescriptionDescription

thread_countthread_count

DescriptionDescription

verboseverbose

DescriptionDescription

task_typetask_type

DescriptionDescription

Return valueReturn value

Was the article helpful?

Method call format

Parameters

data

Description

prediction_type

Description

ntree_start

Description

ntree_end

Description

thread_count

Description

verbose

Description

task_type

Description

Return value