staged_predict

Apply the model to the given dataset and calculate the results taking into consideration only the trees in the range [0; i).

Note

The model prediction results will be correct only if the data parameter with feature values contains all the features used in the model. Typically, the order of these features must match the order of the corresponding columns that is provided during the training. But if feature names are provided both during the training and when applying the model, they can be matched by names instead of columns order. Feature names can be specified if the data parameter has one of the following types:

Method call format

staged_predict(data,
    prediction_type=None,
    ntree_start=0,
    ntree_end=0,
    eval_period=1,
    thread_count=-1,
    verbose=None)

Parameters

data

Description

Feature values data.

The format depends on the number of input objects:

  • Multiple — Matrix-like data of shape (object_count, feature_count)
  • Single — An array

Possible types

For multiple objects:

  • catboost.Pool
  • list of lists
  • numpy.ndarray of shape (object_count, feature_count)
  • pandas.DataFrame
  • pandas.SparseDataFrame
  • pandas.Series
  • catboost.FeaturesData
  • scipy.sparse.spmatrix (all subclasses except dia_matrix)

For a single object:

  • list of feature values
  • one-dimensional numpy.ndarray with feature values

Default value

Required parameter

prediction_type

Description

The required prediction type.

Supported prediction types:

  • Probability
  • Class
  • RawFormulaVal
  • Exponent
  • LogProbability

Possible types

string

Default value

None (Exponent for Poisson and Tweedie, RawFormulaVal for all other loss functions)

ntree_start

Description

To reduce the number of trees to use when the model is applied or the metrics are calculated, set the range of the tree indices to [ntree_start; ntree_end) and the step of the trees to use to eval_period.

This parameter defines the index of the first tree to be used when applying the model or calculating the metrics (the inclusive left border of the range). Indices are zero-based.

Possible types

int

Default value

0

ntree_end

Description

To reduce the number of trees to use when the model is applied or the metrics are calculated, set the range of the tree indices to [ntree_start; ntree_end) and the step of the trees to use to eval_period.

This parameter defines the index of the first tree to be used when applying the model or calculating the metrics (the inclusive left border of the range). Indices are zero-based.

Possible types

int

Default value

0 (the index of the last tree to use equals to the number of trees in the model minus one)

eval_period

Description

To reduce the number of trees to use when the model is applied or the metrics are calculated, set the range of the tree indices to [ntree_start; ntree_end) and the step of the trees to use to eval_period.

This parameter defines the step to iterate over the range [ntree_start; ntree_end). For example, let's assume that the following parameter values are set:

  • ntree_start is set 0
  • ntree_end is set to N (the total tree count)
  • eval_period is set to 2

In this case, the results are returned for the following tree ranges: [0, 2), [0, 4), ... , [0, N).

Possible types

int
Default value

1 (the trees are applied sequentially: the first tree, then the first two trees, etc.)

thread_count

Description

The number of threads to calculate predictions.

Optimizes the speed of execution. This parameter doesn't affect results.

Possible types

int

Default value

-1 (the number of threads is equal to the number of processor cores)

verbose

Description

Output the measured evaluation metric to stderr.

Possible types

bool

Default value

None

Return value

Generator that produces predictions with a sequentially growing subset of trees from the model. The type of generated values depends on the number of input objects:

  • Single object — The returned value depends on the specified value of the prediction_type parameter:

    • RawFormulaVal — Raw formula value.

    • Class — Class label.

    • Probability — One-dimensional numpy.ndarray with the probability for every class.

  • Multiple objects — The returned value depends on the specified value of the prediction_type parameter:

    • RawFormulaVal — One-dimensional numpy.ndarray of raw formula values (one for each object).

    • Class — One-dimensional numpy.ndarray of class label (one for each object).

    • Probability — Two-dimensional numpy.ndarray of shape (number_of_objects, number_of_classes) with the probability for every class for each object.