catboost.get_object_importance

Purpose
Arguments
Examples

catboost.get_object_importance(model,
                               pool,
                               train_pool,
                               top_size = -1,
                               type = 'Average',
                               update_method = 'SinglePoint',
                               thread_count = -1)

Purpose

Calculate the effect of objects from the train dataset on the optimized metric values for the objects from the input dataset:

Positive values reflect that the optimized metric increases.
Negative values reflect that the optimized metric decreases.

The higher the deviation from 0, the bigger the impact that an object has on the optimized metric.

The method is an implementation of the approach described in the Finding Influential Training Samples for Gradient Boosted Decision Trees paper .

Currently, object importance is supported only for the following loss functions.

Logloss

CrossEntropy

RMSE

MAE

Quantile

Expectile

LogLinQuantile

MAPE

Poisson

Arguments

model

Description

The model obtained as the result of training.

Default value

Required argument

pool

Description

The input dataset.

Default value

Required argument

train_pool

Description

The dataset used for training.

Default value

Required argument

top_size

Description

Defines the number of most important objects from the training dataset. The number of returned objects is limited to this number.

Default value

-1 (top size is not limited)

type

Description

The method for calculating the object importances.

Possible values:

Average — The average of scores of objects from the training dataset for every object from the input dataset.
PerObject — The scores of each object from the training dataset for each object from the input dataset.

Default value

Average

update_method

Description

The algorithm accuracy method.

Possible values:

SinglePoint — The fastest and least accurate method.
TopKLeaves — Specify the number of leaves. The higher the value, the more accurate and the slower the calculation.
AllPoints — The slowest and most accurate method.

Supported parameters:

top — Defines the number of leaves to use for the TopKLeaves update method. See the Finding Influential Training Samples for Gradient Boosted Decision Trees for more details.

For example, the following value sets the method to TopKLeaves and limits the number of leaves to 3:

TopKLeaves:top=3

Default value

SinglePoint

thread_count

Description

The number of threads to use for operation.

Optimizes the speed of execution. This parameter doesn't affect results.

Default value

-1 (the number of threads is equal to the number of processor cores)

Examples

Calculate the object strength:

library(catboost)

train_dataset = matrix(c(1900,7,1,
                         1896,1,1),
                        nrow=2,
                        ncol=3,
                        byrow = TRUE)

label_values = c(0, 1)

train_pool = catboost.load_pool(train_dataset,
                                label_values)

input_dataset = matrix(c(1900,47,1,
                         1904,27,1),
                 nrow=2,
                 ncol=3,
                 byrow = TRUE)

input_pool = catboost.load_pool(input_dataset,
                                label_values)

trained_model <- catboost.train(train_pool,
                               params = list(iterations = 10))

object_importance <- catboost.get_object_importance(trained_model,
                                                    input_pool,
                                                    train_pool)

catboost.get_object_importance

PurposePurpose

ArgumentsArguments

modelmodel

DescriptionDescription

poolpool

DescriptionDescription

train_pooltrain_pool

DescriptionDescription

top_sizetop_size

DescriptionDescription

typetype

DescriptionDescription

update_methodupdate_method

DescriptionDescription

thread_countthread_count

DescriptionDescription

ExamplesExamples

Was the article helpful?

Purpose

Arguments

model

Description

pool

Description

train_pool

Description

top_size

Description

type

Description

update_method

Description

thread_count

Description

Examples