sum_models

Purpose

Blend trees and counters of two or more trained CatBoost models into a new model. Leaf values can be individually weighted for each input model. For example, it may be useful to blend models trained on different validation datasets.

Method call format

sum_models(models,
           weights=None,
           ctr_merge_policy='IntersectingCountersAverage')

Parameters

models

Description

A list of models to blend.

Possible values

list of CatBoost models

Default value

Required parameter

weights

Description

A list of weights for the leaf values of each model. The length of this list must be equal to the number of blended models.

А list of weights equal to 1.0/N for N blended models gives the average prediction. For example, the following list of weights gives the average prediction for four blended models:

[0.25,0.25,0.25,0.25]

Possible values

list of numbers

Default value

None (leaf values weights are set to 1 for all models)

ctr_merge_policy

Description

The counters merging policy. Possible values:

FailIfCtrIntersects — Ensure that the models have zero intersecting counters.
LeaveMostDiversifiedTable — Use the most diversified counters by the count of unique hash values.
IntersectingCountersAverage — Use the average ctr counter values in the intersecting bins.
KeepAllTables — Keep Counter and FeatureFreq ctr's from all models.

Possible values

string

Default value

IntersectingCountersAverage

Note

The bias of the models sum is equal to the weighted sum of models biases.
The scale of the models sum is equal to 1, leaf values are scaled before the summation.

Type of return value

CatBoost model

Example

from catboost import CatBoostClassifier, Pool, sum_models
from catboost.datasets import amazon
import numpy as np
from sklearn.model_selection import train_test_split

train_df, _ = amazon()

y = train_df.ACTION
X = train_df.drop('ACTION', axis=1)

categorical_features_indices = np.where(X.dtypes != np.float)[0]

X_train, X_validation, y_train, y_validation = train_test_split(X,
                                                                y,
                                                                train_size=0.8,
                                                                random_state=42)

train_pool = Pool(X_train,
                  y_train,
                  cat_features=categorical_features_indices)
validate_pool = Pool(X_validation,
                     y_validation,
                     cat_features=categorical_features_indices)

models = []
for i in range(5):
    model = CatBoostClassifier(iterations=100,
                               random_seed=i)
    model.fit(train_pool,
              eval_set=validate_pool)
    models.append(model)

models_avrg = sum_models(models,
                         weights=[1.0/len(models)] * len(models))

Was the article helpful?

sample_gaussian_process

to_classifier