sum_models

Purpose

Blend trees and counters of two or more trained CatBoost models into a new model. Leaf values can be individually weighted for each input model. For example, it may be useful to blend models trained on different validation datasets.

Method call format

sum_models(models,
           weights=None,
           ctr_merge_policy='IntersectingCountersAverage')

Parameters

models

Description

A list of models to blend.

Possible values

list of CatBoost models

Default value

Required parameter

weights

Description

A list of weights for the leaf values of each model. The length of this list must be equal to the number of blended models.

А list of weights equal to 1.0/N for N blended models gives the average prediction. For example, the following list of weights gives the average prediction for four blended models:

[0.25,0.25,0.25,0.25]

Possible values

list of numbers

Default value

None (leaf values weights are set to 1 for all models)

ctr_merge_policy

Description

The counters merging policy. Possible values:

  • FailIfCtrIntersects — Ensure that the models have zero intersecting counters.
  • LeaveMostDiversifiedTable — Use the most diversified counters by the count of unique hash values.
  • IntersectingCountersAverage — Use the average ctr counter values in the intersecting bins.
  • KeepAllTables — Keep Counter and FeatureFreq ctr's from all models.

Possible values

string

Default value

IntersectingCountersAverage

Note

  • The bias of the models sum is equal to the weighted sum of models biases.
  • The scale of the models sum is equal to 1, leaf values are scaled before the summation.

Type of return value

CatBoost model

Example

from catboost import CatBoostClassifier, Pool, sum_models
from catboost.datasets import amazon
import numpy as np
from sklearn.model_selection import train_test_split

train_df, _ = amazon()

y = train_df.ACTION
X = train_df.drop('ACTION', axis=1)

categorical_features_indices = np.where(X.dtypes != np.float)[0]

X_train, X_validation, y_train, y_validation = train_test_split(X,
                                                                y,
                                                                train_size=0.8,
                                                                random_state=42)

train_pool = Pool(X_train,
                  y_train,
                  cat_features=categorical_features_indices)
validate_pool = Pool(X_validation,
                     y_validation,
                     cat_features=categorical_features_indices)

models = []
for i in range(5):
    model = CatBoostClassifier(iterations=100,
                               random_seed=i)
    model.fit(train_pool,
              eval_set=validate_pool)
    models.append(model)

models_avrg = sum_models(models,
                         weights=[1.0/len(models)] * len(models))