sum_models
Purpose
Blend trees and counters of two or more trained CatBoost models into a new model. Leaf values can be individually weighted for each input model. For example, it may be useful to blend models trained on different validation datasets.
Method call format
sum_models(models,
weights=None,
ctr_merge_policy='IntersectingCountersAverage')
Parameters
models
Description
A list of models to blend.
Possible values
list of CatBoost models
Default value
Required parameter
weights
Description
A list of weights for the leaf values of each model. The length of this list must be equal to the number of blended models.
А list of weights equal to 1.0/N
for N blended models gives the average prediction. For example, the following list of weights gives the average prediction for four blended models:
[0.25,0.25,0.25,0.25]
Possible values
list of numbers
Default value
None (leaf values weights are set to 1 for all models)
ctr_merge_policy
Description
The counters merging policy. Possible values:
- FailIfCtrIntersects — Ensure that the models have zero intersecting counters.
- LeaveMostDiversifiedTable — Use the most diversified counters by the count of unique hash values.
- IntersectingCountersAverage — Use the average ctr counter values in the intersecting bins.
- KeepAllTables — Keep Counter and FeatureFreq ctr's from all models.
Possible values
string
Default value
IntersectingCountersAverage
Note
- The bias of the models sum is equal to the weighted sum of models biases.
- The scale of the models sum is equal to 1, leaf values are scaled before the summation.
Type of return value
CatBoost model
Example
from catboost import CatBoostClassifier, Pool, sum_models
from catboost.datasets import amazon
import numpy as np
from sklearn.model_selection import train_test_split
train_df, _ = amazon()
y = train_df.ACTION
X = train_df.drop('ACTION', axis=1)
categorical_features_indices = np.where(X.dtypes != np.float)[0]
X_train, X_validation, y_train, y_validation = train_test_split(X,
y,
train_size=0.8,
random_state=42)
train_pool = Pool(X_train,
y_train,
cat_features=categorical_features_indices)
validate_pool = Pool(X_validation,
y_validation,
cat_features=categorical_features_indices)
models = []
for i in range(5):
model = CatBoostClassifier(iterations=100,
random_seed=i)
model.fit(train_pool,
eval_set=validate_pool)
models.append(model)
models_avrg = sum_models(models,
weights=[1.0/len(models)] * len(models))