Using models exported to ONNX-ML format

ONNX is an open format to represent AI models.

A quote from the Open Neural Network Exchange documentation:

“There are two official ONNX variants; the main distinction between the two is found in the supported types and the default operator sets. The neural-network-only ONNX variant recognizes only tensors as input and output types, while the Classical Machine Learning extension, ONNX-ML, also recognizes sequences and maps. ONNX-ML extends the ONNX operator set with ML algorithms that are not based on neural networks.”

CatBoost models are based on ensembles of decision trees, therefore only exporting models to the ONNX-ML format is supported.

Specifics

  • Only models trained on datasets without categorical features are currently supported.
  • Exported ONNX-ML models cannot be currently loaded and applied by CatBoost libraries/executable. This export format is suitable only for external Machine Learning libraries.
  • The native model format (cbm) used by CatBoost libraries/executable is usually faster for applying on common x86-64 platforms because it is optimized to work withCatBoost-specific oblivious trees structure.
  • The model's metadata is stored in the metadata_props component of the ONNX Model.

Applying a trained model with ONNX

Model input parameters
Parameter Possible types Description
features

Tensor of shape [N_examples] and type int or string

The input features.

Parameter Possible types Description
features

Tensor of shape [N_examples] and type int or string

The input features.

Model output parameters for classification
Parameter Possible types Description
label tensor of shape [N_examples] and one of the following types:
  • int if class names are not specified in the training dataset.
  • string if class names are specified in the training dataset.

The label value for the example.

Note. The label is inferred incorrectly for binary classification. This is a known bug in the onnxruntime implementation. Ignore the value of this parameter in case of binary classification.
probabilities tensor of shape [N_examples] and one of the following types:
  • type seq(map(string, float)) if class names are specified in the training dataset.
  • seq(map(int64, float)) if class names are not specified in the training dataset.

The key value reflects the probability that the example belongs to the class defined by the map key.

Parameter Possible types Description
label tensor of shape [N_examples] and one of the following types:
  • int if class names are not specified in the training dataset.
  • string if class names are specified in the training dataset.

The label value for the example.

Note. The label is inferred incorrectly for binary classification. This is a known bug in the onnxruntime implementation. Ignore the value of this parameter in case of binary classification.
probabilities tensor of shape [N_examples] and one of the following types:
  • type seq(map(string, float)) if class names are specified in the training dataset.
  • seq(map(int64, float)) if class names are not specified in the training dataset.

The key value reflects the probability that the example belongs to the class defined by the map key.

Model output parameters for regression
Parameter Possible types Description
probabilities tensor of shape [N_examples] and type float

The target value predicted by the model.

Parameter Possible types Description
probabilities tensor of shape [N_examples] and type float

The target value predicted by the model.

Examples

The following examples use the Python package for training and theONNX Runtime scoring engine for applying the model.

Binary classification

Train the model with CatBoost:

import catboost
from sklearn import datasets


breast_cancer = datasets.load_breast_cancer()
model = catboost.CatBoostClassifier(loss_function='Logloss')

model.fit(breast_cancer.data, breast_cancer.target)

# Save model to ONNX-ML format
model.save_model(
    "breast_cancer.onnx",
    format="onnx",
    export_parameters={
        'onnx_domain': 'ai.catboost',
        'onnx_model_version': 1,
        'onnx_doc_string': 'test model for BinaryClassification',
        'onnx_graph_name': 'CatBoostModel_for_BinaryClassification'
    }
)

Apply the model with onnxruntime:

import numpy as np
from sklearn import datasets
import onnxruntime as rt


breast_cancer = datasets.load_breast_cancer()

sess = rt.InferenceSession('breast_cancer.onnx')

# onnxruntime bug: 'label' inference is broken for binary classification 
#label = sess.run(['label'], 
#                 {'features': breast_cancer.data.astype(np.float32)})

probabilities = sess.run(['probabilities'], 
                         {'features': breast_cancer.data.astype(np.float32)})
Multiclassification

Train the model with CatBoost:

import catboost
from sklearn import datasets


iris = datasets.load_iris()
model = catboost.CatBoostClassifier(loss_function='MultiClass')

model.fit(iris.data, iris.target)

# Save model to ONNX-ML format
model.save_model(
    "iris.onnx",
    format="onnx",
    export_parameters={
        'onnx_domain': 'ai.catboost',
        'onnx_model_version': 1,
        'onnx_doc_string': 'test model for MultiClassification',
        'onnx_graph_name': 'CatBoostModel_for_MultiClassification'
    }
)

Apply the model with onnxruntime:

import numpy as np
from sklearn import datasets
import onnxruntime as rt


iris = datasets.load_iris()

sess = rt.InferenceSession('iris.onnx')

# can get only label
label = sess.run(['label'], 
                 {'features': iris.data.astype(np.float32)})

# can get only probabilities
probabilities = sess.run(['probabilities'], 
                         {'features': iris.data.astype(np.float32)})

# or both
label, probabilities = sess.run(['label', 'probabilities'], 
                                {'features': iris.data.astype(np.float32)})
Regression

Train the model with CatBoost:

import catboost
from sklearn import datasets


boston = datasets.load_boston()
model = catboost.CatBoostRegressor()

model.fit(boston.data, boston.target)

# Save model to ONNX-ML format
model.save_model(
    "boston.onnx",
    format="onnx",
    export_parameters={
        'onnx_domain': 'ai.catboost',
        'onnx_model_version': 1,
        'onnx_doc_string': 'test model for Regression',
        'onnx_graph_name': 'CatBoostModel_for_Regression'
    }
)

Apply the model with onnxruntime:

import numpy as np
from sklearn import datasets
import onnxruntime as rt


boston = datasets.load_boston()

sess = rt.InferenceSession('boston.onnx')

predictions = sess.run(['predictions'], 
                       {'features': boston.data.astype(np.float32)})