ONNX
ONNX is an open format to represent AI models.
A quote from the Open Neural Network Exchange documentation:
“There are two official ONNX variants; the main distinction between the two is found in the supported types and the default operator sets. The neural-network-only ONNX variant recognizes only tensors as input and output types, while the Classical Machine Learning extension, ONNX-ML, also recognizes sequences and maps. ONNX-ML extends the ONNX operator set with ML algorithms that are not based on neural networks.”
CatBoost models are based on ensembles of decision trees, therefore only exporting models to the ONNX-ML format is supported.
Specifics
- Only models trained on datasets without categorical features are currently supported.
- Exported ONNX-ML models cannot be currently loaded and applied by CatBoost libraries/executable. This export format is suitable only for external Machine Learning libraries.
The native model format (cbm) used by CatBoost libraries/executable is usually faster for applying on common x86-64 platforms because it is optimized to work with CatBoost-specific oblivious trees structure.
- The model's metadata is stored in the metadata_props component of the ONNX Model.
Applying a trained model with ONNX
- Model input parameters
-
Parameter Possible types Description features Tensor of shape [N_examples] and type int or string
The input features.
Parameter Possible types Description features Tensor of shape [N_examples] and type int or string
The input features.
- Model output parameters for classification
-
Parameter Possible types Description label tensor of shape [N_examples] and one of the following types: - int if class names are not specified in the training dataset.
- string if class names are specified in the training dataset.
The label value for the example.
Note. The label is inferred incorrectly for binary classification. This is a known bug in the onnxruntime implementation. Ignore the value of this parameter in case of binary classification.probabilities tensor of shape [N_examples] and one of the following types: - type seq(map(string, float)) if class names are specified in the training dataset.
- seq(map(int64, float)) if class names are not specified in the training dataset.
The key value reflects the probability that the example belongs to the class defined by the map key.
Parameter Possible types Description label tensor of shape [N_examples] and one of the following types: - int if class names are not specified in the training dataset.
- string if class names are specified in the training dataset.
The label value for the example.
Note. The label is inferred incorrectly for binary classification. This is a known bug in the onnxruntime implementation. Ignore the value of this parameter in case of binary classification.probabilities tensor of shape [N_examples] and one of the following types: - type seq(map(string, float)) if class names are specified in the training dataset.
- seq(map(int64, float)) if class names are not specified in the training dataset.
The key value reflects the probability that the example belongs to the class defined by the map key.
- Model output parameters for regression
-
Parameter Possible types Description probabilities tensor of shape [N_examples] and type float The target value predicted by the model.
Parameter Possible types Description probabilities tensor of shape [N_examples] and type float The target value predicted by the model.
Examples
The following examples use the Python package for training and theONNX Runtime scoring engine for applying the model.
- Binary classification
-
Train the model with CatBoost:
import catboost from sklearn import datasets breast_cancer = datasets.load_breast_cancer() model = catboost.CatBoostClassifier(loss_function='Logloss') model.fit(breast_cancer.data, breast_cancer.target) # Save model to ONNX-ML format model.save_model( "breast_cancer.onnx", format="onnx", export_parameters={ 'onnx_domain': 'ai.catboost', 'onnx_model_version': 1, 'onnx_doc_string': 'test model for BinaryClassification', 'onnx_graph_name': 'CatBoostModel_for_BinaryClassification' } )
Apply the model with onnxruntime:
import numpy as np from sklearn import datasets import onnxruntime as rt breast_cancer = datasets.load_breast_cancer() sess = rt.InferenceSession('breast_cancer.onnx') # onnxruntime bug: 'label' inference is broken for binary classification #label = sess.run(['label'], # {'features': breast_cancer.data.astype(np.float32)}) probabilities = sess.run(['probabilities'], {'features': breast_cancer.data.astype(np.float32)})
- Multiclassification
-
Train the model with CatBoost:
import catboost from sklearn import datasets iris = datasets.load_iris() model = catboost.CatBoostClassifier(loss_function='MultiClass') model.fit(iris.data, iris.target) # Save model to ONNX-ML format model.save_model( "iris.onnx", format="onnx", export_parameters={ 'onnx_domain': 'ai.catboost', 'onnx_model_version': 1, 'onnx_doc_string': 'test model for MultiClassification', 'onnx_graph_name': 'CatBoostModel_for_MultiClassification' } )
Apply the model with onnxruntime:
import numpy as np from sklearn import datasets import onnxruntime as rt iris = datasets.load_iris() sess = rt.InferenceSession('iris.onnx') # can get only label label = sess.run(['label'], {'features': iris.data.astype(np.float32)}) # can get only probabilities probabilities = sess.run(['probabilities'], {'features': iris.data.astype(np.float32)}) # or both label, probabilities = sess.run(['label', 'probabilities'], {'features': iris.data.astype(np.float32)})
- Regression
-
Train the model with CatBoost:
import catboost from sklearn import datasets boston = datasets.load_boston() model = catboost.CatBoostRegressor() model.fit(boston.data, boston.target) # Save model to ONNX-ML format model.save_model( "boston.onnx", format="onnx", export_parameters={ 'onnx_domain': 'ai.catboost', 'onnx_model_version': 1, 'onnx_doc_string': 'test model for Regression', 'onnx_graph_name': 'CatBoostModel_for_Regression' } )
Apply the model with onnxruntime:
import numpy as np from sklearn import datasets import onnxruntime as rt boston = datasets.load_boston() sess = rt.InferenceSession('boston.onnx') predictions = sess.run(['predictions'], {'features': boston.data.astype(np.float32)})