# Quantization

Before learning, the possible values of objects are divided into disjoint ranges (*buckets*) delimited by the threshold values (*splits*). The size of the quantization (the number of splits) is determined by the starting parameters (separately for numerical features and numbers obtained as a result of converting categorical features into numerical features).

Quantization is also used to split the label values when working with categorical features. А random subset of the dataset is used for this purpose on large datasets.

The table below shows the quantization modes provided in CatBoost.

Mode | How splits are chosen |
---|---|

Median | Include an approximately equal number of objects in every bucket. |

Uniform | Generate splits by dividing the `[min_feature_value, max_feature_value]` segment into subsegments of equal length. Absolute values of the feature are used in this case. |

UniformAndQuantiles | Combine the splits obtained in the following modes, after first halving the quantization size provided by the starting parameters for each of them: - Median. - Uniform. |

MaxLogSum | Maximize the value of the following expression inside each bucket: $\sum\limits_{i=1}^{n}\log(weight){ , where}$ - $n$ — The number of distinct objects in the bucket. - $weight$ — The number of times an object in the bucket is repeated. |

MinEntropy | Minimize the value of the following expression inside each bucket: $\sum \limits_{i=1}^{n} weight \cdot log (weight) { , where}$ - $n$ — The number of distinct objects in the bucket. - $weight$ — The number of times an object in the bucket is repeated. |

GreedyLogSum | Maximize the greedy approximation of the following expression inside every bucket: $\sum\limits_{i=1}^{n}\log(weight){ , where}$ - $n$ — The number of distinct objects in the bucket. - $weight$ — The number of times an object in the bucket is repeated. |