min_unused_token_id

Get the smallest unused token identifier.

Identifiers are assigned consistently to all input tokens. Some additional identifiers are reserved for internal needs. This method returns the first unused identifier.

All further identifiers are assumed to be unassigned to any token.

Method call format

min_unused_token_id()

Type of return value

int

Example

from catboost.text_processing import Dictionary

dictionary = Dictionary(occurence_lower_bound=0)\
    .fit(["his", "tender", "heir", "whatever"])

print(dictionary.min_unused_token_id)

Output:

6