Dictionaries options

The following is a list of options for the dictionaries parameter (these options are set in option_name):
Parameter Data types Description Default value
token_level_type string

The token level type. This parameter defines what should be considered a separate token.

Possible values:
  • Word
  • Letter
Word
gram_order int

The number of words or letters in each token.

For example, let's assume that it is required to build a dictionary for the following set of tokens: “['maybe', 'some', 'other', 'time']”.

If the token level type is set to Word and this parameter is set to 2, the following tokens are formed:
  • “maybe some”
  • “some other”
  • “other time”
1
skip_step int

The number of words or letters to skip when joining them to tokens. This parameter takes effect if the value of the gram_order parameter is strictly greater than 1.

For example, let's assume that it is required to build a dictionary for the following set of tokens: “['maybe', 'some', 'other', 'time']”.

If the token level type is set to Word, gram_order is set to 2 and this parameter is set to 1, the following tokens are formed:
  • “maybe other”
  • “some time”
0
end_of_word_policy string

The policy for processing implicit tokens that point to the end of the word.

Possible values:

  • Skip
  • Insert
Insert
end_of_sentence_policy string

The policy for processing implicit tokens that point to the end of the sentence.

Possible values:

  • Skip
  • Insert
Skip
occurence_lower_bound int

The lower limit of token occurrences in the text to include it in the dictionary.

50
max_dictionary_size int The maximum number of tokens in the dictionary. -1 (the size of the dictionary is not limited)
Parameter Data types Description Default value
token_level_type string

The token level type. This parameter defines what should be considered a separate token.

Possible values:
  • Word
  • Letter
Word
gram_order int

The number of words or letters in each token.

For example, let's assume that it is required to build a dictionary for the following set of tokens: “['maybe', 'some', 'other', 'time']”.

If the token level type is set to Word and this parameter is set to 2, the following tokens are formed:
  • “maybe some”
  • “some other”
  • “other time”
1
skip_step int

The number of words or letters to skip when joining them to tokens. This parameter takes effect if the value of the gram_order parameter is strictly greater than 1.

For example, let's assume that it is required to build a dictionary for the following set of tokens: “['maybe', 'some', 'other', 'time']”.

If the token level type is set to Word, gram_order is set to 2 and this parameter is set to 1, the following tokens are formed:
  • “maybe other”
  • “some time”
0
end_of_word_policy string

The policy for processing implicit tokens that point to the end of the word.

Possible values:

  • Skip
  • Insert
Insert
end_of_sentence_policy string

The policy for processing implicit tokens that point to the end of the sentence.

Possible values:

  • Skip
  • Insert
Skip
occurence_lower_bound int

The lower limit of token occurrences in the text to include it in the dictionary.

50
max_dictionary_size int The maximum number of tokens in the dictionary. -1 (the size of the dictionary is not limited)