apply

Method call format
Parameters
Type of return value
Example
An example with input string tokenization

Apply a previously trained dictionary to the input text.

Method call format

apply(data,
      tokenizer=None,
      unknown_token_policy=None)

Parameters

data

Description

The input text to apply the dictionary to.

A zero-, one- or two-dimensional array-like data.

Data types

string, numpy.ndarray, pandas.DataFrame

Default value

Obligatory parameter

tokenizer

Description

The tokenizer for text processing.

If this parameter is specified and a one-dimensional data is input, each element in this list is considered a sentence and is tokenized.

Data types

Tokenizer

Default value

None (the input data is considered tokenized)

unknown_token_policy

Description

The policy for processing unknown tokens.

Possible values:

Skip — All unknown tokens are skipped from the resulting token ids list (empty values are put in compliance)
Insert — A coinciding ID is put in compliance with all unknown tokens. This ID matches the number of the tokens in the dictionary.

Data types

string

Default value

Skip

Type of return value

A one- or two-dimensional array with token IDs.

Example

from catboost.text_processing import Dictionary

dictionary = Dictionary(occurence_lower_bound=0)\
    .fit(["his", "tender", "heir", "whatever"])

applied_model = dictionary.apply(["might", "bear", "his", "memory"])

print(applied_model)

Output:

[[], [], [1], []]

An example with input string tokenization

from catboost.text_processing import Dictionary, Tokenizer

tokenized = Tokenizer()

dictionary = Dictionary(occurence_lower_bound=0)\
    .fit(["his tender heir whatever"], tokenizer=tokenized)

applied_model = dictionary.apply(["might", "bear", "his", "memory"])

print(applied_model)

Output:

[[], [], [1], []]

apply

Method call formatMethod call format

ParametersParameters

datadata

DescriptionDescription

tokenizertokenizer

DescriptionDescription

unknown_token_policyunknown_token_policy

DescriptionDescription

Type of return valueType of return value

ExampleExample

An example with input string tokenizationAn example with input string tokenization

Was the article helpful?

Method call format

Parameters

data

Description

tokenizer

Description

unknown_token_policy

Description

Type of return value

Example

An example with input string tokenization