tokenize
Tokenize the input string.
Method call format
tokenize(s)
Parameters
s
Description
The input string that has to be tokenized.
Data types
String
Default value
Obligatory parameter
Type of return value
A list of tokens.
Example
from catboost.text_processing import Tokenizer
text="Still, I would love to see you at 12, if you don't mind"
tokenized = Tokenizer(lowercasing=True,
separator_type='BySense',
token_types=['Word', 'Number']).tokenize(text)
print tokenized
Output:
['still', 'i', 'would', 'love', 'to', 'see', 'you', 'at', '12', 'if', 'you', "don't", 'mind']