Frequency Based Dictionary
Contains
The trained Frequency Based Dictionary.
Header format
The first row in the output file contains information regarding the training parameters.
Format:
{"key_1":"value_1","key_2":"value_2",.., "key_N":"value_N"}
Format
The second row contains the number of tokens in the dictionary.
Each row starting from the second contains information regarding a single token.
Format:
<token_ID><\t><number_of_occurrences><\t><token>
-
token ID
— A zero-based token identifier. Tokens are sorted case sensitive ordering. -
number_of_occurrences
— The number of times that a token is found in the input text. -
token
— The value of the token.
Example
{"end_of_word_token_policy":"Insert","skip_step":"0","start_token_id":"0","token_level_type":"Word","dictionary_format":"id_count_token","end_of_sentence_token_policy":"Skip","gram_order":"1"}
11
0 1 How
1 1 It's
2 1 Today
3 1 and
4 1 forever
5 1 high
6 1 moon
7 1 snowing
8 1 the
9 1 today
10 1 tomorrow