Frequency Based Dictionary
- Contains
- The trained Frequency Based Dictionary.
- Header format
-
The first row in the output file contains information regarding the training parameters.
Format:
{"key_1":"value_1","key_2":"value_2",.., "key_N":"value_N"}
- Format
-
The second row contains the number of tokens in the dictionary.
Each row starting from the second contains information regarding a single token.
Format:<token_ID><\t><number_of_occurrences><\t><token>
token ID — A zero-based token identifier. Tokens are sorted case sensitive ordering.
number_of_occurrences — The number of times that a token is found in the input text.
token — The value of the token.
- Example
-
{"end_of_word_token_policy":"Insert","skip_step":"0","start_token_id":"0","token_level_type":"Word","dictionary_format":"id_count_token","end_of_sentence_token_policy":"Skip","gram_order":"1"} 11 0 1 How 1 1 It's 2 1 Today 3 1 and 4 1 forever 5 1 high 6 1 moon 7 1 snowing 8 1 the 9 1 today 10 1 tomorrow