This commit adds the name of the training data file to the log message
printed when the training data is tokenized.
The motivation for this change is that it can be useful to show which
file is being tokenized when running the finetune example.
Signed-off-by: Daniel Bevenius <redacted>
std::vector<llama_token> train_tokens;
std::vector<size_t> train_samples_begin;
std::vector<size_t> train_samples_size;
- printf("%s: tokenize training data\n", __func__);
+ printf("%s: tokenize training data from %s\n", __func__, params.common.fn_train_data);
tokenize_file(lctx,
params.common.fn_train_data,
params.common.sample_start,