This commit explicitly sets the pooling type to 'none' in the logits.cpp
to support models that have a pooling type specified.
The motivation for this is that some models may have a pooling type set
in the model file (.gguf file) and for this specific case where we only
want to extract logits, we need to ensure that no pooling is used to
so that we are comparing raw logits and not pooled embeddings.
ctx_params.no_perf = false;
if (embedding_mode) {
ctx_params.embeddings = true;
+ ctx_params.pooling_type = LLAMA_POOLING_TYPE_NONE;
ctx_params.n_ubatch = ctx_params.n_batch;
}