]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
llama : fix FA when KV cache is not used (i.e. embeddings) (llama/12825)
authorGeorgi Gerganov <redacted>
Tue, 8 Apr 2025 16:54:51 +0000 (19:54 +0300)
committerGeorgi Gerganov <redacted>
Thu, 24 Apr 2025 17:39:16 +0000 (20:39 +0300)
commitee2cbeeb740fee02ac0919c709c398ffc2025775
tree0ded6cee73faa8127960e63dbc3ccf1dfaef9e52
parent868a5ce310066ea013c236e7bc98fc26c2d6b616
llama : fix FA when KV cache is not used (i.e. embeddings) (llama/12825)

* ggml : FA supports F32 V

* graph : cast KV to F16 when the KV cache is not used

ggml-ci

* server : add test that exercises embeddings with FA enabled

ggml-ci
ggml/src/ggml-cpu/ops.cpp
ggml/src/ggml-metal/ggml-metal.m