]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
llama : fix FA when KV cache is not used (i.e. embeddings) (llama/12825)
authorGeorgi Gerganov <redacted>
Tue, 8 Apr 2025 16:54:51 +0000 (19:54 +0300)
committerGeorgi Gerganov <redacted>
Thu, 10 Apr 2025 20:58:06 +0000 (23:58 +0300)
commit637503b4eb061004850fe8e7c4e6962ae1d66d94
tree2302f2008fd53873d88c13877bb7686abc7922b5
parent2af91a99b979d1d7db45b4b7dd96a85702ddf5ff
llama : fix FA when KV cache is not used (i.e. embeddings) (llama/12825)

* ggml : FA supports F32 V

* graph : cast KV to F16 when the KV cache is not used

ggml-ci

* server : add test that exercises embeddings with FA enabled

ggml-ci
src/ggml-cpu/ops.cpp
src/ggml-metal/ggml-metal.m