]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
llama : fix FA when KV cache is not used (i.e. embeddings) (#12825)
authorGeorgi Gerganov <redacted>
Tue, 8 Apr 2025 16:54:51 +0000 (19:54 +0300)
committerGitHub <redacted>
Tue, 8 Apr 2025 16:54:51 +0000 (19:54 +0300)
commita19b5cef16d885c44c635da4a5c97113c1577de8
treed03d8f85266c43059d9018ea53e3822998676a66
parent78a1ba0a4f2bfed5b8b8e312592143d22e531698
llama : fix FA when KV cache is not used (i.e. embeddings) (#12825)

* ggml : FA supports F32 V

* graph : cast KV to F16 when the KV cache is not used

ggml-ci

* server : add test that exercises embeddings with FA enabled

ggml-ci
examples/server/tests/unit/test_embedding.py
examples/server/tests/utils.py
examples/server_embd.py
ggml/src/ggml-cpu/ops.cpp
ggml/src/ggml-metal/ggml-metal.m
src/llama-graph.cpp