]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
llama: use FA + max. GPU layers by default (#15434)
authorJohannes Gäßler <redacted>
Sat, 30 Aug 2025 14:32:10 +0000 (16:32 +0200)
committerGitHub <redacted>
Sat, 30 Aug 2025 14:32:10 +0000 (16:32 +0200)
commite81b8e4b7f5ab870836fad26d154a7507b341b36
tree4280194dd8e5532a99ffe382a732173ca19d7f37
parent38ad381f9f5d4dd368a96d844fb19cf501ed9d22
llama: use FA + max. GPU layers by default (#15434)

* llama: use max. GPU layers by default, auto -fa

* ggml-backend: abort instead of segfault
19 files changed:
common/arg.cpp
common/common.cpp
common/common.h
examples/diffusion/diffusion-cli.cpp
ggml/src/ggml-backend.cpp
include/llama.h
scripts/server-bench.py
scripts/tool_bench.py
src/llama-context.cpp
src/llama-graph.cpp
src/llama-graph.h
src/llama-impl.h
src/llama-model.cpp
src/llama.cpp
tools/batched-bench/batched-bench.cpp
tools/llama-bench/llama-bench.cpp
tools/server/tests/unit/test_ctx_shift.py
tools/server/tests/unit/test_speculative.py
tools/server/tests/utils.py