git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	Jeff Bolz <redacted>
	Fri, 6 Feb 2026 08:15:13 +0000 (02:15 -0600)
committer	GitHub <redacted>
	Fri, 6 Feb 2026 08:15:13 +0000 (09:15 +0100)
commit	1946e46f4c29da7b9294d702756969839e922bb8
tree	f0925d64fabab16e3b11124f070e4df3f7f783b6	tree
parent	f9bd518a6bac615e1060dcc44f3f302f9e7ae0e8	commit \| diff

vulkan: For coopmat2 FA, use fp16 accumulators for the final result (#19376)

The cpu and cuda backends use fp16 for the VKQ accumulator type, this change
does the same for vulkan. This helps particularly with large head sizes which
are very register-limited.

I tried this for the coopmat1 path and it slowed down a bit. I didn't try for
scalar.

I applied the softmax bias that the cuda backend uses to avoid overflow,
although I was not able to reproduce the original bug without it.

ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_base.glsl		diff \| blob \| history
ggml/src/ggml-vulkan/vulkan-shaders/flash_attn_cm2.comp		diff \| blob \| history

Packaging of ggml-org/llama.cpp

RSS Atom