]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
vulkan: Use VK_EXT_shader_64bit_indexing to handle large mat_mul(_id) (#18678)
authorJeff Bolz <redacted>
Mon, 12 Jan 2026 11:32:13 +0000 (05:32 -0600)
committerGitHub <redacted>
Mon, 12 Jan 2026 11:32:13 +0000 (12:32 +0100)
commit2bbe4c2cf8298114e3908e285125b9d0d1c5bb42
tree917cd59306274ac2d52fae690460c15b957c4988
parent1051ecd28907d2ca0a15c135f190fe415d0a3d1b
vulkan: Use VK_EXT_shader_64bit_indexing to handle large mat_mul(_id) (#18678)

This fixes incoherent output in Llama-4-Maverick-17B-128E-PAB-Q8_0, which
has a mul_mat_id with an A matrix that's Q8_0 8192 x 5120 x 128.

This should work when the number of blocks in the A matrix is less than 2^32
(for mul_mat_vec or mul_mm_cm2), or for mul_mm I think the limit is like
2^32*LOAD_VEC_A elements.

- Divide batch_stride by QUANT_K earlier, so the block index calculation works in 32b.
- Each vk_pipeline_struct has a linked list of pipelines that will allow it to handle
variants. So far this change just adds a single use case for this, compiling with the
e64BitIndexingEXT flag.
- Use the 64b indexing variant when the A matrix is larger than maxStorageBufferRange.

64-bit indexing has some cost - around 3-5% in MoE models, so it's worth the effort
to avoid enabling it unconditionally.
20 files changed:
ggml/src/ggml-vulkan/ggml-vulkan.cpp
ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec.comp
ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_base.glsl
ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_iq1_m.comp
ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_iq1_s.comp
ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_iq2_s.comp
ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_iq2_xs.comp
ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_iq2_xxs.comp
ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_iq3_s.comp
ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_iq3_xxs.comp
ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_q2_k.comp
ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_q3_k.comp
ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_q4_k.comp
ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_q5_k.comp
ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vec_q6_k.comp
ggml/src/ggml-vulkan/vulkan-shaders/mul_mat_vecq.comp
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm.comp
ggml/src/ggml-vulkan/vulkan-shaders/mul_mm_cm2.comp
ggml/src/ggml-vulkan/vulkan-shaders/mul_mmq.comp
tests/test-backend-ops.cpp