git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Zheyuan Chen <redacted>
	Thu, 2 Apr 2026 17:40:42 +0000 (10:40 -0700)
committer	GitHub <redacted>
	Thu, 2 Apr 2026 17:40:42 +0000 (10:40 -0700)
commit	a1cfb645307edc61a89e41557f290f441043d3c2
tree	061bb40edcf4403fcf7755af0aa7f81f58d904d0	tree
parent	5803c8d11529b08844d740c03d8b6603608654b1	commit \| diff

ggml-webgpu: add vectorized flash attention (#20709)

* naive vectorized version

* add vectorized flash attention

* update vec version

* remove unused path and shader

* remove unused helper functions

* add comments

* remove pad path

* ggml-webgpu: fix flash-attn vec nwg=1 path and tighten vec specialization

* change back to vec4

* enable multi split

* enable vec path when:
- Q->ne[1] < 20
- Q->ne[0] % 32 == 0
- V->ne[0] % 4 == 0
- K->type == f16

* update flast_attn_vec_split.wgsl to reduce redundant workgroup barrier usage and use select

* enable vec path for q4 and q8

* flash-attn vec nwg=1 fast path (skip tmp/reduce staging)

* use packed f16 K loads in flash-attn vec split

* use packed f16 K loads in flash-attn vec split on host side

* tune flash-attn vec f16 VEC_NE by head dim

* cleanup

* cleanup

* keep host side clean

* cleanup host side

* change back to original host wait/submit behavior

* formatting

* reverted param-buffer pool r ecfactor

* add helper functions

* ggml-webgpu: move flash-attn vec pipeline caching back into shader lib

* ggml-webgpu: remove duplicate functions

* ggml-webgpu: reserve flash-attn vec scratch in dst buffer allocation

* ggml-webgpu: revert unrelated change

* ggml-webgpu: revert deleted comment

* disable uniformity check

* remove unnecessary change

* Update ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_split.wgsl

* Update ggml/src/ggml-webgpu/ggml-webgpu.cpp

---------

Co-authored-by: Reese Levine <redacted>

ggml/src/ggml-webgpu/ggml-webgpu-shader-lib.hpp		diff \| blob \| history
ggml/src/ggml-webgpu/ggml-webgpu.cpp		diff \| blob \| history
ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_blk.wgsl	[new file with mode: 0644]	blob
ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_reduce.wgsl	[new file with mode: 0644]	blob
ggml/src/ggml-webgpu/wgsl-shaders/flash_attn_vec_split.wgsl	[new file with mode: 0644]	blob