git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Zheyuan Chen <redacted>
	Thu, 29 Jan 2026 22:05:30 +0000 (14:05 -0800)
committer	GitHub <redacted>
	Thu, 29 Jan 2026 22:05:30 +0000 (14:05 -0800)
commit	bd90fc74c3fecd18f36e26a91b3c3282578bf680
tree	2453444bbc49991b07988aa7634ac6f5e161c097	tree
parent	ce38a4db478b90542874cd4af5cb48b3a0fcf311	commit \| diff

ggml-webgpu: improve flastAttention performance by software pipelining (#19151)

* webgpu : pipeline flash_attn Q/K loads in WGSL

* ggml-webgpu: unroll Q*K accumlation inner loop

* ggml-webgpu: vectorization

* ggml-webgpu: unrolling

* ggml-webgpu: remove redundant unrolling

* ggml-webgpu: restore the config

* ggml-webgpu: remove redundant comments

* ggml-webgpu: formatting

* ggml-webgpu: formatting and remove vectorization

* ggml-webgpu: remove unnecessary constants

* ggml-webgpu: change QKV buffer to read_write to pass validation

* ggml-webgpu: add explanation for the additional bracket around Q K accumulate

* Indentation and for -> if for tail

* Kick off CI on wgsl only commits

---------

Co-authored-by: Reese Levine <redacted>

.github/workflows/build.yml		diff \| blob \| history
ggml/src/ggml-webgpu/wgsl-shaders/flash_attn.wgsl		diff \| blob \| history