git.djapps.eu Git - pkg/ggml/sources/ggml/commit

author	Zheyuan Chen <redacted>
	Thu, 29 Jan 2026 22:05:30 +0000 (14:05 -0800)
committer	Georgi Gerganov <redacted>
	Fri, 30 Jan 2026 11:49:29 +0000 (13:49 +0200)
commit	b79ba0805e08cbff7a132eae9297daa6ebb70654
tree	c020fcb97845a5508caec5ce78fae88da4cc94e9	tree
parent	b6f5a28cb3fa1b55283807f4abfa0ac3ec6fd667	commit \| diff

ggml-webgpu: improve flastAttention performance by software pipelining (llama/19151)

* webgpu : pipeline flash_attn Q/K loads in WGSL

* ggml-webgpu: unroll Q*K accumlation inner loop

* ggml-webgpu: vectorization

* ggml-webgpu: unrolling

* ggml-webgpu: remove redundant unrolling

* ggml-webgpu: restore the config

* ggml-webgpu: remove redundant comments

* ggml-webgpu: formatting

* ggml-webgpu: formatting and remove vectorization

* ggml-webgpu: remove unnecessary constants

* ggml-webgpu: change QKV buffer to read_write to pass validation

* ggml-webgpu: add explanation for the additional bracket around Q K accumulate

* Indentation and for -> if for tail

* Kick off CI on wgsl only commits

---------

Co-authored-by: Reese Levine <redacted>