git.djapps.eu Git - pkg/ggml/sources/ggml/commit

author	Jeff Bolz <redacted>
	Wed, 13 Nov 2024 06:58:57 +0000 (00:58 -0600)
committer	Georgi Gerganov <redacted>
	Wed, 13 Nov 2024 17:03:32 +0000 (19:03 +0200)
commit	688752ec02743d60309a760f21550607c34e3baf
tree	52ed55fe36b1b7c4f6e3411d34e7d7bbb4d31247	tree
parent	67a320b69efde90383531018fca7e8c7562b3e59	commit \| diff

vulkan: Optimize contiguous copies (llama/10254)

* tests: Fix memory bandwidth calculation for perf tests

Add a flops calculation for flash attention.

Add one GGML_OP_CPY perf test.

* vulkan: Optimize contiguous copies

Add a variant of the copy shader for when the tensors are contiguous. Avoid
the complex addressing calculations, and do four elements per invocation
to hide some other overhead.

Apply similar changes to the scale shader, since scale is always contiguous.

Add a "progress bar" for shader compiles.

src/ggml-vulkan.cpp		diff \| blob \| history
src/vulkan-shaders/clamp.comp		diff \| blob \| history
src/vulkan-shaders/contig_copy.comp	[new file with mode: 0644]	blob
src/vulkan-shaders/copy.comp		diff \| blob \| history
src/vulkan-shaders/cos.comp		diff \| blob \| history
src/vulkan-shaders/generic_unary_head.comp		diff \| blob \| history
src/vulkan-shaders/pad.comp		diff \| blob \| history
src/vulkan-shaders/repeat.comp		diff \| blob \| history
src/vulkan-shaders/scale.comp		diff \| blob \| history
src/vulkan-shaders/sin.comp		diff \| blob \| history
src/vulkan-shaders/square.comp		diff \| blob \| history
src/vulkan-shaders/vulkan-shaders-gen.cpp		diff \| blob \| history
tests/test-backend-ops.cpp		diff \| blob \| history