]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
vulkan: optimizations for direct convolution (llama/14933)
authorJeff Bolz <redacted>
Sat, 2 Aug 2025 07:57:04 +0000 (02:57 -0500)
committerGeorgi Gerganov <redacted>
Mon, 18 Aug 2025 17:30:45 +0000 (20:30 +0300)
commit46e9e5b9a746a902f52e482b2284e9ae6256629a
tree135ce82cc768bc6dad183d3a3884aa54dfb36a83
parent7e7557ac50be60e58ffc238edad7ecc62398c147
vulkan: optimizations for direct convolution (llama/14933)

* vulkan: optimizations for direct convolution

- Empirically choose a better tile size. Reducing BS_K/BS_NPQ helps fill
  the GPU. The new size should be amenable to using coopmat, too.
- Fix shmem bank conflicts. 16B padding should work with coopmat.
- Some explicit loop unrolling.
- Skip math/stores work for parts of the tile that are OOB.
- Apply fastdiv opt.
- Disable shuffles for NV.

* Three tiles sizes for CONV_2D, and a heuristic to choose

* reallow collectives for pre-Turing

* make SHMEM_PAD a spec constant

* fixes for intel perf - no shmem padding, placeholder shader core count

* shader variants with/without unrolling

* 0cc4m's fixes for AMD perf

Co-authored-by: 0cc4m <redacted>
---------

Co-authored-by: 0cc4m <redacted>
ggml/src/ggml-vulkan/ggml-vulkan.cpp
ggml/src/ggml-vulkan/vulkan-shaders/conv2d_mm.comp
ggml/src/ggml-vulkan/vulkan-shaders/vulkan-shaders-gen.cpp