]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: optimize and refactor MMQ (llama/8416)
authorJohannes Gäßler <redacted>
Thu, 11 Jul 2024 14:47:47 +0000 (16:47 +0200)
committerGeorgi Gerganov <redacted>
Sat, 27 Jul 2024 15:26:12 +0000 (18:26 +0300)
commit41fac7e3344a3e98245f4d88593efcb3dadb38fa
treed2baa1c06713a225d3cee9a1dc43e48632cf2606
parent8696213951f26d8bb5287664715aa225c90bafe3
CUDA: optimize and refactor MMQ (llama/8416)

* CUDA: optimize and refactor MMQ

* explicit q8_1 memory layouts, add documentation
src/ggml-cuda/mma.cuh
src/ggml-cuda/mmq.cuh
src/ggml-cuda/quantize.cu
src/ggml-cuda/quantize.cuh
src/ggml-cuda/vecdotq.cuh