]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
CUDA: optimize and refactor MMQ (llama/8416)
authorJohannes Gäßler <redacted>
Thu, 11 Jul 2024 14:47:47 +0000 (16:47 +0200)
committerGeorgi Gerganov <redacted>
Thu, 8 Aug 2024 19:48:46 +0000 (22:48 +0300)
commit15d71189e92ccf7ac17894d12137f3f364c9c48e
tree015ca342245d5f39dbab15094792e40f1e5bb99e
parent37e962580f076905cc8056bb7ee38f0db90bd369
CUDA: optimize and refactor MMQ (llama/8416)

* CUDA: optimize and refactor MMQ

* explicit q8_1 memory layouts, add documentation
ggml/src/ggml-cuda/mma.cuh
ggml/src/ggml-cuda/mmq.cuh
ggml/src/ggml-cuda/quantize.cu
ggml/src/ggml-cuda/quantize.cuh
ggml/src/ggml-cuda/vecdotq.cuh