]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: optimize and refactor MMQ (#8416)
authorJohannes Gäßler <redacted>
Thu, 11 Jul 2024 14:47:47 +0000 (16:47 +0200)
committerGitHub <redacted>
Thu, 11 Jul 2024 14:47:47 +0000 (16:47 +0200)
commit808aba39161e5d7ca2ff24110b5aa14d2e536988
tree2bc897d4b0e9a2edddc1a2737df5eedc0032a8fe
parenta977c115448e40856fb9cbe3ceb6d8ce802553b0
CUDA: optimize and refactor MMQ (#8416)

* CUDA: optimize and refactor MMQ

* explicit q8_1 memory layouts, add documentation
ggml/src/ggml-cuda/mma.cuh
ggml/src/ggml-cuda/mmq.cuh
ggml/src/ggml-cuda/quantize.cu
ggml/src/ggml-cuda/quantize.cuh
ggml/src/ggml-cuda/vecdotq.cuh