]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID (llama/13014)
authorJohannes Gäßler <redacted>
Tue, 22 Apr 2025 19:27:40 +0000 (21:27 +0200)
committerGeorgi Gerganov <redacted>
Thu, 24 Apr 2025 15:36:25 +0000 (18:36 +0300)
commit658de8ce915ce516833fa89226e876cad1df4c73
tree986d43bd0d09e7e123be0b9ba9b4cb064a3051cd
parentfd9dddbd6b21d40600dd522571fc863b04243e99
CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID (llama/13014)

* CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID

* fix logic for RoPE support, CUDA graphs
src/ggml-cuda/ggml-cuda.cu
src/ggml-cuda/mmv.cu
src/ggml-cuda/mmv.cuh
src/ggml-cuda/mmvq.cu
src/ggml-cuda/mmvq.cuh
src/ggml-cuda/quantize.cu
src/ggml-cuda/quantize.cuh
src/ggml-cuda/vecdotq.cuh
tests/test-backend-ops.cpp