]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID (#13014)
authorJohannes Gäßler <redacted>
Tue, 22 Apr 2025 19:27:40 +0000 (21:27 +0200)
committerGitHub <redacted>
Tue, 22 Apr 2025 19:27:40 +0000 (21:27 +0200)
commit658987cfc9d752dca7758987390d5fb1a7a0a54a
treeca2fc6d20f2281d0312a395508e891600d5620f8
parentdc39a5e7a84815a90fa0c515ed8927870cf858c9
CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID (#13014)

* CUDA: noncont MMVQ + batched bs1 MUL_MAT_ID

* fix logic for RoPE support, CUDA graphs
ggml/src/ggml-cuda/ggml-cuda.cu
ggml/src/ggml-cuda/mmv.cu
ggml/src/ggml-cuda/mmv.cuh
ggml/src/ggml-cuda/mmvq.cu
ggml/src/ggml-cuda/mmvq.cuh
ggml/src/ggml-cuda/quantize.cu
ggml/src/ggml-cuda/quantize.cuh
ggml/src/ggml-cuda/vecdotq.cuh
tests/test-backend-ops.cpp