]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: batched+noncont MMQ, refactor bs>1 MoE code (#13199)
authorJohannes Gäßler <redacted>
Wed, 30 Apr 2025 21:12:59 +0000 (23:12 +0200)
committerGitHub <redacted>
Wed, 30 Apr 2025 21:12:59 +0000 (23:12 +0200)
commite1e8e0991ffd9e99a445c6812bb519d5bac9f4b5
treea7887b8c3c27cb70688a08b195d13a69e4b3a6aa
parent6f67cf1f480926391ad75ff746e0a021214bf70c
CUDA: batched+noncont MMQ, refactor bs>1 MoE code (#13199)
ggml/src/ggml-cuda/getrows.cu
ggml/src/ggml-cuda/getrows.cuh
ggml/src/ggml-cuda/ggml-cuda.cu
ggml/src/ggml-cuda/mmq.cu
ggml/src/ggml-cuda/mmq.cuh
ggml/src/ggml-cuda/mmvq.cu
ggml/src/ggml-cuda/quantize.cu
ggml/src/ggml-cuda/quantize.cuh
tests/test-backend-ops.cpp