]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: Faster Mixtral prompt processing (#4538)
authorJohannes Gäßler <redacted>
Wed, 20 Dec 2023 14:41:22 +0000 (15:41 +0100)
committerGitHub <redacted>
Wed, 20 Dec 2023 14:41:22 +0000 (15:41 +0100)
commit799fc2268989482054944c902874cca76337580f
treef535df08f2059a709f8f5b8014d532f1aa086a2d
parent328b83de23b33240e28f4e74900d1d06726f5eb1
CUDA: Faster Mixtral prompt processing (#4538)

* CUDA: make MoE tensors contiguous for batch size>1

* Update ggml-cuda.cu

Co-authored-by: slaren <redacted>
---------

Co-authored-by: slaren <redacted>
ggml-cuda.cu