git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

CUDA: optimize and refactor MMQ (#8416)

* CUDA: optimize and refactor MMQ

* explicit q8_1 memory layouts, add documentation

Packaging of ggml-org/llama.cpp

ggml/src/ggml-cuda/mma.cuh		diff \| blob \| history
ggml/src/ggml-cuda/mmq.cuh		diff \| blob \| history
ggml/src/ggml-cuda/quantize.cu		diff \| blob \| history
ggml/src/ggml-cuda/quantize.cuh		diff \| blob \| history
ggml/src/ggml-cuda/vecdotq.cuh		diff \| blob \| history