]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: optimize MMQ int8 tensor core performance (#8062)
authorJohannes Gäßler <redacted>
Mon, 24 Jun 2024 10:41:23 +0000 (12:41 +0200)
committerGitHub <redacted>
Mon, 24 Jun 2024 10:41:23 +0000 (12:41 +0200)
commit9a590c82262dd518137f85406e65e452fdf2aca3
treef722351d4e9c0435351723122df3f7f1d203ed1d
parent52fc8705a0617452df08333e1161838726c322b4
CUDA: optimize MMQ int8 tensor core performance (#8062)

* CUDA: optimize MMQ int8 tensor core performance

* only a single get_mma_tile_x_k function

* simplify code, make functions constexpr
ggml-cuda/common.cuh
ggml-cuda/mma.cuh
ggml-cuda/mmq.cuh