]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: use tensor cores for MMQ (#7676)
authorJohannes Gäßler <redacted>
Mon, 10 Jun 2024 09:45:13 +0000 (11:45 +0200)
committerGitHub <redacted>
Mon, 10 Jun 2024 09:45:13 +0000 (11:45 +0200)
commit1f0dabda8d5c131f9d4632aa41de74317cdd61fb
tree70f8c54d4752f196616adf6081498d8f92992ec0
parentaf4ae502ddaeb03cd5861273ca2e9a5ae4551db7
CUDA: use tensor cores for MMQ (#7676)

* CUDA: int8 tensor cores for MMQ (legacy quants)

* fix out-of-bounds writes

* __builtin_assume -> GGML_CUDA_ASSUME

* fix writeback returning too early
ggml-cuda/common.cuh
ggml-cuda/fattn-common.cuh
ggml-cuda/fattn-tile-f16.cu
ggml-cuda/fattn-vec-f16.cuh
ggml-cuda/fattn-wmma-f16.cuh
ggml-cuda/mma.cuh [new file with mode: 0644]
ggml-cuda/mmq.cuh