]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA: use tensor cores for MMQ (llama/7676)
authorJohannes Gäßler <redacted>
Mon, 10 Jun 2024 09:45:13 +0000 (11:45 +0200)
committerGeorgi Gerganov <redacted>
Sat, 15 Jun 2024 19:05:47 +0000 (22:05 +0300)
commitc570abcd7fc738220b4d033749fd1b02c5da167d
treecb7469013b8dfb5883292899877fedec11616161
parenta32a2b8ffba9c90a8c8cc19eb454285dd68871e9
CUDA: use tensor cores for MMQ (llama/7676)

* CUDA: int8 tensor cores for MMQ (legacy quants)

* fix out-of-bounds writes

* __builtin_assume -> GGML_CUDA_ASSUME

* fix writeback returning too early
src/ggml-cuda/common.cuh
src/ggml-cuda/fattn-common.cuh
src/ggml-cuda/fattn-tile-f16.cu
src/ggml-cuda/fattn-vec-f16.cuh
src/ggml-cuda/fattn-wmma-f16.cuh
src/ggml-cuda/mma.cuh [new file with mode: 0644]
src/ggml-cuda/mmq.cuh