]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
musa: enable building fat binaries, enable unified memory, and disable Flash Attentio...
authorR0CKSTAR <redacted>
Sun, 22 Sep 2024 14:55:49 +0000 (22:55 +0800)
committerGeorgi Gerganov <redacted>
Tue, 24 Sep 2024 10:04:37 +0000 (13:04 +0300)
commite3d284ceb787f8320e1c8823262b360cb4d7759c
treef9011c750323b4ad7ad6a37440488792537ec49d
parent766c42b217d9a53c4bfc8b610d8231105192891a
musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80) (llama/9526)

* mtgpu: add mp_21 support

Signed-off-by: Xiaodong Ye <redacted>
* mtgpu: disable flash attention on qy1 (MTT S80); disable q3_k and mul_mat_batched_cublas

Signed-off-by: Xiaodong Ye <redacted>
* mtgpu: enable unified memory

Signed-off-by: Xiaodong Ye <redacted>
* mtgpu: map cublasOperation_t to mublasOperation_t (sync code to latest)

Signed-off-by: Xiaodong Ye <redacted>
---------

Signed-off-by: Xiaodong Ye <redacted>
src/CMakeLists.txt
src/ggml-cuda.cu
src/ggml-cuda/common.cuh
src/ggml-cuda/fattn-tile-f32.cu
src/ggml-cuda/vendors/musa.h