git.djapps.eu Git - pkg/ggml/sources/ggml/commit

author	R0CKSTAR <redacted>
	Sun, 22 Sep 2024 14:55:49 +0000 (22:55 +0800)
committer	Georgi Gerganov <redacted>
	Tue, 24 Sep 2024 10:04:37 +0000 (13:04 +0300)
commit	e3d284ceb787f8320e1c8823262b360cb4d7759c
tree	f9011c750323b4ad7ad6a37440488792537ec49d	tree
parent	766c42b217d9a53c4bfc8b610d8231105192891a	commit \| diff

musa: enable building fat binaries, enable unified memory, and disable Flash Attention on QY1 (MTT S80) (llama/9526)

* mtgpu: add mp_21 support

Signed-off-by: Xiaodong Ye <redacted>
* mtgpu: disable flash attention on qy1 (MTT S80); disable q3_k and mul_mat_batched_cublas

Signed-off-by: Xiaodong Ye <redacted>
* mtgpu: enable unified memory

Signed-off-by: Xiaodong Ye <redacted>
* mtgpu: map cublasOperation_t to mublasOperation_t (sync code to latest)

Signed-off-by: Xiaodong Ye <redacted>
---------

Signed-off-by: Xiaodong Ye <redacted>

src/CMakeLists.txt		diff \| blob \| history
src/ggml-cuda.cu		diff \| blob \| history
src/ggml-cuda/common.cuh		diff \| blob \| history
src/ggml-cuda/fattn-tile-f32.cu		diff \| blob \| history
src/ggml-cuda/vendors/musa.h		diff \| blob \| history