git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	Johannes Gäßler <redacted>
	Mon, 17 Feb 2025 13:03:24 +0000 (14:03 +0100)
committer	GitHub <redacted>
	Mon, 17 Feb 2025 13:03:24 +0000 (14:03 +0100)
commit	73e2ed3ce3492d3ed70193dd09ae8aa44779651d
tree	59ddbdf5022dbe7a87216b88b9e9862433fc8ddb	tree
parent	f7b1116af102bcac450c1a522e9c59db241c6767	commit \| diff

CUDA: use async data loading for FlashAttention (#11894)

* CUDA: use async data loading for FlashAttention

---------

Co-authored-by: Diego Devesa <redacted>

ggml/src/ggml-cuda/common.cuh		diff \| blob \| history
ggml/src/ggml-cuda/cp-async.cuh	[new file with mode: 0644]	blob
ggml/src/ggml-cuda/fattn-common.cuh		diff \| blob \| history
ggml/src/ggml-cuda/fattn-mma-f16.cuh		diff \| blob \| history
ggml/src/ggml-cuda/mma.cuh		diff \| blob \| history
ggml/src/ggml-cuda/mmq.cuh		diff \| blob \| history

Packaging of ggml-org/llama.cpp

RSS Atom