git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Piotr Wilkin (ilintar) <redacted>
	Thu, 4 Dec 2025 21:19:51 +0000 (22:19 +0100)
committer	GitHub <redacted>
	Thu, 4 Dec 2025 21:19:51 +0000 (22:19 +0100)
commit	96fe9badfc5235ff0a049aca647bff8c448055aa
tree	21a2f9aca78c491c0ee1df33efbcc51d4d2e03f3	tree
parent	bde188d60f58012ada0725c6dd5ba7c69fe4dd87	commit \| diff

Add support for CUMSUM and TRI for CUDA. (#17584)

* Add support for CUMSUM and TRI for CUDA.

* Minor optimizations.

* Correct warp_prefix_inclusive_sum in float2 variant to return float2

* Optimize TRI

* Whitespace

* Fix strides.

* Implement double loop

* Whitespace

* Fix HIP compilation bugs

* Optimizations + big case performance tests

* Implement using CUB with fallback to custom kernel

* Remove error message.

* Fixes from code review

* Comment out CPU-unsupported F16/BF16 cases to fix CI

* Fine, you win :P

* Fix last cast, use NO_DEVICE_CODE and GGML_UNUSED_VARS

* Vary warp-size based on physical warp size

* Add GGML_UNUSED_VARS in tri as well

* Use constexpr and call prefix_inclusive with warp_size template param

* Update ggml/src/ggml-cuda/cumsum.cu

Co-authored-by: Johannes Gäßler <redacted>
* Apply suggestions from code review

Co-authored-by: Johannes Gäßler <redacted>
* Change to tid % warp_size

* Fix strides; hardcode mask; add ggml_lane_mask_t

* Missing renames, remove unused get_warp_mask(), explicit calls to ggml_cuda_info()

* Too hasty...

---------

Co-authored-by: Johannes Gäßler <redacted>

ggml/src/ggml-cuda/common.cuh		diff \| blob \| history
ggml/src/ggml-cuda/cumsum.cu	[new file with mode: 0644]	blob
ggml/src/ggml-cuda/cumsum.cuh	[new file with mode: 0644]	blob
ggml/src/ggml-cuda/ggml-cuda.cu		diff \| blob \| history
ggml/src/ggml-cuda/tri.cu	[new file with mode: 0644]	blob
ggml/src/ggml-cuda/tri.cuh	[new file with mode: 0644]	blob
tests/test-backend-ops.cpp		diff \| blob \| history