]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA & CPU: support F32 kernel type for `CONV_TRANSPOSE_2D` (#17094)
authorYihao Wang <redacted>
Thu, 26 Mar 2026 02:19:14 +0000 (19:19 -0700)
committerGitHub <redacted>
Thu, 26 Mar 2026 02:19:14 +0000 (10:19 +0800)
commit0a524f240456d7727570043f97757ea2c249003b
treef9a174b1fec65809e43a537d771cbe9ec0209aff
parentc0159f9c1f874da15e94f371d136f5920b4b5335
CUDA & CPU: support F32 kernel type for `CONV_TRANSPOSE_2D` (#17094)

* Refactor CUDA 2D transpose implementation to support multiple kernel types and improve parameter handling

- Introduced a `conv2d_transpose_params` struct for better parameter management.
- Updated `conv2d_transpose_kernel` to be templated for different kernel types (float and half).
- Modified `ggml_cuda_conv_2d_transpose_p0` to handle both F16 and F32 kernel types.
- Enhanced test cases to validate functionality for both kernel types.

* Refactor test cases for 2D convolution transpose to support dynamic kernel types

- Updated `test_conv_transpose_2d` structure to improve parameter handling by reordering constructor arguments.
- Enhanced test case generation to iterate over kernel types, allowing for flexible testing of different configurations.
- Removed hardcoded kernel type instances in favor of a loop for better maintainability and scalability.

* Refactor ggml_compute_forward_conv_transpose_2d to support both F16 and F32 tensor types.

* Refactor conv2d transpose kernel to use a template for kernel type, enhancing flexibility for different data types.
Update test cases to include both F16 and F32 tensor types for comprehensive coverage.

* Update ggml/src/ggml-cuda/conv2d-transpose.cu

Co-authored-by: Aman Gupta <redacted>
* Update ggml/src/ggml-cpu/ggml-cpu.c

Co-authored-by: Aman Gupta <redacted>
* Refactor conv2d transpose implementation by removing the conv2d_transpose_params struct and dispatching with direct kernel launch.

* Enhance cpu conv2d transpose implementation by introducing a templated kernel type for improved flexibility with F16 and F32 data types.

---------

Co-authored-by: Aman Gupta <redacted>
ggml/src/ggml-cpu/ggml-cpu.c
ggml/src/ggml-cpu/ops.cpp
ggml/src/ggml-cuda/conv2d-transpose.cu
ggml/src/ggml-cuda/conv2d-transpose.cuh
tests/test-backend-ops.cpp