]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit
CUDA & CPU: support F32 kernel type for `CONV_TRANSPOSE_2D` (llama/17094)
authorYihao Wang <redacted>
Thu, 26 Mar 2026 02:19:14 +0000 (19:19 -0700)
committerGeorgi Gerganov <redacted>
Sat, 28 Mar 2026 11:39:09 +0000 (13:39 +0200)
commit83d1f963f395a31aecf2da8265b0a5d1d8cbcb85
tree19c27032ff6fb558aa5f8dbd2ab512ed9f886e88
parent4a424d3fcf4110f5f2ba0003d5d340c0d62c2217
CUDA & CPU: support F32 kernel type for `CONV_TRANSPOSE_2D` (llama/17094)

* Refactor CUDA 2D transpose implementation to support multiple kernel types and improve parameter handling

- Introduced a `conv2d_transpose_params` struct for better parameter management.
- Updated `conv2d_transpose_kernel` to be templated for different kernel types (float and half).
- Modified `ggml_cuda_conv_2d_transpose_p0` to handle both F16 and F32 kernel types.
- Enhanced test cases to validate functionality for both kernel types.

* Refactor test cases for 2D convolution transpose to support dynamic kernel types

- Updated `test_conv_transpose_2d` structure to improve parameter handling by reordering constructor arguments.
- Enhanced test case generation to iterate over kernel types, allowing for flexible testing of different configurations.
- Removed hardcoded kernel type instances in favor of a loop for better maintainability and scalability.

* Refactor ggml_compute_forward_conv_transpose_2d to support both F16 and F32 tensor types.

* Refactor conv2d transpose kernel to use a template for kernel type, enhancing flexibility for different data types.
Update test cases to include both F16 and F32 tensor types for comprehensive coverage.

* Update ggml/src/ggml-cuda/conv2d-transpose.cu

Co-authored-by: Aman Gupta <redacted>
* Update ggml/src/ggml-cpu/ggml-cpu.c

Co-authored-by: Aman Gupta <redacted>
* Refactor conv2d transpose implementation by removing the conv2d_transpose_params struct and dispatching with direct kernel launch.

* Enhance cpu conv2d transpose implementation by introducing a templated kernel type for improved flexibility with F16 and F32 data types.

---------

Co-authored-by: Aman Gupta <redacted>
src/ggml-cpu/ggml-cpu.c
src/ggml-cpu/ops.cpp
src/ggml-cuda/conv2d-transpose.cu
src/ggml-cuda/conv2d-transpose.cuh
tests/test-backend-ops.cpp