]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
CUDA: don't convert BF16 weights to FP32 (ggml/1174)
authorSigbjørn Skjæret <redacted>
Fri, 4 Apr 2025 19:05:12 +0000 (21:05 +0200)
committerGeorgi Gerganov <redacted>
Mon, 7 Apr 2025 15:44:17 +0000 (18:44 +0300)
commit36ca8b362885e4b4984d72e348ba403568864a95
tree7da7c011daa032f95a37e4214b0fe0624d3ce4a4
parent995083e4ed24933e6a289472f9de0f0b53ca5eca
CUDA: don't convert BF16 weights to FP32 (ggml/1174)

* add bf16 support

* use convert_from_bf16_cuda instead of convert_unary_cuda for f32

* revert 7ec5085

* move functionality into convert_unary with constexpr
ggml/src/ggml-cuda/convert.cu
ggml/src/ggml-cuda/convert.cuh
ggml/src/ggml-cuda/ggml-cuda.cu