From: Ivy233 <redacted>
Date: Wed, 26 Mar 2025 14:06:04 +0000 (+0800)
Subject: clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend (#12566)
X-Git-Tag: upstream/0.0.5028~65
X-Git-Url: https://git.djapps.eu/?a=commitdiff_plain;h=02082f1519565fc7b49de211b28bc5404a69209b;p=pkg%2Fggml%2Fsources%2Fllama.cpp

clip: Fix llama-llava-clip-quantize-cli quantization error under CUDA backend (#12566)

* [Fix] Compiling clip-quantize-cli and running it in a CUDA environment will cause ggml_fp16_to_fp32 to report an error when trying to access video memory. You need to switch to the CPU backend to run quantize.
After the fix, it will automatically run in the CPU backend and will no longer be bound to CUDA.

* [Fix]Roll back the signature and implementation of clip_model_load, and change the call in clip_model_quantize to clip_init.
---

diff --git a/examples/llava/clip.cpp b/examples/llava/clip.cpp
index a1f050e3..58ee5cf0 100644
--- a/examples/llava/clip.cpp
+++ b/examples/llava/clip.cpp
@@ -2989,7 +2989,10 @@ bool clip_model_quantize(const char * fname_inp, const char * fname_out, const i
     assert(itype < GGML_TYPE_COUNT);
     ggml_type type = static_cast<ggml_type>(itype);
 
-    auto * ctx_clip = clip_model_load(fname_inp, 2);
+    auto * ctx_clip = clip_init(fname_inp, clip_context_params{
+        /* use_gpu */   false,
+        /* verbosity */ 2,
+    });
 
     const auto & ctx_src = ctx_clip->ctx_gguf;
     const auto & ctx_data = ctx_clip->ctx_data;