From: Ruikai Peng Date: Fri, 20 Mar 2026 09:31:34 +0000 (+0800) Subject: context: zero output buffer on allocation (#20781) X-Git-Tag: upstream/0.0.8611~159 X-Git-Url: https://git.djapps.eu/?a=commitdiff_plain;h=dc6592431b909208040c1a8e953e6c5440471eaa;p=pkg%2Fggml%2Fsources%2Fllama.cpp context: zero output buffer on allocation (#20781) * context: zero output buffer on allocation Address GHSA-wqq9-25mr-rw76. The logits output buffer allocated in output_reserve() uses posix_memalign(), which does not zero memory. The buffer is only written during decode when needs_raw_logits() returns true. When backend samplers cover all output sequences, needs_raw_logits() returns false and the buffer is never written, but llama_get_logits() still returns a pointer to it, exposing stale heap content. Zero the buffer after allocation to prevent information disclosure through the public logits API. Found-by: Pwno * Update src/llama-context.cpp Co-authored-by: Georgi Gerganov --------- Co-authored-by: Georgi Gerganov --- diff --git a/src/llama-context.cpp b/src/llama-context.cpp index dc61afb0b..8f25d4778 100644 --- a/src/llama-context.cpp +++ b/src/llama-context.cpp @@ -1946,6 +1946,7 @@ uint32_t llama_context::output_reserve(int32_t n_outputs) { LLAMA_LOG_ERROR("%s: failed to allocate output buffer of size %.2f MiB\n", __func__, new_size / (1024.0 * 1024.0)); return 0; } + ggml_backend_buffer_clear(buf_output.get(), 0); } float * output_base = (float *) ggml_backend_buffer_get_base(buf_output.get());