]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
llama-quant : correct `n_attention_wv` usage (#20357)
authorddh0 <redacted>
Tue, 10 Mar 2026 19:43:29 +0000 (14:43 -0500)
committerGitHub <redacted>
Tue, 10 Mar 2026 19:43:29 +0000 (21:43 +0200)
commit10e5b148b061569aaee8ae0cf72a703129df0eab
treeb8191c60c7fde02e9f0580596bc9fda512e139bf
parent90b2731894ecd07cb24360231eeec106336e1727
llama-quant : correct `n_attention_wv` usage (#20357)

* llama-quant : correct `n_attention_wv` usage

In #19770, I introduced a regression in the way the
`quantize_state_impl` counter values were initialized. I was
incrementing and using `n_attention_wv` in the same loop, when it should
have been fixed by the time we're deciding tensor types in
`llama_tensor_get_type_impl` (for `use_more_bits`).

I never observed a difference in any of [my
tests](https://github.com/ggml-org/llama.cpp/pull/19770#issuecomment-4000424712)
- it was only after @bartowski kindly pointed this out that I realized
it was incorrect. (Thanks!)

* simplify
src/llama-quant.cpp