git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Daniel Bevenius <redacted>
	Sat, 6 Dec 2025 11:26:20 +0000 (12:26 +0100)
committer	GitHub <redacted>
	Sat, 6 Dec 2025 11:26:20 +0000 (12:26 +0100)
commit	444f00b0ec814a071ce1b9dc0de5ea4b4850bd1b
tree	c7431429b80f6254792eb3546227a60ad4a9b93d	tree
parent	2960eb2975da26f0d0b1e30cb6e09f25fe3dac0e	commit \| diff

llama : remove quantization sanity check (#17788)

* llama : remove quantization sanity check

This commit removes the quantization sanity check for attention layers.

The motivation for this is that there are model that are hybrid models
that have recurrent layers, experts layers, and attention layers. For
these models the current check fails as the experts layers are not
taking into account. After consideration, it was decided that this check
is not strictly necessary, and can be removed to allow for more flexible
model architectures.

* llama : remove unused pruned_attention_w and is_clip_model vars