git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	ZeroV0LT <redacted>
	Fri, 13 Mar 2026 18:53:42 +0000 (19:53 +0100)
committer	GitHub <redacted>
	Fri, 13 Mar 2026 18:53:42 +0000 (20:53 +0200)
commit	f17b3be63fd9dfa3d12ad3555c977a6430661769
tree	000ad2dc05407828f71abd9215542760237a4321	tree
parent	d7ba99c4850bd687621f13329490dc28f28f17c9	commit \| diff

llama : fix pooling assertion crash in chunked GDN detection path (#20468)

* llama : fix pooling assertion crash in chunked GDN detection path

The chunked fused Gated Delta Net detection in sched_reserve() calls
graph_reserve(16*n_seqs, n_seqs, n_outputs, ...) where n_outputs = n_seqs.
This creates a dimension mismatch in build_pooling() for embedding models
with mean/rank pooling: build_inp_mean() creates a tensor with shape
[n_tokens=16*n_seqs, ...] while t_embd is reduced to [n_outputs=n_seqs, ...]
via out_ids, causing ggml_mul_mat to assert on ggml_can_mul_mat(a, b).

Fix: pass n_tokens as n_outputs in the chunked GDN graph reservation,
matching the pattern used by the pp/tg worst-case reservations.

Regression introduced by #20340 (d28961d).
Same class of bug as #12517, fixed by #12545.

* server : add mean pooling tests to embedding test suite

Add test_embedding_pooling_mean and test_embedding_pooling_mean_multiple
to cover the --pooling mean codepath, which was previously untested.

These tests would have caught the regression introduced by #20340 where
build_pooling() crashes with a ggml_mul_mat assertion due to mismatched
dimensions in the chunked GDN detection path.

---------

Co-authored-by: Domenico Crupi <redacted>

src/llama-context.cpp		diff \| blob \| history
tools/server/tests/unit/test_embedding.py		diff \| blob \| history