git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Daniel Bevenius <redacted>
	Thu, 4 Sep 2025 16:10:29 +0000 (18:10 +0200)
committer	GitHub <redacted>
	Thu, 4 Sep 2025 16:10:29 +0000 (18:10 +0200)
commit	fb15d649ed14ab447eeab911e0c9d21e35fb243e
tree	348eed15f0ba308e7fef092922bc117ef08096fc	tree
parent	856ed0947f27b4ec3ad269fceda0402fbab263d3	commit \| diff

llama : add support for EmbeddingGemma 300m (#15798)

This commit add support for the EmbeddingGemma 300m. This model supports
sliding window attention (SWA) and a new swq_type is introduced to
support symmetric SWA masking.

This commit also extracts the code from the function
llama_is_masked_swa in llama-impl.h, so that the logic can be shared
by both llm_graph_input_attn_no_cache::set_input and
llama_kv_cache::set_input_kq_mask.

With this commit the EmbeddingGemma 300m model can be converted to
to GGUF and used with llama.cpp.

Once the model has been uploaded to HuggingFace it can be used like
this:
```console
./build/bin/llama-cli -hf ggml-org/embeddinggemma-300m-GGUF:Q8_0
```

convert_hf_to_gguf.py		diff \| blob \| history
gguf-py/gguf/constants.py		diff \| blob \| history
gguf-py/gguf/tensor_mapping.py		diff \| blob \| history
src/llama-arch.cpp		diff \| blob \| history
src/llama-arch.h		diff \| blob \| history
src/llama-graph.cpp		diff \| blob \| history
src/llama-graph.h		diff \| blob \| history
src/llama-hparams.cpp		diff \| blob \| history
src/llama-hparams.h		diff \| blob \| history
src/llama-kv-cache-iswa.cpp		diff \| blob \| history
src/llama-kv-cache.cpp		diff \| blob \| history
src/llama-kv-cache.h		diff \| blob \| history
src/llama-memory-hybrid.cpp		diff \| blob \| history
src/llama-memory-hybrid.h		diff \| blob \| history
src/llama-model.cpp		diff \| blob \| history