]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
llama : add support for EmbeddingGemma 300m (#15798)
authorDaniel Bevenius <redacted>
Thu, 4 Sep 2025 16:10:29 +0000 (18:10 +0200)
committerGitHub <redacted>
Thu, 4 Sep 2025 16:10:29 +0000 (18:10 +0200)
commitfb15d649ed14ab447eeab911e0c9d21e35fb243e
tree348eed15f0ba308e7fef092922bc117ef08096fc
parent856ed0947f27b4ec3ad269fceda0402fbab263d3
llama : add support for EmbeddingGemma 300m (#15798)

This commit add support for the EmbeddingGemma 300m. This model supports
sliding window attention (SWA) and a new swq_type is introduced to
support symmetric SWA masking.

This commit also extracts the code from the function
llama_is_masked_swa in llama-impl.h, so that the logic can be shared
by both llm_graph_input_attn_no_cache::set_input and
llama_kv_cache::set_input_kq_mask.

With this commit the EmbeddingGemma 300m model can be converted to
to GGUF and used with llama.cpp.

Once the model has been uploaded to HuggingFace it can be used like
this:
```console
./build/bin/llama-cli -hf ggml-org/embeddinggemma-300m-GGUF:Q8_0
```
15 files changed:
convert_hf_to_gguf.py
gguf-py/gguf/constants.py
gguf-py/gguf/tensor_mapping.py
src/llama-arch.cpp
src/llama-arch.h
src/llama-graph.cpp
src/llama-graph.h
src/llama-hparams.cpp
src/llama-hparams.h
src/llama-kv-cache-iswa.cpp
src/llama-kv-cache.cpp
src/llama-kv-cache.h
src/llama-memory-hybrid.cpp
src/llama-memory-hybrid.h
src/llama-model.cpp