git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Juk Armstrong <redacted>
	Tue, 15 Apr 2025 06:49:57 +0000 (07:49 +0100)
committer	GitHub <redacted>
	Tue, 15 Apr 2025 06:49:57 +0000 (09:49 +0300)
commit	daa422881a0ec7944771bcc8ff8de34d11f5bd3b
tree	a3dff0819a99c851fc4a6d878df489d5b0f54f7f	tree
parent	eccc7a1602f0752507de4aaad1008b9618a282c8	commit \| diff

llama : DeepSeek V2/V3 MLA implementation (#12801)

* Merged using squash to remove all noise commit messages

* Force flash attention off for `LLM_ARCH_DEEPSEEK2` - embedding too large

* Removed 3 conts (2x RoPE and 1x RMS-norm)

* Changed to use `<cmath>` instead of `<math.h>`

* Reverted removal of the 3 conts

* Used `reshape` in `llm_graph_context::build_attn_mha()`

* Use `k_pe = ggml_reshape`

* Removed the 3 conts again

* Removed the 3D views of `wk_b` and `wv_b`, and just save and 3D in GGUF

* Removed MQA optimisation from `build_attn_mha()` as no gains now

* Simplified `is_mla` branch in `llm_build_deepseek2()`

* Removed `build_attn_mla` and added `nullptr` to all `build_atnn` calls

* Fixed call to `build_attn` in `llm_build_t5_enc`

convert_hf_to_gguf.py		diff \| blob \| history
gguf-py/gguf/constants.py		diff \| blob \| history
gguf-py/gguf/gguf_writer.py		diff \| blob \| history
gguf-py/gguf/tensor_mapping.py		diff \| blob \| history
src/llama-arch.cpp		diff \| blob \| history
src/llama-arch.h		diff \| blob \| history
src/llama-context.cpp		diff \| blob \| history
src/llama-graph.cpp		diff \| blob \| history
src/llama-graph.h		diff \| blob \| history
src/llama-hparams.h		diff \| blob \| history
src/llama-kv-cache.cpp		diff \| blob \| history
src/llama-model.cpp		diff \| blob \| history
src/llama-model.h		diff \| blob \| history