git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	fairydreaming <redacted>
	Tue, 28 May 2024 15:07:05 +0000 (17:07 +0200)
committer	GitHub <redacted>
	Tue, 28 May 2024 15:07:05 +0000 (17:07 +0200)
commit	ee3dff6b8e39bb8c1cdea1782a7b95ef0118f970
tree	28eaea501c6c929f98442cc451eb14b380c02de6	tree
parent	edc29433fa08b4e5aeb67649a29fc7713af13d04	commit \| diff

Add support for DeepseekV2ForCausalLM (#7519)

* common : increase max number of experts to 160

* common : add tensors ATTN_Q_A, ATTN_Q_A_NORM, ATTN_Q_B, ATTN_KV_A_MQA, ATTN_KV_A_NORM, ATTN_KV_B needed by DeepSeek-V2 MLA (multi-head latent attention) architecture

* common : add model header parameters: leading_dense_block_count, expert_feed_forward_length, expert_shared_count, expert_weights_scale, attention.q_lora_rank, attention.kv_lora_rank, rope.scaling.yarn_log_multiplier

* convert-hf : add model conversion support for DeepseekV2ForCausalLM

* llama : add model types for DeepSeek-V2 and DeepSeek-V2-Lite models

* llama : add two new llm_build_moe_ffn() arguments: scale_w (whether to scale weights of selected MoE experts) and w_scale (numerical value of the scaling factor)

* llama : add inference support for LLM_ARCH_DEEPSEEK2

---------

Co-authored-by: Stanisław Szymczyk <redacted>

convert-hf-to-gguf.py		diff \| blob \| history
gguf-py/gguf/constants.py		diff \| blob \| history
gguf-py/gguf/gguf_writer.py		diff \| blob \| history
gguf-py/gguf/tensor_mapping.py		diff \| blob \| history
llama.cpp		diff \| blob \| history