git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

ggml : mul_mat_id use the same tensor for all the experts (#6387)

* ggml : update mul_mat_id to use the same tensor for all the experts

* update cuda

* minor

* update metal

* update test-backend-ops

* fix cuda

* Update ggml-metal.m

Co-authored-by: Georgi Gerganov <redacted>
* update convert.py

* update convert-hf-to-gguf.py

* update convert.py for mixtral hf models

* Update convert-hf-to-gguf.py

Co-authored-by: Georgi Gerganov <redacted>
* cuda : support non-pow-2 number of experts

* allow quantize to work for split and merged experts models in the same way

* cleanup + disable mmap automatically with split tensors models

* update imatrix

* test-backend-ops : test qwen argsort

* update grok model loading

* llama : add merged experts tensors to the grok tensor map

* minor

* gguf : bump version

* fix quantizing of merged experts

* convert-hf-to-gguf.py : update grok (untested)

* make linter happy

* cuda/argsort : use shared memory instead of pool memory

* convert : fix grok tensor names

* metal : add support for non-pow-2 argsort

* llama : more loader cleanup, better error checking

* cuda : fix warning

* llama : still use mmap for loading old models, but copy the data to a host buffer

* add review note

* llama : remove ffn tensor counting + add sanity check

ggml-ci

* convert : fix handling of n_experts == None

ggml-ci

* imatrix : fix ncall counters

* llama : produce error if imatrix size does not match

* quantize : terminate on errors + trace logs

ggml-ci

* metal : pad shared memory to 16 bytes

---------

Co-authored-by: Georgi Gerganov <redacted>

author	slaren <redacted>
	Wed, 3 Apr 2024 13:07:05 +0000 (15:07 +0200)
committer	GitHub <redacted>
	Wed, 3 Apr 2024 13:07:05 +0000 (16:07 +0300)
commit	08a0c0206075556e82aca0feafad530dcc5f1426
tree	3937cd263076c548ba25348253dcec6d355b8fef	tree
parent	52604860f93063ef98863921da697576af1c7665	commit \| diff

convert-hf-to-gguf.py		diff \| blob \| history
convert.py		diff \| blob \| history
examples/imatrix/imatrix.cpp		diff \| blob \| history
examples/quantize/quantize.cpp		diff \| blob \| history
ggml-cuda.cu		diff \| blob \| history
ggml-cuda/argsort.cu		diff \| blob \| history
ggml-metal.m		diff \| blob \| history
ggml-metal.metal		diff \| blob \| history
ggml.c		diff \| blob \| history
ggml.h		diff \| blob \| history
gguf-py/gguf/constants.py		diff \| blob \| history
gguf-py/gguf/tensor_mapping.py		diff \| blob \| history
gguf-py/pyproject.toml		diff \| blob \| history
llama.cpp		diff \| blob \| history
tests/test-backend-ops.cpp		diff \| blob \| history