git.djapps.eu Git - pkg/ggml/sources/ggml/commit

]> git.djapps.eu Git - pkg/ggml/sources/ggml/commit

overview / pkg / ggml / sources / ggml / commit

author	Kawrakow <redacted>
	Thu, 21 Mar 2024 07:27:57 +0000 (08:27 +0100)
committer	Georgi Gerganov <redacted>
	Wed, 27 Mar 2024 11:20:00 +0000 (13:20 +0200)
commit	36143de86ba4ae32e81252df41367d6b37c0cf57
tree	7905a88afd1f0dfea540f518b625badb0a6e64a1	tree
parent	5860eb04a2c995c267e11110f42b3c99073d8187	commit \| diff

Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (llama/6183)

* k_cache: be able to use Q5_0

* k_cache: be able to use Q5_1 on CODA

* k_cache: be able to use Q5_0 on Metal

* k_cache: be able to use Q5_1 on Metal

* k_cache: be able to use IQ4_NL - just CUDA for now

* k_cache: be able to use IQ4_NL on Metal

* k_cache: add newly added supported types to llama-bench and CUDA supports_op

---------

Co-authored-by: Iwan Kawrakow <redacted>

src/ggml-cuda.cu		diff \| blob \| history
src/ggml-metal.m		diff \| blob \| history
src/ggml-metal.metal		diff \| blob \| history

Packaging of ggml-org/ggml

RSS Atom