git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	Kawrakow <redacted>
	Thu, 21 Mar 2024 07:27:57 +0000 (08:27 +0100)
committer	GitHub <redacted>
	Thu, 21 Mar 2024 07:27:57 +0000 (08:27 +0100)
commit	76aa30a26353f597e4fbe3cf776772ae812af89a
tree	35654d27aa0f3fd656aa5cab1125999c13ae5201	tree
parent	c5b8595e3f4f4ed319ef71c9c9d868d1b7a27626	commit \| diff

Add ability to use Q5_0, Q5_1, and IQ4_NL for quantized K cache (#6183)

* k_cache: be able to use Q5_0

* k_cache: be able to use Q5_1 on CODA

* k_cache: be able to use Q5_0 on Metal

* k_cache: be able to use Q5_1 on Metal

* k_cache: be able to use IQ4_NL - just CUDA for now

* k_cache: be able to use IQ4_NL on Metal

* k_cache: add newly added supported types to llama-bench and CUDA supports_op

---------

Co-authored-by: Iwan Kawrakow <redacted>

common/common.cpp		diff \| blob \| history
examples/llama-bench/llama-bench.cpp		diff \| blob \| history
ggml-cuda.cu		diff \| blob \| history
ggml-metal.m		diff \| blob \| history
ggml-metal.metal		diff \| blob \| history

Packaging of ggml-org/llama.cpp

RSS Atom