git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

overview / pkg / ggml / sources / llama.cpp / commit

author	Kawrakow <redacted>
	Thu, 8 Jun 2023 19:28:21 +0000 (22:28 +0300)
committer	GitHub <redacted>
	Thu, 8 Jun 2023 19:28:21 +0000 (22:28 +0300)
commit	72ff5282bf0388c60821f504c4c8cc2b1f491aa6
tree	19d6971bdd6934b72a000694f2b1791dadd9f7dc	tree
parent	0bf7cf1b296fc9fca05411b37afdf08a531487d2	commit \| diff

metal : add Q2_K implementation (#1762)

* metal : add Q2_K implementation

27.1 ms / token on M2 Max 30-core GPU, so about the
same speed as Q4_0. Memory throughput is ~156 GB/s.

The access pattern used in the Q2_K
CUDA implementation resulted in significantly lower
performance (~31 ms/token).

* Fixing merge conflicts

---------

Co-authored-by: Iwan Kawrakow <redacted>

ggml-metal.m		diff \| blob \| history
ggml-metal.metal		diff \| blob \| history

Packaging of ggml-org/llama.cpp

RSS Atom