]> git.djapps.eu Git - pkg/ggml/sources/whisper.cpp/commit
llama: add support for QRWKV6 model architecture (llama/11001)
authorMolly Sophia <redacted>
Fri, 10 Jan 2025 01:58:08 +0000 (09:58 +0800)
committerGeorgi Gerganov <redacted>
Tue, 14 Jan 2025 08:38:01 +0000 (10:38 +0200)
commit06209f6683c51b0d8489f46f1d2422b22baf0a6e
tree4740cf6597e7c2b3828988011804beeb6e4cf5ab
parentc3235bd81e3f5c91660faae0f62306e8920ef1a3
llama: add support for QRWKV6 model architecture (llama/11001)

llama: add support for QRWKV6 model architecture (llama/11001)

* WIP: Add support for RWKV6Qwen2

Signed-off-by: Molly Sophia <redacted>
* RWKV: Some graph simplification

Signed-off-by: Molly Sophia <redacted>
* Add support for RWKV6Qwen2 with cpu and cuda GLA

Signed-off-by: Molly Sophia <redacted>
* RWKV6[QWEN2]: Concat lerp weights together to reduce cpu overhead

Signed-off-by: Molly Sophia <redacted>
* Fix some typos

Signed-off-by: Molly Sophia <redacted>
* code format changes

Signed-off-by: Molly Sophia <redacted>
* Fix wkv test & add gla test

Signed-off-by: Molly Sophia <redacted>
* Fix cuda warning

Signed-off-by: Molly Sophia <redacted>
* Update README.md

Signed-off-by: Molly Sophia <redacted>
* Update ggml/src/ggml-cuda/gla.cu

Co-authored-by: Georgi Gerganov <redacted>
* Fix fused lerp weights loading with RWKV6

Signed-off-by: Molly Sophia <redacted>
* better sanity check skipping for QRWKV6 in llama-quant

thanks @compilade

Signed-off-by: Molly Sophia <redacted>
Co-authored-by: compilade <redacted>
---------

Signed-off-by: Molly Sophia <redacted>
Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: compilade <redacted>
ggml/include/ggml.h
ggml/src/ggml-cpu/ggml-cpu.c
ggml/src/ggml-cuda/ggml-cuda.cu
ggml/src/ggml-cuda/gla.cu [new file with mode: 0644]
ggml/src/ggml-cuda/gla.cuh [new file with mode: 0644]
ggml/src/ggml-cuda/wkv6.cu
ggml/src/ggml-sycl/wkv6.cpp
ggml/src/ggml-vulkan/ggml-vulkan.cpp
ggml/src/ggml.c