]>
git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
llama : rotate activations for better quantization (#21038)
* llama : rotate activations for better quantization
* cont : rotate V more + refactor
* cont : rotate caches separately + support non-power-of-2 head sizes
* cont : simplify
* cont : add reference for V rotation
* cont : refactor
* cont : support context shift
* cont : consolidate
* cont : dedup + allow different types for the rotation matrix
* cont : add env variable to disable rotation
* cont : simplify attn rot kv cache logic + rename env
* cont : pre-compute the Hadamard matrices