]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/shortlog
pkg/ggml/sources/llama.cpp
2026-03-09 Paul Flynnmetal : extend mul_mv_ext to BF16, Q2_K, Q3_K (#20250)
2026-03-09 Georgi Gerganovserver : fix checkpoints n_tokens calculation (#20287)
2026-03-09 Georgi Gerganovmetal : add upscale (#20284)
2026-03-09 Georgi Gerganovserver : warn swa-full is not supported for non-SWA...
2026-03-09 Georgi Gerganovserver : fix off-by-1 in server_tokens::size_up_to_pos...
2026-03-09 Piotr Wilkin... common: map developer role to system (#20215)
2026-03-09 Georgi Gerganovmodels : fix assert in mamba2 graph (#20270)
2026-03-09 Georgi Gerganovserver : add kill switch when server is stuck (#20277)
2026-03-09 Aman Guptaggml-cuda: disable gdn for musa (#20278)
2026-03-09 ddh0llama-quant : left-align tensor names in output (#20117)
2026-03-09 Aman Guptacontributing: limit open PRs for new contributors to...
2026-03-09 Bertay Erenggml-vulkan: add SGN operator, auto-generate Vulkan...
2026-03-09 Ruben Ortlamvulkan: skip zero size tensors in backend copies (...
2026-03-09 Michael Huangcuda : display total and free VRAM capacity during...
2026-03-09 Aaron Teollama-bench: introduce `-hf` and `-hff` flags & use...
2026-03-09 Piotr Wilkin... PEG parser for LFM2 (#20251)
2026-03-08 Georgi Gerganovserver : do not create checkpoints right after mtmd...
2026-03-08 Sigbjørn Skjæretgraph : remove redundant scale_w parameter (#20235)
2026-03-08 Aldehir Rojascommon : gracefully handle incomplete output (#20191)
2026-03-08 Piotr Wilkin... Fix compile bug (#20203)
2026-03-08 Piotr Wilkin... Fix structured outputs (#20223)
2026-03-08 GiantPrinceggml-vulkan: Add ELU op support (#20183)
2026-03-08 Jeff Bolzvulkan: Fix data races in coopmat1 mul_mat(_id) (#20084)
2026-03-08 Johannes Gäßlerllama: end-to-end tests (#19802)
2026-03-08 Christopher... readme : update infra list (#20212)
2026-03-08 Piotr Wilkin... Revert to OAI-compatible args (#20213)
2026-03-08 decahedron1server : correct index on finish in OAI completion...
2026-03-08 Neo Zhang[SYCL] supprt Flash Attention for fp32/fp16/Q4/Q5/Q8...
2026-03-07 Aman Guptaggml: add GATED_DELTA_NET op (#19504)
2026-03-07 lhezopencl: add l2_norm (#20160)
2026-03-07 Piotr Wilkin... Autoparser: True streaming (#20177)
2026-03-06 Piotr Wilkin... Autoparser: add optional argument reshuffle capability...
2026-03-06 Bartowskiquants : Add memsets and other fixes for IQ quants...
2026-03-06 Piotr Wilkin... Add @pwilkin to CODEOWNERS for autoparser code (#20174)
2026-03-06 Piotr Wilkin... Autoparser - complete refactoring of parser architectur...
2026-03-06 Todor Boinovskihexagon: add f32 ssm_conv op (#20122)
2026-03-06 Tom Vaucourtserver : preserve anthropic thinking blocks in conversi...
2026-03-06 Max Krasnyanskycpu: skip redudant ROPE cache updates (#20149)
2026-03-06 Aman Guptaggml-cuda: add mem check for fusion (#19916)
2026-03-06 Aaron Teoggml: update comments for backends which have no memory...
2026-03-06 shalinib-ibmggml-cpu: Fix gcc 15 ICE on ppc64le (#20083) (#20130)
2026-03-06 Aman GuptaCUDA: use shared mem for ssm_conv (#20128)
2026-03-06 Tim Neumanncontext: ignore zero scale LoRAs when checking sameness...
2026-03-06 Piotr Wilkin... Checkpoint every n tokens: squash (#20087)
2026-03-06 Aleksander... webui: Agentic Loop + MCP Client with support for Tools...
2026-03-06 Johannes Gäßlerggml-cpu: fix data race for debug asserts (#20148)
2026-03-06 Georgi Gerganovkv-cache : fix M-RoPE checkpoints (#20132)
2026-03-06 Roj234cli : Don't clear system prompt when using '/clear...
2026-03-06 lhezopencl: add neg, exp and diag (#20127)
2026-03-06 YardenTal44hexagon: add fp16 support for binary ops: add,sub,mul...
2026-03-05 ymckimodels : kda chunk size = 16 (#19827)
2026-03-05 Andreas KieslingerCUDA: Improve performance via less synchronizations...
2026-03-05 Eric Zhangmodel : update Qwen3.5 model type detection (#20126)
2026-03-05 Sigbjørn Skjæretcli : add command and file auto-completion (#19985)
2026-03-05 Sigbjørn Skjæretconvert : register Qwen 3.5 ForCausalLM for text only...
2026-03-05 Aleksander... webui: Improvements for Models Selector UI (#20066)
2026-03-05 Marcel Petrickchore : correct typos [no ci] (#20041)
2026-03-05 Max Krasnyanskyhexagon: Flash Attention optimizations (dma, mpyacc...
2026-03-05 lhezopencl: add `SET`, support i32 for `CPY`, minor refacto...
2026-03-04 Todor Boinovskihexagon: add llama-completion runner script (#20095)
2026-03-04 Nikhil Jain[WebGPU] Fix wait logic for inflight jobs (#20096)
2026-03-04 Masashi YoshimuraAdd concat op to webgpu. (#20068)
2026-03-04 Sigbjørn Skjærettools : add missing clocale include in mtmd-cli [no...
2026-03-04 Johannes Gäßlerggml: fix ggml_is_contiguous_n for ne == 1 (#20092)
2026-03-04 Adrien Gallouëtggml : use a simple std::thread in AMX without OpenMP...
2026-03-04 ddh0impl : use 6 digits for tensor dims (#20094)
2026-03-04 SamareshSinghFix locale-dependent float printing in GGUF metadata...
2026-03-04 standby24x7completion : Fix a typo in warning message (#20082)
2026-03-03 Mickael Desgrangesdocs: Fix intel documentation link (#20040)
2026-03-03 Charles Xukleidiai : add sme fp16 compute path for q4_0 gemm...
2026-03-03 shaofeiqiopencl: add optimized q4_1 mm kernel for adreno (#19840)
2026-03-03 Abhijit Rameshggml webgpu: fix workgroup dispatch limit for large...
2026-03-02 Nikhil Jainggml webgpu: Clean up per-thread parameter buffer pool...
2026-03-02 Masashi Yoshimuraggml-webgpu: Support non-contiguous `src0` and overlapp...
2026-03-02 Ruben Ortlamvulkan: tune MMVQ for Intel Windows (#19988)
2026-03-02 Adrien Gallouëtscripts : improve get-wikitext-2.sh (#19952)
2026-03-02 Aaron Teoggml-cpu: optimise s390x multiply extend instructions...
2026-03-01 Ruben Ortlamvulkan: improve partial offloading performance on AMD...
2026-03-01 oobaboogacuda: cap grid.y at 65535 in non-contiguous dequantize...
2026-02-28 Dmitry Atamanovvendors : update miniaudio library to 0.11.24 (#19914)
2026-02-28 Adrien Gallouëtvendor : update cpp-httplib to 0.35.0 (#19969)
2026-02-28 Bartowskitests : model metadata loading from huggingface (#19796)
2026-02-27 Jayant LohiaCUDA: add CDNA3 MFMA support for flash attention MMA...
2026-02-27 Roj234server: Add pragma once to server-context.h (#19944)
2026-02-27 Sami Kamaserver: Mirroring /v1/responses to /responses to match...
2026-02-27 Daniel Beveniusci : use ubuntu-latest for gguf-publish workflow (...
2026-02-27 Aman Guptaggml-cpu: add repack for mxfp4 (#19738)
2026-02-27 Daniel Beveniusgguf-py : dump version to 0.18.0 (#19950) gguf-v0.18.0
2026-02-27 Pascalserver : support multiple model aliases via comma-separ...
2026-02-27 Jan Patrick... tests : enable test-chat out of tree build (#19558)
2026-02-27 Neo Zhangreplace the magic nunber 768 by max work group size...
2026-02-27 Vishal Singhggml-zendnn: update code for latest ZenDNN API (#19923)
2026-02-26 Adrien Gallouëtggml : fix AMX and add batched support (#19925)
2026-02-26 Ruben Ortlamvulkan: fix fp16 Flash Attention on Windows AMD RDNA2...
2026-02-26 Georgi Gerganovmtmd : fix padding of n_tokens (#19930)
2026-02-26 Georgi Gerganovserver : fix ctx checkpoint restore logic (#19924)
2026-02-26 Georgi Gerganovkv-cache : fix can_shift() check to take into account...
2026-02-26 Aman Guptallama: Add option to merge gate and exp weights (#19139)
2026-02-26 Kevin Pougetggml-virtgpu: improve the reliability of the code ...
2026-02-26 drrrosserver: fix load-on-startup not respected in ini file...
next