]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/shortlog
pkg/ggml/sources/llama.cpp
2026-01-30 Daniel Beveniusmemory : clarify comments for r_l and s_l tensors ...
2026-01-30 Georgi Gerganovtests : add GQA=20 FA test (#19095)
2026-01-30 Daniel Beveniusconvert : add missing return statement for GraniteMoeMo...
2026-01-30 Daniel Beveniusmemory : remove unused tmp_buf (#19199)
2026-01-30 Antonis Makropoulosdocs: Add LlamaLib to UI projects (#19181)
2026-01-30 bssrdfadd tensor type checking as part of cuda graph properti...
2026-01-30 s8322sycl: implement GGML_UNARY_OP_SOFTPLUS (#19114)
2026-01-30 RachelMantelsycl: implement GGML_OP_TRI (#19089)
2026-01-30 DDXDBFix typos in SYCL documentation (#19162)
2026-01-29 Zheyuan Chenggml-webgpu: improve flastAttention performance by...
2026-01-29 Todor Boinovskihexagon: enable offloading to Hexagon on Windows on...
2026-01-29 Georgi Gerganovcuda : fix nkvo, offload and cuda graph node properties...
2026-01-29 Aldehir Rojaschat : add parsing for solar-open-100b (#18540)
2026-01-29 Andrew Marshallwebui: Update Svelte to fix effect_update_depth_exceede...
2026-01-29 Sigbjørn Skjæretjinja : do not pass empty tools and add some none filte...
2026-01-29 yuloHIP: add mmf for CDNA (#18896)
2026-01-29 Georgi Gerganovarg : add -kvu to llama-batched-bench (#19172)
2026-01-29 Vishal Singhggml-zendnn : resolve ZenDNN backend cross-module symbo...
2026-01-29 Aman GuptaCUDA: refactor topk-moe to enable more models (GLM...
2026-01-29 Neo Zhangsycl: fix norm kernels: l2_norm, group_norm, rms_norm...
2026-01-28 Sigbjørn Skjæretci : find latest release with asset for winget (#19161)
2026-01-28 Ruben OrtlamVulkan Flash Attention Coopmat1 Refactor (#19075)
2026-01-28 Sascha Rogmannspec : add self‑speculative decoding (no draft model...
2026-01-28 Daniel Beveniusconvert : yield Mamba2Model/GraniteMoeModel modify_tens...
2026-01-28 Patryk Kaminskiggml-sycl: remove unused syclcompat header (#19140)
2026-01-28 Sigbjørn Skjæretjinja : undefined should be treated as sequence/iterabl...
2026-01-28 Oleksandr Kuvshynovvulkan: handle device dedup on MacOS + Vega II Duo...
2026-01-28 Ben Chendoc: add build instruction to use Vulkan backend on...
2026-01-28 Kevin Pougetggml: new backend for Virglrenderer API Remoting accele...
2026-01-28 Alberto Cabrera... ggml-cpu: arm64: Q4_K scale unroll and vectorization...
2026-01-28 Georgi Gerganovcuda : fix "V is K view" check for non-unified KV cache...
2026-01-28 Georgi GerganovCUDA: tune GLM 4.7 Flash FA kernel selection logic...
2026-01-28 Georgi Gerganovserver : adjust spec tests to generate up to 16 tokens...
2026-01-28 Georgi Gerganovllama : disable Direct IO by default (#19109)
2026-01-28 Daniel Beveniussampling : remove sampling branching in output_reserve...
2026-01-28 Nikhil Jainggml webgpu: Split shared state (webgpu_context) into...
2026-01-27 Vishal Singhggml-zendnn : update ZenDNN git tag to main branch...
2026-01-27 Sigbjørn Skjæretjinja : implement mixed type object keys (#18955)
2026-01-27 David Limadocs: Remove duplicated word on CUDA build section...
2026-01-27 Johannes GäßlerCUDA: tune GLM 4.7 Flash FA kernel selection logic...
2026-01-27 Sigbjørn Skjæretci : revert slim runner for winget (#19129)
2026-01-27 Alberto Cabrera... ggml-cpu: aarm64: q6_K repack gemm and gemv (and generi...
2026-01-27 Gaurav Garg[CUDA] Reduce CPU-side stalls due to the CUDA command...
2026-01-27 Daniel Beveniuscommon : clarify HTTPS build options in error message...
2026-01-27 shalinib-ibmggml-cpu: Enable FP16 MMA kernels on PPC (#19060)
2026-01-27 lhezopencl: add flattened q6_K mv (#19054)
2026-01-26 Johannes GäßlerCUDA: fix padding of GQA to power of 2 in FA (#19115)
2026-01-26 Georgi Gerganovgraph : fix nkvo offload with FA (#19105)
2026-01-26 Sigbjørn Skjæretci : use new 1vCPU runner for lightweight jobs (#19107)
2026-01-26 Georgi Gerganovmodel : add correct type for GLM 4.7 Flash (#19106)
2026-01-25 Johannes GäßlerCUDA: faster FA for GQA > 1 but not power of 2 (#19092)
2026-01-25 ccbinnmetal : fix recommendedMaxWorkingSetSize availability...
2026-01-25 Sigbjørn Skjæretconvert : yield Gemma3N custom_map tensors directly...
2026-01-25 Aman Guptaggml-cpu: Use tiled FA for prompt-processing (#19012)
2026-01-25 Georgi Gerganovkv-cache : support V-less cache (#19067)
2026-01-25 Sigbjørn Skjæretconvert : fix Gemma3N, GraniteMoe and Ernie4.5Moe ...
2026-01-25 Georgi Gerganovcompletion : fix prompt cache for recurrent models...
2026-01-25 Molly Sophiareadme: update RWKV7 model links (#19061)
2026-01-25 Jakkala Maheshllama: fix integer type consistency in split helpers...
2026-01-25 Daniel Beveniuscommon : use two decimal places for float arg help...
2026-01-25 Bartowskiconvert : fix conversion for inheriting models that...
2026-01-24 Johannes Gäßlerllama-fit-params: keep explicit --ctx-size 0 (#19070)
2026-01-24 Johannes GäßlerGGUF: check that tensor size is representable (#19072)
2026-01-24 Xuan-Son Nguyenchat: fix language input for translategemma (#19052)
2026-01-24 Johannes GäßlerCUDA: re-use MLA K data for V in MMA FA (#19057)
2026-01-24 Aman Guptaggml-cuda: enable cuda-graphs for `n-cpu-moe` (#18934)
2026-01-24 nullnameggml-hexagon: flash-attn opt (#19025)
2026-01-23 Georgi Gerganovgraph : utilize `ggml_build_forward_select()` to avoid...
2026-01-23 Neo Zhang[SYCL] use malloc to support both iGPU and dGPU in...
2026-01-23 Xuan-Son Nguyenchat : fix translategemma crash on common_chat_format_e...
2026-01-23 Daniel Beveniusmodel-conversion : use BUILD_DIR variable in all script...
2026-01-23 Alberto Cabrera... ggml-cpu: aarm64: q5_K repack gemm and gemv (and generi...
2026-01-23 Aldehir Rojascli : load parser definition (#19031)
2026-01-22 Xuan-Son Nguyenserver : support preserving reasoning_content in assist...
2026-01-22 Georgi Gerganovmla : make the V tensor a view of K (#18986)
2026-01-22 Johannes GäßlerCUDA: fix alignment check for FA (#19023)
2026-01-22 Aman Guptaconvert_hf_to_gguf.py: refactor modify_tensors to call...
2026-01-22 lhezopencl: enable the general fp mm for non-cont input...
2026-01-22 Xuan-Son Nguyenserver: do not log certain endpoints (avoid log spam...
2026-01-22 Georgi Gerganovquant : manual overrides of tensor types take precedenc...
2026-01-22 Aaron Teorelease: update github api (#19022)
2026-01-22 Xuan-Son Nguyenmtmd : update docs to use llama_model_n_embd_inp (...
2026-01-22 손희준server: Reorder methods in `server-task.cpp` (#19016)
2026-01-22 Aman GuptaCUDA: add gqa_ratio 4 for GLM 4.7 flash (#18953)
2026-01-22 shaofeiqiopencl: add TRI op support (#18979)
2026-01-22 Aleksei Nikiforovggml-zdnn : mark zDNN buffers as non-host (#18967)
2026-01-21 Pádraic Slatteryci : update GitHub Actions versions [no ci] (#18935)
2026-01-21 Mariusz Woloszynconvert : add Devstral-2 (Ministral3ForCausalLM) arch...
2026-01-21 Piotr Wilkin... jinja: support none|string (#18995)
2026-01-21 Hendrik Erzfix: Use `tabular-nums` for chat message statistics...
2026-01-21 Daniel Beveniusllama : clarify nemotron-h.cpp comment about RoPE ...
2026-01-21 Jeff Bolzvulkan: Remove transfer_ctx, do everything in compute_c...
2026-01-21 Adrien Gallouëtcommon : improve error message when HTTPS is missing...
2026-01-21 손희준server: /v1/responses (partial) (#18486)
2026-01-21 Jeff Bolzvulkan: support flash attention GQA/split_k with small...
2026-01-21 Masato NakasakaRevert "vulkan: force full subgroups for flash attentio...
2026-01-21 Jeff Bolzvulkan: Use mul_mat_vec_id for small values of n (...
2026-01-21 Tarek Dakhranmemory : add llama_memory_hybrid_iswa (#18601)
2026-01-21 Piotr Wilkin... Fix GLM 4.7 Lite MoE gating func (#18980)
2026-01-21 Matthieu Coudrongguf: display strerrno when cant load a model (#18884)
next