| 2026-02-07 |
Georgi Gerganov | metal : add diag (llama/19330) |
commit | commitdiff | tree |
| 2026-02-07 |
Oleksandr Kuvshynov | vulkan: fix GPU deduplication logic. (llama/19222) |
commit | commitdiff | tree |
| 2026-02-07 |
Jeff Bolz | vulkan: Set k_load_shmem to false when K is too large... |
commit | commitdiff | tree |
| 2026-02-07 |
Jeff Bolz | vulkan: fix non-contig rope (llama/19299) |
commit | commitdiff | tree |
| 2026-02-07 |
will-lms | metal : add missing includes (llama/19348) |
commit | commitdiff | tree |
| 2026-02-07 |
Georgi Gerganov | tests : add non-cont, inplace rope tests (llama/19296) |
commit | commitdiff | tree |
| 2026-02-07 |
Kevin Pouget | ggml-virtgpu: make the code thread safe (llama/19204) |
commit | commitdiff | tree |
| 2026-02-07 |
Aman Gupta | ggml-cpu: use LUT for converting e8->f32 scales on... |
commit | commitdiff | tree |
| 2026-02-07 |
Georgi Gerganov | metal : add solve_tri (llama/19302) |
commit | commitdiff | tree |
| 2026-02-07 |
Ruben Ortlam | vulkan: disable coopmat1 fa on Nvidia Turing (llama... |
commit | commitdiff | tree |
| 2026-02-07 |
Aman Gupta | CUDA: use mmvq for mul-mat-id for small batch sizes... |
commit | commitdiff | tree |
| 2026-02-07 |
Georgi Gerganov | metal : minor cleanup (llama/19251) |
commit | commitdiff | tree |
| 2026-02-07 |
Oliver Simons | CUDA: Fix loop unrolling for BW in mul_mat_q_stream_k_f... |
commit | commitdiff | tree |
| 2026-02-07 |
George | ggml: added cleanups in ggml_quantize_free (llama/19278) |
commit | commitdiff | tree |
| 2026-02-07 |
Gaurav Garg | cuda : revert CUDA_SCALE_LAUNCH_QUEUES override until... |
commit | commitdiff | tree |
| 2026-02-07 |
lhez | opencl: refactor some ops, concat, repeat, tanh and... |
commit | commitdiff | tree |
| 2026-02-07 |
Aman Gupta | ggml-cpu: FA split across kv for faster TG (llama/19209) |
commit | commitdiff | tree |
| 2026-02-07 |
Neo Zhang | Remove support for Nvidia & AMD GPU, because the oneAPI... |
commit | commitdiff | tree |
| 2026-02-07 |
Tamar | sycl: implement GGML_OP_TOP_K (llama/19242) |
commit | commitdiff | tree |
| 2026-02-07 |
Georgi Gerganov | metal : support virtual devices (llama/18919) |
commit | commitdiff | tree |
| 2026-02-07 |
Johannes Gäßler | ggml-backend: fix async set/get fallback sync (llama... |
commit | commitdiff | tree |
| 2026-02-07 |
Christian Kastner | docs : Minor cleanups (llama/19252) |
commit | commitdiff | tree |
| 2026-02-07 |
Nikhil Jain | Remove pipeline cache mutexes (llama/19195) |
commit | commitdiff | tree |
| 2026-02-07 |
Max Krasnyansky | Bump cmake max version (needed for Windows on Snapdrago... |
commit | commitdiff | tree |
| 2026-02-07 |
nullname | ggml-hexagon: flash-attention and reduce-sum optimizati... |
commit | commitdiff | tree |
| 2026-02-07 |
shaofeiqi | opencl: add optimized q8_0 mm kernel for adreno (llama... |
commit | commitdiff | tree |
| 2026-02-07 |
Simon Redman | Correctly fetch q8_1 quantize pipeline in test as neede... |
commit | commitdiff | tree |
| 2026-02-07 |
Georgi Gerganov | tests : add GQA=20 FA test (llama/19095) |
commit | commitdiff | tree |
| 2026-02-07 |
Georgi Gerganov | ci : remove "Release" word from the title of the release |
commit | commitdiff | tree |
| 2026-02-07 |
Georgi Gerganov | ggml : bump version to 0.9.6 (#1423) v0.9.6 |
commit | commitdiff | tree |
| 2026-01-30 |
Georgi Gerganov | cmake : remove unused file (#1419) |
commit | commitdiff | tree |
| 2026-01-30 |
Georgi Gerganov | sync : whisper.cpp |
commit | commitdiff | tree |
| 2026-01-30 |
Georgi Gerganov | cuda : fix compile warnings (whisper/0) |
commit | commitdiff | tree |
| 2026-01-30 |
Georgi Gerganov | sync : llama.cpp |
commit | commitdiff | tree |
| 2026-01-30 |
bssrdf | add tensor type checking as part of cuda graph properti... |
commit | commitdiff | tree |
| 2026-01-30 |
s8322 | sycl: implement GGML_UNARY_OP_SOFTPLUS (llama/19114) |
commit | commitdiff | tree |
| 2026-01-30 |
RachelMantel | sycl: implement GGML_OP_TRI (llama/19089) |
commit | commitdiff | tree |
| 2026-01-30 |
Zheyuan Chen | ggml-webgpu: improve flastAttention performance by... |
commit | commitdiff | tree |
| 2026-01-30 |
Todor Boinovski | hexagon: enable offloading to Hexagon on Windows on... |
commit | commitdiff | tree |
| 2026-01-30 |
Georgi Gerganov | cuda : fix nkvo, offload and cuda graph node properties... |
commit | commitdiff | tree |
| 2026-01-30 |
yulo | HIP: add mmf for CDNA (llama/18896) |
commit | commitdiff | tree |
| 2026-01-30 |
Vishal Singh | ggml-zendnn : resolve ZenDNN backend cross-module symbo... |
commit | commitdiff | tree |
| 2026-01-30 |
Aman Gupta | CUDA: refactor topk-moe to enable more models (GLM... |
commit | commitdiff | tree |
| 2026-01-30 |
Neo Zhang | sycl: fix norm kernels: l2_norm, group_norm, rms_norm... |
commit | commitdiff | tree |
| 2026-01-30 |
Ruben Ortlam | Vulkan Flash Attention Coopmat1 Refactor (llama/19075) |
commit | commitdiff | tree |
| 2026-01-30 |
Patryk Kaminski | ggml-sycl: remove unused syclcompat header (llama/19140) |
commit | commitdiff | tree |
| 2026-01-30 |
Oleksandr Kuvshynov | vulkan: handle device dedup on MacOS + Vega II Duo... |
commit | commitdiff | tree |
| 2026-01-30 |
Kevin Pouget | ggml: new backend for Virglrenderer API Remoting accele... |
commit | commitdiff | tree |
| 2026-01-30 |
Alberto Cabrera... | ggml-cpu: arm64: Q4_K scale unroll and vectorization... |
commit | commitdiff | tree |
| 2026-01-30 |
Georgi Gerganov | cuda : fix "V is K view" check for non-unified KV cache... |
commit | commitdiff | tree |
| 2026-01-30 |
Georgi Gerganov | CUDA: tune GLM 4.7 Flash FA kernel selection logic... |
commit | commitdiff | tree |
| 2026-01-30 |
Nikhil Jain | ggml webgpu: Split shared state (webgpu_context) into... |
commit | commitdiff | tree |
| 2026-01-30 |
Vishal Singh | ggml-zendnn : update ZenDNN git tag to main branch... |
commit | commitdiff | tree |
| 2026-01-30 |
Johannes Gäßler | CUDA: tune GLM 4.7 Flash FA kernel selection logic... |
commit | commitdiff | tree |
| 2026-01-30 |
Alberto Cabrera... | ggml-cpu: aarm64: q6_K repack gemm and gemv (and generi... |
commit | commitdiff | tree |
| 2026-01-30 |
Gaurav Garg | Reduce CPU-side stalls due to the CUDA command buffer... |
commit | commitdiff | tree |
| 2026-01-30 |
shalinib-ibm | ggml-cpu: Enable FP16 MMA kernels on PPC (llama/19060) |
commit | commitdiff | tree |
| 2026-01-30 |
lhez | opencl: add flattened q6_K mv (llama/19054) |
commit | commitdiff | tree |
| 2026-01-30 |
Georgi Gerganov | sync : llama.cpp |
commit | commitdiff | tree |
| 2026-01-30 |
Johannes Gäßler | CUDA: fix padding of GQA to power of 2 in FA (llama... |
commit | commitdiff | tree |
| 2026-01-30 |
Johannes Gäßler | CUDA: faster FA for GQA > 1 but not power of 2 (llama... |
commit | commitdiff | tree |
| 2026-01-30 |
ccbinn | metal : fix recommendedMaxWorkingSetSize availability... |
commit | commitdiff | tree |
| 2026-01-30 |
Aman Gupta | ggml-cpu: Use tiled FA for prompt-processing (llama... |
commit | commitdiff | tree |
| 2026-01-30 |
Georgi Gerganov | kv-cache : support V-less cache (llama/19067) |
commit | commitdiff | tree |
| 2026-01-30 |
Johannes Gäßler | CUDA: re-use MLA K data for V in MMA FA (llama/19057) |
commit | commitdiff | tree |
| 2026-01-30 |
Aman Gupta | ggml-cuda: enable cuda-graphs for `n-cpu-moe` (llama... |
commit | commitdiff | tree |
| 2026-01-30 |
nullname | ggml-hexagon: flash-attn opt (llama/19025) |
commit | commitdiff | tree |
| 2026-01-30 |
Neo Zhang | use malloc to support both iGPU and dGPU in same time... |
commit | commitdiff | tree |
| 2026-01-30 |
Alberto Cabrera... | ggml-cpu: aarm64: q5_K repack gemm and gemv (and generi... |
commit | commitdiff | tree |
| 2026-01-30 |
Georgi Gerganov | mla : make the V tensor a view of K (llama/18986) |
commit | commitdiff | tree |
| 2026-01-30 |
Johannes Gäßler | CUDA: fix alignment check for FA (llama/19023) |
commit | commitdiff | tree |
| 2026-01-30 |
lhez | opencl: enable the general fp mm for non-cont input... |
commit | commitdiff | tree |
| 2026-01-30 |
Aman Gupta | CUDA: add gqa_ratio 4 for GLM 4.7 flash (llama/18953) |
commit | commitdiff | tree |
| 2026-01-30 |
shaofeiqi | opencl: add TRI op support (llama/18979) |
commit | commitdiff | tree |
| 2026-01-30 |
Aleksei Nikiforov | ggml-zdnn : mark zDNN buffers as non-host (llama/18967) |
commit | commitdiff | tree |
| 2026-01-30 |
Jeff Bolz | vulkan: Remove transfer_ctx, do everything in compute_c... |
commit | commitdiff | tree |
| 2026-01-30 |
Jeff Bolz | vulkan: support flash attention GQA/split_k with small... |
commit | commitdiff | tree |
| 2026-01-30 |
Masato Nakasaka | Revert "vulkan: force full subgroups for flash attentio... |
commit | commitdiff | tree |
| 2026-01-30 |
Jeff Bolz | vulkan: Use mul_mat_vec_id for small values of n (llama... |
commit | commitdiff | tree |
| 2026-01-30 |
Oliver Simons | CUDA: Fix builds for older CCCL versions by ifdefing... |
commit | commitdiff | tree |
| 2026-01-30 |
Oliver Simons | CUDA: Replace init_offsets kernel with iterators in... |
commit | commitdiff | tree |
| 2026-01-30 |
Adrien Gallouët | ggml : cleanup path_str() (llama/18928) |
commit | commitdiff | tree |
| 2026-01-30 |
Georgi Gerganov | metal : enable FA for MLA heads (llama/18950) |
commit | commitdiff | tree |
| 2026-01-30 |
Georgi Gerganov | ggml : add ggml_build_forward_select (llama/18550) |
commit | commitdiff | tree |
| 2026-01-30 |
lhez | opencl: fix q6_K mv for m=1 (llama/18893) |
commit | commitdiff | tree |
| 2026-01-30 |
Georgi Gerganov | sync : llama.cpp |
commit | commitdiff | tree |
| 2026-01-30 |
Reese Levine | ggml webgpu: support for backend sampling (llama/18880) |
commit | commitdiff | tree |
| 2026-01-30 |
Georgi Gerganov | sync : llama.cpp |
commit | commitdiff | tree |
| 2026-01-30 |
Thore Koritzius | ggml : extend ggml_pool_1d + metal (llama/16429) |
commit | commitdiff | tree |
| 2026-01-30 |
Perry Naseck | ggml-blas: hide warnings from included BLAS headers... |
commit | commitdiff | tree |
| 2026-01-30 |
Raul Torres | CANN: Remove unused `ggml_cann_get_device` function... |
commit | commitdiff | tree |
| 2026-01-30 |
Chenguang Li | CANN: fix an issue where get_env was not fully renamed... |
commit | commitdiff | tree |
| 2026-01-30 |
hipudding | CANN: support gated linear attn (llama/18653) |
commit | commitdiff | tree |
| 2026-01-30 |
shaofeiqi | OpenCL: add SOLVE_TRI op support (llama/18846) |
commit | commitdiff | tree |
| 2026-01-30 |
Georgi Gerganov | cuda : print less debug logs when disabling cuda graphs... |
commit | commitdiff | tree |
| 2026-01-30 |
Johannes Gäßler | CUDA: fix allignment on register spill for FA (llama... |
commit | commitdiff | tree |
| 2026-01-30 |
shalinib-ibm | ggml-cpu: optimize ggml_vec_dot_bf16 for Power9 (llama... |
commit | commitdiff | tree |
| 2026-01-30 |
Georgi Gerganov | sync : llama.cpp |
commit | commitdiff | tree |
| 2026-01-30 |
Max Krasnyansky | hexagon: support for OP_CPY, host buffers now optional... |
commit | commitdiff | tree |
| 2026-01-30 |
Georgi Gerganov | sync : llama.cpp |
commit | commitdiff | tree |
| next |