| 2025-11-01 |
Aman Gupta | CUDA: topk-moe: add optional parameter for gpt-oss... |
commit | commitdiff | tree |
| 2025-11-01 |
Johannes Gäßler | CUDA: better error for FA kernel with 0 occupancy ... |
commit | commitdiff | tree |
| 2025-10-29 |
Jeff Bolz | Rewrite simple-backend to use sched and ggml_backend_lo... |
commit | commitdiff | tree |
| 2025-10-22 |
Georgi Gerganov | sync : whisper.cpp |
commit | commitdiff | tree |
| 2025-10-21 |
Georgi Gerganov | sync : llama.cpp |
commit | commitdiff | tree |
| 2025-10-21 |
Aman Gupta | ggml: add ggml_can_fuse_subgraph (llama/16662) |
commit | commitdiff | tree |
| 2025-10-21 |
lhez | opencl: fix warnings and clean up profiling (llama... |
commit | commitdiff | tree |
| 2025-10-21 |
Jeff Bolz | vulkan: Handle FA with all -inf mask values (llama... |
commit | commitdiff | tree |
| 2025-10-21 |
YehuditE | sycl : add PAD_REFLECT_D1 operator support (llama/16145) |
commit | commitdiff | tree |
| 2025-10-21 |
Diego Devesa | ggml-alloc : fix leak when reusing a tensor with a... |
commit | commitdiff | tree |
| 2025-10-21 |
safranowith | SYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary... |
commit | commitdiff | tree |
| 2025-10-21 |
Aaron Teo | ci : fix binaries release failure for s390x (binaries... |
commit | commitdiff | tree |
| 2025-10-21 |
Johannes Gäßler | HIP: fix GPU_TARGETS (llama/16642) |
commit | commitdiff | tree |
| 2025-10-21 |
Jeff Bolz | vulkan: Implement topk_moe fused shader, ported from... |
commit | commitdiff | tree |
| 2025-10-21 |
Aman Gupta | CUDA: use registers instead of smem in topk-moe (llama... |
commit | commitdiff | tree |
| 2025-10-21 |
Shawn Gu | opencl: transposed gemm/gemv moe kernel with mxfp4... |
commit | commitdiff | tree |
| 2025-10-21 |
Radoslav Gerganov | rpc : report actual free memory (llama/16616) |
commit | commitdiff | tree |
| 2025-10-21 |
Giuseppe Scrivano | vulkan: Add State Space Model (SSM) Operations Support... |
commit | commitdiff | tree |
| 2025-10-21 |
muggle-stack | ggml : fix SpaceMit IME array out-of-bounds in task... |
commit | commitdiff | tree |
| 2025-10-21 |
Jeff Bolz | vulkan: fix debug build (add_rms_len/data not found... |
commit | commitdiff | tree |
| 2025-10-21 |
Ilia Ilmer | metal : add `CONV_TRANSPOSE_2D` (llama/16542) |
commit | commitdiff | tree |
| 2025-10-21 |
GittyBurstein | SYCL SET operator optimized for F32 tensors (llama... |
commit | commitdiff | tree |
| 2025-10-21 |
GittyBurstein | sycl : add ARANGE operator (llama/16362) |
commit | commitdiff | tree |
| 2025-10-21 |
Chenguang Li | CANN: format code using .clang-format (llama/15863) |
commit | commitdiff | tree |
| 2025-10-21 |
takuya kodama | ggml-cpu: replace putenv with setenv for const-correctn... |
commit | commitdiff | tree |
| 2025-10-21 |
yael-works | SYCL: Add GGML_OP_MEAN operator support (llama/16009) |
commit | commitdiff | tree |
| 2025-10-21 |
safranowith | cpu : add FLOOR, CEIL, ROUND and TRUNC unary operators... |
commit | commitdiff | tree |
| 2025-10-21 |
lhez | opencl: add q8_0 mm support (llama/16469) |
commit | commitdiff | tree |
| 2025-10-21 |
lhez | opencl: fix FA for f32 (llama/16584) |
commit | commitdiff | tree |
| 2025-10-21 |
Sam/Samuel | metal: optimise `GGML_OP_SUM` (llama/16559) |
commit | commitdiff | tree |
| 2025-10-21 |
Julius Tischbein | CUDA: Changing the CUDA scheduling strategy to spin... |
commit | commitdiff | tree |
| 2025-10-21 |
Georgi Gerganov | metal : avoid using Metal's gpuAddress property (llama... |
commit | commitdiff | tree |
| 2025-10-14 |
Georgi Gerganov | sync : llama.cpp upstream/latest upstream/0.9.4.58 |
commit | commitdiff | tree |
| 2025-10-14 |
SavicStefan | vulkan: Add ACC_TYPE_VEC2 implementation (llama/16203) |
commit | commitdiff | tree |
| 2025-10-14 |
Aman Gupta | CUDA + openCL: fix bug in accessing rms_norm->src while... |
commit | commitdiff | tree |
| 2025-10-14 |
Jeff Bolz | vulkan: Support FA with K/V in F32 (llama/16543) |
commit | commitdiff | tree |
| 2025-10-14 |
Jeff Bolz | vulkan: Improve build time for MSVC (llama/16545) |
commit | commitdiff | tree |
| 2025-10-14 |
Johannes Gäßler | CUDA: enable FA for FP32 KV cache (llama/16546) |
commit | commitdiff | tree |
| 2025-10-14 |
Aman Gupta | CUDA: use fastdiv + ggml_cuda_mad for mmvf (llama/16557) |
commit | commitdiff | tree |
| 2025-10-14 |
Aman Gupta | CUDA: add fp kernel for larger batch size MoE (llama... |
commit | commitdiff | tree |
| 2025-10-14 |
Anav Prasad | cuda : remove legacy copy-op pointer indirection code... |
commit | commitdiff | tree |
| 2025-10-14 |
Georgi Gerganov | metal : FA support F32 K and V and head size = 32 ... |
commit | commitdiff | tree |
| 2025-10-14 |
lhez | opencl: fix build targeting CL 2 (llama/16554) |
commit | commitdiff | tree |
| 2025-10-14 |
Johannes Gäßler | CUDA: fix numerical issues in tile FA kernel (llama... |
commit | commitdiff | tree |
| 2025-10-14 |
Jie Fu (傅杰) | ggml : fix build broken with -march=armv9-a on MacOS... |
commit | commitdiff | tree |
| 2025-10-14 |
Chenguang Li | CANN: fix CPU memory leak in CANN backend (llama/16549) |
commit | commitdiff | tree |
| 2025-10-14 |
Sam/Samuel | metal: add support for opt_step_sgd (llama/16539) |
commit | commitdiff | tree |
| 2025-10-14 |
Georgi Gerganov | ggml : fix scalar path for computing norm (llama/16558) |
commit | commitdiff | tree |
| 2025-10-14 |
hipudding | CANN: Update several operators to support FP16 data... |
commit | commitdiff | tree |
| 2025-10-14 |
Sam/Samuel | metal : add opt_step_adamw and op_sum (llama/16529) |
commit | commitdiff | tree |
| 2025-10-14 |
Neo Zhang Jianyu | fix UT fault cases: count-equal, argsort, pad OPs ... |
commit | commitdiff | tree |
| 2025-10-14 |
sirus20x6 | ggml : Fix FP16 ELU positive branch (llama/16519) |
commit | commitdiff | tree |
| 2025-10-14 |
sirus20x6 | ggml: Correct SVE implementation in ggml_vec_dot_f16_un... |
commit | commitdiff | tree |
| 2025-10-14 |
Johannes Gäßler | CUDA: faster tile FA, add oob checks, more HSs (llama... |
commit | commitdiff | tree |
| 2025-10-12 |
Georgi Gerganov | sync : llama.cpp |
commit | commitdiff | tree |
| 2025-10-12 |
Georgi Gerganov | metal : fix mul-mm condition + fix mul-mv permuted... |
commit | commitdiff | tree |
| 2025-10-12 |
Diego Devesa | cuda : avoid initializing unused devices (llama/16510) |
commit | commitdiff | tree |
| 2025-10-12 |
Prajwal B Mehendarkar | cmake : Dont define XOPENSOURCE on AIX (llama/16481) |
commit | commitdiff | tree |
| 2025-10-12 |
duduta | cpu : optimize the ggml NORM operation (llama/15953) |
commit | commitdiff | tree |
| 2025-10-12 |
Chenguang Li | CANN: Improve ACL graph matching (llama/16166) |
commit | commitdiff | tree |
| 2025-10-12 |
Charles Xu | kleidiai: kernel interface refactoring (llama/16460) |
commit | commitdiff | tree |
| 2025-10-12 |
Neo Zhang Jianyu | refactor soft_max, add soft_max_back (llama/16472) |
commit | commitdiff | tree |
| 2025-10-12 |
ai-fonsi | Disable CUDA host buffers on integrated GPUs (llama... |
commit | commitdiff | tree |
| 2025-10-12 |
Georgi Gerganov | metal : mark FA blocks (llama/16372) |
commit | commitdiff | tree |
| 2025-10-12 |
Reese Levine | ggml webgpu: profiling, CI updates, reworking of comman... |
commit | commitdiff | tree |
| 2025-10-12 |
Georgi Gerganov | metal : add support for non-padded FA KV (llama/16148) |
commit | commitdiff | tree |
| 2025-10-12 |
Georgi Gerganov | tests : add -INF blocks to the KQ mask in the FA tests... |
commit | commitdiff | tree |
| 2025-10-12 |
Georgi Gerganov | metal : various optimizations + refactoring (llama... |
commit | commitdiff | tree |
| 2025-10-12 |
Georgi Gerganov | ggml : fix unaligned access in AMX code (llama/16315) |
commit | commitdiff | tree |
| 2025-10-12 |
Daniel Bevenius | ggml-cpu : fix leftover handling in ggml_vec_scale_f32... |
commit | commitdiff | tree |
| 2025-10-12 |
Reese Levine | ggml webgpu: actually add softmax, fix rms_norm offset... |
commit | commitdiff | tree |
| 2025-10-12 |
Eve | vulkan: use a more appropriate amount of threads when... |
commit | commitdiff | tree |
| 2025-10-12 |
Radoslav Gerganov | rpc : check src buffer when copying tensor (llama/16421) |
commit | commitdiff | tree |
| 2025-10-12 |
Radoslav Gerganov | rpc : add support for multiple devices (llama/16276) |
commit | commitdiff | tree |
| 2025-10-12 |
Georgi Gerganov | sync : llama.cpp |
commit | commitdiff | tree |
| 2025-10-12 |
Acly | vulkan : incremental shader builds (llama/16341) |
commit | commitdiff | tree |
| 2025-10-12 |
Georgi Gerganov | sync : llama.cpp |
commit | commitdiff | tree |
| 2025-10-12 |
Georgi Gerganov | metal : fix loop bound in ggml_mem_ranges (llama/16412) |
commit | commitdiff | tree |
| 2025-10-12 |
Acly | ggml : fix graph reallocation with multiple chunks... |
commit | commitdiff | tree |
| 2025-10-12 |
Jeff Bolz | vulkan: Replace uses of maxMemoryAllocationSize and... |
commit | commitdiff | tree |
| 2025-10-12 |
Jeff Bolz | vulkan: Fix FA coopmat1 invalid array indexing (llama... |
commit | commitdiff | tree |
| 2025-10-12 |
Jeff Bolz | vulkan: in flash attention, bounds check against nem1... |
commit | commitdiff | tree |
| 2025-10-12 |
Reese Levine | ggml webgpu: add support for soft_max, optimize rms_nor... |
commit | commitdiff | tree |
| 2025-10-12 |
Piotr Wilkin... | model : Apertus model implementation (llama/15852) |
commit | commitdiff | tree |
| 2025-10-12 |
R0CKSTAR | musa: update compile flags (llama/16265) |
commit | commitdiff | tree |
| 2025-10-12 |
uvos | HIP: Disable ROCWMMA fattn on CDNA when compiled agains... |
commit | commitdiff | tree |
| 2025-10-12 |
Eve | vulkan: make ggml_vk_default_dispatcher support older... |
commit | commitdiff | tree |
| 2025-10-12 |
lhez | opencl: support pad_ext (llama/15888) |
commit | commitdiff | tree |
| 2025-10-12 |
Reese Levine | ggml webgpu: support for rope,div,sub,glu,scale,cont... |
commit | commitdiff | tree |
| 2025-10-12 |
lhez | opencl: support ne3 in get_rows (llama/15866) |
commit | commitdiff | tree |
| 2025-09-30 |
Georgi Gerganov | ggml : bump version to 0.9.4 (#1363) upstream/0.9.4 v0.9.4 |
commit | commitdiff | tree |
| 2025-09-30 |
Georgi Gerganov | sync : whisper.cpp [no ci] |
commit | commitdiff | tree |
| 2025-09-30 |
Georgi Gerganov | sync : llama.cpp |
commit | commitdiff | tree |
| 2025-09-30 |
anavp-nvidia | cuda : Enable CUDA Graph usage for Nemotron Nano v2... |
commit | commitdiff | tree |
| 2025-09-30 |
Georgi Gerganov | metal : dynamic simdgroups for MV kernels (llama/16340) |
commit | commitdiff | tree |
| 2025-09-30 |
Charles Xu | kleidiai : fix work size and threads sync for fp16... |
commit | commitdiff | tree |
| 2025-09-30 |
Jeff Bolz | tests: override test_set_rows::max_nmse_err to allow... |
commit | commitdiff | tree |
| 2025-09-29 |
Georgi Gerganov | sync : llama.cpp |
commit | commitdiff | tree |
| 2025-09-29 |
alex-spacemit | ggml: riscv: add riscv spacemit backend (llama/15288) |
commit | commitdiff | tree |
| 2025-09-29 |
Rafal Lewczuk | ggml-backend : add root cause in error message if loadi... |
commit | commitdiff | tree |
| next |