| 2025-11-01 |
YaelLogic | sycl: add RMS_NORM_BACK operation support (llama/16808) |
commit | commitdiff | tree |
| 2025-11-01 |
YaelGitAccount | cuda: add SET operation support (llama/16804) |
commit | commitdiff | tree |
| 2025-11-01 |
l3utterfly | initialise buffer.device in ggml_hexagon_session (llama... |
commit | commitdiff | tree |
| 2025-11-01 |
Chenguang Li | CANN: Improve device ID handling and aclnnArange checks... |
commit | commitdiff | tree |
| 2025-11-01 |
Aman Gupta | CUDA: add unused vars to mmvf and mmvq (llama/16807) |
commit | commitdiff | tree |
| 2025-11-01 |
tamarPal | sycl: add SSM_CONV operation support (llama/16800) |
commit | commitdiff | tree |
| 2025-11-01 |
Acly | ggml : fix interpolate with align-corners and ne=1... |
commit | commitdiff | tree |
| 2025-11-01 |
Johannes Gäßler | HIP: fix AMDGPU_TARGETS, update documentation (llama... |
commit | commitdiff | tree |
| 2025-11-01 |
Aman Gupta | test-backend-ops: print failed tests at the end (llama... |
commit | commitdiff | tree |
| 2025-11-01 |
tamarPal | sycl: add ROLL operation support (llama/16665) |
commit | commitdiff | tree |
| 2025-11-01 |
shani-f | sycl: add REPEAT_BACK operation support (llama/16734) |
commit | commitdiff | tree |
| 2025-11-01 |
Aman Gupta | CUDA: support for weight clamp in top-k norm (llama... |
commit | commitdiff | tree |
| 2025-11-01 |
Acly | ggml-alloc : make gallocr prefer chunks that allow... |
commit | commitdiff | tree |
| 2025-11-01 |
Sigbjørn Skjæret | cuda : use fast copy when src and dst are of different... |
commit | commitdiff | tree |
| 2025-11-01 |
leejet | ggml: fix cuda kernel launch configuration for k_comput... |
commit | commitdiff | tree |
| 2025-11-01 |
Aman Gupta | CUDA: General GEMV fusion (llama/16715) |
commit | commitdiff | tree |
| 2025-11-01 |
Gilad S. | vulkan: deduplicate Microsoft Direct3D12 devices (llama... |
commit | commitdiff | tree |
| 2025-11-01 |
Giuseppe Scrivano | vulkan: delete dead code (llama/16732) |
commit | commitdiff | tree |
| 2025-11-01 |
Jeff Bolz | vulkan: Optimize SSM_SCAN (llama/16645) |
commit | commitdiff | tree |
| 2025-11-01 |
leejet | ggml: fix CUDA grid launch condition for large block_nu... |
commit | commitdiff | tree |
| 2025-11-01 |
Aman Gupta | CUDA: use CUB for arbitary size argsort (llama/16754) |
commit | commitdiff | tree |
| 2025-11-01 |
Aman Gupta | ggml-cuda: use passed ops instead of hardcoded ops... |
commit | commitdiff | tree |
| 2025-11-01 |
Matthew Michel | sycl: use async memory allocation to fix crashes during... |
commit | commitdiff | tree |
| 2025-11-01 |
Max Krasnyansky | Add experimental ggml-hexagon backend for the Hexagon... |
commit | commitdiff | tree |
| 2025-11-01 |
Diego Devesa | Revert "ggml : Leverage the existing GGML_F32_VEC helpe... |
commit | commitdiff | tree |
| 2025-11-01 |
sirus20x6 | ggml : Leverage the existing GGML_F32_VEC helpers to... |
commit | commitdiff | tree |
| 2025-11-01 |
Aman Gupta | CUDA: fix bug in topk-moe softmax (llama/16711) |
commit | commitdiff | tree |
| 2025-11-01 |
Aman Gupta | CUDA: topk-moe: add optional parameter for gpt-oss... |
commit | commitdiff | tree |
| 2025-11-01 |
Johannes Gäßler | CUDA: better error for FA kernel with 0 occupancy ... |
commit | commitdiff | tree |
| 2025-10-29 |
Jeff Bolz | Rewrite simple-backend to use sched and ggml_backend_lo... |
commit | commitdiff | tree |
| 2025-10-22 |
Georgi Gerganov | sync : whisper.cpp |
commit | commitdiff | tree |
| 2025-10-21 |
Georgi Gerganov | sync : llama.cpp |
commit | commitdiff | tree |
| 2025-10-21 |
Aman Gupta | ggml: add ggml_can_fuse_subgraph (llama/16662) |
commit | commitdiff | tree |
| 2025-10-21 |
lhez | opencl: fix warnings and clean up profiling (llama... |
commit | commitdiff | tree |
| 2025-10-21 |
Jeff Bolz | vulkan: Handle FA with all -inf mask values (llama... |
commit | commitdiff | tree |
| 2025-10-21 |
YehuditE | sycl : add PAD_REFLECT_D1 operator support (llama/16145) |
commit | commitdiff | tree |
| 2025-10-21 |
Diego Devesa | ggml-alloc : fix leak when reusing a tensor with a... |
commit | commitdiff | tree |
| 2025-10-21 |
safranowith | SYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary... |
commit | commitdiff | tree |
| 2025-10-21 |
Aaron Teo | ci : fix binaries release failure for s390x (binaries... |
commit | commitdiff | tree |
| 2025-10-21 |
Johannes Gäßler | HIP: fix GPU_TARGETS (llama/16642) |
commit | commitdiff | tree |
| 2025-10-21 |
Jeff Bolz | vulkan: Implement topk_moe fused shader, ported from... |
commit | commitdiff | tree |
| 2025-10-21 |
Aman Gupta | CUDA: use registers instead of smem in topk-moe (llama... |
commit | commitdiff | tree |
| 2025-10-21 |
Shawn Gu | opencl: transposed gemm/gemv moe kernel with mxfp4... |
commit | commitdiff | tree |
| 2025-10-21 |
Radoslav Gerganov | rpc : report actual free memory (llama/16616) |
commit | commitdiff | tree |
| 2025-10-21 |
Giuseppe Scrivano | vulkan: Add State Space Model (SSM) Operations Support... |
commit | commitdiff | tree |
| 2025-10-21 |
muggle-stack | ggml : fix SpaceMit IME array out-of-bounds in task... |
commit | commitdiff | tree |
| 2025-10-21 |
Jeff Bolz | vulkan: fix debug build (add_rms_len/data not found... |
commit | commitdiff | tree |
| 2025-10-21 |
Ilia Ilmer | metal : add `CONV_TRANSPOSE_2D` (llama/16542) |
commit | commitdiff | tree |
| 2025-10-21 |
GittyBurstein | SYCL SET operator optimized for F32 tensors (llama... |
commit | commitdiff | tree |
| 2025-10-21 |
GittyBurstein | sycl : add ARANGE operator (llama/16362) |
commit | commitdiff | tree |
| 2025-10-21 |
Chenguang Li | CANN: format code using .clang-format (llama/15863) |
commit | commitdiff | tree |
| 2025-10-21 |
takuya kodama | ggml-cpu: replace putenv with setenv for const-correctn... |
commit | commitdiff | tree |
| 2025-10-21 |
yael-works | SYCL: Add GGML_OP_MEAN operator support (llama/16009) |
commit | commitdiff | tree |
| 2025-10-21 |
safranowith | cpu : add FLOOR, CEIL, ROUND and TRUNC unary operators... |
commit | commitdiff | tree |
| 2025-10-21 |
lhez | opencl: add q8_0 mm support (llama/16469) |
commit | commitdiff | tree |
| 2025-10-21 |
lhez | opencl: fix FA for f32 (llama/16584) |
commit | commitdiff | tree |
| 2025-10-21 |
Sam/Samuel | metal: optimise `GGML_OP_SUM` (llama/16559) |
commit | commitdiff | tree |
| 2025-10-21 |
Julius Tischbein | CUDA: Changing the CUDA scheduling strategy to spin... |
commit | commitdiff | tree |
| 2025-10-21 |
Georgi Gerganov | metal : avoid using Metal's gpuAddress property (llama... |
commit | commitdiff | tree |
| 2025-10-14 |
Georgi Gerganov | sync : llama.cpp upstream/latest upstream/0.9.4.58 |
commit | commitdiff | tree |
| 2025-10-14 |
SavicStefan | vulkan: Add ACC_TYPE_VEC2 implementation (llama/16203) |
commit | commitdiff | tree |
| 2025-10-14 |
Aman Gupta | CUDA + openCL: fix bug in accessing rms_norm->src while... |
commit | commitdiff | tree |
| 2025-10-14 |
Jeff Bolz | vulkan: Support FA with K/V in F32 (llama/16543) |
commit | commitdiff | tree |
| 2025-10-14 |
Jeff Bolz | vulkan: Improve build time for MSVC (llama/16545) |
commit | commitdiff | tree |
| 2025-10-14 |
Johannes Gäßler | CUDA: enable FA for FP32 KV cache (llama/16546) |
commit | commitdiff | tree |
| 2025-10-14 |
Aman Gupta | CUDA: use fastdiv + ggml_cuda_mad for mmvf (llama/16557) |
commit | commitdiff | tree |
| 2025-10-14 |
Aman Gupta | CUDA: add fp kernel for larger batch size MoE (llama... |
commit | commitdiff | tree |
| 2025-10-14 |
Anav Prasad | cuda : remove legacy copy-op pointer indirection code... |
commit | commitdiff | tree |
| 2025-10-14 |
Georgi Gerganov | metal : FA support F32 K and V and head size = 32 ... |
commit | commitdiff | tree |
| 2025-10-14 |
lhez | opencl: fix build targeting CL 2 (llama/16554) |
commit | commitdiff | tree |
| 2025-10-14 |
Johannes Gäßler | CUDA: fix numerical issues in tile FA kernel (llama... |
commit | commitdiff | tree |
| 2025-10-14 |
Jie Fu (傅杰) | ggml : fix build broken with -march=armv9-a on MacOS... |
commit | commitdiff | tree |
| 2025-10-14 |
Chenguang Li | CANN: fix CPU memory leak in CANN backend (llama/16549) |
commit | commitdiff | tree |
| 2025-10-14 |
Sam/Samuel | metal: add support for opt_step_sgd (llama/16539) |
commit | commitdiff | tree |
| 2025-10-14 |
Georgi Gerganov | ggml : fix scalar path for computing norm (llama/16558) |
commit | commitdiff | tree |
| 2025-10-14 |
hipudding | CANN: Update several operators to support FP16 data... |
commit | commitdiff | tree |
| 2025-10-14 |
Sam/Samuel | metal : add opt_step_adamw and op_sum (llama/16529) |
commit | commitdiff | tree |
| 2025-10-14 |
Neo Zhang Jianyu | fix UT fault cases: count-equal, argsort, pad OPs ... |
commit | commitdiff | tree |
| 2025-10-14 |
sirus20x6 | ggml : Fix FP16 ELU positive branch (llama/16519) |
commit | commitdiff | tree |
| 2025-10-14 |
sirus20x6 | ggml: Correct SVE implementation in ggml_vec_dot_f16_un... |
commit | commitdiff | tree |
| 2025-10-14 |
Johannes Gäßler | CUDA: faster tile FA, add oob checks, more HSs (llama... |
commit | commitdiff | tree |
| 2025-10-12 |
Georgi Gerganov | sync : llama.cpp |
commit | commitdiff | tree |
| 2025-10-12 |
Georgi Gerganov | metal : fix mul-mm condition + fix mul-mv permuted... |
commit | commitdiff | tree |
| 2025-10-12 |
Diego Devesa | cuda : avoid initializing unused devices (llama/16510) |
commit | commitdiff | tree |
| 2025-10-12 |
Prajwal B Mehendarkar | cmake : Dont define XOPENSOURCE on AIX (llama/16481) |
commit | commitdiff | tree |
| 2025-10-12 |
duduta | cpu : optimize the ggml NORM operation (llama/15953) |
commit | commitdiff | tree |
| 2025-10-12 |
Chenguang Li | CANN: Improve ACL graph matching (llama/16166) |
commit | commitdiff | tree |
| 2025-10-12 |
Charles Xu | kleidiai: kernel interface refactoring (llama/16460) |
commit | commitdiff | tree |
| 2025-10-12 |
Neo Zhang Jianyu | refactor soft_max, add soft_max_back (llama/16472) |
commit | commitdiff | tree |
| 2025-10-12 |
ai-fonsi | Disable CUDA host buffers on integrated GPUs (llama... |
commit | commitdiff | tree |
| 2025-10-12 |
Georgi Gerganov | metal : mark FA blocks (llama/16372) |
commit | commitdiff | tree |
| 2025-10-12 |
Reese Levine | ggml webgpu: profiling, CI updates, reworking of comman... |
commit | commitdiff | tree |
| 2025-10-12 |
Georgi Gerganov | metal : add support for non-padded FA KV (llama/16148) |
commit | commitdiff | tree |
| 2025-10-12 |
Georgi Gerganov | tests : add -INF blocks to the KQ mask in the FA tests... |
commit | commitdiff | tree |
| 2025-10-12 |
Georgi Gerganov | metal : various optimizations + refactoring (llama... |
commit | commitdiff | tree |
| 2025-10-12 |
Georgi Gerganov | ggml : fix unaligned access in AMX code (llama/16315) |
commit | commitdiff | tree |
| 2025-10-12 |
Daniel Bevenius | ggml-cpu : fix leftover handling in ggml_vec_scale_f32... |
commit | commitdiff | tree |
| 2025-10-12 |
Reese Levine | ggml webgpu: actually add softmax, fix rms_norm offset... |
commit | commitdiff | tree |
| 2025-10-12 |
Eve | vulkan: use a more appropriate amount of threads when... |
commit | commitdiff | tree |
| 2025-10-12 |
Radoslav Gerganov | rpc : check src buffer when copying tensor (llama/16421) |
commit | commitdiff | tree |
| next |