]> git.djapps.eu Git - pkg/ggml/sources/ggml/shortlog
pkg/ggml/sources/ggml
2025-11-09 Georgi Gerganovsync : llama.cpp
2025-11-09 Ruben Ortlamvulkan: iGPU memory reporting fix (llama/17110)
2025-11-09 Ruben Ortlamvulkan: fix mmq out of bounds reads (llama/17108)
2025-11-09 Jeff Bolzvulkan: fuse mul_mat_id + mul (llama/17095)
2025-11-09 Georgi Gerganovmetal : retain src and dst buffers during async ops...
2025-11-09 Jeff Bolzvulkan: Use spec constants for conv2d s/d/p and kernel...
2025-11-09 Aman GuptaRevert "CUDA: add expert reduce kernel (#16857)" (llama...
2025-11-09 Aman GuptaCUDA: skip fusion for repeating adds in bias (llama...
2025-11-09 SavicStefanvulkan: Increase BK to 32; use BK/4 for non-CM mul_mm...
2025-11-09 Aleksei Nikiforovggml: disable vxe for cross-compilation by default...
2025-11-09 Jeff Bolzvulkan: fuse rms_norm + mul + rope (+ view + set_rows...
2025-11-09 Jeff Bolzvulkan: Fix test-thread-safety crashes (llama/17024)
2025-11-09 Johannes GäßlerCUDA: fix MMQ stream-k fixup ne1 indices (llama/17089)
2025-11-09 Reese Levineggml webgpu: faster matrix multiplication/matrix-vector...
2025-11-09 bssrdfCUDA: properly handle nb00=nb02 case for cpy (llama...
2025-11-09 Aclyvulkan : refactor buffer handling in vk_op_f32 (llama...
2025-11-09 Johannes GäßlerCUDA: fix should_use_mmvf for ne11 == 1 (llama/17085)
2025-11-09 Adrien GallouëtRevert "ggml-cpu: detect correct cpu flags for arm64...
2025-11-09 ironggml-cpu: detect correct cpu flags for arm64 (#16229...
2025-11-09 xctanggml-cpu : optimize RVV q2_k and q3_k kernels (llama...
2025-11-09 Johannes GäßlerCUDA: fix crash on uneven context without FA (llama...
2025-11-09 Georgi Gerganovmetal : initial Metal4 tensor API support (llama/16634)
2025-11-09 YehuditEsycl: add CONCAT operator support (llama/16047)
2025-11-09 l3utterflyggml-hexagon: graceful fallback for older socs where...
2025-11-09 bssrdfimprove CUDA cpy memory bandwidth when copying transpos...
2025-11-09 Jeff Bolzvulkan: Fix GGML_VULKAN_CHECK_RESULTS to better handle...
2025-11-09 Georgi Gerganovsync : llama.cpp
2025-11-09 Reese Levineggml webgpu: minor set rows optimization (llama/16810)
2025-11-09 Georgi Gerganovsync : llama.cpp
2025-11-09 nullnamerefactor: replace sprintf with snprintf for safer strin...
2025-11-09 Jeff Bolzvulkan: remove the need for the dryrun (llama/16826)
2025-11-09 Aclyggml-cpu : bicubic interpolation (llama/16891)
2025-11-09 NoahFix garbled output with REPACK at high thread counts...
2025-11-09 Aman GuptaCUDA: avoid mul + bias fusion when doing fusion (llama...
2025-11-09 lhezopencl: support imrope (llama/16914)
2025-11-09 theo77186ggml: CUDA: add head size 72 for flash-attn (llama...
2025-11-09 Jinyang Heggml : LoongArch fixes (llama/16958)
2025-11-09 shani-fSYCL: optimized repeat_back kernel (3× fewer asm instru...
2025-11-09 Shagun Beratest-backend-ops : fix segfault in moe-expert-reduce...
2025-11-09 Georgi Gerganovclip : use FA (llama/16837)
2025-11-09 mnehete32CUDA: add FLOOR, CEIL, ROUND, TRUNC unary ops (llama...
2025-11-09 Aaron Teoggml: add s390x cpu-feats (llama/16774)
2025-11-09 Jeff Bolzvulkan: Fix multi_add invalid descriptor usage (llama...
2025-11-09 Jeff Bolzvulkan: fuse mul_mat+add and mul_mat_id+add_id (llama...
2025-11-09 Oliver SimonsCUDA: Remove unneded bias/gate dims in fused mmvq ...
2025-11-09 Johannes GäßlerCUDA: Volta tensor core support for MMF (llama/16843)
2025-11-04 Georgi Gerganovggml : fix conv2d_dw SVE path (#1380)
2025-11-01 Georgi Gerganovsync : llama.cpp
2025-11-01 Aman GuptaCUDA: add expert reduce kernel (llama/16857)
2025-11-01 Jeff Bolzvulkan: disable spirv-opt for rope shaders (llama/16872)
2025-11-01 Masato Nakasakavulkan: Fix crash when FP16 mul_mat accumulation is...
2025-11-01 Ruben Ortlamvulkan: fix shmem overrun in mmq id shader (llama/16873)
2025-11-01 l3utterflyggml-hexagon: respect input size when getting/setting...
2025-11-01 lhezopencl: fix boundary handling for mul_mm (llama/16875)
2025-11-01 Max Krasnyanskycpu: introduce chunking for repack matmuls and enable...
2025-11-01 JJJYmmmmodel: add support for qwen3vl series (llama/16780)
2025-11-01 Max Krasnyanskycpu: introduce chunking for flash attention (llama...
2025-11-01 Sigbjørn Skjæretcuda : fix argsort with 64k+ rows (llama/16849)
2025-11-01 Jeff Bolzvulkan: Handle argsort with a large number of rows...
2025-11-01 Oliver SimonsHide latency of bias and gate-loading (llama/16847)
2025-11-01 Jeff Bolzvulkan: Fuse rope+set_rows (llama/16769)
2025-11-01 Jeff Bolzvulkan: Update topk_moe fusion to handle gpt's late...
2025-11-01 Ruben OrtlamVulkan MMQ Integer Dot Refactor and K-Quant support...
2025-11-01 Max KrasnyanskyHexagon Op queue & dispatch optimizations (llama/16820)
2025-11-01 Aman GuptaCUDA: use fastdiv in set-rows (llama/16834)
2025-11-01 Jeff Bolzvulkan: Call ggml_vk_buffer_write_2d from ggml_vk_buffe...
2025-11-01 Aman GuptaCUDA: Fix bug in topk-moe for gpt-oss (llama/16821)
2025-11-01 YaelLogicsycl: add RMS_NORM_BACK operation support (llama/16808)
2025-11-01 YaelGitAccountcuda: add SET operation support (llama/16804)
2025-11-01 l3utterflyinitialise buffer.device in ggml_hexagon_session (llama...
2025-11-01 Chenguang LiCANN: Improve device ID handling and aclnnArange checks...
2025-11-01 Aman GuptaCUDA: add unused vars to mmvf and mmvq (llama/16807)
2025-11-01 tamarPalsycl: add SSM_CONV operation support (llama/16800)
2025-11-01 Aclyggml : fix interpolate with align-corners and ne=1...
2025-11-01 Johannes GäßlerHIP: fix AMDGPU_TARGETS, update documentation (llama...
2025-11-01 Aman Guptatest-backend-ops: print failed tests at the end (llama...
2025-11-01 tamarPalsycl: add ROLL operation support (llama/16665)
2025-11-01 shani-fsycl: add REPEAT_BACK operation support (llama/16734)
2025-11-01 Aman GuptaCUDA: support for weight clamp in top-k norm (llama...
2025-11-01 Aclyggml-alloc : make gallocr prefer chunks that allow...
2025-11-01 Sigbjørn Skjæretcuda : use fast copy when src and dst are of different...
2025-11-01 leejetggml: fix cuda kernel launch configuration for k_comput...
2025-11-01 Aman GuptaCUDA: General GEMV fusion (llama/16715)
2025-11-01 Gilad S.vulkan: deduplicate Microsoft Direct3D12 devices (llama...
2025-11-01 Giuseppe Scrivanovulkan: delete dead code (llama/16732)
2025-11-01 Jeff Bolzvulkan: Optimize SSM_SCAN (llama/16645)
2025-11-01 leejetggml: fix CUDA grid launch condition for large block_nu...
2025-11-01 Aman GuptaCUDA: use CUB for arbitary size argsort (llama/16754)
2025-11-01 Aman Guptaggml-cuda: use passed ops instead of hardcoded ops...
2025-11-01 Matthew Michelsycl: use async memory allocation to fix crashes during...
2025-11-01 Max KrasnyanskyAdd experimental ggml-hexagon backend for the Hexagon...
2025-11-01 Diego DevesaRevert "ggml : Leverage the existing GGML_F32_VEC helpe...
2025-11-01 sirus20x6ggml : Leverage the existing GGML_F32_VEC helpers to...
2025-11-01 Aman GuptaCUDA: fix bug in topk-moe softmax (llama/16711)
2025-11-01 Aman GuptaCUDA: topk-moe: add optional parameter for gpt-oss...
2025-11-01 Johannes GäßlerCUDA: better error for FA kernel with 0 occupancy ...
2025-10-29 Jeff BolzRewrite simple-backend to use sched and ggml_backend_lo...
2025-10-22 Georgi Gerganovsync : whisper.cpp
2025-10-21 Georgi Gerganovsync : llama.cpp
2025-10-21 Aman Guptaggml: add ggml_can_fuse_subgraph (llama/16662)
next