]> git.djapps.eu Git - pkg/ggml/sources/ggml/shortlog
pkg/ggml/sources/ggml
2025-11-01 Jeff Bolzvulkan: Handle argsort with a large number of rows...
2025-11-01 Oliver SimonsHide latency of bias and gate-loading (llama/16847)
2025-11-01 Jeff Bolzvulkan: Fuse rope+set_rows (llama/16769)
2025-11-01 Jeff Bolzvulkan: Update topk_moe fusion to handle gpt's late...
2025-11-01 Ruben OrtlamVulkan MMQ Integer Dot Refactor and K-Quant support...
2025-11-01 Max KrasnyanskyHexagon Op queue & dispatch optimizations (llama/16820)
2025-11-01 Aman GuptaCUDA: use fastdiv in set-rows (llama/16834)
2025-11-01 Jeff Bolzvulkan: Call ggml_vk_buffer_write_2d from ggml_vk_buffe...
2025-11-01 Aman GuptaCUDA: Fix bug in topk-moe for gpt-oss (llama/16821)
2025-11-01 YaelLogicsycl: add RMS_NORM_BACK operation support (llama/16808)
2025-11-01 YaelGitAccountcuda: add SET operation support (llama/16804)
2025-11-01 l3utterflyinitialise buffer.device in ggml_hexagon_session (llama...
2025-11-01 Chenguang LiCANN: Improve device ID handling and aclnnArange checks...
2025-11-01 Aman GuptaCUDA: add unused vars to mmvf and mmvq (llama/16807)
2025-11-01 tamarPalsycl: add SSM_CONV operation support (llama/16800)
2025-11-01 Aclyggml : fix interpolate with align-corners and ne=1...
2025-11-01 Johannes GäßlerHIP: fix AMDGPU_TARGETS, update documentation (llama...
2025-11-01 Aman Guptatest-backend-ops: print failed tests at the end (llama...
2025-11-01 tamarPalsycl: add ROLL operation support (llama/16665)
2025-11-01 shani-fsycl: add REPEAT_BACK operation support (llama/16734)
2025-11-01 Aman GuptaCUDA: support for weight clamp in top-k norm (llama...
2025-11-01 Aclyggml-alloc : make gallocr prefer chunks that allow...
2025-11-01 Sigbjørn Skjæretcuda : use fast copy when src and dst are of different...
2025-11-01 leejetggml: fix cuda kernel launch configuration for k_comput...
2025-11-01 Aman GuptaCUDA: General GEMV fusion (llama/16715)
2025-11-01 Gilad S.vulkan: deduplicate Microsoft Direct3D12 devices (llama...
2025-11-01 Giuseppe Scrivanovulkan: delete dead code (llama/16732)
2025-11-01 Jeff Bolzvulkan: Optimize SSM_SCAN (llama/16645)
2025-11-01 leejetggml: fix CUDA grid launch condition for large block_nu...
2025-11-01 Aman GuptaCUDA: use CUB for arbitary size argsort (llama/16754)
2025-11-01 Aman Guptaggml-cuda: use passed ops instead of hardcoded ops...
2025-11-01 Matthew Michelsycl: use async memory allocation to fix crashes during...
2025-11-01 Max KrasnyanskyAdd experimental ggml-hexagon backend for the Hexagon...
2025-11-01 Diego DevesaRevert "ggml : Leverage the existing GGML_F32_VEC helpe...
2025-11-01 sirus20x6ggml : Leverage the existing GGML_F32_VEC helpers to...
2025-11-01 Aman GuptaCUDA: fix bug in topk-moe softmax (llama/16711)
2025-11-01 Aman GuptaCUDA: topk-moe: add optional parameter for gpt-oss...
2025-11-01 Johannes GäßlerCUDA: better error for FA kernel with 0 occupancy ...
2025-10-29 Jeff BolzRewrite simple-backend to use sched and ggml_backend_lo...
2025-10-22 Georgi Gerganovsync : whisper.cpp
2025-10-21 Georgi Gerganovsync : llama.cpp
2025-10-21 Aman Guptaggml: add ggml_can_fuse_subgraph (llama/16662)
2025-10-21 lhezopencl: fix warnings and clean up profiling (llama...
2025-10-21 Jeff Bolzvulkan: Handle FA with all -inf mask values (llama...
2025-10-21 YehuditEsycl : add PAD_REFLECT_D1 operator support (llama/16145)
2025-10-21 Diego Devesaggml-alloc : fix leak when reusing a tensor with a...
2025-10-21 safranowithSYCL: Add support for FLOOR,CEIL,ROUND and TRUNC unary...
2025-10-21 Aaron Teoci : fix binaries release failure for s390x (binaries...
2025-10-21 Johannes GäßlerHIP: fix GPU_TARGETS (llama/16642)
2025-10-21 Jeff Bolzvulkan: Implement topk_moe fused shader, ported from...
2025-10-21 Aman GuptaCUDA: use registers instead of smem in topk-moe (llama...
2025-10-21 Shawn Guopencl: transposed gemm/gemv moe kernel with mxfp4...
2025-10-21 Radoslav Gerganovrpc : report actual free memory (llama/16616)
2025-10-21 Giuseppe Scrivanovulkan: Add State Space Model (SSM) Operations Support...
2025-10-21 muggle-stackggml : fix SpaceMit IME array out-of-bounds in task...
2025-10-21 Jeff Bolzvulkan: fix debug build (add_rms_len/data not found...
2025-10-21 Ilia Ilmermetal : add `CONV_TRANSPOSE_2D` (llama/16542)
2025-10-21 GittyBursteinSYCL SET operator optimized for F32 tensors (llama...
2025-10-21 GittyBursteinsycl : add ARANGE operator (llama/16362)
2025-10-21 Chenguang LiCANN: format code using .clang-format (llama/15863)
2025-10-21 takuya kodamaggml-cpu: replace putenv with setenv for const-correctn...
2025-10-21 yael-worksSYCL: Add GGML_OP_MEAN operator support (llama/16009)
2025-10-21 safranowithcpu : add FLOOR, CEIL, ROUND and TRUNC unary operators...
2025-10-21 lhezopencl: add q8_0 mm support (llama/16469)
2025-10-21 lhezopencl: fix FA for f32 (llama/16584)
2025-10-21 Sam/Samuelmetal: optimise `GGML_OP_SUM` (llama/16559)
2025-10-21 Julius TischbeinCUDA: Changing the CUDA scheduling strategy to spin...
2025-10-21 Georgi Gerganovmetal : avoid using Metal's gpuAddress property (llama...
2025-10-14 Georgi Gerganovsync : llama.cpp upstream/latest upstream/0.9.4.58
2025-10-14 SavicStefanvulkan: Add ACC_TYPE_VEC2 implementation (llama/16203)
2025-10-14 Aman GuptaCUDA + openCL: fix bug in accessing rms_norm->src while...
2025-10-14 Jeff Bolzvulkan: Support FA with K/V in F32 (llama/16543)
2025-10-14 Jeff Bolzvulkan: Improve build time for MSVC (llama/16545)
2025-10-14 Johannes GäßlerCUDA: enable FA for FP32 KV cache (llama/16546)
2025-10-14 Aman GuptaCUDA: use fastdiv + ggml_cuda_mad for mmvf (llama/16557)
2025-10-14 Aman GuptaCUDA: add fp kernel for larger batch size MoE (llama...
2025-10-14 Anav Prasadcuda : remove legacy copy-op pointer indirection code...
2025-10-14 Georgi Gerganovmetal : FA support F32 K and V and head size = 32 ...
2025-10-14 lhezopencl: fix build targeting CL 2 (llama/16554)
2025-10-14 Johannes GäßlerCUDA: fix numerical issues in tile FA kernel (llama...
2025-10-14 Jie Fu (傅杰)ggml : fix build broken with -march=armv9-a on MacOS...
2025-10-14 Chenguang LiCANN: fix CPU memory leak in CANN backend (llama/16549)
2025-10-14 Sam/Samuelmetal: add support for opt_step_sgd (llama/16539)
2025-10-14 Georgi Gerganovggml : fix scalar path for computing norm (llama/16558)
2025-10-14 hipuddingCANN: Update several operators to support FP16 data...
2025-10-14 Sam/Samuelmetal : add opt_step_adamw and op_sum (llama/16529)
2025-10-14 Neo Zhang Jianyufix UT fault cases: count-equal, argsort, pad OPs ...
2025-10-14 sirus20x6ggml : Fix FP16 ELU positive branch (llama/16519)
2025-10-14 sirus20x6ggml: Correct SVE implementation in ggml_vec_dot_f16_un...
2025-10-14 Johannes GäßlerCUDA: faster tile FA, add oob checks, more HSs (llama...
2025-10-12 Georgi Gerganovsync : llama.cpp
2025-10-12 Georgi Gerganovmetal : fix mul-mm condition + fix mul-mv permuted...
2025-10-12 Diego Devesacuda : avoid initializing unused devices (llama/16510)
2025-10-12 Prajwal B Mehendarkarcmake : Dont define XOPENSOURCE on AIX (llama/16481)
2025-10-12 dudutacpu : optimize the ggml NORM operation (llama/15953)
2025-10-12 Chenguang LiCANN: Improve ACL graph matching (llama/16166)
2025-10-12 Charles Xukleidiai: kernel interface refactoring (llama/16460)
2025-10-12 Neo Zhang Jianyurefactor soft_max, add soft_max_back (llama/16472)
2025-10-12 ai-fonsiDisable CUDA host buffers on integrated GPUs (llama...
2025-10-12 Georgi Gerganovmetal : mark FA blocks (llama/16372)
next