]> git.djapps.eu Git - pkg/ggml/sources/ggml/shortlog
pkg/ggml/sources/ggml
2026-03-15 Johannes GäßlerCUDA: limit number of FA stream-k CUDA blocks (llama...
2026-03-15 Pascalggml: avoid creating CUDA context during device init...
2026-03-15 MoonShadowggml/hip: fix APU compatibility - soft error handling...
2026-03-15 Bartowskiggml : guard against sumq2 being 0 in IQ4_NL (llama...
2026-03-15 PikaPikachucuda : add RDNA4-specific MMVQ parameter table for...
2026-03-15 Ruben Ortlamvulkan: use graphics queue on AMD (llama/20551)
2026-03-15 Georgi Gerganovmetal : add FA specialization for HSK = 320, HSV =...
2026-03-15 Max Krasnyanskyhexagon: Q4_0 and MXFP4 repack fixes (llama/20527)
2026-03-15 Neo Zhangadd op gated_delta_net (llama/20455)
2026-03-15 Adrien Gallouëtggml : add native AVX512-FP16 support for F16 operation...
2026-03-15 WallentriUse fp32 in cuBLAS V100 to avoid overflows, env variabl...
2026-03-15 Zijun Yuggml : add OpenVINO backend (llama/15307)
2026-03-15 Rail ChabdarovFix data race in CUDA's "cpy" kernel (influences GGML...
2026-03-15 lhezopencl: fix l2_norm (llama/20480)
2026-03-15 Georgi Gerganovgraph : remove redundant GDN state transposes (llama...
2026-03-15 rehan-10xengineerggml-cpu: add RVV vec dot kernels for quantization...
2026-03-15 Adrien Gallouëtggml : fix typo gmml (llama/20512)
2026-03-15 Georgi Gerganovmetal : fix l2 norm scale (llama/20493)
2026-03-15 Georgi Gerganovllama : disable graph reuse with pipeline parallelism...
2026-03-15 Ruben Ortlamtest-backend-ops: allow loading tests from file and...
2026-03-15 ProgenyAlphavulkan: add GATED_DELTA_NET op support (llama/20334)
2026-03-15 ProgenyAlphavulkan: fix SSM_CONV PP scaling with large ubatch sizes...
2026-03-15 Georgi Gerganovsync : llama.cpp
2026-03-15 Georgi Gerganovmetal : avoid divisions in bin kernel (llama/20426)
2026-03-15 Georgi Gerganovsync : llama.cpp
2026-03-15 Jeff Bolzvulkan: fix l2_norm epsilon handling (llama/20350)
2026-03-15 Jeff Bolzvulkan: fix OOB check in flash_attn_mask_opt (llama...
2026-03-15 Masato Nakasakavulkan: Fix ErrorOutOfHostMemory on Intel GPU when...
2026-03-15 lhezopencl: use larger workgroup size for get_rows (llama...
2026-03-15 shaofeiqiopencl: add cumsum op (llama/18981)
2026-03-15 uvoship: compile debug builds with -O2 on hip to avoid...
2026-03-15 Masashi Yoshimuraggml-webgpu: Add supports for `GGML_OP_REPEAT` (llama...
2026-03-15 Georgi Gerganovllama : enable chunked fused GDN path (llama/20340)
2026-03-15 Richard Davisonggml : add NVFP4 quantization type support (llama/19769)
2026-03-15 Daniel Beveniusllama : add support for Nemotron 3 Super (llama/20411)
2026-03-15 Georgi Gerganovmetal : fix capture_compute counter logic (llama/20410)
2026-03-15 Georgi Gerganovmetal : fix q5_k mul_mv register spill (llama/20399)
2026-03-15 Georgi Gerganovmetal : add env var to trigger graph capture (llama...
2026-03-15 uvosggml-cuda: gdn use shared mem for HIP (llama/20366)
2026-03-15 uvoscuda/hip: fix loop unrolling in ssm-conv (llama/20369)
2026-03-15 Neo Zhangfix op rope, add rope_back (llama/20293)
2026-03-15 Neo Zhangfix for failed UT case: ACC, L2_NORM, UPSCALE, fused_gl...
2026-03-15 Georgi Gerganovggml : bump RPC version (llama/20330)
2026-03-15 Reese Levineggml webgpu: faster normal quant and some k-quant matri...
2026-03-15 Charles Xukleidiai : support for concurrent sme and neon kernel...
2026-03-15 Taimur Ahmadggml-cpu: add RVV repack GEMM and GEMV for quantization...
2026-03-15 Julian Pscheidmetal: handle command buffer failures gracefully in...
2026-03-15 Paul Flynnmetal : extend mul_mv_ext to BF16, Q2_K, Q3_K (llama...
2026-03-15 Georgi Gerganovmetal : add upscale (llama/20284)
2026-03-15 Aman Guptaggml-cuda: disable gdn for musa (llama/20278)
2026-03-15 Bertay Erenggml-vulkan: add SGN operator, auto-generate Vulkan...
2026-03-15 Ruben Ortlamvulkan: skip zero size tensors in backend copies (llama...
2026-03-15 Michael Huangcuda : display total and free VRAM capacity during...
2026-03-15 GiantPrinceggml-vulkan: Add ELU op support (llama/20183)
2026-03-15 Jeff Bolzvulkan: Fix data races in coopmat1 mul_mat(_id) (llama...
2026-03-15 Neo Zhangsupprt Flash Attention for fp32/fp16/Q4/Q5/Q8 (llama...
2026-03-15 Aman Guptaggml: add GATED_DELTA_NET op (llama/19504)
2026-03-15 lhezopencl: add l2_norm (llama/20160)
2026-03-15 Bartowskiquants : Add memsets and other fixes for IQ quants...
2026-03-15 Piotr Wilkin... Autoparser - complete refactoring of parser architectur...
2026-03-15 Todor Boinovskihexagon: add f32 ssm_conv op (llama/20122)
2026-03-15 Max Krasnyanskycpu: skip redudant ROPE cache updates (llama/20149)
2026-03-15 Aman Guptaggml-cuda: add mem check for fusion (llama/19916)
2026-03-15 Aaron Teoggml: update comments for backends which have no memory...
2026-03-15 shalinib-ibmggml-cpu: Fix gcc 15 ICE on ppc64le (#20083) (llama...
2026-03-15 Aman GuptaCUDA: use shared mem for ssm_conv (llama/20128)
2026-03-15 Johannes Gäßlerggml-cpu: fix data race for debug asserts (llama/20148)
2026-03-15 lhezopencl: add neg, exp and diag (llama/20127)
2026-03-15 YardenTal44hexagon: add fp16 support for binary ops: add,sub,mul...
2026-03-15 Andreas KieslingerCUDA: Improve performance via less synchronizations...
2026-03-15 Marcel Petrickchore : correct typos [no ci] (llama/20041)
2026-03-15 Max Krasnyanskyhexagon: Flash Attention optimizations (dma, mpyacc...
2026-03-15 lhezopencl: add `SET`, support i32 for `CPY`, minor refacto...
2026-03-15 Nikhil JainFix wait logic for inflight jobs (llama/20096)
2026-03-15 Masashi YoshimuraAdd concat op to webgpu. (llama/20068)
2026-03-15 Johannes Gäßlerggml: fix ggml_is_contiguous_n for ne == 1 (llama/20092)
2026-03-15 Adrien Gallouëtggml : use a simple std::thread in AMX without OpenMP...
2026-03-15 Charles Xukleidiai : add sme fp16 compute path for q4_0 gemm...
2026-03-15 shaofeiqiopencl: add optimized q4_1 mm kernel for adreno (llama...
2026-03-15 Abhijit Rameshggml webgpu: fix workgroup dispatch limit for large...
2026-03-15 Nikhil Jainggml webgpu: Clean up per-thread parameter buffer pool...
2026-03-15 Masashi Yoshimuraggml-webgpu: Support non-contiguous `src0` and overlapp...
2026-03-15 Ruben Ortlamvulkan: tune MMVQ for Intel Windows (llama/19988)
2026-03-15 Aaron Teoggml-cpu: optimise s390x multiply extend instructions...
2026-03-15 Ruben Ortlamvulkan: improve partial offloading performance on AMD...
2026-03-15 oobaboogacuda: cap grid.y at 65535 in non-contiguous dequantize...
2026-03-15 Jayant LohiaCUDA: add CDNA3 MFMA support for flash attention MMA...
2026-03-15 Aman Guptaggml-cpu: add repack for mxfp4 (llama/19738)
2026-03-15 David366AIexamples/yolo: fix load_model memory leak (#1432)
2026-02-27 Georgi Gerganovgguf : sync (llama/0)
2026-02-27 Georgi Gerganovscripts : sync gguf code
2026-02-27 Georgi Gerganovsync : llama.cpp
2026-02-27 Neo Zhangreplace the magic nunber 768 by max work group size...
2026-02-27 Vishal Singhggml-zendnn: update code for latest ZenDNN API (llama...
2026-02-27 Adrien Gallouëtggml : fix AMX and add batched support (llama/19925)
2026-02-27 Ruben Ortlamvulkan: fix fp16 Flash Attention on Windows AMD RDNA2...
2026-02-27 Kevin Pougetggml-virtgpu: improve the reliability of the code ...
2026-02-27 Neo Zhangsupport permuted, remove check s0/s10 (llama/19889)
2026-02-27 Jeff Bolzvulkan: check for memory overlap before doing fusion...
2026-02-25 Georgi Gerganovsync : llama.cpp
next