]> git.djapps.eu Git - pkg/ggml/sources/ggml/shortlog
pkg/ggml/sources/ggml
2026-03-15 GiantPrinceggml-vulkan: Add ELU op support (llama/20183)
2026-03-15 Jeff Bolzvulkan: Fix data races in coopmat1 mul_mat(_id) (llama...
2026-03-15 Neo Zhangsupprt Flash Attention for fp32/fp16/Q4/Q5/Q8 (llama...
2026-03-15 Aman Guptaggml: add GATED_DELTA_NET op (llama/19504)
2026-03-15 lhezopencl: add l2_norm (llama/20160)
2026-03-15 Bartowskiquants : Add memsets and other fixes for IQ quants...
2026-03-15 Piotr Wilkin... Autoparser - complete refactoring of parser architectur...
2026-03-15 Todor Boinovskihexagon: add f32 ssm_conv op (llama/20122)
2026-03-15 Max Krasnyanskycpu: skip redudant ROPE cache updates (llama/20149)
2026-03-15 Aman Guptaggml-cuda: add mem check for fusion (llama/19916)
2026-03-15 Aaron Teoggml: update comments for backends which have no memory...
2026-03-15 shalinib-ibmggml-cpu: Fix gcc 15 ICE on ppc64le (#20083) (llama...
2026-03-15 Aman GuptaCUDA: use shared mem for ssm_conv (llama/20128)
2026-03-15 Johannes Gäßlerggml-cpu: fix data race for debug asserts (llama/20148)
2026-03-15 lhezopencl: add neg, exp and diag (llama/20127)
2026-03-15 YardenTal44hexagon: add fp16 support for binary ops: add,sub,mul...
2026-03-15 Andreas KieslingerCUDA: Improve performance via less synchronizations...
2026-03-15 Marcel Petrickchore : correct typos [no ci] (llama/20041)
2026-03-15 Max Krasnyanskyhexagon: Flash Attention optimizations (dma, mpyacc...
2026-03-15 lhezopencl: add `SET`, support i32 for `CPY`, minor refacto...
2026-03-15 Nikhil JainFix wait logic for inflight jobs (llama/20096)
2026-03-15 Masashi YoshimuraAdd concat op to webgpu. (llama/20068)
2026-03-15 Johannes Gäßlerggml: fix ggml_is_contiguous_n for ne == 1 (llama/20092)
2026-03-15 Adrien Gallouëtggml : use a simple std::thread in AMX without OpenMP...
2026-03-15 Charles Xukleidiai : add sme fp16 compute path for q4_0 gemm...
2026-03-15 shaofeiqiopencl: add optimized q4_1 mm kernel for adreno (llama...
2026-03-15 Abhijit Rameshggml webgpu: fix workgroup dispatch limit for large...
2026-03-15 Nikhil Jainggml webgpu: Clean up per-thread parameter buffer pool...
2026-03-15 Masashi Yoshimuraggml-webgpu: Support non-contiguous `src0` and overlapp...
2026-03-15 Ruben Ortlamvulkan: tune MMVQ for Intel Windows (llama/19988)
2026-03-15 Aaron Teoggml-cpu: optimise s390x multiply extend instructions...
2026-03-15 Ruben Ortlamvulkan: improve partial offloading performance on AMD...
2026-03-15 oobaboogacuda: cap grid.y at 65535 in non-contiguous dequantize...
2026-03-15 Jayant LohiaCUDA: add CDNA3 MFMA support for flash attention MMA...
2026-03-15 Aman Guptaggml-cpu: add repack for mxfp4 (llama/19738)
2026-03-15 David366AIexamples/yolo: fix load_model memory leak (#1432)
2026-02-27 Georgi Gerganovgguf : sync (llama/0)
2026-02-27 Georgi Gerganovscripts : sync gguf code
2026-02-27 Georgi Gerganovsync : llama.cpp
2026-02-27 Neo Zhangreplace the magic nunber 768 by max work group size...
2026-02-27 Vishal Singhggml-zendnn: update code for latest ZenDNN API (llama...
2026-02-27 Adrien Gallouëtggml : fix AMX and add batched support (llama/19925)
2026-02-27 Ruben Ortlamvulkan: fix fp16 Flash Attention on Windows AMD RDNA2...
2026-02-27 Kevin Pougetggml-virtgpu: improve the reliability of the code ...
2026-02-27 Neo Zhangsupport permuted, remove check s0/s10 (llama/19889)
2026-02-27 Jeff Bolzvulkan: check for memory overlap before doing fusion...
2026-02-25 Georgi Gerganovsync : llama.cpp
2026-02-25 Georgi Gerganovggml/gguf : prevent integer overflows (llama/19856)
2026-02-25 Ruben OrtlamVulkan Scalar Flash Attention Refactor (llama/19625)
2026-02-25 Jeff Bolzvulkan: fix coopmat1 without bf16 support (llama/19793)
2026-02-25 Jeff Bolzvulkan: fix data race in mul_mat_id shader (llama/19790)
2026-02-25 Max Krasnyanskyhexagon refactor all Ops to use local context struct...
2026-02-25 Alberto Cabrera... ggml-cpu: arm64: q5_K repack gemm and gemv (and generic...
2026-02-25 Gaurav GargImprove CUDA graph capture (llama/19754)
2026-02-25 Taimur Ahmadggml-cpu: add RVV vec dot kernels for quantization...
2026-02-25 Jeff Bolztest: mul_mat tests with huge batch size (llama/19519)
2026-02-25 Masashi Yoshimuraggml-webgpu: Add unary op (SQR, SQRT, SIN, COS) support...
2026-02-25 Ruben Ortlamvulkan: fix MMQ shader push constants and multi-dispatc...
2026-02-25 Johannes GäßlerCUDA: fix kernel selection logic for tile FA (llama...
2026-02-25 shalinib-ibmllamafile: powerpc: add FP16 MMA path for Q4/Q8 matmul...
2026-02-25 Reese Levineggml webgpu: Fix bug in dispatching large matrix-vector...
2026-02-25 Georgi Gerganovsync : llama.cpp
2026-02-25 Reese Levineggml webgpu: shader library organization (llama/19530)
2026-02-25 Georgi Gerganovsync : llama.cpp
2026-02-25 Jeff Bolzvulkan: split mul_mat into multiple dispatches to avoid...
2026-02-25 shaofeiqiopencl: refactor expm1 and softplus (llama/19404)
2026-02-25 shaofeiqiopencl: optimize mean and sum_row kernels (llama/19614)
2026-02-25 Talha Can Havadarggml: ggml-cpu: force-no-lto-for-cpu-feats (llama/19609)
2026-02-25 Georgi Gerganovcuda : enable CUDA graphs for MMID 1 <= BS <= 4 (llama...
2026-02-25 Juddggml : make `ggml_is_view` as API (llama/19539)
2026-02-25 Mario LimoncielloAdjust workaround for ROCWMMA_FATTN/GFX9 to only newer...
2026-02-25 abhijain1204fujitsuggml: aarch64: Implement SVE in Gemm q4_k 8x8 q8_k...
2026-02-25 David Friehscuda: optimize iq2xxs/iq2xs/iq3xxs dequantization ...
2026-02-25 Daniel Beveniuscmake : check if KleidiAI API has been fetched (llama...
2026-02-25 Georgi Gerganovggml : avoid UB in gemm ukernel (llama/19642)
2026-02-25 Aaron Teoggml-cpu: optimize ggml_vec_dot_bf16 for s390x (llama...
2026-02-25 Aman Guptaggml-cpu: FA add GEMM microkernel (llama/19422)
2026-02-25 SamareshSinghcmake : fix KleidiAI install target failure with EXCLUD...
2026-02-25 Salman Chishtici : Upgrade GitHub Actions for Node 24 compatibility...
2026-02-15 Georgi Gerganovggml : bump version to 0.9.7 (#1425) upstream/0.9.7 v0.9.7
2026-02-15 Georgi Gerganovsync : whisper.cpp
2026-02-14 Georgi Gerganovsync : llama.cpp
2026-02-14 Georgi Gerganovmodels : optimize qwen3next graph (llama/19375)
2026-02-14 Adrien Gallouëtggml : fix GGML_DEBUG with OpenMP (llama/19599)
2026-02-14 Georgi Gerganovmetal : fix ACC op (llama/19427)
2026-02-14 Jeff Bolzvulkan: support L2_NORM with contiguous rows (llama...
2026-02-14 Jeff Bolzvulkan: support GGML_OP_SET (llama/19584)
2026-02-14 Sophonvulkan: Add vendor id for Qualcomm drivers (llama/19569)
2026-02-14 Max Krasnyanskyhexagon: further optimizations and refactoring for...
2026-02-14 Jeff Bolzvulkan: restore -inf check in FA shaders (llama/19582)
2026-02-14 Alberto Cabrera... Fix wrong memcpy length for block_interleave == 4 ...
2026-02-14 ymckifix vulkan ggml_acc only works in 3d but not 4d (llama...
2026-02-14 Aman GuptaCUDA: loop over ne2*ne3 in case it overflows (llama...
2026-02-14 Oliver SimonsCUDA: Do not mutate cgraph for fused ADDs (llama/19566)
2026-02-14 Georgi Gerganovmetal : improve concurrency (llama/19555)
2026-02-14 Georgi Gerganovmetal : support GGML_OP_SET (llama/19548)
2026-02-14 Shupei Fanhexagon: fix typo in vtcm_needs_release (llama/19545)
2026-02-14 lhezopencl: add basic support for q4_1 (llama/19534)
2026-02-14 Georgi Gerganovmetal : update sum_rows kernel to support float4 (llama...
2026-02-14 Mario LimoncielloAdd a workaround for compilation with ROCWMMA_FATTN...
next