]> git.djapps.eu Git - pkg/ggml/sources/ggml/shortlog
pkg/ggml/sources/ggml
2026-01-13 Jeff Bolzvulkan: change memory_logger to be controlled by an...
2026-01-13 Jeff Bolzvulkan: Use VK_EXT_shader_64bit_indexing to handle...
2026-01-13 Ruben Ortlamvulkan: Disable large coopmat matmul configuration...
2026-01-13 Ruben OrtlamVulkan: Optimize Matmul parameters for AMD GPUs with...
2026-01-11 Georgi Gerganovsync : llma.cpp
2026-01-11 shaofeiqiopencl: add SOFTPLUS op support (llama/18726)
2026-01-11 Aman Guptatest-backend-ops: fix mxfp4 tests on blackwell (llama...
2026-01-11 Johannes GäßlerHIP: adjust RDNA3.5 MMQ kernel selction logic (llama...
2026-01-11 Perry Naseckcmake : update blas logic (llama/18205)
2026-01-11 Michael WandCorrected: changed s13 = src1->nb[3] instead of nb...
2026-01-11 shaofeiqiopencl: add EXPM1 op (llama/18704)
2026-01-11 Reese LevineUpdates to webgpu get_memory (llama/18707)
2026-01-11 Georgi Gerganovsync : llama.cpp
2026-01-11 Aaron Teollama: use host memory if device reports 0 memory ...
2026-01-11 Masashi Yoshimuraggml-webgpu: Fix GGML_MEM_ALIGN to 8 for emscripten...
2026-01-11 Reese Levineggml webgpu: initial flashattention implementation...
2026-01-11 Jeff Bolzvulkan: fix push constant size for quantize_q8_1 (llama...
2026-01-11 Jeff Bolzvulkan: optimize ssm_scan (llama/18630)
2026-01-11 도로로도로또metal : add MoE kernel specialization for ne20=5 (llama...
2026-01-11 Doctor Shotgunggml: add env var GGML_OP_OFFLOAD_MIN_BATCH (llama...
2026-01-11 shaofeiqiopencl: add FILL op support (llama/18682)
2026-01-11 Oliver Walshcuda : fix build on cuda 12.8 (llama/18672)
2026-01-11 Jeff Bolzvulkan: reject ops when a tensor is too large to alloca...
2026-01-11 virajwadvulkan: Warptile tuning for Intel Xe2/Xe3 (llama/18178)
2026-01-11 Evevulkan: more mul mat optimizations (llama/18533)
2026-01-11 hipuddingCANN: Fix rename for get_env (llama/18652)
2026-01-11 Raul TorresCANN: Rename `get_env` to `get_env_as_lowercase` (llama...
2026-01-11 Max KrasnyanskyHexagon add support for f16/f32 flash attention, scale...
2026-01-11 Aadeshveer... ggml : optimize cuda ssm_scan using warp-level reductio...
2026-01-11 Jeff Bolzvulkan: support buffer_from_host_ptr (llama/18467)
2026-01-11 Aman Guptaggml-cuda: refactor cuda graph usage (llama/18637)
2026-01-11 Beinseziimmq.cu: tune mmq/rocblas switching for RDNA (llama...
2026-01-11 Adrien Gallouëtggml : fix avx512bf16 build (llama/18623)
2026-01-11 Raul TorresCANN: Make `valid_values` variable `static const` ...
2026-01-11 nwyinggml webgpu: add CEIL operation support (llama/18605)
2026-01-11 Johannes GäßlerCUDA: fix FA FP16 accumulator overflow for Granite...
2026-01-11 Aman Guptaggml-cuda: check for srcs outside the cgraph (llama...
2026-01-11 Jeff Bolzvulkan: fix topk_moe_sigmoid_norm_bias failures in...
2026-01-11 Jeff Bolzvulkan: handle quantize_q8_1 overflowing the max workgr...
2026-01-11 Chenguang LiCANN: add operator fusion support for ADD + RMS_NORM...
2026-01-11 Daniel Beveniussampling : add support for backend sampling (llama...
2026-01-11 Aman GuptaCUDA: disable cuda graph when using n-cpu-moe (llama...
2026-01-11 Aman Guptaggml-cuda: remove unused params in ggml_cuda_graph...
2026-01-11 Aman Guptaggml-cuda: fixes for concurrent streams (llama/18496)
2026-01-11 Johannes GäßlerCUDA: only allocate FA tmp buffer if needed (llama...
2026-01-11 pl752(Bugfix, ggml-cuda) Pool alloc count fix + small size...
2026-01-11 Shouyuggml-hexagon: optimize activation function (llama/18393)
2026-01-11 Jeff Bolzvulkan: Optimize GGML_OP_CUMSUM (llama/18417)
2026-01-11 Jeff Bolzvulkan: Implement mmvq for iq1_s/iq1_m (llama/18450)
2026-01-11 Georgi Gerganovmetal : adjust extra size for FA buffer to avoid reallo...
2026-01-11 Chris Rohlfrpc : use unordered_map::reserve and emplace (llama...
2026-01-11 MeeMincuda : fix copy of large tensors (ggml_nbytes <= INT_MA...
2026-01-11 Aman Guptaggml-cuda: remove unneccesary prints on ggml_cuda_init...
2026-01-11 Jeff Bolzvulkan: extend topk_moe to handle sigmoid w/exp_probs_b...
2025-12-31 Georgi Gerganovggml : bump version to 0.9.5 (#1410) upstream/0.9.5 v0.9.5
2025-12-31 Georgi Gerganovsync : whisper.cpp
2025-12-31 Georgi Gerganovsync : llama.cpp
2025-12-31 gatbontonpcmetal : add count_equal op (llama/18314)
2025-12-31 Johannes GäßlerCUDA: fix KQ max calculation (llama/18487)
2025-12-31 Georgi Gerganovmetal : remove BF16 x F16 kernels (llama/18456)
2025-12-31 Georgi Gerganovsync : llama.cpp
2025-12-31 Aman Guptasycl: add newline at the end of CMakeLists.txt (llama...
2025-12-31 Rahul SatheWork around broken IntelSYCLConfig.cmake in Intel oneAP...
2025-12-31 Charles Xukleidiai: add and integrate SVE 256-bit vector-length...
2025-12-31 Aman GuptaCUDA: add log line when mxfp4 acceleration is used...
2025-12-31 Johannes GäßlerCUDA: fix replacment of bad archs in CMake (llama/18457)
2025-12-31 Johannes GäßlerCUDA: Blackwell features for non-native builds (llama...
2025-12-31 Aman Guptacuda: fix race condition in cumsum (llama/18448)
2025-12-31 uvosHIP: Use mmq on MFMA devices for MUL_MAT_ID in cases...
2025-12-31 Aman GuptaRevert "ggml-cuda: use CMAKE_CUDA_ARCHITECTURES if...
2025-12-31 o7sirpc: fix segfault on invalid endpoint format (llama...
2025-12-31 Boian Berberovcmake: Added more x86_64 CPU backends when building...
2025-12-31 QDeltaggml-cuda: use CMAKE_CUDA_ARCHITECTURES if set when...
2025-12-31 lhezopencl: allow resizing transpose buffers (llama/18384)
2025-12-31 Aman Guptaggml-cuda: Use same regex for GGML_NATIVE=OFF (llama...
2025-12-31 Jeff Bolzvulkan: preprocess mul_mat_id experts and discard workg...
2025-12-31 Jeff Bolzvulkan: optimize decodeFuncB in coopmat2 mul_mat_id...
2025-12-31 Jeff Bolzvulkan: Use BK=32 for coopmat2 mul_mat_id (llama/18332)
2025-12-31 Evevulkan: small dequantization improvements (llama/18380)
2025-12-31 Jeff Bolzvulkan: Support UPSCALE w/antialias (llama/18327)
2025-12-31 Jeff Bolzvulkan: handle rope with large number of rows (llama...
2025-12-31 0MarbleCANN: implement the SSM_CONV operator (llama/17737)
2025-12-31 Aman Guptaggml-cuda: fix regex for arch list (llama/18371)
2025-12-31 Aman Guptacuda: optimize cumsum cub path (llama/18362)
2025-12-31 Aman Guptaggml-cuda: fix blackwell native builds (llama/18361)
2025-12-31 Penglin CaiCANN: Add support for CONV_TRANSPOSE_1D when kernel...
2025-12-31 Aadeshveer... ggml : optimize cuda cumsum fallback kernel (llama...
2025-12-31 Aman GuptaCUDA: experimental native mxfp4 support for blackwell...
2025-12-31 Jeff Bolzvulkan: fix command buffer corruption in ggml_backend_v...
2025-12-31 Wang WeixuanCANN : refactor ACL graph cache (llama/17752)
2025-12-31 Ruben Ortlamvulkan: use fewer FA rows for small cache runs (llama...
2025-12-31 TianHao324CANN: Uses yarn_ramp cache in ROPE (llama/17725)
2025-12-31 Chris Rohlfrpc : add check for rpc buffer type (llama/18242)
2025-12-31 nullnameggml-hexagon: create generalized functions for cpu...
2025-12-31 Shouyuggml-hexagon: gelu optimization (llama/18151)
2025-12-31 Taimur Ahmadllamafile: add rvv support for sgemm kernels (llama...
2025-12-31 lhezopencl: unpack q4_0 for adreno in get_tensor (llama...
2025-12-31 Jeff Bolzvulkan: Extend rope fusions to allow mrope (llama/18264)
2025-12-31 Jeff Bolzvulkan: Implement set_tensor_async and the event interf...
2025-12-31 Johannes Gäßlerllama: fix RPC for -fit on (llama/18233)
next