]>
git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log
Svetlozar Georgiev [Fri, 13 Jun 2025 16:32:56 +0000 (17:32 +0100)]
sycl: fix docker image (#14144)
Guy Goldenberg [Fri, 13 Jun 2025 16:20:25 +0000 (19:20 +0300)]
Merge commit from fork
* vocab : prevent integer overflow during load
* Add static cast and GGML_ABORT
---------
Co-authored-by: Georgi Gerganov <redacted>
Georgi Gerganov [Fri, 13 Jun 2025 15:35:00 +0000 (18:35 +0300)]
batch : add LLAMA_BATCH_DEBUG environment variable (#14172)
* batch : add LLAMA_BATCH_DEBUG environment variable
ggml-ci
* cont : improve seq_id display
ddpasa [Fri, 13 Jun 2025 13:17:53 +0000 (15:17 +0200)]
docs : Update multimodal.md (#14122)
* Update multimodal.md
* Update multimodal.md
Georgi Gerganov [Fri, 13 Jun 2025 10:47:55 +0000 (13:47 +0300)]
batch : rework llama_batch_allocr (#14153)
* batch : rework llama_batch_allocr
ggml-ci
* cont : move validation inside class
ggml-ci
* cont : move output counting to class
ggml-ci
* cont : minor
ggml-ci
* batch : add TODOs
ggml-ci
Georgi Gerganov [Fri, 13 Jun 2025 08:55:44 +0000 (11:55 +0300)]
readme : remove survey link (#14168)
Christian Kastner [Fri, 13 Jun 2025 08:38:52 +0000 (08:38 +0000)]
cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167)
* cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT
* cmake: Pass on LLAMA_BUILD_* to GGML_BUILD_*
Đinh Trọng Huy [Fri, 13 Jun 2025 08:34:08 +0000 (17:34 +0900)]
pooling : make cls_b and cls_out_b optional (#14165)
Co-authored-by: dinhhuy <redacted>
Georgi Gerganov [Fri, 13 Jun 2025 08:18:25 +0000 (11:18 +0300)]
server : fix SWA condition for full context reprocess (#14163)
ggml-ci
Anton Mitkov [Fri, 13 Jun 2025 07:51:39 +0000 (08:51 +0100)]
sycl: Adding additional cpy dbg print output (#14034)
Ewan Crawford [Fri, 13 Jun 2025 07:45:37 +0000 (08:45 +0100)]
SYCL: Bump oneMath commit (#14152)
Update oneMath commit to merged PR https://github.com/uxlfoundation/oneMath/pull/669
which adds SYCL-Graph support for recording CUDA BLAS commands.
With this change the `MUL_MAT` tests now pass on DPC++ CUDA backends with SYCL-Graph
enabled. Prior to this change, an error would be thrown.
```
$ GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0 -o MUL_MAT -p type_a=f16,type_b=f32,m=16,n=1,k=256,bs=\\[1,1\\],nr=\\[2
UR CUDA ERROR:
Value: 700
Name: CUDA_ERROR_ILLEGAL_ADDRESS
Description: an illegal memory access was encountered
Function: operator()
Source Location: $HOME/dpcpp/unified-runtime/source/adapters/cuda/queue.cpp:154
Native API failed. Native API returns:
2147483646 (UR_RESULT_ERROR_UNKNOWN)
Exception caught at file:$HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:3598, func:operator()
SYCL error: CHECK_TRY_ERROR((stream)->wait()): Meet error in this line code!
in function ggml_backend_sycl_synchronize at $HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3598
$HOME/llama.cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:118: SYCL error
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
```
Christian Kastner [Fri, 13 Jun 2025 06:51:34 +0000 (06:51 +0000)]
cmake : Improve build-info.cpp generation (#14156)
* cmake: Simplify build-info.cpp generation
The rebuild of build-info.cpp still gets triggered when .git/index gets
changes.
* cmake: generate build-info.cpp in build dir
Georgi Gerganov [Fri, 13 Jun 2025 05:03:54 +0000 (08:03 +0300)]
vocab : prevent heap overflow when vocab is too small (#14145)
ggml-ci
Anton Mitkov [Thu, 12 Jun 2025 13:15:11 +0000 (14:15 +0100)]
sycl: Remove not needed copy f16->f32 for dnnl mul mat (#14125)
Georgi Gerganov [Thu, 12 Jun 2025 11:43:09 +0000 (14:43 +0300)]
readme : remove project status link (#14149)
Georgi Gerganov [Thu, 12 Jun 2025 08:51:38 +0000 (11:51 +0300)]
server : re-enable SWA speculative decoding (#14131)
ggml-ci
Georgi Gerganov [Thu, 12 Jun 2025 08:50:01 +0000 (11:50 +0300)]
context : simplify output counting logic during decode (#14142)
* batch : remove logits_all flag
ggml-ci
* context : simplify output counting logic during decode
ggml-ci
* cont : fix comments
Georgi Gerganov [Thu, 12 Jun 2025 08:49:26 +0000 (11:49 +0300)]
batch : remove logits_all flag (#14141)
ggml-ci
Georgi Gerganov [Thu, 12 Jun 2025 07:14:24 +0000 (10:14 +0300)]
cmake : handle whitepsaces in path during metal build (#14126)
* cmake : handle whitepsaces in path during metal build
ggml-ci
* cont : proper fix
ggml-ci
---------
Co-authored-by: Daniel Bevenius <redacted>
Georgi Gerganov [Thu, 12 Jun 2025 07:02:15 +0000 (10:02 +0300)]
kv-cache : fix split_equal handling in unified implementation (#14130)
ggml-ci
compilade [Thu, 12 Jun 2025 06:56:04 +0000 (02:56 -0400)]
context : round n_tokens to next multiple of n_seqs when reserving (#14140)
This fixes RWKV inference which otherwise failed
when the worst case ubatch.n_seq_tokens rounded to 0.
bandoti [Wed, 11 Jun 2025 20:19:44 +0000 (17:19 -0300)]
common: fix issue with regex_escape routine on windows (#14133)
Christian Kastner [Wed, 11 Jun 2025 19:07:44 +0000 (19:07 +0000)]
Implement GGML_CPU_ALL_VARIANTS for ARM (#14080)
* ggml-cpu: Factor out feature detection build from x86
* ggml-cpu: Add ARM feature detection and scoring
This is analogous to cpu-feats-x86.cpp. However, to detect compile-time
activation of features, we rely on GGML_USE_<FEAT> which need to be set
in cmake, instead of GGML_<FEAT> that users would set for x86.
This is because on ARM, users specify features with GGML_CPU_ARM_ARCH,
rather than with individual flags.
* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for ARM
Like x86, however to pass around arch flags within cmake, we use
GGML_INTERNAL_<FEAT> as we don't have GGML_<FEAT>.
Some features are optional, so we may need to build multiple backends
per arch version (armv8.2_1, armv8.2_2, ...), and let the scoring
function sort out which one can be used.
* ggml-cpu: Limit ARM GGML_CPU_ALL_VARIANTS to Linux for now
The other platforms will need their own specific variants.
This also fixes the bug that the the variant-building branch was always
being executed as the else-branch of GGML_NATIVE=OFF. The branch is
moved to an elseif-branch which restores the previous behavior.
Sigbjørn Skjæret [Wed, 11 Jun 2025 17:04:23 +0000 (19:04 +0200)]
chore : clean up relative source dir paths (#14128)
Sigbjørn Skjæret [Wed, 11 Jun 2025 15:16:32 +0000 (17:16 +0200)]
tests : add test-tokenizers-repo (#14017)
Jeff Bolz [Wed, 11 Jun 2025 14:48:52 +0000 (09:48 -0500)]
vulkan: Better thread-safety for command pools/buffers (#14116)
This change moves the command pool/buffer tracking into a vk_command_pool
structure. There are two instances per context (for compute+transfer) and
two instances per device for operations that don't go through a context.
This should prevent separate contexts from stomping on each other.
Aman [Wed, 11 Jun 2025 14:42:25 +0000 (22:42 +0800)]
webui: Wrap long numbers instead of infinite horizontal scroll (#14062)
* webui: Wrap long numbers instead of infinite horizontal scroll
* Use tailwind class
* update index.html.gz
Georgi Gerganov [Wed, 11 Jun 2025 13:48:45 +0000 (16:48 +0300)]
kv-cache : relax SWA masking condition (#14119)
ggml-ci
Taylor [Wed, 11 Jun 2025 10:43:43 +0000 (06:43 -0400)]
server : pass default --keep argument (#14120)
Georgi Gerganov [Wed, 11 Jun 2025 09:52:45 +0000 (12:52 +0300)]
kv-cache : add LLAMA_KV_CACHE_DEBUG environment variable (#14121)
Jeff Bolz [Wed, 11 Jun 2025 05:19:25 +0000 (00:19 -0500)]
vulkan: Track descriptor pools/sets per-context (#14109)
Use the same descriptor set layout for all pipelines (MAX_PARAMETER_COUNT == 8)
and move it to the vk_device. Move all the descriptor pool and set tracking to
the context - none of it is specific to pipelines anymore. It has a single vector
of pools and vector of sets, and a single counter to track requests and a single
counter to track use.
lhez [Tue, 10 Jun 2025 23:55:58 +0000 (16:55 -0700)]
opencl: add `mul_mv_id_q4_0_f32_8x_flat` (#14003)
compilade [Tue, 10 Jun 2025 22:20:14 +0000 (18:20 -0400)]
kv-cache : avoid modifying recurrent cells when setting inputs (#13834)
* kv-cache : avoid modifying recurrent cells when setting inputs
* kv-cache : remove inp_s_mask
It was replaced with equivalent and simpler functionality
with rs_z (the first zeroed state) and the already-existing inp_s_copy.
* kv-cache : fix non-consecutive token pos warning for recurrent models
The problem was apparently caused by how the tail cells were swapped.
* graph : simplify logic for recurrent state copies
* kv-cache : use cell without src refs for rs_z in recurrent cache
* llama-graph : fix recurrent state copy
The `state_copy` shuffle assumes everything is moved at once,
which is not true when `states_extra` is copied back to the cache
before copying the range of states between `head` and `head + n_seqs`.
This is only a problem if any of the cells in [`head`, `head + n_seqs`)
have an `src` in [`head + n_seqs`, `head + n_kv`),
which does happen when `n_ubatch > 1` in the `llama-parallel` example.
Changing the order of the operations avoids the potential overwrite
before use, although when copies are avoided (like with Mamba2),
this will require further changes.
* llama-graph : rename n_state to state_size in build_recurrent_state
This naming should reduce confusion between the state size
and the number of states.
Sigbjørn Skjæret [Tue, 10 Jun 2025 21:29:52 +0000 (23:29 +0200)]
convert : fix duplicate key DeepSeek-R1 conversion error (#14103)
Sigbjørn Skjæret [Tue, 10 Jun 2025 16:02:08 +0000 (18:02 +0200)]
llama : support GEGLU for jina-bert-v2 (#14090)
Jeff Bolz [Tue, 10 Jun 2025 15:53:47 +0000 (10:53 -0500)]
vulkan: force device 0 in CI (#14106)
Juk Armstrong [Tue, 10 Jun 2025 15:48:07 +0000 (16:48 +0100)]
Fixed spec timings to: accepted/tested instead of accepted/drafted (#14104)
Georgi Gerganov [Tue, 10 Jun 2025 14:37:45 +0000 (17:37 +0300)]
sync : ggml
ggml-ci
Georgi Gerganov [Tue, 10 Jun 2025 08:34:10 +0000 (11:34 +0300)]
ggml : fix weak alias win32 (whisper/0)
ggml-ci
0cc4m [Tue, 10 Jun 2025 12:01:33 +0000 (14:01 +0200)]
Vulkan: Don't default to CPU device (like llvmpipe), even if no other device is available, to allow fallback to CPU backend (#14099)
Isaac McFadyen [Tue, 10 Jun 2025 06:41:01 +0000 (02:41 -0400)]
rpc : nicer error messages for RPC server crash (#14076)
Georgi Gerganov [Tue, 10 Jun 2025 06:20:51 +0000 (09:20 +0300)]
sync : ggml
ggml-ci
Kai Pastor [Tue, 3 Jun 2025 10:33:28 +0000 (12:33 +0200)]
Add in-build ggml::ggml ALIAS library (ggml/1260)
Enable uniform linking with subproject and with find_package.
Georgi Gerganov [Mon, 9 Jun 2025 20:05:02 +0000 (23:05 +0300)]
metal : use less stack memory in FA kernel (#14088)
* metal : use less stack memory in FA kernel
ggml-ci
* cont : fix BF16 variant
Georgi Gerganov [Mon, 9 Jun 2025 20:04:35 +0000 (23:04 +0300)]
kv-cache : fix shift and defrag logic (#14081)
* kv-cache : fix shift
ggml-ci
* cont : reset shift[i]
ggml-ci
* cont : fix defrag erasing cells that didn't move
ggml-ci
Diego Devesa [Mon, 9 Jun 2025 18:03:09 +0000 (11:03 -0700)]
llama : allow building all tests on windows when not using shared libs (#13980)
* llama : allow building all tests on windows when not using shared libraries
* add static windows build to ci
* tests : enable debug logs for test-chat
---------
Co-authored-by: Georgi Gerganov <redacted>
xctan [Mon, 9 Jun 2025 14:47:13 +0000 (22:47 +0800)]
ggml-cpu : split arch-specific implementations (#13892)
* move ggml-cpu-aarch64 to repack
* split quantize_row_q8_0/1
* split helper functions
* split ggml_vec_dot_q4_0_q8_0
* split ggml_vec_dot_q4_1_q8_1
* split ggml_vec_dot_q5_0_q8_0
* split ggml_vec_dot_q5_1_q8_1
* split ggml_vec_dot_q8_0_q8_0
* split ggml_vec_dot_tq1_0_q8_K
* split ggml_vec_dot_tq2_0_q8_K
* split ggml_vec_dot_q2_K_q8_K
* split ggml_vec_dot_q3_K_q8_K
* split ggml_vec_dot_q4_K_q8_K
* split ggml_vec_dot_q5_K_q8_K
* split ggml_vec_dot_q6_K_q8_K
* split ggml_vec_dot_iq2_xxs_q8_K
* split ggml_vec_dot_iq2_xs_q8_K
* split ggml_vec_dot_iq2_s_q8_K
* split ggml_vec_dot_iq3_xxs_q8_K
* split ggml_vec_dot_iq3_s_q8_K
* split ggml_vec_dot_iq1_s_q8_K
* split ggml_vec_dot_iq1_m_q8_K
* split ggml_vec_dot_iq4_nl_q8_0
* split ggml_vec_dot_iq4_xs_q8_K
* fix typos
* fix missing prototypes
* rename ggml-cpu-quants.c
* rename ggml-cpu-traits
* rename arm folder
* move cpu-feats-x86.cpp
* rename ggml-cpu-hbm
* update arm detection macro in quants.c
* move iq quant tables
* split ggml_quantize_mat_q8_0/K
* split ggml_gemv_*
* split ggml_gemm_*
* rename namespace aarch64 to repack
* use weak aliases to replace test macros
* rename GGML_CPU_AARCH64 to GGML_CPU_REPACK
* rename more aarch64 to repack
* clean up rebase leftover
* fix compilation errors
* remove trailing spaces
* try to fix clang compilation errors
* try to fix clang compilation errors again
* try to fix clang compilation errors, 3rd attempt
* try to fix clang compilation errors, 4th attempt
* try to fix clang compilation errors, 5th attempt
* try to fix clang compilation errors, 6th attempt
* try to fix clang compilation errors, 7th attempt
* try to fix clang compilation errors, 8th attempt
* try to fix clang compilation errors, 9th attempt
* more cleanup
* fix compilation errors
* fix apple targets
* fix a typo in arm version of ggml_vec_dot_q4_K_q8_K
Co-authored-by: Georgi Gerganov <redacted>
---------
Co-authored-by: Georgi Gerganov <redacted>
Diego Devesa [Mon, 9 Jun 2025 14:36:26 +0000 (07:36 -0700)]
cuda : fix device sync on buffer clear (#14033)
Georgi Gerganov [Mon, 9 Jun 2025 14:17:31 +0000 (17:17 +0300)]
graph : fix geglu (#14077)
ggml-ci
Xinpeng Dou [Mon, 9 Jun 2025 11:47:39 +0000 (19:47 +0800)]
CANN: Simplify the environment variable setting(#13104)
* Simplify the environment variable setting to specify the memory pool type.
* Adjust the GGML_CANN_ASYNC_MODE setting to accept yes, enable, 1, or on (case-insensitive) as valid options.
* update
* fix CI
* update
* delete whitespace
* fix according to review
* update CANN.md
* update CANN.md
R0CKSTAR [Mon, 9 Jun 2025 10:01:17 +0000 (18:01 +0800)]
webui: fix sidebar being covered by main content (#14082)
* webui: fix sidebar being covered by main content
Signed-off-by: Xiaodong Ye <redacted>
* webui: update index.html.gz
Signed-off-by: Xiaodong Ye <redacted>
---------
Signed-off-by: Xiaodong Ye <redacted>
Georgi Gerganov [Mon, 9 Jun 2025 09:57:58 +0000 (12:57 +0300)]
server : fix LRU check (#14079)
ggml-ci
Nicolò Scipione [Mon, 9 Jun 2025 09:47:07 +0000 (11:47 +0200)]
sycl: Add reorder to Q6_K mmvq implementation (#13885)
* Add Reorder to Q6_K mmvq implementation
* Address PR comments: clean up comments
* Remove unused parameter after refactoring q4_k
* Adding inline to function and removing unnecessary reference to int
---------
Signed-off-by: nscipione <redacted>
Đinh Trọng Huy [Mon, 9 Jun 2025 04:15:31 +0000 (13:15 +0900)]
add geglu activation function (#14074)
Co-authored-by: dinhhuy <redacted>
Yuanhao Ji [Mon, 9 Jun 2025 03:20:06 +0000 (11:20 +0800)]
CANN: Enable labeler for Ascend NPU (#13914)
Diego Devesa [Sun, 8 Jun 2025 18:39:56 +0000 (11:39 -0700)]
cuda : fix buffer type check with integrated GPUs (#14069)
吴小白 [Sat, 7 Jun 2025 13:39:11 +0000 (21:39 +0800)]
ci: add LoongArch cross-compile build (#13944)
Akarshan Biswas [Sat, 7 Jun 2025 13:28:20 +0000 (18:58 +0530)]
SYCL: Implement few same quantized type copy kernels (#13739)
* SYCL: Implement few same quantized type copy kernels
* Use memcpy for copying contiguous tensors
ggml-ci
* feat(sycl): add contiguous tensor copy support and device checks
Adds a memcpy path for contiguous tensors of the same type to optimize data transfer. Updates device support checks to recognize contiguous tensor operations, improving compatibility and performance.
* refactor: replace specific block copy functions with template
The changes replace multiple redundant block copy functions (e.g., cpy_block_q8_0_q8_0, cpy_block_q5_0_q5_0) with a single templated function cpy_blck_q_q. This reduces code duplication by using a generic template that works for any block type, improving maintainability while preserving the same functionality. The template is instantiated with specific block types (e.g., block_q8_0) where needed.
* Exclude BF16 support for COPY tensors for now
ggml-ci
* perf: adjust SYCL copy kernel block sizes for efficiency
Use ceil_div to ensure full element coverage and update nd_range parameters to better align with SYCL block sizes, improving parallelism and device utilization in copy operations.
Sigbjørn Skjæret [Sat, 7 Jun 2025 12:13:12 +0000 (14:13 +0200)]
llama : fix llama_model_chat_template with template name (LLM_KV with suffix) (#14050)
Georgi Gerganov [Fri, 6 Jun 2025 11:11:15 +0000 (14:11 +0300)]
llama : deprecate llama_kv_self_ API (#14030)
* llama : deprecate llama_kv_self_ API
ggml-ci
* llama : allow llama_memory_(nullptr)
ggml-ci
* memory : add flag for optional data clear in llama_memory_clear
ggml-ci
Georgi Gerganov [Fri, 6 Jun 2025 10:29:18 +0000 (13:29 +0300)]
context : fix SWA-related warning for multiple sequences (#14045)
Sigbjørn Skjæret [Fri, 6 Jun 2025 07:03:25 +0000 (09:03 +0200)]
llama : support multiple classifier outputs and labels (#13940)
Sigbjørn Skjæret [Thu, 5 Jun 2025 15:42:31 +0000 (17:42 +0200)]
gguf-py : add add_classifier_output_labels method to writer (#14031)
* add add_classifier_output_labels
* use add_classifier_output_labels
Masato Nakasaka [Thu, 5 Jun 2025 14:00:29 +0000 (23:00 +0900)]
vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs (#14001)
* allowing B580 and U9-288V
* experimenting code to detect Xe2
* allowing coopmat only for Xe2 GPUs
* fixed comment wording
* fixed comment wording
* removed unnecessary driver check
pockers21 [Thu, 5 Jun 2025 13:25:29 +0000 (06:25 -0700)]
ci: fix CUDA build failure on autodl cloud machines (#14005)
Replace CMAKE_CUDA_ARCHITECTURES=native with nvidia-smi detection
as 'native' fails on autodl cloud environments.
Co-authored-by: pockers21 <redacted>
Georgi Gerganov [Thu, 5 Jun 2025 12:29:22 +0000 (15:29 +0300)]
memory : migrate from llama_kv_cache to more generic llama_memory (#14006)
* memory : merge llama_kv_cache into llama_memory + new `llama_memory` API
ggml-ci
* context : fix casts
ggml-ci
Diego Devesa [Thu, 5 Jun 2025 09:57:42 +0000 (02:57 -0700)]
llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (#14013)
Olexandr88 [Thu, 5 Jun 2025 07:50:55 +0000 (10:50 +0300)]
readme : add badge (#13938)
Sigbjørn Skjæret [Thu, 5 Jun 2025 07:29:18 +0000 (09:29 +0200)]
vocab : warn about missing mask token (#14022)
Georgi Gerganov [Thu, 5 Jun 2025 06:06:29 +0000 (09:06 +0300)]
context : fix pos_min initialization upon error decode (#14008)
ggml-ci
Jeff Bolz [Thu, 5 Jun 2025 05:17:58 +0000 (00:17 -0500)]
vulkan: automatically deduce size of push constants (#13936)
Ervin Áron Tasnádi [Wed, 4 Jun 2025 20:02:00 +0000 (22:02 +0200)]
ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (#13813)
* * ggml-vulkan: adds op CONV_TRANSPOSE_1D
* test-backend-ops: adds more spohisticated tests for CONV_TRANSPOSE_1D
* Missing barrier added to shader.
Number of additional tests reduced to 108.
* * Fixes typo in variable name.
* Removes extra whitespaces.
* Adds int64->int32 casts to prevent possible warnings.
* Problem size reduced in tests to pass tests with llvmpipe.
* supports_op condition moved from unintended position
Georgi Gerganov [Wed, 4 Jun 2025 15:58:20 +0000 (18:58 +0300)]
kv-cache : refactor the update/defrag mechanism (#13988)
* kv-cache : refactor update mechanism
ggml-ci
* memory : improve status handling
* defrag : reset head + add comments
ggml-ci
* cont : minor fixes
ggml-ci
Diego Devesa [Wed, 4 Jun 2025 13:37:40 +0000 (06:37 -0700)]
ci : remove cuda 11.7 releases, switch runner to windows 2022 (#13997)
Diego Devesa [Wed, 4 Jun 2025 11:15:54 +0000 (04:15 -0700)]
releases : use dl backend for linux release, remove arm64 linux release (#13996)
Xuan-Son Nguyen [Wed, 4 Jun 2025 08:11:26 +0000 (10:11 +0200)]
llama-graph : use ggml_repeat_4d (#13998)
Johannes Gäßler [Wed, 4 Jun 2025 06:57:05 +0000 (08:57 +0200)]
CUDA: fix FTZ in FA for Gemma 3 (#13991)
Georgi Gerganov [Wed, 4 Jun 2025 06:50:32 +0000 (09:50 +0300)]
kv-cache : fix unified::seq_rm to work with seq_id < 0 (#13985)
ggml-ci
Jeff Bolz [Tue, 3 Jun 2025 18:30:22 +0000 (13:30 -0500)]
vulkan: fix warnings in perf logger querypool code (#13937)
Xuan-Son Nguyen [Tue, 3 Jun 2025 11:09:36 +0000 (13:09 +0200)]
docs : add "Quick start" section for new users (#13862)
* docs : add "Quick start" section for non-technical users
* rm flox
* Update README.md
lhez [Mon, 2 Jun 2025 23:54:58 +0000 (16:54 -0700)]
opencl: add `backend_synchronize` (#13939)
* This is not needed by the normal use where the result is read
using `tensor_get`, but it allows perf mode of `test-backend-ops`
to properly measure performance.
rmatif [Mon, 2 Jun 2025 23:53:36 +0000 (23:53 +0000)]
OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat (#13840)
* add concat, pad, repeat, tsembd, tanh, upscale
* small fixes
Georgi Gerganov [Mon, 2 Jun 2025 18:34:40 +0000 (21:34 +0300)]
server : disable speculative decoding for SWA models (#13970)
* server : use swa-full fo draft context
ggml-ci
* server : disable speculative decoding for SWA models
Georgi Gerganov [Mon, 2 Jun 2025 18:33:40 +0000 (21:33 +0300)]
metal : use F32 accumulators in FA kernels (#13975)
ggml-ci
Georgi Gerganov [Mon, 2 Jun 2025 17:54:26 +0000 (20:54 +0300)]
gemma : more consistent attention scaling for v2 and v3 (#13951)
* gemma : fix attn scale for 27B
* cont : apply scale before attn
* cont : consistent attention scaling
Olivier Chafik [Mon, 2 Jun 2025 17:15:44 +0000 (10:15 -0700)]
`server`: update deepseek reasoning format (pass reasoning_content as diffs) (#13933)
* server: update deepseek reasoning format (now in reasoning_content diffs), add legacy option for compat
* update unit/test_tool_call.py::test_thoughts
Xuan-Son Nguyen [Mon, 2 Jun 2025 14:29:28 +0000 (16:29 +0200)]
mtmd : fix memory leak in mtmd_helper_eval_chunk_single (#13961)
* mtmd : fix memory in mtmd_helper_eval_chunk_single
* mtmd-cli : fix mem leak
* Update tools/mtmd/mtmd-cli.cpp
Co-authored-by: Georgi Gerganov <redacted>
---------
Co-authored-by: Georgi Gerganov <redacted>
shalinib-ibm [Mon, 2 Jun 2025 12:18:36 +0000 (17:48 +0530)]
cmake : Handle mixed-case 'Power' strings in POWER CPU detection (#13966)
Some systems report the CPU implementation as "Power11" instead of "POWER11".
The existing CMake logic uses a case-sensitive regular expression to extract
the CPU generation, which fails when the casing doesn't exactly match "POWER".
This patch provides a fix by first converting the string to uppercase before applying the regex.
Signed-off-by: root <redacted>
Co-authored-by: root <redacted>
Atharva Dubey [Mon, 2 Jun 2025 09:12:20 +0000 (10:12 +0100)]
sycl: quantize and reorder the input to q8_1 when reorder is enabled (#13826)
* [WIP]: fuse q8 quantization and reorder
* wip2: fuse q8 quantization and reorder
* working q8 reorder commit
* restored common.hpp
* remove debug prints
* remove unnecessary headers and remove trailing whitespace
* Update ggml/src/ggml-sycl/ggml-sycl.cpp
Co-authored-by: Alberto Cabrera Pérez <redacted>
---------
Co-authored-by: Alberto Cabrera Pérez <redacted>
Johannes Gäßler [Sun, 1 Jun 2025 16:08:05 +0000 (18:08 +0200)]
gguf: fix failure on version == 0 (#13956)
Sigbjørn Skjæret [Sun, 1 Jun 2025 16:07:21 +0000 (18:07 +0200)]
convert : fix nomic-bert-moe mask token (#13757)
Sigbjørn Skjæret [Sun, 1 Jun 2025 15:23:11 +0000 (17:23 +0200)]
convert : fix vocab padding code for bert models (#13954)
Aaron Teo [Sun, 1 Jun 2025 14:53:57 +0000 (22:53 +0800)]
ggml: check if non-native endian model is being loaded (#13943)
* gguf: prevent non-native endian models from being loaded
Signed-off-by: Aaron Teo <redacted>
* gguf: update error message
Signed-off-by: Aaron Teo <redacted>
* gguf: make the non-native endian check more verbose
Signed-off-by: Aaron Teo <redacted>
* ggml: move ggml_assert location
Signed-off-by: Aaron Teo <redacted>
* ggml: reword the endianness check error message
Signed-off-by: Aaron Teo <redacted>
---------
Signed-off-by: Aaron Teo <redacted>
Georgi Gerganov [Sun, 1 Jun 2025 09:23:14 +0000 (12:23 +0300)]
sync : ggml
ggml-ci
Kai Pastor [Sat, 31 May 2025 10:49:55 +0000 (12:49 +0200)]
vulkan : Remove unexpected ; (ggml/1253)
Kai Pastor [Sat, 31 May 2025 10:39:19 +0000 (12:39 +0200)]
cmake : Fix broken CMake error messages (ggml/1252)
Radoslav Gerganov [Fri, 30 May 2025 06:11:09 +0000 (09:11 +0300)]
ggml : remove ggml_graph_import and ggml_graph_export declarations (ggml/1247)
The implementation is already deleted with commit
9d0762e .
closes: #1235
Georgi Gerganov [Thu, 29 May 2025 10:29:50 +0000 (13:29 +0300)]
sync : whisper.cpp (ggml/1250)
* ggml : Fix backtrace breaking Windows build (whisper/3203)
* sync : whisper.cpp
ggml-ci
---------
Co-authored-by: Daniel Tang <redacted>
Radoslav Gerganov [Thu, 29 May 2025 05:34:46 +0000 (08:34 +0300)]
ggml : install dynamic backends (ggml/1240)
* ggml : install dynamic backends
Make sure dynamic backends are installed in $CMAKE_INSTALL_BINDIR
Daniel Tang [Wed, 28 May 2025 00:58:46 +0000 (20:58 -0400)]
ggml : Print backtrace on uncaught C++ exceptions (ggml/1232)
The goal is to have what users call "full logs" contain the backtrace.
This is registered upon ggml_init. Also fixes a minor fd leak on Linux.