]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log
pkg/ggml/sources/llama.cpp
2 weeks agovocab : fix build (#14175)
Georgi Gerganov [Fri, 13 Jun 2025 17:03:05 +0000 (20:03 +0300)]
vocab : fix build (#14175)

ggml-ci

2 weeks agosycl: fix docker image (#14144)
Svetlozar Georgiev [Fri, 13 Jun 2025 16:32:56 +0000 (17:32 +0100)]
sycl: fix docker image (#14144)

2 weeks agoMerge commit from fork
Guy Goldenberg [Fri, 13 Jun 2025 16:20:25 +0000 (19:20 +0300)]
Merge commit from fork

* vocab : prevent integer overflow during load

* Add static cast and GGML_ABORT

---------

Co-authored-by: Georgi Gerganov <redacted>
2 weeks agobatch : add LLAMA_BATCH_DEBUG environment variable (#14172)
Georgi Gerganov [Fri, 13 Jun 2025 15:35:00 +0000 (18:35 +0300)]
batch : add LLAMA_BATCH_DEBUG environment variable (#14172)

* batch : add LLAMA_BATCH_DEBUG environment variable

ggml-ci

* cont : improve seq_id display

2 weeks agodocs : Update multimodal.md (#14122)
ddpasa [Fri, 13 Jun 2025 13:17:53 +0000 (15:17 +0200)]
docs : Update multimodal.md (#14122)

* Update multimodal.md

* Update multimodal.md

2 weeks agobatch : rework llama_batch_allocr (#14153)
Georgi Gerganov [Fri, 13 Jun 2025 10:47:55 +0000 (13:47 +0300)]
batch : rework llama_batch_allocr (#14153)

* batch : rework llama_batch_allocr

ggml-ci

* cont : move validation inside class

ggml-ci

* cont : move output counting to class

ggml-ci

* cont : minor

ggml-ci

* batch : add TODOs

ggml-ci

2 weeks agoreadme : remove survey link (#14168)
Georgi Gerganov [Fri, 13 Jun 2025 08:55:44 +0000 (11:55 +0300)]
readme : remove survey link (#14168)

2 weeks agocmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167)
Christian Kastner [Fri, 13 Jun 2025 08:38:52 +0000 (08:38 +0000)]
cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT (#14167)

* cmake: Add ability to pass in LLAMA_BUILD_NUMBER/COMMIT

* cmake: Pass on LLAMA_BUILD_* to GGML_BUILD_*

2 weeks agopooling : make cls_b and cls_out_b optional (#14165)
Đinh Trọng Huy [Fri, 13 Jun 2025 08:34:08 +0000 (17:34 +0900)]
pooling : make cls_b and cls_out_b optional (#14165)

Co-authored-by: dinhhuy <redacted>
2 weeks agoserver : fix SWA condition for full context reprocess (#14163)
Georgi Gerganov [Fri, 13 Jun 2025 08:18:25 +0000 (11:18 +0300)]
server : fix SWA condition for full context reprocess (#14163)

ggml-ci

2 weeks agosycl: Adding additional cpy dbg print output (#14034)
Anton Mitkov [Fri, 13 Jun 2025 07:51:39 +0000 (08:51 +0100)]
sycl: Adding additional cpy dbg print output (#14034)

2 weeks agoSYCL: Bump oneMath commit (#14152)
Ewan Crawford [Fri, 13 Jun 2025 07:45:37 +0000 (08:45 +0100)]
SYCL: Bump oneMath commit (#14152)

Update oneMath commit to merged PR https://github.com/uxlfoundation/oneMath/pull/669
which adds SYCL-Graph support for recording CUDA BLAS commands.

With this change the `MUL_MAT` tests now pass on DPC++ CUDA backends with SYCL-Graph
enabled. Prior to this change, an error would be thrown.

```
$ GGML_SYCL_DISABLE_GRAPH=0 ./bin/test-backend-ops -b SYCL0 -o MUL_MAT -p type_a=f16,type_b=f32,m=16,n=1,k=256,bs=\\[1,1\\],nr=\\[2

UR CUDA ERROR:
        Value:           700
        Name:            CUDA_ERROR_ILLEGAL_ADDRESS
        Description:     an illegal memory access was encountered
        Function:        operator()
        Source Location: $HOME/dpcpp/unified-runtime/source/adapters/cuda/queue.cpp:154

Native API failed. Native API returns: 2147483646 (UR_RESULT_ERROR_UNKNOWN)
Exception caught at file:$HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp, line:3598, func:operator()
SYCL error: CHECK_TRY_ERROR((stream)->wait()): Meet error in this line code!
  in function ggml_backend_sycl_synchronize at $HOME/llama.cpp/ggml/src/ggml-sycl/ggml-sycl.cpp:3598
$HOME/llama.cpp/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:118: SYCL error
Could not attach to process.  If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user.  For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
```

2 weeks agocmake : Improve build-info.cpp generation (#14156)
Christian Kastner [Fri, 13 Jun 2025 06:51:34 +0000 (06:51 +0000)]
cmake : Improve build-info.cpp generation (#14156)

* cmake: Simplify build-info.cpp generation

The rebuild of build-info.cpp still gets triggered when .git/index gets
changes.

* cmake: generate build-info.cpp in build dir

2 weeks agovocab : prevent heap overflow when vocab is too small (#14145)
Georgi Gerganov [Fri, 13 Jun 2025 05:03:54 +0000 (08:03 +0300)]
vocab : prevent heap overflow when vocab is too small (#14145)

ggml-ci

2 weeks agosycl: Remove not needed copy f16->f32 for dnnl mul mat (#14125)
Anton Mitkov [Thu, 12 Jun 2025 13:15:11 +0000 (14:15 +0100)]
sycl: Remove not needed copy f16->f32 for dnnl mul mat (#14125)

2 weeks agoreadme : remove project status link (#14149)
Georgi Gerganov [Thu, 12 Jun 2025 11:43:09 +0000 (14:43 +0300)]
readme : remove project status link (#14149)

2 weeks agoserver : re-enable SWA speculative decoding (#14131)
Georgi Gerganov [Thu, 12 Jun 2025 08:51:38 +0000 (11:51 +0300)]
server : re-enable SWA speculative decoding (#14131)

ggml-ci

2 weeks agocontext : simplify output counting logic during decode (#14142)
Georgi Gerganov [Thu, 12 Jun 2025 08:50:01 +0000 (11:50 +0300)]
context : simplify output counting logic during decode (#14142)

* batch : remove logits_all flag

ggml-ci

* context : simplify output counting logic during decode

ggml-ci

* cont : fix comments

2 weeks agobatch : remove logits_all flag (#14141)
Georgi Gerganov [Thu, 12 Jun 2025 08:49:26 +0000 (11:49 +0300)]
batch : remove logits_all flag (#14141)

ggml-ci

2 weeks agocmake : handle whitepsaces in path during metal build (#14126)
Georgi Gerganov [Thu, 12 Jun 2025 07:14:24 +0000 (10:14 +0300)]
cmake : handle whitepsaces in path during metal build (#14126)

* cmake : handle whitepsaces in path during metal build

ggml-ci

* cont : proper fix

ggml-ci

---------

Co-authored-by: Daniel Bevenius <redacted>
2 weeks agokv-cache : fix split_equal handling in unified implementation (#14130)
Georgi Gerganov [Thu, 12 Jun 2025 07:02:15 +0000 (10:02 +0300)]
kv-cache : fix split_equal handling in unified implementation (#14130)

ggml-ci

2 weeks agocontext : round n_tokens to next multiple of n_seqs when reserving (#14140)
compilade [Thu, 12 Jun 2025 06:56:04 +0000 (02:56 -0400)]
context : round n_tokens to next multiple of n_seqs when reserving (#14140)

This fixes RWKV inference which otherwise failed
when the worst case ubatch.n_seq_tokens rounded to 0.

2 weeks agocommon: fix issue with regex_escape routine on windows (#14133)
bandoti [Wed, 11 Jun 2025 20:19:44 +0000 (17:19 -0300)]
common: fix issue with regex_escape routine on windows (#14133)

2 weeks agoImplement GGML_CPU_ALL_VARIANTS for ARM (#14080)
Christian Kastner [Wed, 11 Jun 2025 19:07:44 +0000 (19:07 +0000)]
Implement GGML_CPU_ALL_VARIANTS for ARM (#14080)

* ggml-cpu: Factor out feature detection build from x86

* ggml-cpu: Add ARM feature detection and scoring

This is analogous to cpu-feats-x86.cpp. However, to detect compile-time
activation of features, we rely on GGML_USE_<FEAT> which need to be set
in cmake, instead of GGML_<FEAT> that users would set for x86.

This is because on ARM, users specify features with GGML_CPU_ARM_ARCH,
rather than with individual flags.

* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for ARM

Like x86, however to pass around arch flags within cmake, we use
GGML_INTERNAL_<FEAT> as we don't have GGML_<FEAT>.

Some features are optional, so we may need to build multiple backends
per arch version (armv8.2_1, armv8.2_2, ...), and let the scoring
function sort out which one can be used.

* ggml-cpu: Limit ARM GGML_CPU_ALL_VARIANTS to Linux for now

The other platforms will need their own specific variants.

This also fixes the bug that the the variant-building branch was always
being executed as the else-branch of GGML_NATIVE=OFF. The branch is
moved to an elseif-branch which restores the previous behavior.

2 weeks agochore : clean up relative source dir paths (#14128)
Sigbjørn Skjæret [Wed, 11 Jun 2025 17:04:23 +0000 (19:04 +0200)]
chore : clean up relative source dir paths (#14128)

2 weeks agotests : add test-tokenizers-repo (#14017)
Sigbjørn Skjæret [Wed, 11 Jun 2025 15:16:32 +0000 (17:16 +0200)]
tests : add test-tokenizers-repo (#14017)

2 weeks agovulkan: Better thread-safety for command pools/buffers (#14116)
Jeff Bolz [Wed, 11 Jun 2025 14:48:52 +0000 (09:48 -0500)]
vulkan: Better thread-safety for command pools/buffers (#14116)

This change moves the command pool/buffer tracking into a vk_command_pool
structure. There are two instances per context (for compute+transfer) and
two instances per device for operations that don't go through a context.
This should prevent separate contexts from stomping on each other.

2 weeks agowebui: Wrap long numbers instead of infinite horizontal scroll (#14062)
Aman [Wed, 11 Jun 2025 14:42:25 +0000 (22:42 +0800)]
webui: Wrap long numbers instead of infinite horizontal scroll (#14062)

* webui: Wrap long numbers instead of infinite horizontal scroll

* Use tailwind class

* update index.html.gz

2 weeks agokv-cache : relax SWA masking condition (#14119)
Georgi Gerganov [Wed, 11 Jun 2025 13:48:45 +0000 (16:48 +0300)]
kv-cache : relax SWA masking condition (#14119)

ggml-ci

2 weeks agoserver : pass default --keep argument (#14120)
Taylor [Wed, 11 Jun 2025 10:43:43 +0000 (06:43 -0400)]
server : pass default --keep argument (#14120)

2 weeks agokv-cache : add LLAMA_KV_CACHE_DEBUG environment variable (#14121)
Georgi Gerganov [Wed, 11 Jun 2025 09:52:45 +0000 (12:52 +0300)]
kv-cache : add LLAMA_KV_CACHE_DEBUG environment variable (#14121)

2 weeks agovulkan: Track descriptor pools/sets per-context (#14109)
Jeff Bolz [Wed, 11 Jun 2025 05:19:25 +0000 (00:19 -0500)]
vulkan: Track descriptor pools/sets per-context (#14109)

Use the same descriptor set layout for all pipelines (MAX_PARAMETER_COUNT == 8)
and move it to the vk_device. Move all the descriptor pool and set tracking to
the context - none of it is specific to pipelines anymore. It has a single vector
of pools and vector of sets, and a single counter to track requests and a single
counter to track use.

2 weeks agoopencl: add `mul_mv_id_q4_0_f32_8x_flat` (#14003)
lhez [Tue, 10 Jun 2025 23:55:58 +0000 (16:55 -0700)]
opencl: add `mul_mv_id_q4_0_f32_8x_flat` (#14003)

2 weeks agokv-cache : avoid modifying recurrent cells when setting inputs (#13834)
compilade [Tue, 10 Jun 2025 22:20:14 +0000 (18:20 -0400)]
kv-cache : avoid modifying recurrent cells when setting inputs (#13834)

* kv-cache : avoid modifying recurrent cells when setting inputs

* kv-cache : remove inp_s_mask

It was replaced with equivalent and simpler functionality
with rs_z (the first zeroed state) and the already-existing inp_s_copy.

* kv-cache : fix non-consecutive token pos warning for recurrent models

The problem was apparently caused by how the tail cells were swapped.

* graph : simplify logic for recurrent state copies

* kv-cache : use cell without src refs for rs_z in recurrent cache

* llama-graph : fix recurrent state copy

The `state_copy` shuffle assumes everything is moved at once,
which is not true when `states_extra` is copied back to the cache
before copying the range of states between `head` and `head + n_seqs`.
This is only a problem if any of the cells in [`head`, `head + n_seqs`)
have an `src` in [`head + n_seqs`, `head + n_kv`),
which does happen when `n_ubatch > 1` in the `llama-parallel` example.

Changing the order of the operations avoids the potential overwrite
before use, although when copies are avoided (like with Mamba2),
this will require further changes.

* llama-graph : rename n_state to state_size in build_recurrent_state

This naming should reduce confusion between the state size
and the number of states.

2 weeks agoconvert : fix duplicate key DeepSeek-R1 conversion error (#14103)
Sigbjørn Skjæret [Tue, 10 Jun 2025 21:29:52 +0000 (23:29 +0200)]
convert : fix duplicate key DeepSeek-R1 conversion error (#14103)

2 weeks agollama : support GEGLU for jina-bert-v2 (#14090)
Sigbjørn Skjæret [Tue, 10 Jun 2025 16:02:08 +0000 (18:02 +0200)]
llama : support GEGLU for jina-bert-v2 (#14090)

2 weeks agovulkan: force device 0 in CI (#14106)
Jeff Bolz [Tue, 10 Jun 2025 15:53:47 +0000 (10:53 -0500)]
vulkan: force device 0 in CI (#14106)

2 weeks agoFixed spec timings to: accepted/tested instead of accepted/drafted (#14104)
Juk Armstrong [Tue, 10 Jun 2025 15:48:07 +0000 (16:48 +0100)]
Fixed spec timings to: accepted/tested instead of accepted/drafted (#14104)

2 weeks agosync : ggml
Georgi Gerganov [Tue, 10 Jun 2025 14:37:45 +0000 (17:37 +0300)]
sync : ggml

ggml-ci

2 weeks agoggml : fix weak alias win32 (whisper/0)
Georgi Gerganov [Tue, 10 Jun 2025 08:34:10 +0000 (11:34 +0300)]
ggml : fix weak alias win32 (whisper/0)

ggml-ci

2 weeks agoVulkan: Don't default to CPU device (like llvmpipe), even if no other device is avail...
0cc4m [Tue, 10 Jun 2025 12:01:33 +0000 (14:01 +0200)]
Vulkan: Don't default to CPU device (like llvmpipe), even if no other device is available, to allow fallback to CPU backend (#14099)

2 weeks agorpc : nicer error messages for RPC server crash (#14076)
Isaac McFadyen [Tue, 10 Jun 2025 06:41:01 +0000 (02:41 -0400)]
rpc : nicer error messages for RPC server crash (#14076)

2 weeks agosync : ggml
Georgi Gerganov [Tue, 10 Jun 2025 06:20:51 +0000 (09:20 +0300)]
sync : ggml

ggml-ci

2 weeks agoAdd in-build ggml::ggml ALIAS library (ggml/1260)
Kai Pastor [Tue, 3 Jun 2025 10:33:28 +0000 (12:33 +0200)]
Add in-build ggml::ggml ALIAS library (ggml/1260)

Enable uniform linking with subproject and with find_package.

2 weeks agometal : use less stack memory in FA kernel (#14088)
Georgi Gerganov [Mon, 9 Jun 2025 20:05:02 +0000 (23:05 +0300)]
metal : use less stack memory in FA kernel (#14088)

* metal : use less stack memory in FA kernel

ggml-ci

* cont : fix BF16 variant

2 weeks agokv-cache : fix shift and defrag logic (#14081)
Georgi Gerganov [Mon, 9 Jun 2025 20:04:35 +0000 (23:04 +0300)]
kv-cache : fix shift and defrag logic (#14081)

* kv-cache : fix shift

ggml-ci

* cont : reset shift[i]

ggml-ci

* cont : fix defrag erasing cells that didn't move

ggml-ci

2 weeks agollama : allow building all tests on windows when not using shared libs (#13980)
Diego Devesa [Mon, 9 Jun 2025 18:03:09 +0000 (11:03 -0700)]
llama : allow building all tests on windows when not using shared libs (#13980)

* llama : allow building all tests on windows when not using shared libraries

* add static windows build to ci

* tests : enable debug logs for test-chat

---------

Co-authored-by: Georgi Gerganov <redacted>
2 weeks agoggml-cpu : split arch-specific implementations (#13892)
xctan [Mon, 9 Jun 2025 14:47:13 +0000 (22:47 +0800)]
ggml-cpu : split arch-specific implementations (#13892)

* move ggml-cpu-aarch64 to repack

* split quantize_row_q8_0/1

* split helper functions

* split ggml_vec_dot_q4_0_q8_0

* split ggml_vec_dot_q4_1_q8_1

* split ggml_vec_dot_q5_0_q8_0

* split ggml_vec_dot_q5_1_q8_1

* split ggml_vec_dot_q8_0_q8_0

* split ggml_vec_dot_tq1_0_q8_K

* split ggml_vec_dot_tq2_0_q8_K

* split ggml_vec_dot_q2_K_q8_K

* split ggml_vec_dot_q3_K_q8_K

* split ggml_vec_dot_q4_K_q8_K

* split ggml_vec_dot_q5_K_q8_K

* split ggml_vec_dot_q6_K_q8_K

* split ggml_vec_dot_iq2_xxs_q8_K

* split ggml_vec_dot_iq2_xs_q8_K

* split ggml_vec_dot_iq2_s_q8_K

* split ggml_vec_dot_iq3_xxs_q8_K

* split ggml_vec_dot_iq3_s_q8_K

* split ggml_vec_dot_iq1_s_q8_K

* split ggml_vec_dot_iq1_m_q8_K

* split ggml_vec_dot_iq4_nl_q8_0

* split ggml_vec_dot_iq4_xs_q8_K

* fix typos

* fix missing prototypes

* rename ggml-cpu-quants.c

* rename ggml-cpu-traits

* rename arm folder

* move cpu-feats-x86.cpp

* rename ggml-cpu-hbm

* update arm detection macro in quants.c

* move iq quant tables

* split ggml_quantize_mat_q8_0/K

* split ggml_gemv_*

* split ggml_gemm_*

* rename namespace aarch64 to repack

* use weak aliases to replace test macros

* rename GGML_CPU_AARCH64 to GGML_CPU_REPACK

* rename more aarch64 to repack

* clean up rebase leftover

* fix compilation errors

* remove trailing spaces

* try to fix clang compilation errors

* try to fix clang compilation errors again

* try to fix clang compilation errors, 3rd attempt

* try to fix clang compilation errors, 4th attempt

* try to fix clang compilation errors, 5th attempt

* try to fix clang compilation errors, 6th attempt

* try to fix clang compilation errors, 7th attempt

* try to fix clang compilation errors, 8th attempt

* try to fix clang compilation errors, 9th attempt

* more cleanup

* fix compilation errors

* fix apple targets

* fix a typo in arm version of ggml_vec_dot_q4_K_q8_K

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>
2 weeks agocuda : fix device sync on buffer clear (#14033)
Diego Devesa [Mon, 9 Jun 2025 14:36:26 +0000 (07:36 -0700)]
cuda : fix device sync on buffer clear (#14033)

2 weeks agograph : fix geglu (#14077)
Georgi Gerganov [Mon, 9 Jun 2025 14:17:31 +0000 (17:17 +0300)]
graph : fix geglu (#14077)

ggml-ci

2 weeks agoCANN: Simplify the environment variable setting(#13104)
Xinpeng Dou [Mon, 9 Jun 2025 11:47:39 +0000 (19:47 +0800)]
CANN: Simplify the environment variable setting(#13104)

* Simplify the environment variable setting to specify the memory pool type.

* Adjust the GGML_CANN_ASYNC_MODE setting to accept yes, enable, 1, or on (case-insensitive) as valid options.

* update

* fix CI

* update

* delete whitespace

* fix according to review

* update CANN.md

* update CANN.md

2 weeks agowebui: fix sidebar being covered by main content (#14082)
R0CKSTAR [Mon, 9 Jun 2025 10:01:17 +0000 (18:01 +0800)]
webui: fix sidebar being covered by main content (#14082)

* webui: fix sidebar being covered by main content

Signed-off-by: Xiaodong Ye <redacted>
* webui: update index.html.gz

Signed-off-by: Xiaodong Ye <redacted>
---------

Signed-off-by: Xiaodong Ye <redacted>
2 weeks agoserver : fix LRU check (#14079)
Georgi Gerganov [Mon, 9 Jun 2025 09:57:58 +0000 (12:57 +0300)]
server : fix LRU check (#14079)

ggml-ci

2 weeks agosycl: Add reorder to Q6_K mmvq implementation (#13885)
Nicolò Scipione [Mon, 9 Jun 2025 09:47:07 +0000 (11:47 +0200)]
sycl: Add reorder to Q6_K mmvq implementation (#13885)

* Add Reorder to Q6_K mmvq implementation

* Address PR comments: clean up comments

* Remove unused parameter after refactoring q4_k

* Adding inline to function and removing unnecessary reference to int

---------

Signed-off-by: nscipione <redacted>
2 weeks agoadd geglu activation function (#14074)
Đinh Trọng Huy [Mon, 9 Jun 2025 04:15:31 +0000 (13:15 +0900)]
add geglu activation function (#14074)

Co-authored-by: dinhhuy <redacted>
2 weeks agoCANN: Enable labeler for Ascend NPU (#13914)
Yuanhao Ji [Mon, 9 Jun 2025 03:20:06 +0000 (11:20 +0800)]
CANN: Enable labeler for Ascend NPU (#13914)

2 weeks agocuda : fix buffer type check with integrated GPUs (#14069)
Diego Devesa [Sun, 8 Jun 2025 18:39:56 +0000 (11:39 -0700)]
cuda : fix buffer type check with integrated GPUs (#14069)

3 weeks agoci: add LoongArch cross-compile build (#13944)
吴小白 [Sat, 7 Jun 2025 13:39:11 +0000 (21:39 +0800)]
ci: add LoongArch cross-compile build (#13944)

3 weeks agoSYCL: Implement few same quantized type copy kernels (#13739)
Akarshan Biswas [Sat, 7 Jun 2025 13:28:20 +0000 (18:58 +0530)]
SYCL: Implement few same quantized type copy kernels (#13739)

* SYCL: Implement few same quantized type copy kernels

* Use memcpy for copying contiguous tensors

ggml-ci

* feat(sycl): add contiguous tensor copy support and device checks

Adds a memcpy path for contiguous tensors of the same type to optimize data transfer. Updates device support checks to recognize contiguous tensor operations, improving compatibility and performance.

* refactor: replace specific block copy functions with template

The changes replace multiple redundant block copy functions (e.g., cpy_block_q8_0_q8_0, cpy_block_q5_0_q5_0) with a single templated function cpy_blck_q_q. This reduces code duplication by using a generic template that works for any block type, improving maintainability while preserving the same functionality. The template is instantiated with specific block types (e.g., block_q8_0) where needed.

* Exclude BF16 support for COPY tensors for now
ggml-ci

* perf: adjust SYCL copy kernel block sizes for efficiency

Use ceil_div to ensure full element coverage and update nd_range parameters to better align with SYCL block sizes, improving parallelism and device utilization in copy operations.

3 weeks agollama : fix llama_model_chat_template with template name (LLM_KV with suffix) (#14050)
Sigbjørn Skjæret [Sat, 7 Jun 2025 12:13:12 +0000 (14:13 +0200)]
llama : fix llama_model_chat_template with template name (LLM_KV with suffix) (#14050)

3 weeks agollama : deprecate llama_kv_self_ API (#14030)
Georgi Gerganov [Fri, 6 Jun 2025 11:11:15 +0000 (14:11 +0300)]
llama : deprecate llama_kv_self_ API (#14030)

* llama : deprecate llama_kv_self_ API

ggml-ci

* llama : allow llama_memory_(nullptr)

ggml-ci

* memory : add flag for optional data clear in llama_memory_clear

ggml-ci

3 weeks agocontext : fix SWA-related warning for multiple sequences (#14045)
Georgi Gerganov [Fri, 6 Jun 2025 10:29:18 +0000 (13:29 +0300)]
context : fix SWA-related warning for multiple sequences (#14045)

3 weeks agollama : support multiple classifier outputs and labels (#13940)
Sigbjørn Skjæret [Fri, 6 Jun 2025 07:03:25 +0000 (09:03 +0200)]
llama : support multiple classifier outputs and labels (#13940)

3 weeks agogguf-py : add add_classifier_output_labels method to writer (#14031)
Sigbjørn Skjæret [Thu, 5 Jun 2025 15:42:31 +0000 (17:42 +0200)]
gguf-py : add add_classifier_output_labels method to writer (#14031)

* add add_classifier_output_labels

* use add_classifier_output_labels

3 weeks agovulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs (#14001)
Masato Nakasaka [Thu, 5 Jun 2025 14:00:29 +0000 (23:00 +0900)]
vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs (#14001)

* allowing B580 and U9-288V

* experimenting code to detect Xe2

* allowing coopmat only for Xe2 GPUs

* fixed comment wording

* fixed comment wording

* removed unnecessary driver check

3 weeks agoci: fix CUDA build failure on autodl cloud machines (#14005)
pockers21 [Thu, 5 Jun 2025 13:25:29 +0000 (06:25 -0700)]
ci: fix CUDA build failure on autodl cloud machines (#14005)

Replace CMAKE_CUDA_ARCHITECTURES=native with nvidia-smi detection
as 'native' fails on autodl cloud environments.

Co-authored-by: pockers21 <redacted>
3 weeks agomemory : migrate from llama_kv_cache to more generic llama_memory (#14006)
Georgi Gerganov [Thu, 5 Jun 2025 12:29:22 +0000 (15:29 +0300)]
memory : migrate from llama_kv_cache to more generic llama_memory (#14006)

* memory : merge llama_kv_cache into llama_memory + new `llama_memory` API

ggml-ci

* context : fix casts

ggml-ci

3 weeks agollama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama...
Diego Devesa [Thu, 5 Jun 2025 09:57:42 +0000 (02:57 -0700)]
llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (#14013)

3 weeks agoreadme : add badge (#13938)
Olexandr88 [Thu, 5 Jun 2025 07:50:55 +0000 (10:50 +0300)]
readme : add badge (#13938)

3 weeks agovocab : warn about missing mask token (#14022)
Sigbjørn Skjæret [Thu, 5 Jun 2025 07:29:18 +0000 (09:29 +0200)]
vocab : warn about missing mask token (#14022)

3 weeks agocontext : fix pos_min initialization upon error decode (#14008)
Georgi Gerganov [Thu, 5 Jun 2025 06:06:29 +0000 (09:06 +0300)]
context : fix pos_min initialization upon error decode (#14008)

ggml-ci

3 weeks agovulkan: automatically deduce size of push constants (#13936)
Jeff Bolz [Thu, 5 Jun 2025 05:17:58 +0000 (00:17 -0500)]
vulkan: automatically deduce size of push constants (#13936)

3 weeks agoggml-vulkan: adds support for op CONV_TRANSPOSE_1D (#13813)
Ervin Áron Tasnádi [Wed, 4 Jun 2025 20:02:00 +0000 (22:02 +0200)]
ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (#13813)

* * ggml-vulkan: adds op CONV_TRANSPOSE_1D

* test-backend-ops: adds more spohisticated tests for CONV_TRANSPOSE_1D

* Missing barrier added to shader.
Number of additional tests reduced to 108.

* * Fixes typo in variable name.

* Removes extra whitespaces.

* Adds int64->int32 casts to prevent possible warnings.

* Problem size reduced in tests to pass tests with llvmpipe.

* supports_op condition moved from unintended position

3 weeks agokv-cache : refactor the update/defrag mechanism (#13988)
Georgi Gerganov [Wed, 4 Jun 2025 15:58:20 +0000 (18:58 +0300)]
kv-cache : refactor the update/defrag mechanism (#13988)

* kv-cache : refactor update mechanism

ggml-ci

* memory : improve status handling

* defrag : reset head + add comments

ggml-ci

* cont : minor fixes

ggml-ci

3 weeks agoci : remove cuda 11.7 releases, switch runner to windows 2022 (#13997)
Diego Devesa [Wed, 4 Jun 2025 13:37:40 +0000 (06:37 -0700)]
ci : remove cuda 11.7 releases, switch runner to windows 2022 (#13997)

3 weeks agoreleases : use dl backend for linux release, remove arm64 linux release (#13996)
Diego Devesa [Wed, 4 Jun 2025 11:15:54 +0000 (04:15 -0700)]
releases : use dl backend for linux release, remove arm64 linux release (#13996)

3 weeks agollama-graph : use ggml_repeat_4d (#13998)
Xuan-Son Nguyen [Wed, 4 Jun 2025 08:11:26 +0000 (10:11 +0200)]
llama-graph : use ggml_repeat_4d (#13998)

3 weeks agoCUDA: fix FTZ in FA for Gemma 3 (#13991)
Johannes Gäßler [Wed, 4 Jun 2025 06:57:05 +0000 (08:57 +0200)]
CUDA: fix FTZ in FA for Gemma 3 (#13991)

3 weeks agokv-cache : fix unified::seq_rm to work with seq_id < 0 (#13985)
Georgi Gerganov [Wed, 4 Jun 2025 06:50:32 +0000 (09:50 +0300)]
kv-cache : fix unified::seq_rm to work with seq_id < 0 (#13985)

ggml-ci

3 weeks agovulkan: fix warnings in perf logger querypool code (#13937)
Jeff Bolz [Tue, 3 Jun 2025 18:30:22 +0000 (13:30 -0500)]
vulkan: fix warnings in perf logger querypool code (#13937)

3 weeks agodocs : add "Quick start" section for new users (#13862)
Xuan-Son Nguyen [Tue, 3 Jun 2025 11:09:36 +0000 (13:09 +0200)]
docs : add "Quick start" section for new users (#13862)

* docs : add "Quick start" section for non-technical users

* rm flox

* Update README.md

3 weeks agoopencl: add `backend_synchronize` (#13939)
lhez [Mon, 2 Jun 2025 23:54:58 +0000 (16:54 -0700)]
opencl: add `backend_synchronize` (#13939)

* This is not needed by the normal use where the result is read
  using `tensor_get`, but it allows perf mode of `test-backend-ops`
  to properly measure performance.

3 weeks agoOpenCL: Add concat, tsembd, upscale, tanh, pad and repeat (#13840)
rmatif [Mon, 2 Jun 2025 23:53:36 +0000 (23:53 +0000)]
OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat (#13840)

* add concat, pad, repeat, tsembd, tanh, upscale

* small fixes

3 weeks agoserver : disable speculative decoding for SWA models (#13970)
Georgi Gerganov [Mon, 2 Jun 2025 18:34:40 +0000 (21:34 +0300)]
server : disable speculative decoding for SWA models (#13970)

* server : use swa-full fo draft context

ggml-ci

* server : disable speculative decoding for SWA models

3 weeks agometal : use F32 accumulators in FA kernels (#13975)
Georgi Gerganov [Mon, 2 Jun 2025 18:33:40 +0000 (21:33 +0300)]
metal : use F32 accumulators in FA kernels (#13975)

ggml-ci

3 weeks agogemma : more consistent attention scaling for v2 and v3 (#13951)
Georgi Gerganov [Mon, 2 Jun 2025 17:54:26 +0000 (20:54 +0300)]
gemma : more consistent attention scaling for v2 and v3 (#13951)

* gemma : fix attn scale for 27B

* cont : apply scale before attn

* cont : consistent attention scaling

3 weeks ago`server`: update deepseek reasoning format (pass reasoning_content as diffs) (#13933)
Olivier Chafik [Mon, 2 Jun 2025 17:15:44 +0000 (10:15 -0700)]
`server`: update deepseek reasoning format (pass reasoning_content as diffs) (#13933)

* server: update deepseek reasoning format (now in reasoning_content diffs), add legacy option for compat
* update unit/test_tool_call.py::test_thoughts

3 weeks agomtmd : fix memory leak in mtmd_helper_eval_chunk_single (#13961)
Xuan-Son Nguyen [Mon, 2 Jun 2025 14:29:28 +0000 (16:29 +0200)]
mtmd : fix memory leak in mtmd_helper_eval_chunk_single (#13961)

* mtmd : fix memory in mtmd_helper_eval_chunk_single

* mtmd-cli : fix mem leak

* Update tools/mtmd/mtmd-cli.cpp

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>
3 weeks agocmake : Handle mixed-case 'Power' strings in POWER CPU detection (#13966)
shalinib-ibm [Mon, 2 Jun 2025 12:18:36 +0000 (17:48 +0530)]
cmake : Handle mixed-case 'Power' strings in POWER CPU detection (#13966)

Some systems report the CPU implementation as "Power11" instead of "POWER11".
The existing CMake logic uses a case-sensitive regular expression to extract
the CPU generation, which fails when the casing doesn't exactly match "POWER".

This patch provides a fix by first converting the string to uppercase before applying the regex.

Signed-off-by: root <redacted>
Co-authored-by: root <redacted>
3 weeks agosycl: quantize and reorder the input to q8_1 when reorder is enabled (#13826)
Atharva Dubey [Mon, 2 Jun 2025 09:12:20 +0000 (10:12 +0100)]
sycl: quantize and reorder the input to q8_1 when reorder is enabled (#13826)

* [WIP]: fuse q8 quantization and reorder

* wip2: fuse q8 quantization and reorder

* working q8 reorder commit

* restored common.hpp

* remove debug prints

* remove unnecessary headers and remove trailing whitespace

* Update ggml/src/ggml-sycl/ggml-sycl.cpp

Co-authored-by: Alberto Cabrera Pérez <redacted>
---------

Co-authored-by: Alberto Cabrera Pérez <redacted>
3 weeks agogguf: fix failure on version == 0 (#13956)
Johannes Gäßler [Sun, 1 Jun 2025 16:08:05 +0000 (18:08 +0200)]
gguf: fix failure on version == 0 (#13956)

3 weeks agoconvert : fix nomic-bert-moe mask token (#13757)
Sigbjørn Skjæret [Sun, 1 Jun 2025 16:07:21 +0000 (18:07 +0200)]
convert : fix nomic-bert-moe mask token (#13757)

3 weeks agoconvert : fix vocab padding code for bert models (#13954)
Sigbjørn Skjæret [Sun, 1 Jun 2025 15:23:11 +0000 (17:23 +0200)]
convert : fix vocab padding code for bert models (#13954)

3 weeks agoggml: check if non-native endian model is being loaded (#13943)
Aaron Teo [Sun, 1 Jun 2025 14:53:57 +0000 (22:53 +0800)]
ggml: check if non-native endian model is being loaded (#13943)

* gguf: prevent non-native endian models from being loaded

Signed-off-by: Aaron Teo <redacted>
* gguf: update error message

Signed-off-by: Aaron Teo <redacted>
* gguf: make the non-native endian check more verbose

Signed-off-by: Aaron Teo <redacted>
* ggml: move ggml_assert location

Signed-off-by: Aaron Teo <redacted>
* ggml: reword the endianness check error message

Signed-off-by: Aaron Teo <redacted>
---------

Signed-off-by: Aaron Teo <redacted>
3 weeks agosync : ggml
Georgi Gerganov [Sun, 1 Jun 2025 09:23:14 +0000 (12:23 +0300)]
sync : ggml

ggml-ci

3 weeks agovulkan : Remove unexpected ; (ggml/1253)
Kai Pastor [Sat, 31 May 2025 10:49:55 +0000 (12:49 +0200)]
vulkan : Remove unexpected ; (ggml/1253)

3 weeks agocmake : Fix broken CMake error messages (ggml/1252)
Kai Pastor [Sat, 31 May 2025 10:39:19 +0000 (12:39 +0200)]
cmake : Fix broken CMake error messages (ggml/1252)

3 weeks agoggml : remove ggml_graph_import and ggml_graph_export declarations (ggml/1247)
Radoslav Gerganov [Fri, 30 May 2025 06:11:09 +0000 (09:11 +0300)]
ggml : remove ggml_graph_import and ggml_graph_export declarations (ggml/1247)

The implementation is already deleted with commit 9d0762e.

closes: #1235

3 weeks agosync : whisper.cpp (ggml/1250)
Georgi Gerganov [Thu, 29 May 2025 10:29:50 +0000 (13:29 +0300)]
sync : whisper.cpp (ggml/1250)

* ggml : Fix backtrace breaking Windows build (whisper/3203)

* sync : whisper.cpp

ggml-ci

---------

Co-authored-by: Daniel Tang <redacted>
3 weeks agoggml : install dynamic backends (ggml/1240)
Radoslav Gerganov [Thu, 29 May 2025 05:34:46 +0000 (08:34 +0300)]
ggml : install dynamic backends (ggml/1240)

* ggml : install dynamic backends

Make sure dynamic backends are installed in $CMAKE_INSTALL_BINDIR