git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

overview / pkg / ggml / sources / llama.cpp / log

compilade [Thu, 12 Jun 2025 06:56:04 +0000 (02:56 -0400)]

context : round n_tokens to next multiple of n_seqs when reserving (#14140)

This fixes RWKV inference which otherwise failed
when the worst case ubatch.n_seq_tokens rounded to 0.

commit | commitdiff | tree

bandoti [Wed, 11 Jun 2025 20:19:44 +0000 (17:19 -0300)]

common: fix issue with regex_escape routine on windows (#14133)

commit | commitdiff | tree

Christian Kastner [Wed, 11 Jun 2025 19:07:44 +0000 (19:07 +0000)]

Implement GGML_CPU_ALL_VARIANTS for ARM (#14080)

* ggml-cpu: Factor out feature detection build from x86

* ggml-cpu: Add ARM feature detection and scoring

This is analogous to cpu-feats-x86.cpp. However, to detect compile-time
activation of features, we rely on GGML_USE_<FEAT> which need to be set
in cmake, instead of GGML_<FEAT> that users would set for x86.

This is because on ARM, users specify features with GGML_CPU_ARM_ARCH,
rather than with individual flags.

* ggml-cpu: Implement GGML_CPU_ALL_VARIANTS for ARM

Like x86, however to pass around arch flags within cmake, we use
GGML_INTERNAL_<FEAT> as we don't have GGML_<FEAT>.

Some features are optional, so we may need to build multiple backends
per arch version (armv8.2_1, armv8.2_2, ...), and let the scoring
function sort out which one can be used.

* ggml-cpu: Limit ARM GGML_CPU_ALL_VARIANTS to Linux for now

The other platforms will need their own specific variants.

This also fixes the bug that the the variant-building branch was always
being executed as the else-branch of GGML_NATIVE=OFF. The branch is
moved to an elseif-branch which restores the previous behavior.

commit | commitdiff | tree

Sigbjørn Skjæret [Wed, 11 Jun 2025 17:04:23 +0000 (19:04 +0200)]

chore : clean up relative source dir paths (#14128)

commit | commitdiff | tree

Sigbjørn Skjæret [Wed, 11 Jun 2025 15:16:32 +0000 (17:16 +0200)]

tests : add test-tokenizers-repo (#14017)

commit | commitdiff | tree

Jeff Bolz [Wed, 11 Jun 2025 14:48:52 +0000 (09:48 -0500)]

vulkan: Better thread-safety for command pools/buffers (#14116)

This change moves the command pool/buffer tracking into a vk_command_pool
structure. There are two instances per context (for compute+transfer) and
two instances per device for operations that don't go through a context.
This should prevent separate contexts from stomping on each other.

commit | commitdiff | tree

Aman [Wed, 11 Jun 2025 14:42:25 +0000 (22:42 +0800)]

webui: Wrap long numbers instead of infinite horizontal scroll (#14062)

* webui: Wrap long numbers instead of infinite horizontal scroll

* Use tailwind class

* update index.html.gz

commit | commitdiff | tree

Georgi Gerganov [Wed, 11 Jun 2025 13:48:45 +0000 (16:48 +0300)]

kv-cache : relax SWA masking condition (#14119)

ggml-ci

commit | commitdiff | tree

Taylor [Wed, 11 Jun 2025 10:43:43 +0000 (06:43 -0400)]

server : pass default --keep argument (#14120)

commit | commitdiff | tree

Georgi Gerganov [Wed, 11 Jun 2025 09:52:45 +0000 (12:52 +0300)]

kv-cache : add LLAMA_KV_CACHE_DEBUG environment variable (#14121)

commit | commitdiff | tree

Jeff Bolz [Wed, 11 Jun 2025 05:19:25 +0000 (00:19 -0500)]

vulkan: Track descriptor pools/sets per-context (#14109)

Use the same descriptor set layout for all pipelines (MAX_PARAMETER_COUNT == 8)
and move it to the vk_device. Move all the descriptor pool and set tracking to
the context - none of it is specific to pipelines anymore. It has a single vector
of pools and vector of sets, and a single counter to track requests and a single
counter to track use.

commit | commitdiff | tree

lhez [Tue, 10 Jun 2025 23:55:58 +0000 (16:55 -0700)]

opencl: add `mul_mv_id_q4_0_f32_8x_flat` (#14003)

commit | commitdiff | tree

compilade [Tue, 10 Jun 2025 22:20:14 +0000 (18:20 -0400)]

kv-cache : avoid modifying recurrent cells when setting inputs (#13834)

* kv-cache : avoid modifying recurrent cells when setting inputs

* kv-cache : remove inp_s_mask

It was replaced with equivalent and simpler functionality
with rs_z (the first zeroed state) and the already-existing inp_s_copy.

* kv-cache : fix non-consecutive token pos warning for recurrent models

The problem was apparently caused by how the tail cells were swapped.

* graph : simplify logic for recurrent state copies

* kv-cache : use cell without src refs for rs_z in recurrent cache

* llama-graph : fix recurrent state copy

The `state_copy` shuffle assumes everything is moved at once,
which is not true when `states_extra` is copied back to the cache
before copying the range of states between `head` and `head + n_seqs`.
This is only a problem if any of the cells in [`head`, `head + n_seqs`)
have an `src` in [`head + n_seqs`, `head + n_kv`),
which does happen when `n_ubatch > 1` in the `llama-parallel` example.

Changing the order of the operations avoids the potential overwrite
before use, although when copies are avoided (like with Mamba2),
this will require further changes.

* llama-graph : rename n_state to state_size in build_recurrent_state

This naming should reduce confusion between the state size
and the number of states.

commit | commitdiff | tree

Sigbjørn Skjæret [Tue, 10 Jun 2025 21:29:52 +0000 (23:29 +0200)]

convert : fix duplicate key DeepSeek-R1 conversion error (#14103)

commit | commitdiff | tree

Sigbjørn Skjæret [Tue, 10 Jun 2025 16:02:08 +0000 (18:02 +0200)]

llama : support GEGLU for jina-bert-v2 (#14090)

commit | commitdiff | tree

Jeff Bolz [Tue, 10 Jun 2025 15:53:47 +0000 (10:53 -0500)]

vulkan: force device 0 in CI (#14106)

commit | commitdiff | tree

Juk Armstrong [Tue, 10 Jun 2025 15:48:07 +0000 (16:48 +0100)]

Fixed spec timings to: accepted/tested instead of accepted/drafted (#14104)

commit | commitdiff | tree

Georgi Gerganov [Tue, 10 Jun 2025 14:37:45 +0000 (17:37 +0300)]

sync : ggml

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Tue, 10 Jun 2025 08:34:10 +0000 (11:34 +0300)]

ggml : fix weak alias win32 (whisper/0)

ggml-ci

commit | commitdiff | tree

0cc4m [Tue, 10 Jun 2025 12:01:33 +0000 (14:01 +0200)]

Vulkan: Don't default to CPU device (like llvmpipe), even if no other device is available, to allow fallback to CPU backend (#14099)

commit | commitdiff | tree

Isaac McFadyen [Tue, 10 Jun 2025 06:41:01 +0000 (02:41 -0400)]

rpc : nicer error messages for RPC server crash (#14076)

commit | commitdiff | tree

Georgi Gerganov [Tue, 10 Jun 2025 06:20:51 +0000 (09:20 +0300)]

sync : ggml

ggml-ci

commit | commitdiff | tree

Kai Pastor [Tue, 3 Jun 2025 10:33:28 +0000 (12:33 +0200)]

Add in-build ggml::ggml ALIAS library (ggml/1260)

Enable uniform linking with subproject and with find_package.

commit | commitdiff | tree

Georgi Gerganov [Mon, 9 Jun 2025 20:05:02 +0000 (23:05 +0300)]

metal : use less stack memory in FA kernel (#14088)

* metal : use less stack memory in FA kernel

ggml-ci

* cont : fix BF16 variant

commit | commitdiff | tree

Georgi Gerganov [Mon, 9 Jun 2025 20:04:35 +0000 (23:04 +0300)]

kv-cache : fix shift and defrag logic (#14081)

* kv-cache : fix shift

ggml-ci

* cont : reset shift[i]

ggml-ci

* cont : fix defrag erasing cells that didn't move

ggml-ci

commit | commitdiff | tree

Diego Devesa [Mon, 9 Jun 2025 18:03:09 +0000 (11:03 -0700)]

llama : allow building all tests on windows when not using shared libs (#13980)

* llama : allow building all tests on windows when not using shared libraries

* add static windows build to ci

* tests : enable debug logs for test-chat

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

xctan [Mon, 9 Jun 2025 14:47:13 +0000 (22:47 +0800)]

ggml-cpu : split arch-specific implementations (#13892)

* move ggml-cpu-aarch64 to repack

* split quantize_row_q8_0/1

* split helper functions

* split ggml_vec_dot_q4_0_q8_0

* split ggml_vec_dot_q4_1_q8_1

* split ggml_vec_dot_q5_0_q8_0

* split ggml_vec_dot_q5_1_q8_1

* split ggml_vec_dot_q8_0_q8_0

* split ggml_vec_dot_tq1_0_q8_K

* split ggml_vec_dot_tq2_0_q8_K

* split ggml_vec_dot_q2_K_q8_K

* split ggml_vec_dot_q3_K_q8_K

* split ggml_vec_dot_q4_K_q8_K

* split ggml_vec_dot_q5_K_q8_K

* split ggml_vec_dot_q6_K_q8_K

* split ggml_vec_dot_iq2_xxs_q8_K

* split ggml_vec_dot_iq2_xs_q8_K

* split ggml_vec_dot_iq2_s_q8_K

* split ggml_vec_dot_iq3_xxs_q8_K

* split ggml_vec_dot_iq3_s_q8_K

* split ggml_vec_dot_iq1_s_q8_K

* split ggml_vec_dot_iq1_m_q8_K

* split ggml_vec_dot_iq4_nl_q8_0

* split ggml_vec_dot_iq4_xs_q8_K

* fix typos

* fix missing prototypes

* rename ggml-cpu-quants.c

* rename ggml-cpu-traits

* rename arm folder

* move cpu-feats-x86.cpp

* rename ggml-cpu-hbm

* update arm detection macro in quants.c

* move iq quant tables

* split ggml_quantize_mat_q8_0/K

* split ggml_gemv_*

* split ggml_gemm_*

* rename namespace aarch64 to repack

* use weak aliases to replace test macros

* rename GGML_CPU_AARCH64 to GGML_CPU_REPACK

* rename more aarch64 to repack

* clean up rebase leftover

* fix compilation errors

* remove trailing spaces

* try to fix clang compilation errors

* try to fix clang compilation errors again

* try to fix clang compilation errors, 3rd attempt

* try to fix clang compilation errors, 4th attempt

* try to fix clang compilation errors, 5th attempt

* try to fix clang compilation errors, 6th attempt

* try to fix clang compilation errors, 7th attempt

* try to fix clang compilation errors, 8th attempt

* try to fix clang compilation errors, 9th attempt

* more cleanup

* fix compilation errors

* fix apple targets

* fix a typo in arm version of ggml_vec_dot_q4_K_q8_K

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Diego Devesa [Mon, 9 Jun 2025 14:36:26 +0000 (07:36 -0700)]

cuda : fix device sync on buffer clear (#14033)

commit | commitdiff | tree

Georgi Gerganov [Mon, 9 Jun 2025 14:17:31 +0000 (17:17 +0300)]

graph : fix geglu (#14077)

ggml-ci

commit | commitdiff | tree

Xinpeng Dou [Mon, 9 Jun 2025 11:47:39 +0000 (19:47 +0800)]

CANN: Simplify the environment variable setting(#13104)

* Simplify the environment variable setting to specify the memory pool type.

* Adjust the GGML_CANN_ASYNC_MODE setting to accept yes, enable, 1, or on (case-insensitive) as valid options.

* update

* fix CI

* update

* delete whitespace

* fix according to review

* update CANN.md

* update CANN.md

commit | commitdiff | tree

R0CKSTAR [Mon, 9 Jun 2025 10:01:17 +0000 (18:01 +0800)]

webui: fix sidebar being covered by main content (#14082)

* webui: fix sidebar being covered by main content

Signed-off-by: Xiaodong Ye <redacted>
* webui: update index.html.gz

Signed-off-by: Xiaodong Ye <redacted>
---------

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Georgi Gerganov [Mon, 9 Jun 2025 09:57:58 +0000 (12:57 +0300)]

server : fix LRU check (#14079)

ggml-ci

commit | commitdiff | tree

Nicolò Scipione [Mon, 9 Jun 2025 09:47:07 +0000 (11:47 +0200)]

sycl: Add reorder to Q6_K mmvq implementation (#13885)

* Add Reorder to Q6_K mmvq implementation

* Address PR comments: clean up comments

* Remove unused parameter after refactoring q4_k

* Adding inline to function and removing unnecessary reference to int

---------

Signed-off-by: nscipione <redacted>

commit | commitdiff | tree

Đinh Trọng Huy [Mon, 9 Jun 2025 04:15:31 +0000 (13:15 +0900)]

add geglu activation function (#14074)

Co-authored-by: dinhhuy <redacted>

commit | commitdiff | tree

Yuanhao Ji [Mon, 9 Jun 2025 03:20:06 +0000 (11:20 +0800)]

CANN: Enable labeler for Ascend NPU (#13914)

commit | commitdiff | tree

Diego Devesa [Sun, 8 Jun 2025 18:39:56 +0000 (11:39 -0700)]

cuda : fix buffer type check with integrated GPUs (#14069)

commit | commitdiff | tree

吴小白 [Sat, 7 Jun 2025 13:39:11 +0000 (21:39 +0800)]

ci: add LoongArch cross-compile build (#13944)

commit | commitdiff | tree

Akarshan Biswas [Sat, 7 Jun 2025 13:28:20 +0000 (18:58 +0530)]

SYCL: Implement few same quantized type copy kernels (#13739)

* SYCL: Implement few same quantized type copy kernels

* Use memcpy for copying contiguous tensors

ggml-ci

* feat(sycl): add contiguous tensor copy support and device checks

Adds a memcpy path for contiguous tensors of the same type to optimize data transfer. Updates device support checks to recognize contiguous tensor operations, improving compatibility and performance.

* refactor: replace specific block copy functions with template

The changes replace multiple redundant block copy functions (e.g., cpy_block_q8_0_q8_0, cpy_block_q5_0_q5_0) with a single templated function cpy_blck_q_q. This reduces code duplication by using a generic template that works for any block type, improving maintainability while preserving the same functionality. The template is instantiated with specific block types (e.g., block_q8_0) where needed.

* Exclude BF16 support for COPY tensors for now
ggml-ci

* perf: adjust SYCL copy kernel block sizes for efficiency

Use ceil_div to ensure full element coverage and update nd_range parameters to better align with SYCL block sizes, improving parallelism and device utilization in copy operations.

commit | commitdiff | tree

Sigbjørn Skjæret [Sat, 7 Jun 2025 12:13:12 +0000 (14:13 +0200)]

llama : fix llama_model_chat_template with template name (LLM_KV with suffix) (#14050)

commit | commitdiff | tree

Georgi Gerganov [Fri, 6 Jun 2025 11:11:15 +0000 (14:11 +0300)]

llama : deprecate llama_kv_self_ API (#14030)

* llama : deprecate llama_kv_self_ API

ggml-ci

* llama : allow llama_memory_(nullptr)

ggml-ci

* memory : add flag for optional data clear in llama_memory_clear

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Fri, 6 Jun 2025 10:29:18 +0000 (13:29 +0300)]

context : fix SWA-related warning for multiple sequences (#14045)

commit | commitdiff | tree

Sigbjørn Skjæret [Fri, 6 Jun 2025 07:03:25 +0000 (09:03 +0200)]

llama : support multiple classifier outputs and labels (#13940)

commit | commitdiff | tree

Sigbjørn Skjæret [Thu, 5 Jun 2025 15:42:31 +0000 (17:42 +0200)]

gguf-py : add add_classifier_output_labels method to writer (#14031)

* add add_classifier_output_labels

* use add_classifier_output_labels

commit | commitdiff | tree

Masato Nakasaka [Thu, 5 Jun 2025 14:00:29 +0000 (23:00 +0900)]

vulkan: Enable VK_KHR_cooperative_matrix extension for Intel Xe2 GPUs (#14001)

* allowing B580 and U9-288V

* experimenting code to detect Xe2

* allowing coopmat only for Xe2 GPUs

* fixed comment wording

* fixed comment wording

* removed unnecessary driver check

commit | commitdiff | tree

pockers21 [Thu, 5 Jun 2025 13:25:29 +0000 (06:25 -0700)]

ci: fix CUDA build failure on autodl cloud machines (#14005)

Replace CMAKE_CUDA_ARCHITECTURES=native with nvidia-smi detection
as 'native' fails on autodl cloud environments.

Co-authored-by: pockers21 <redacted>

commit | commitdiff | tree

Georgi Gerganov [Thu, 5 Jun 2025 12:29:22 +0000 (15:29 +0300)]

memory : migrate from llama_kv_cache to more generic llama_memory (#14006)

* memory : merge llama_kv_cache into llama_memory + new `llama_memory` API

ggml-ci

* context : fix casts

ggml-ci

commit | commitdiff | tree

Diego Devesa [Thu, 5 Jun 2025 09:57:42 +0000 (02:57 -0700)]

llama : allow using mmap without PrefetchVirtualMemory, apply GGML_WIN_VER to llama.cpp sources (#14013)

commit | commitdiff | tree

Olexandr88 [Thu, 5 Jun 2025 07:50:55 +0000 (10:50 +0300)]

readme : add badge (#13938)

commit | commitdiff | tree

Sigbjørn Skjæret [Thu, 5 Jun 2025 07:29:18 +0000 (09:29 +0200)]

vocab : warn about missing mask token (#14022)

commit | commitdiff | tree

Georgi Gerganov [Thu, 5 Jun 2025 06:06:29 +0000 (09:06 +0300)]

context : fix pos_min initialization upon error decode (#14008)

ggml-ci

commit | commitdiff | tree

Jeff Bolz [Thu, 5 Jun 2025 05:17:58 +0000 (00:17 -0500)]

vulkan: automatically deduce size of push constants (#13936)

commit | commitdiff | tree

Ervin Áron Tasnádi [Wed, 4 Jun 2025 20:02:00 +0000 (22:02 +0200)]

ggml-vulkan: adds support for op CONV_TRANSPOSE_1D (#13813)

* * ggml-vulkan: adds op CONV_TRANSPOSE_1D

* test-backend-ops: adds more spohisticated tests for CONV_TRANSPOSE_1D

* Missing barrier added to shader.
Number of additional tests reduced to 108.

* * Fixes typo in variable name.

* Removes extra whitespaces.

* Adds int64->int32 casts to prevent possible warnings.

* Problem size reduced in tests to pass tests with llvmpipe.

* supports_op condition moved from unintended position

commit | commitdiff | tree

Georgi Gerganov [Wed, 4 Jun 2025 15:58:20 +0000 (18:58 +0300)]

kv-cache : refactor the update/defrag mechanism (#13988)

* kv-cache : refactor update mechanism

ggml-ci

* memory : improve status handling

* defrag : reset head + add comments

ggml-ci

* cont : minor fixes

ggml-ci

commit | commitdiff | tree

Diego Devesa [Wed, 4 Jun 2025 13:37:40 +0000 (06:37 -0700)]

ci : remove cuda 11.7 releases, switch runner to windows 2022 (#13997)

commit | commitdiff | tree

Diego Devesa [Wed, 4 Jun 2025 11:15:54 +0000 (04:15 -0700)]

releases : use dl backend for linux release, remove arm64 linux release (#13996)

commit | commitdiff | tree

Xuan-Son Nguyen [Wed, 4 Jun 2025 08:11:26 +0000 (10:11 +0200)]

llama-graph : use ggml_repeat_4d (#13998)

commit | commitdiff | tree

Johannes Gäßler [Wed, 4 Jun 2025 06:57:05 +0000 (08:57 +0200)]

CUDA: fix FTZ in FA for Gemma 3 (#13991)

commit | commitdiff | tree

Georgi Gerganov [Wed, 4 Jun 2025 06:50:32 +0000 (09:50 +0300)]

kv-cache : fix unified::seq_rm to work with seq_id < 0 (#13985)

ggml-ci

commit | commitdiff | tree

Jeff Bolz [Tue, 3 Jun 2025 18:30:22 +0000 (13:30 -0500)]

vulkan: fix warnings in perf logger querypool code (#13937)

commit | commitdiff | tree

Xuan-Son Nguyen [Tue, 3 Jun 2025 11:09:36 +0000 (13:09 +0200)]

docs : add "Quick start" section for new users (#13862)

* docs : add "Quick start" section for non-technical users

* rm flox

* Update README.md

commit | commitdiff | tree

lhez [Mon, 2 Jun 2025 23:54:58 +0000 (16:54 -0700)]

opencl: add `backend_synchronize` (#13939)

* This is not needed by the normal use where the result is read
using `tensor_get`, but it allows perf mode of `test-backend-ops`
to properly measure performance.

commit | commitdiff | tree

rmatif [Mon, 2 Jun 2025 23:53:36 +0000 (23:53 +0000)]

OpenCL: Add concat, tsembd, upscale, tanh, pad and repeat (#13840)

* add concat, pad, repeat, tsembd, tanh, upscale

* small fixes

commit | commitdiff | tree

Georgi Gerganov [Mon, 2 Jun 2025 18:34:40 +0000 (21:34 +0300)]

server : disable speculative decoding for SWA models (#13970)

* server : use swa-full fo draft context

ggml-ci

* server : disable speculative decoding for SWA models

commit | commitdiff | tree

Georgi Gerganov [Mon, 2 Jun 2025 18:33:40 +0000 (21:33 +0300)]

metal : use F32 accumulators in FA kernels (#13975)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Mon, 2 Jun 2025 17:54:26 +0000 (20:54 +0300)]

gemma : more consistent attention scaling for v2 and v3 (#13951)

* gemma : fix attn scale for 27B

* cont : apply scale before attn

* cont : consistent attention scaling

commit | commitdiff | tree

Olivier Chafik [Mon, 2 Jun 2025 17:15:44 +0000 (10:15 -0700)]

`server`: update deepseek reasoning format (pass reasoning_content as diffs) (#13933)

* server: update deepseek reasoning format (now in reasoning_content diffs), add legacy option for compat
* update unit/test_tool_call.py::test_thoughts

commit | commitdiff | tree

Xuan-Son Nguyen [Mon, 2 Jun 2025 14:29:28 +0000 (16:29 +0200)]

mtmd : fix memory leak in mtmd_helper_eval_chunk_single (#13961)

* mtmd : fix memory in mtmd_helper_eval_chunk_single

* mtmd-cli : fix mem leak

* Update tools/mtmd/mtmd-cli.cpp

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

shalinib-ibm [Mon, 2 Jun 2025 12:18:36 +0000 (17:48 +0530)]

cmake : Handle mixed-case 'Power' strings in POWER CPU detection (#13966)

Some systems report the CPU implementation as "Power11" instead of "POWER11".
The existing CMake logic uses a case-sensitive regular expression to extract
the CPU generation, which fails when the casing doesn't exactly match "POWER".

This patch provides a fix by first converting the string to uppercase before applying the regex.

Signed-off-by: root <redacted>
Co-authored-by: root <redacted>

commit | commitdiff | tree

Atharva Dubey [Mon, 2 Jun 2025 09:12:20 +0000 (10:12 +0100)]

sycl: quantize and reorder the input to q8_1 when reorder is enabled (#13826)

* [WIP]: fuse q8 quantization and reorder

* wip2: fuse q8 quantization and reorder

* working q8 reorder commit

* restored common.hpp

* remove debug prints

* remove unnecessary headers and remove trailing whitespace

* Update ggml/src/ggml-sycl/ggml-sycl.cpp

Co-authored-by: Alberto Cabrera Pérez <redacted>
---------

Co-authored-by: Alberto Cabrera Pérez <redacted>

commit | commitdiff | tree

Johannes Gäßler [Sun, 1 Jun 2025 16:08:05 +0000 (18:08 +0200)]

gguf: fix failure on version == 0 (#13956)

commit | commitdiff | tree

Sigbjørn Skjæret [Sun, 1 Jun 2025 16:07:21 +0000 (18:07 +0200)]

convert : fix nomic-bert-moe mask token (#13757)

commit | commitdiff | tree

Sigbjørn Skjæret [Sun, 1 Jun 2025 15:23:11 +0000 (17:23 +0200)]

convert : fix vocab padding code for bert models (#13954)

commit | commitdiff | tree

Aaron Teo [Sun, 1 Jun 2025 14:53:57 +0000 (22:53 +0800)]

ggml: check if non-native endian model is being loaded (#13943)

* gguf: prevent non-native endian models from being loaded

Signed-off-by: Aaron Teo <redacted>
* gguf: update error message

Signed-off-by: Aaron Teo <redacted>
* gguf: make the non-native endian check more verbose

Signed-off-by: Aaron Teo <redacted>
* ggml: move ggml_assert location

Signed-off-by: Aaron Teo <redacted>
* ggml: reword the endianness check error message

Signed-off-by: Aaron Teo <redacted>
---------

Signed-off-by: Aaron Teo <redacted>

commit | commitdiff | tree

Georgi Gerganov [Sun, 1 Jun 2025 09:23:14 +0000 (12:23 +0300)]

sync : ggml

ggml-ci

commit | commitdiff | tree

Kai Pastor [Sat, 31 May 2025 10:49:55 +0000 (12:49 +0200)]

vulkan : Remove unexpected ; (ggml/1253)

commit | commitdiff | tree

Kai Pastor [Sat, 31 May 2025 10:39:19 +0000 (12:39 +0200)]

cmake : Fix broken CMake error messages (ggml/1252)

commit | commitdiff | tree

Radoslav Gerganov [Fri, 30 May 2025 06:11:09 +0000 (09:11 +0300)]

ggml : remove ggml_graph_import and ggml_graph_export declarations (ggml/1247)

The implementation is already deleted with commit 9d0762e.

closes: #1235

commit | commitdiff | tree

Georgi Gerganov [Thu, 29 May 2025 10:29:50 +0000 (13:29 +0300)]

sync : whisper.cpp (ggml/1250)

* ggml : Fix backtrace breaking Windows build (whisper/3203)

* sync : whisper.cpp

ggml-ci

---------

Co-authored-by: Daniel Tang <redacted>

commit | commitdiff | tree

Radoslav Gerganov [Thu, 29 May 2025 05:34:46 +0000 (08:34 +0300)]

ggml : install dynamic backends (ggml/1240)

* ggml : install dynamic backends

Make sure dynamic backends are installed in $CMAKE_INSTALL_BINDIR

commit | commitdiff | tree

Daniel Tang [Wed, 28 May 2025 00:58:46 +0000 (20:58 -0400)]

ggml : Print backtrace on uncaught C++ exceptions (ggml/1232)

The goal is to have what users call "full logs" contain the backtrace.

This is registered upon ggml_init. Also fixes a minor fd leak on Linux.

commit | commitdiff | tree

ddh0 [Sun, 1 Jun 2025 08:44:30 +0000 (03:44 -0500)]

readme : update bindings (#13950)

commit | commitdiff | tree

Georgi Gerganov [Sun, 1 Jun 2025 08:42:16 +0000 (11:42 +0300)]

parallel : fix n_junk == 0 (#13952)

commit | commitdiff | tree

Georgi Gerganov [Sun, 1 Jun 2025 08:39:27 +0000 (11:39 +0300)]

kv-cache : split implementation in separate sources (#13920)

ggml-ci

commit | commitdiff | tree

Max Krasnyansky [Sat, 31 May 2025 22:39:19 +0000 (15:39 -0700)]

threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling (#12995)

* threading: support for GGML_SCHED_PRIO_LOW, update thread info on Windows to avoid throttling

We talked about adding LOW priority for GGML threads in the original threadpool PR.
It might be useful for some cases to avoid contention.

Latest Windows ARM64 releases started parking (offlining) the CPU cores
more aggresively which results in suboptimal performance with n_threads > 4.
To deal with that we now disable Power Throttling for our threads for the NORMAL
and higher priorities.

Co-authored-by: Diego Devesa <redacted>
* threading: disable SetThreadInfo() calls for older Windows versions

* Update tools/llama-bench/llama-bench.cpp

Co-authored-by: Diego Devesa <redacted>
---------

Co-authored-by: Diego Devesa <redacted>

commit | commitdiff | tree

Jiří Podivín [Sat, 31 May 2025 16:58:35 +0000 (18:58 +0200)]

docs : Note about necessity of having libcurl installed for standard build. (#13945)

Signed-off-by: Jiri Podivin <redacted>

commit | commitdiff | tree

Olivier Chafik [Sat, 31 May 2025 15:26:10 +0000 (08:26 -0700)]

server: allow unclosed thinking tags (#13931)

commit | commitdiff | tree

Georgi Gerganov [Sat, 31 May 2025 12:58:33 +0000 (15:58 +0300)]

llama : deprecate explicit kv_self defrag/update calls (#13921)

ggml-ci

commit | commitdiff | tree

Georgi Gerganov [Sat, 31 May 2025 12:57:44 +0000 (15:57 +0300)]

llama : use n_swa + n_ubatch cells for SWA cache (#13833)

* llama : use n_swa + n_ubatch cells for SWA cache

ggml-ci

* llama : add warning about multi-sqeuence SWA contexts

commit | commitdiff | tree

igardev [Sat, 31 May 2025 09:56:08 +0000 (12:56 +0300)]

webui : Replace alert and confirm with custom modals. (#13711)

* Replace alert and confirm with custom modals. This is needed as Webview in VS Code doesn't permit alert and confirm for security reasons.

* use Modal Provider to simplify the use of confirm and alert modals.

* Increase the z index of the modal dialogs.

* Update index.html.gz

* also add showPrompt

* rebuild

---------

Co-authored-by: igardev <redacted>
Co-authored-by: Xuan Son Nguyen <redacted>

commit | commitdiff | tree

Georgi Gerganov [Sat, 31 May 2025 09:55:57 +0000 (12:55 +0300)]

llama : auto-batch preparation (#13845)

* llama : auto-batch

ggml-ci

* context : simplify if branching

commit | commitdiff | tree

Xuan-Son Nguyen [Sat, 31 May 2025 08:14:29 +0000 (10:14 +0200)]

mtmd : drop `_shared` from `libmtmd` name, merge helpers into libmtmd (⚠️ breaking change) (#13917)

* mtmd : fix missing public header

* no object

* apply suggestion from Georgi

* rm mtmd-helper, merge it to mtmd

* missing vendor include dir

commit | commitdiff | tree

Georgi Gerganov [Sat, 31 May 2025 07:24:04 +0000 (10:24 +0300)]

kv-cache : refactor + add llama_memory_state_i (#13746)

* kv-cache : simplify the "struct llama_kv_cache" interface

ggml-ci

* kv-cache : revert the (n_swa + n_ubatch) change (for next PR)

ggml-ci

* kv-cache : some comments

ggml-ci

* context : fix graph reserve for multiple sequences

ggml-ci

* kv-cache : fix typo [no ci]

* kv-cache : fix find_slot() logic for free slots

ggml-ci

* llama : add TODO for deprecating the defrag API in the future

* kv-cache : improve find_slot() using min/max seq pos info

ggml-ci

* llama : handle aborts and compute errors

ggml-ci

* memory : extract state into llama_memory_state

ggml-ci

* kv-cache : add comments

ggml-ci

* server : update batching logic to reset n_batch on successful decode

* server : upon full re-processing, remove the sequence from the cache

* kv-cache : add TODO for doing split_equal when split_simple fails

ggml-ci

commit | commitdiff | tree

Shawn yang [Sat, 31 May 2025 06:48:04 +0000 (14:48 +0800)]

CUDA: add a prop in ggml_cuda_device_infor for distinguish iGPU or dGPU in cuda (#13856) (#13895)

* 1. add "integrated" in ggml_cuda_device_info for distinguish whether it is Intergrate_gpu or discrete_gpu
2. Adjust the func:"ggml_backend_cuda_device_supports_buft" for this new feature

* Update ggml/src/ggml-cuda/ggml-cuda.cu

Adjusted code indentation

Co-authored-by: Johannes Gäßler <redacted>
* Update ggml/src/ggml-cuda/ggml-cuda.cu

Fixed incorrect setting of variable types

Co-authored-by: Johannes Gäßler <redacted>
* Update ggml/src/ggml-cuda/ggml-cuda.cu

Adjusted the judgment logic

Co-authored-by: Johannes Gäßler <redacted>
* add a host_buft assert in case of integrated_cuda_device with func:'evaluate_and_capture_cuda_graph()'

* Update ggml/src/ggml-cuda/ggml-cuda.cu

Add a defensive security assert

Co-authored-by: Johannes Gäßler <redacted>
* Update ggml/src/ggml-cuda/ggml-cuda.cu

Adjusted the support judgment logic.

Co-authored-by: Johannes Gäßler <redacted>
* revoke the suggest commit changes due to it's not applicable in jetson_device

* Update ggml/src/ggml-cuda/ggml-cuda.cu

Add parentheses to enforce operator precedence

Co-authored-by: Diego Devesa <redacted>
* Update ggml/src/ggml-cuda/ggml-cuda.cu

Fix ci bug: add a spaces

Co-authored-by: Johannes Gäßler <redacted>
---------

Co-authored-by: yangxiao <redacted>
Co-authored-by: Johannes Gäßler <redacted>
Co-authored-by: yangxiao <redacted>
Co-authored-by: Diego Devesa <redacted>

commit | commitdiff | tree

Johannes Gäßler [Fri, 30 May 2025 19:22:03 +0000 (21:22 +0200)]

CUDA: fix typo in FlashAttention code (#13926)

commit | commitdiff | tree

Diego Devesa [Fri, 30 May 2025 16:56:19 +0000 (09:56 -0700)]

sched : avoid changing cur_copy when a graph is already allocated (#13922)

commit | commitdiff | tree

Georgi Gerganov [Fri, 30 May 2025 16:38:07 +0000 (19:38 +0300)]

parallel : increase the variability of the prompt lengths (#13927)

ggml-ci

commit | commitdiff | tree

Diego Devesa [Fri, 30 May 2025 14:37:18 +0000 (07:37 -0700)]

cuda : prevent using split buffers with 3d/4d matrices (#13919)

commit | commitdiff | tree

Akarshan Biswas [Fri, 30 May 2025 14:10:57 +0000 (19:40 +0530)]

SYCL: Add mrope kernel (#13755)

* SYCL: Add mrope kernel

* feat: Optimize rope operations with vectorization

Uses `sycl::vec` to load and store two elements at a time,
significantly improving performance in `rope_norm`,
`rope_neox`, and `rope_multi`. This reduces the number of memory
accesses and leverages SIMD instructions for faster execution.

* Use ceil_div

commit | commitdiff | tree

Georgi Gerganov [Fri, 30 May 2025 13:25:45 +0000 (16:25 +0300)]

sync : vendor (#13901)

* sync : vendor

ggml-ci

* cont : fix httplib version

ggml-ci

* cont : fix lint

* cont : fix lint

* vendor : move to common folder /vendor

ggml-ci

* cont : fix lint

* cont : move httplib to /vendor + use json_fwd.hpp

ggml-ci

* cont : fix server build

ggml-ci

* cont : add missing headers

ggml-ci

* cont : header clean-up

ggml-ci

commit | commitdiff | tree

Sigbjørn Skjæret [Fri, 30 May 2025 12:50:43 +0000 (14:50 +0200)]

convert : fix rwkv bos/eos token (#13844)

Packaging of ggml-org/llama.cpp

RSS Atom