git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

overview / pkg / ggml / sources / llama.cpp / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Anthony Umfer [Sun, 11 May 2025 15:08:26 +0000 (11:08 -0400)]

tools : fix uninitialized llama_batch in server (#13436)

* add constructor to initialize server_context::batch, preventing destructor's call to llama_batch_free from causing an invalid free()

* Update tools/server/server.cpp

Co-authored-by: Xuan-Son Nguyen <redacted>
* use C++11 initializer syntax

* switch from Copy-list-initialization to Direct-list-initialization

---------

Co-authored-by: Xuan-Son Nguyen <redacted>

commit | commitdiff | tree

Sigbjørn Skjæret [Sun, 11 May 2025 14:20:39 +0000 (16:20 +0200)]

scripts : exit compare-llama-bench.py gracefully when there's nothing to compare (#13451)

commit | commitdiff | tree

Johannes Gäßler [Sun, 11 May 2025 14:09:33 +0000 (16:09 +0200)]

CUDA: fix crash with partial offloading of MoE (#13439)

commit | commitdiff | tree

David Huang [Sun, 11 May 2025 12:18:39 +0000 (20:18 +0800)]

Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (#13386)

commit | commitdiff | tree

City [Sun, 11 May 2025 09:35:52 +0000 (11:35 +0200)]

mtmd : support InternVL 3 38B and 78B mmproj (#13443)

* Support InternVL 3 38B and 78B mmproj

* Swap norms in clip.cpp

* Group variables together

commit | commitdiff | tree

Xuan-Son Nguyen [Sun, 11 May 2025 09:34:23 +0000 (11:34 +0200)]

mtmd : move helpers to dedicated file (#13442)

* mtmd : move helpers to dedicated file

* fix windows build

* rm redundant include

commit | commitdiff | tree

Thomas Germer [Sat, 10 May 2025 20:26:46 +0000 (22:26 +0200)]

docs : Fix typo in InternVL3 model name (#13440)

commit | commitdiff | tree

Johannes Gäßler [Sat, 10 May 2025 20:22:48 +0000 (22:22 +0200)]

CUDA: fix race conditions FlashAttention kernels (#13438)

commit | commitdiff | tree

Sigbjørn Skjæret [Sat, 10 May 2025 20:08:07 +0000 (22:08 +0200)]

vocab : add ByteDance-Seed/Seed-Coder (#13423)

commit | commitdiff | tree

Xuan-Son Nguyen [Sat, 10 May 2025 17:57:54 +0000 (19:57 +0200)]

mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl (#13434)

* mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl

* fix typo

commit | commitdiff | tree

Xuan-Son Nguyen [Sat, 10 May 2025 16:44:49 +0000 (18:44 +0200)]

server : update docs (#13432)

commit | commitdiff | tree

Sigbjørn Skjæret [Sat, 10 May 2025 15:19:52 +0000 (17:19 +0200)]

llguidance : set tokenizer slices to default (#13424)

commit | commitdiff | tree

Thammachart Chinvarapon [Sat, 10 May 2025 14:34:48 +0000 (21:34 +0700)]

ci: free_disk_space flag enabled for intel variant (#13426)

before cleanup: 20G
after cleanup: 44G
after all built and pushed: 24G

https://github.com/Thammachart/llama.cpp/actions/runs/14945093573/job/41987371245

commit | commitdiff | tree

Xuan-Son Nguyen [Sat, 10 May 2025 14:26:42 +0000 (16:26 +0200)]

mtmd : support InternVL 2.5 and 3 (#13422)

* convert : internvl support

* InternVL3-1B working

* fix regression

* rm mobilevlm from test

* fix conversion

* add test for internvl

* add to list of pre-quant

* restore boi/eoi check

* add clarify comment for norm eps

commit | commitdiff | tree

Johannes Gäßler [Sat, 10 May 2025 07:16:52 +0000 (09:16 +0200)]

CUDA: fix FlashAttention on Turing (#13415)

commit | commitdiff | tree

Xuan-Son Nguyen [Sat, 10 May 2025 06:16:29 +0000 (08:16 +0200)]

arg : add env var to control mmproj (#13416)

* arg : add env var to control mmproj

* small note about -hf --mmproj

commit | commitdiff | tree

Jeff Bolz [Sat, 10 May 2025 06:07:07 +0000 (23:07 -0700)]

vulkan: scalar flash attention implementation (#13324)

* vulkan: scalar flash attention implementation

* vulkan: always use fp32 for scalar flash attention

* vulkan: use vector loads in scalar flash attention shader

* vulkan: remove PV matrix, helps with register usage

* vulkan: reduce register usage in scalar FA, but perf may be slightly worse

* vulkan: load each Q value once. optimize O reduction. more tuning

* vulkan: support q4_0/q8_0 KV in scalar FA

* CI: increase timeout to accommodate newly-supported tests

* vulkan: for scalar FA, select between 1 and 8 rows

* vulkan: avoid using Float16 capability in scalar FA

commit | commitdiff | tree

Helton Reis [Fri, 9 May 2025 20:15:39 +0000 (17:15 -0300)]

chore(llguidance): use tagged version that does not break the build (#13413)

commit | commitdiff | tree

Xuan-Son Nguyen [Fri, 9 May 2025 17:29:37 +0000 (19:29 +0200)]

server : vision support via libmtmd (#12898)

* server : (experimental) vision support via libmtmd

* mtmd : add more api around mtmd_image_tokens

* mtmd : add more api around mtmd_image_tokens

* mtmd : ability to calc image hash

* shared_ptr for mtmd_image_tokens

* move hash to user-define ID (fixed)

* abstract out the batch management

* small fix

* refactor logic adding tokens to batch

* implement hashing image

* use FNV hash, now hash bitmap instead of file data

* allow decoding image embedding to be split into batches

* rm whitespace

* disable some features when mtmd is on

* fix --no-mmproj-offload

* mtmd_context_params no timings

* refactor server_inp to server_tokens

* fix the failing test case

* init

* wip

* working version

* add mtmd::bitmaps

* add test target

* rm redundant define

* test: mtmd_input_chunks_free

* rm outdated comment

* fix merging issue

* explicitly create mtmd::input_chunks

* mtmd_input_chunk_copy

* add clone()

* improve server_input struct

* clip : fix confused naming ffn_up and ffn_down

* rm ffn_i/o/g naming

* rename n_embd, n_ff

* small fix

* no check n_ff

* fix detokenize

* add const to various places

* add warning about breaking changes

* add c api

* helper: use mtmd_image_tokens_get_n_pos

* fix ctx_shift

* fix name shadowing

* more strict condition

* support remote image_url

* remote image_url log

* add CI test

* do not log base64

* add "has_multimodal" to /props

* remove dangling image

* speculative: use slot.cache_tokens.insert

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <redacted>
* rm can_be_detokenized

* on prmpt processing done, assert cache_tokens.size

* handle_completions_impl returns void

* adapt the new web ui

* update docs and hot topics

* rm assert

* small fix (2)

---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Alberto Cabrera Pérez [Fri, 9 May 2025 15:34:08 +0000 (16:34 +0100)]

sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (#12858)

* sycl : Implemented reorder Q4_0 mmvq

Signed-off-by: Alberto Cabrera <redacted>
* sycl : Fixed mmvq being called when reorder is disabled

* sycl : Improved comments in the quants header

Signed-off-by: Alberto Cabrera <redacted>
* Use static_assert

* safe_div -> ceil_div

* Clarify qi comment

* change the reorder tensor from init to execute OP

* dbg

* Undo changes to test-backend-ops

* Refactor changes on top of q4_0 reorder fix

* Missing Reverts

* Refactored opt_for_reorder logic to simplify code path

* Explicit inlining and unroll

* Renamed mul_mat_algo enum for consistency

---------

Signed-off-by: Alberto Cabrera <redacted>
Co-authored-by: romain.biessy <redacted>

commit | commitdiff | tree

Georgi Gerganov [Fri, 9 May 2025 12:14:56 +0000 (15:14 +0300)]

metal : optimize MoE for large batches (#13388)

ggml-ci

commit | commitdiff | tree

Johannes Gäßler [Fri, 9 May 2025 11:34:58 +0000 (13:34 +0200)]

CUDA: FA support for Deepseek (Ampere or newer) (#13306)

* CUDA: FA support for Deepseek (Ampere or newer)

* do loop unrolling via C++ template

commit | commitdiff | tree

Diego Devesa [Fri, 9 May 2025 11:02:07 +0000 (13:02 +0200)]

llama : do not crash if there is no CPU backend (#13395)

* llama : do not crash if there is no CPU backend

* add checks to examples

commit | commitdiff | tree

Johannes Gäßler [Fri, 9 May 2025 10:14:04 +0000 (12:14 +0200)]

CUDA: fix crash on large batch size for MoE models (#13384)

commit | commitdiff | tree

Bartowski [Fri, 9 May 2025 09:53:58 +0000 (05:53 -0400)]

imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation (#13389)

* Add --parse-special for enabling parsing of special tokens in imatrix calculation

* whitespace

commit | commitdiff | tree

R0CKSTAR [Fri, 9 May 2025 09:25:50 +0000 (17:25 +0800)]

llama-run: add support for downloading models from ModelScope (#13370)

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Xuan-Son Nguyen [Fri, 9 May 2025 09:18:02 +0000 (11:18 +0200)]

mtmd : fix batch_view for m-rope (#13397)

* mtmd : fix batch_view for m-rope

* nits : fix comment

commit | commitdiff | tree

Xuan-Son Nguyen [Fri, 9 May 2025 09:17:51 +0000 (11:17 +0200)]

llama : one-off chat template fix for Mistral-Small-2503 (#13398)

* llama : one-off chat template fix for Mistral-Small-2503

* update readme

* add mistral-v7-tekken

commit | commitdiff | tree

Radoslav Gerganov [Fri, 9 May 2025 07:31:07 +0000 (10:31 +0300)]

rpc : add rpc_msg_set_tensor_hash_req (#13353)

* rpc : add rpc_msg_set_tensor_hash_req

Use a dedicated struct for the request of RPC_CMD_SET_TENSOR_HASH which
makes the code cleaner.

* fix

commit | commitdiff | tree

Jeff Bolz [Fri, 9 May 2025 07:23:41 +0000 (02:23 -0500)]

vulkan: Allow up to 4096 elements for mul_mat_id row_ids (#13326)

This assert fired running Qwen_Qwen3-30B-A3B-Q2_K.gguf:

GGML_ASSERT(nei0 * nei1 <= 3072);

The tensor is 8 x 512. Increase this array size to accommodate.

commit | commitdiff | tree

Xuan-Son Nguyen [Fri, 9 May 2025 07:06:37 +0000 (09:06 +0200)]

server : (webui) rename has_multimodal --> modalities (#13393)

* server : (webui) rename has_multimodal --> modalities

* allow converting SVG to PNG

* less complicated code

commit | commitdiff | tree

Diego Devesa [Thu, 8 May 2025 21:45:22 +0000 (23:45 +0200)]

ci : limit write permission to only the release step + fixes (#13392)

* ci : limit write permission to only the release step

* fix win cuda file name

* fix license file copy on multi-config generators

commit | commitdiff | tree

Matt Clayton [Thu, 8 May 2025 18:25:39 +0000 (14:25 -0400)]

mtmd : Expose helper_decode_image_chunk (#13366)

* mtmd: Expose helper_decode_image, output_embd_copy, image_tokens_copy/free

* Slim down

* Cleanups

commit | commitdiff | tree

Xuan-Son Nguyen [Thu, 8 May 2025 16:51:45 +0000 (18:51 +0200)]

server : (webui) fix a very small misalignment (#13387)

* server : (webui) fix a very small misalignment

* restore font-bold

commit | commitdiff | tree

Xuan-Son Nguyen [Thu, 8 May 2025 13:37:29 +0000 (15:37 +0200)]

server : (webui) revamp the input area, plus many small UI improvements (#13365)

* rework the input area

* process selected file

* change all icons to heroicons

* fix thought process collapse

* move conversation more menu to sidebar

* sun icon --> moon icon

* rm default system message

* stricter upload file check, only allow image if server has mtmd

* build it

* add renaming

* better autoscroll

* build

* add conversation group

* fix scroll

* extra context first, then user input in the end

* fix <hr> tag

* clean up a bit

* build

* add mb-3 for <pre>

* throttle adjustTextareaHeight to make it less laggy

* (nits) missing padding in sidebar

* rm stray console log

commit | commitdiff | tree

Sigbjørn Skjæret [Thu, 8 May 2025 13:34:29 +0000 (15:34 +0200)]

convert : support rope_scaling type and rope_type (#13349)

commit | commitdiff | tree

welix [Thu, 8 May 2025 13:03:53 +0000 (22:03 +0900)]

mtmd : fix the calculation of n_tokens for smolvlm (#13381)

Co-authored-by: Taichi Nishimura <redacted>

commit | commitdiff | tree

Georgi Gerganov [Thu, 8 May 2025 11:28:33 +0000 (14:28 +0300)]

context : allow cache-less context for embeddings (#13108)

* context : allow cache-less context for embeddings

ggml-ci

* context : enable reranking with encode()

ggml-ci

* context : encode() clears embd_seq

ggml-ci

* examples : use llama_encode() when appropriate

ggml-ci

* models : nomic bert moe does not require KV cache

* llama : update comments for llama_decode/llama_encode

ggml-ci

* context : update warning log [no ci]

commit | commitdiff | tree

Georgi Gerganov [Thu, 8 May 2025 11:26:50 +0000 (14:26 +0300)]

context : remove logits_all flag (#13284)

* context : remove logits_all flag

ggml-ci

* llama : remove logits_all flag + reorder llama_context_params

ggml-ci

commit | commitdiff | tree

Diego Devesa [Thu, 8 May 2025 11:15:28 +0000 (13:15 +0200)]

ci : move release workflow to a separate file (#13362)

commit | commitdiff | tree

Diego Devesa [Thu, 8 May 2025 11:15:15 +0000 (13:15 +0200)]

llama : print size and type of overridden tensors (#13364)

commit | commitdiff | tree

Alberto Cabrera Pérez [Thu, 8 May 2025 09:08:01 +0000 (10:08 +0100)]

sycl: addressing non-contiguous src1 mul_mats (nc and batched) (#13343)

* sycl: fixed non-contiguous src1 mul_mats (nc and batched)

* Fixed wrong static_cast inside kernel

commit | commitdiff | tree

Diego Devesa [Wed, 7 May 2025 14:36:33 +0000 (16:36 +0200)]

docker : disable arm64 and intel images (#13356)

commit | commitdiff | tree

Georgi Gerganov [Wed, 7 May 2025 13:39:36 +0000 (16:39 +0300)]

sync : ggml

ggml-ci

commit | commitdiff | tree

Daniel Bevenius [Mon, 5 May 2025 11:09:35 +0000 (13:09 +0200)]

whisper: remove MSVC warnings pragmas (whisper/3090)

* ggml : remove MSVC warnings pragmas

This commit removes the MSVC-specific pragmas as these are now handled
in ggml/CMakeLists.txt.

* whisper : remove MSVC warning pragmas

This commit removes the MSVC-specific pragmas. These are now handled in
the ggml/CMakeLists.txt file.

commit | commitdiff | tree

Jared Tweed [Fri, 2 May 2025 09:41:35 +0000 (02:41 -0700)]

cmake : removed stdc++fs (whisper/3097)

* removed stdc++fs

* kept line, but removed stdc++fs

commit | commitdiff | tree

Sigbjørn Skjæret [Wed, 7 May 2025 10:49:27 +0000 (12:49 +0200)]

llama : deci : support ffn-free with attention (#13296)

commit | commitdiff | tree

Ycros [Wed, 7 May 2025 08:23:28 +0000 (18:23 +1000)]

common : Add a warning when we can't match samplers from a string or char. (#13330)

commit | commitdiff | tree

R0CKSTAR [Wed, 7 May 2025 07:48:23 +0000 (15:48 +0800)]

cuda : remove nrows_x in mul_mat_q_process_tile (#13325)

Signed-off-by: Xiaodong Ye <redacted>

commit | commitdiff | tree

Georgi Gerganov [Wed, 7 May 2025 07:28:02 +0000 (10:28 +0300)]

examples : remove infill (#13283)

ggml-ci

commit | commitdiff | tree

piDack [Wed, 7 May 2025 07:23:11 +0000 (15:23 +0800)]

llama : support tie embedding for chatglm models (#13328)

commit | commitdiff | tree

Johannes Gäßler [Tue, 6 May 2025 21:35:51 +0000 (23:35 +0200)]

CUDA: mix virt/real CUDA archs for GGML_NATIVE=OFF (#13135)

commit | commitdiff | tree

Xuan-Son Nguyen [Tue, 6 May 2025 20:40:24 +0000 (22:40 +0200)]

clip : refactor graph builder (#13321)

* mtmd : refactor graph builder

* fix qwen2vl

* clean up siglip cgraph

* pixtral migrated

* move minicpmv to a dedicated build function

* move max_feature_layer to build_llava

* use build_attn for minicpm resampler

* fix windows build

* add comment for batch_size

* also support tinygemma3 test model

* qwen2vl does not use RMS norm

* fix qwen2vl norm (2)

commit | commitdiff | tree

DocShotgun [Tue, 6 May 2025 20:36:24 +0000 (13:36 -0700)]

sampling : make top_n_sigma no-op at <=0 or a single candidate (#13345)

commit | commitdiff | tree

oobabooga [Tue, 6 May 2025 18:24:15 +0000 (15:24 -0300)]

sampling : don't consider -infinity values in top_n_sigma (#13344)

commit | commitdiff | tree

Diego Devesa [Tue, 6 May 2025 18:15:31 +0000 (20:15 +0200)]

cmake : remove arm64 msvc presets (#13342)

commit | commitdiff | tree

Akarshan Biswas [Tue, 6 May 2025 14:57:06 +0000 (20:27 +0530)]

SYCL: Disable reorder optimize by default and stop setting tensor extras when optimize is disabled (#13254)

* SYCL: Do not set tensor extras when reorder optimize is disabled

* SYCL: Disable reorder optimize by default

commit | commitdiff | tree

Xuan-Son Nguyen [Tue, 6 May 2025 12:25:40 +0000 (14:25 +0200)]

llama : fix build_ffn without gate (#13336)

* llama : fix build_ffn without gate

* fix build on windows

* Revert "fix build on windows"

This reverts commit fc420d3c7eef3481d3d2f313fef2757cb33a7c56.

commit | commitdiff | tree

Johannes Gäßler [Tue, 6 May 2025 11:58:51 +0000 (13:58 +0200)]

CUDA: fix bad asserts for partial offload (#13337)

commit | commitdiff | tree

Sigbjørn Skjæret [Tue, 6 May 2025 09:12:06 +0000 (11:12 +0200)]

convert : qwen2/3moe : set yarn metadata if present (#13331)

* set yarn metadata if present

* add comment about enabling YaRN

Co-authored-by: Xuan-Son Nguyen <redacted>
---------

Co-authored-by: Xuan-Son Nguyen <redacted>

commit | commitdiff | tree

Johannes Gäßler [Tue, 6 May 2025 06:36:46 +0000 (08:36 +0200)]

CUDA: fix --split-mode row for MMQ (#13323)

commit | commitdiff | tree

compilade [Tue, 6 May 2025 02:27:31 +0000 (22:27 -0400)]

gguf-py : avoid requiring pyside6 for other scripts (#13036)

- gguf-py : remove gguf-py/gguf/scripts/__init__.py because it's not needed

Implicit namespaces are supported since Python 3.3 (https://peps.python.org/pep-0420/),
and the entrypoints in pyproject.toml can directly refer to the main functions.

commit | commitdiff | tree

Johannes Gäßler [Mon, 5 May 2025 20:32:13 +0000 (22:32 +0200)]

CUDA: fix logic for clearing padding with -ngl 0 (#13320)

commit | commitdiff | tree

oobabooga [Mon, 5 May 2025 20:12:19 +0000 (17:12 -0300)]

sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (#13264)

* sampling: add Top-nσ sampler to `llama-server` and sampler ordering

* revert: sampler ordering

* revert: VS' crappy auto-formatting

* revert: VS' crappy auto-formatting pt.2

* revert: my crappy eye sight...

* sampling: add XTC to Top-nσ sampler chain

* sampling: add Dyna. Temp. to Top-nσ sampler chain

* sampling: actually remove Top-nσ from sampler(oops)

* Integrate top_n_sigma into main sampler chain

* Define COMMON_SAMPLER_TYPE_TOP_N_SIGMA

* Formatting

* Lint

* Exit early in the sampler if nsigma < 0

---------

Co-authored-by: CasualAutopsy <redacted>

commit | commitdiff | tree

igardev [Mon, 5 May 2025 14:03:31 +0000 (17:03 +0300)]

server : Webui - change setText command from parent window to also send the message. (#13309)

* setText command from parent window for llama-vscode now sends the message automatically.

* Upgrade packages versions to fix vulnerabilities with "npm audit fix" command.

* Fix code formatting.

* Add index.html.gz changes.

* Revert "Upgrade packages versions to fix vulnerabilities with "npm audit fix" command."

This reverts commit 67687b7fda8a293724ba92ea30bb151677406bc8.

* easier approach

* add setTimeout

---------

Co-authored-by: igardev <redacted>
Co-authored-by: Xuan Son Nguyen <redacted>

commit | commitdiff | tree

Xuan-Son Nguyen [Mon, 5 May 2025 14:02:55 +0000 (16:02 +0200)]

mtmd : rename llava directory to mtmd (#13311)

* mv llava to mtmd

* change ref everywhere

commit | commitdiff | tree

Xuan-Son Nguyen [Mon, 5 May 2025 10:54:44 +0000 (12:54 +0200)]

clip : fix confused naming ffn_up and ffn_down (#13290)

* clip : fix confused naming ffn_up and ffn_down

* rm ffn_i/o/g naming

* rename n_embd, n_ff

* small fix

* no check n_ff

commit | commitdiff | tree

Sigbjørn Skjæret [Mon, 5 May 2025 10:34:26 +0000 (12:34 +0200)]

convert : bailingmoe : set yarn metadata if present (#13312)

commit | commitdiff | tree

Akarshan Biswas [Mon, 5 May 2025 08:09:10 +0000 (13:39 +0530)]

SYCL: Disable mul_mat kernels for noncontiguous tensor b (#13308)

ggml-ci

commit | commitdiff | tree

Xuan-Son Nguyen [Sun, 4 May 2025 21:43:42 +0000 (23:43 +0200)]

mtmd : add C public API (#13184)

* init

* wip

* working version

* add mtmd::bitmaps

* add test target

* rm redundant define

* test: mtmd_input_chunks_free

* rm outdated comment

* fix merging issue

* explicitly create mtmd::input_chunks

* mtmd_input_chunk_copy

* add clone()

* add const to various places

* add warning about breaking changes

* helper: use mtmd_image_tokens_get_n_pos

commit | commitdiff | tree

Diego Devesa [Sun, 4 May 2025 19:25:43 +0000 (21:25 +0200)]

rpc : use backend registry, support dl backends (#13304)

commit | commitdiff | tree

Aaron Teo [Sun, 4 May 2025 17:49:12 +0000 (01:49 +0800)]

ggml : activate s390x simd for Q3_K (#13301)

Signed-off-by: Aaron Teo <redacted>

commit | commitdiff | tree

Diego Devesa [Sun, 4 May 2025 15:05:20 +0000 (17:05 +0200)]

llava/mtmd : fixes to fully support dl backends (#13303)

commit | commitdiff | tree

Diego Devesa [Sun, 4 May 2025 12:20:49 +0000 (14:20 +0200)]

llama : build windows releases with dl backends (#13220)

commit | commitdiff | tree

Johannes Gäßler [Sun, 4 May 2025 12:16:39 +0000 (14:16 +0200)]

CUDA: fix race condition in MMQ stream-k fixup (#13299)

commit | commitdiff | tree

Johannes Gäßler [Sun, 4 May 2025 11:58:38 +0000 (13:58 +0200)]

CUDA: fix race condition in MMQ ids_dst (#13294)

commit | commitdiff | tree

Jeff Bolz [Sun, 4 May 2025 05:17:16 +0000 (00:17 -0500)]

vulkan: Additional type support for unary, binary, and copy (#13266)

Support f16->f32 copy.
Support f16->f16 and f32->f32 unary ops.
Support all combinations of f16/f32 for src0/src1/dst for add/sub/mul/div.

commit | commitdiff | tree

Johannes Gäßler [Sat, 3 May 2025 22:50:37 +0000 (00:50 +0200)]

imatrix: fix oob writes if src1 is not contiguous (#13286)

commit | commitdiff | tree

Xuan-Son Nguyen [Sat, 3 May 2025 18:07:54 +0000 (20:07 +0200)]

clip : revert the change of BOI/EOI token for GLM-edge (⚠️ breaking change) (#13259)

commit | commitdiff | tree

ymcki [Sat, 3 May 2025 15:39:51 +0000 (23:39 +0800)]

llama : Llama-3_1-Nemotron-Ultra-253B-v1 support (#12843)

commit | commitdiff | tree

Diego Devesa [Fri, 2 May 2025 18:27:13 +0000 (20:27 +0200)]

llama : move end-user examples to tools directory (#13249)

* llama : move end-user examples to tools directory

---------

Co-authored-by: Xuan Son Nguyen <redacted>

commit | commitdiff | tree

Georgi Gerganov [Fri, 2 May 2025 17:54:30 +0000 (20:54 +0300)]

sync : ggml (#13268)

* vulkan : kernels for depthwise 2D convolution (CONV_2D_DW) (ggml/1204)

* vulkan : add kernels for depthwise 2d convolution (OP_CONV_2D_DW)

* review: remove src_x/y < 0 checks; add performance tests

* sync : ggml

ggml-ci

* vulkan : fix lint (#0)

---------

Co-authored-by: Acly <redacted>

commit | commitdiff | tree

Georgi Gerganov [Fri, 2 May 2025 17:54:13 +0000 (20:54 +0300)]

context : fix reorder logic (#13267)

ggml-ci

commit | commitdiff | tree

shalinib-ibm [Fri, 2 May 2025 16:53:12 +0000 (22:23 +0530)]

ggml : Enable MMA for BF16 in llamafile_sgemm (#13148)

This patch upstreams llamafile's cpu matrix multiplication kernels for ppc64le using MMA builtins for BF16 data type.

This change results in 9x - 40x gains
in total speed S t/s (ie all tokens/total time), across various batch sizes tested using llama-batched-bench benchmark.

The patch is tested with Meta-Lllama-3-8B,
and Mistral-7B models (BF16 models generated by using llama-quantize from corresponding FP32 models) on an IBM POWER10 machine.

Signed-off-by: Shalini Salomi Bodapati <redacted>

commit | commitdiff | tree

Jared Van Bortel [Fri, 2 May 2025 15:42:30 +0000 (11:42 -0400)]

llama-model : support Qwen2 embedding models and pooling_mode_lasttoken (#13245)

commit | commitdiff | tree

Jared Van Bortel [Fri, 2 May 2025 15:41:54 +0000 (11:41 -0400)]

convert : use correct context length for nomic-embed-text-v2 (#13216)

commit | commitdiff | tree

Xuan-Son Nguyen [Fri, 2 May 2025 15:17:15 +0000 (17:17 +0200)]

convert : converting mmproj for Qwen2/2.5VL from convert_hf_to_gguf (#13209)

* wip

* qwen2.5vl ok

* vision: fix models missing "text_config"

* add test

* fix test repo name

* fix 32B model

* Revert "fix 32B model"

This reverts commit 651752f1ae25fe8a01c1e57c18cf2eca80b2774e.

* clarify about 32B

* rm qwen surgery script

* update llava/readme

* move V_ENC_EMBD_PATCH handling to Qwen2VLVisionModel

commit | commitdiff | tree

Georgi Gerganov [Fri, 2 May 2025 14:48:36 +0000 (17:48 +0300)]

kv-cache : separate recurrent vs non-recurrent impl (#12799)

* kv-cache : serparate recurrent vs non-recurrent impl (wip)

ggml-ci

* kv-cache : init -> contructor + add llama_memory_params

ggml-ci

* kv-cache : fix callback reference

ggml-ci

* context : llama_kv_cache -> llama_memory_i

ggml-ci

* context : move memory creation logic to model

ggml-ci

* llama : remove reference of memory during encode

ggml-ci

* kv-cache : hide padding details in the implementation

ggml-ci

* kv-cache : add ubatch_next()

ggml-ci

* context : simplify sbatch logic

ggml-ci

* kv-cache : hide defrag logic in the implementation

ggml-ci

* context : hide kv cache details in implementation

ggml-ci

* build : fix

ggml-ci

* cont : another fix

ggml-ci

* kv-cache : simplify interface (wip)

ggml-ci

* kv-cache : use separate KV cell structs for unified/recurrent

ggml-ci

* kv-cache : clean-up

ggml-ci

* model : better llama_model::create_model() signature

ggml-ci

* kv-cache : fix recurrent seq_rm()

ggml-ci

* kv-cache : replace `struct callbacks` with `llama_model &`

ggml-ci

* kv-cache : replace `struct graph_params` with `llama_context &`

ggml-ci

* kv-cache : fix offload check

ggml-ci

* context : avoid passing unique_ptr

ggml-ci

* kv-cache : avoid using the backends from the llama_context

ref #13113

ggml-ci

* kv-cache : more consistent debug logs [no ci]

* kv-cache : do not pass the full llama_context for kv graphs

ggml-ci

* kv-cache : remove comment

* kv-cache : ggml_rope_ext_inplace -> ggml_rope_ext

ggml-ci

* kv-cache : fix recurrent multi-user case

ggml-ci

* memory : remove comments [no ci]

commit | commitdiff | tree

Sigbjørn Skjæret [Fri, 2 May 2025 10:44:24 +0000 (12:44 +0200)]

llama : orion rope type is neox (#13261)

commit | commitdiff | tree

Sigbjørn Skjæret [Fri, 2 May 2025 10:40:56 +0000 (12:40 +0200)]

llama : plamo rope type is neox (#13260)

commit | commitdiff | tree

piDack [Fri, 2 May 2025 09:06:09 +0000 (17:06 +0800)]

llama-chat : reset glmedge chat template (#13253)

* reset glmedge chat template

* fix glmedge chat template

commit | commitdiff | tree

Shakil Ahmed [Fri, 2 May 2025 08:20:27 +0000 (14:20 +0600)]

mtmd-cli : fix out_of_range when input image path is empty (#13244)

* fix out_of_range error to keep the chat loop running

* Update examples/llava/mtmd-cli.cpp

Co-authored-by: Sigbjørn Skjæret <redacted>
* mtmd-cli : load image right away

* add a new line for readability

* rm printf

* Update examples/llava/mtmd-cli.cpp

Co-authored-by: Sigbjørn Skjæret <redacted>
* Update examples/llava/mtmd-cli.cpp

---------

Co-authored-by: Sigbjørn Skjæret <redacted>
Co-authored-by: Xuan Son Nguyen <redacted>
Co-authored-by: Xuan-Son Nguyen <redacted>

commit | commitdiff | tree

Georgi Gerganov [Fri, 2 May 2025 06:48:31 +0000 (09:48 +0300)]

server : add cache reuse card link to help (#13230)

* server : add cache reuse card link to help

* args : use short url

commit | commitdiff | tree

Xuan-Son Nguyen [Fri, 2 May 2025 06:45:10 +0000 (08:45 +0200)]

convert : explicitly disable trust_remote_code for AutoConfig (#13246)

commit | commitdiff | tree

bandoti [Thu, 1 May 2025 22:06:39 +0000 (19:06 -0300)]

ci: fix cross-compile sync issues (#12804)

commit | commitdiff | tree

Justin Santa Barbara [Thu, 1 May 2025 21:32:11 +0000 (17:32 -0400)]

rpc : avoid uninitialized memory in serialize_tensor (#13210)

Zero out the name and padding buffers.

commit | commitdiff | tree

Jesse Gross [Thu, 1 May 2025 20:46:10 +0000 (13:46 -0700)]

ggml: Don't assert fail when tensor data changes (#13222)

The following scenario will cause an assertion failure in the graph
allocator:
- Build and allocate a graph containing a tensor with a non-NULL data
pointer
- Build and allocate a new graph where that data is NULL

Result:
ggml-alloc.c:819: GGML_ASSERT(talloc->buffer_id >= 0) failed

This happens during revalidation because we think that memory should
have been previously allocated based on the current graph but in
reality the previous graph was different. In this situation, we
should do a full reallocation pass.

commit | commitdiff | tree

Diego Devesa [Thu, 1 May 2025 19:48:08 +0000 (21:48 +0200)]

build : fix build info on windows (#13239)

* build : fix build info on windows

* fix cuda host compiler msg

commit | commitdiff | tree

Loïc Carrère [Thu, 1 May 2025 19:32:21 +0000 (21:32 +0200)]

clip : (minicpmv) Re-enable upscaling of images smaller than the CLIP image size (#13237)

commit | commitdiff | tree

matteo [Thu, 1 May 2025 19:16:38 +0000 (21:16 +0200)]

llama-chat : update GLM4 chat template (#13238)

* update GLM4 chat template

* Update chat template

Co-authored-by: Xuan-Son Nguyen <redacted>
---------

Co-authored-by: Xuan-Son Nguyen <redacted>

Packaging of ggml-org/llama.cpp