]>
git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log
Xuan-Son Nguyen [Wed, 14 May 2025 08:26:12 +0000 (10:26 +0200)]
webui : use fflate for more deterministic gzip compress (#13525)
* webui : use pako for more deterministic gzip compress
* simpler code
* use fflate instead of pako
Luca Stefani [Wed, 14 May 2025 08:07:31 +0000 (10:07 +0200)]
webui: Allow pasting file from clipboard (#13526)
* server: Allow pasting file from clipboard
* server: Prevent default action on file paste
* update build
* format then build combined
---------
Co-authored-by: Xuan Son Nguyen <redacted>
ddpasa [Wed, 14 May 2025 07:59:12 +0000 (09:59 +0200)]
docs: Update link to ggml-org in multimodal.md (#13513)
* Update multimodal.md
Minor change to include the huggingface link
* Update docs/multimodal.md
---------
Co-authored-by: Xuan-Son Nguyen <redacted>
Sigbjørn Skjæret [Wed, 14 May 2025 06:41:01 +0000 (08:41 +0200)]
scripts : fix compare-llama-bench.py show parameter (#13514)
Jeff Bolz [Wed, 14 May 2025 04:15:50 +0000 (13:15 +0900)]
vulkan: workaround FA compile failures on macos (#13517)
Ed Addario [Tue, 13 May 2025 17:12:31 +0000 (18:12 +0100)]
quantize : improve tensor-type pattern matching (#13033)
Xuan-Son Nguyen [Tue, 13 May 2025 15:07:21 +0000 (17:07 +0200)]
clip : clip.h become private API (⚠️ breaking change) (#13510)
Georgi Gerganov [Tue, 13 May 2025 15:04:39 +0000 (18:04 +0300)]
metal : use FA-vec kernel up to batch size 20 (#13496)
* batched-bench : fix pp batch contents
* metal : optimize multi-sequence FA vec kernel
ggml-ci
* metal : use FA-vec kernel up to batch size 20
ggml-ci
Georgi Gerganov [Tue, 13 May 2025 15:04:00 +0000 (18:04 +0300)]
metal : optimize multi-sequence FA vec kernel (#13493)
* batched-bench : fix pp batch contents
* metal : optimize multi-sequence FA vec kernel
ggml-ci
Dan Johansson [Tue, 13 May 2025 15:02:28 +0000 (17:02 +0200)]
ggml-cpu: Update KleidiAI to v1.6 and fix include directives (#13509)
Signed-off-by: Dan Johansson <redacted>
Georgi Gerganov [Tue, 13 May 2025 15:01:53 +0000 (18:01 +0300)]
batched-bench : fix pp batch contents (#13492)
Xuan-Son Nguyen [Tue, 13 May 2025 13:33:58 +0000 (15:33 +0200)]
mtmd : remove libllava, remove clip-quantize-cli (⚠️ breaking change) (#13460)
* mtmd : remove libllava, remove clip-quantize-cli
* rm clip_model_quantize
Sigbjørn Skjæret [Tue, 13 May 2025 13:31:12 +0000 (15:31 +0200)]
scripts : support arbitrary input file formats in compare-llama-bench.py (#13455)
Gabe Goodhart [Tue, 13 May 2025 13:12:01 +0000 (07:12 -0600)]
model : Granite MoE shared (#13269)
* feat: Add GGUF conversion for granitemoeshared
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <redacted>
* feat: hparam and arch plumbing for granitemoeshared
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <redacted>
* fix: Split MoE fused tensors for shared experts in conversion
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <redacted>
* feat: First WIP cut at model arch in cpp
The hparam and architecture plumbing should be correct, but the
implementation of the shared experts seems to still be broken.
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <redacted>
* fix: Cleaner (maybe more correct?) splitting for gate/up
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <redacted>
* fix: Fix the input to the shared experts
I had misread that the shared experts take the inputs _before_ the standard
MoE layer and was feeding the output of the MoE to the shared experts.
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <redacted>
* fix: Avoid architecture-specific checks for Granite MoE Shared
This is a cleaner way that will allow more flexibility in architecture
strings going forward.
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <redacted>
* refactor: Split granite architectures out of llm_build_llama
This helps de-clutter the llama-family graph construction and allows
granite to diverge further (in preparation for Granite 4).
NOTE: I removed the granite scale factors from llm_build_deci because they
appear to only be there as copy-paste from llm_build_llama. The HF config
does not seem to set those values:
https://huggingface.co/Deci/DeciLM-7B/blob/main/config.json
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <redacted>
* fix: Fix compiler warning about uninitialized inp_pos
This should not have been reachable, but it warns on some compliers
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <redacted>
* fix: Consoladate GraniteMoEShared into GraniteMoE for conversion
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <redacted>
* fix: Consolidate GraniteMoEShared into GraniteMoE on the c++ side
Branch: GraniteMoEShared
Signed-off-by: Gabe Goodhart <redacted>
---------
Signed-off-by: Gabe Goodhart <redacted>
Georgi Gerganov [Tue, 13 May 2025 11:01:45 +0000 (14:01 +0300)]
sync : ggml
Diego Devesa [Mon, 12 May 2025 22:31:37 +0000 (15:31 -0700)]
llama-bench : add defrag-thold, check for invalid ranges (#13487)
lhez [Mon, 12 May 2025 20:13:49 +0000 (13:13 -0700)]
opencl: remove unnecessary assert for `add` (#13257)
Xuan-Son Nguyen [Mon, 12 May 2025 13:06:51 +0000 (15:06 +0200)]
clip : cap max image size 1024 for qwen vl model (#13478)
Johannes Gäßler [Mon, 12 May 2025 12:44:49 +0000 (14:44 +0200)]
llama/ggml: add LLM training support (#10544)
* llama/ggml: add LLM training support
more compact progress bar
llama_save_model_to_file
llama_opt_param_filter
ggml_graph_dup force_grads
refactor ggml_opt, fix test-opt
* remove logits_all
* refactor CUDA implementation for ACC
* reset graph at beginning of opt period
Georgi Gerganov [Mon, 12 May 2025 12:12:27 +0000 (15:12 +0300)]
context : fix state io for memory-less contexts (#13470)
ggml-ci
Anudit Nagar [Mon, 12 May 2025 11:56:42 +0000 (18:56 +0700)]
server : allow content to be null in oaicompat_completion_params_parse (#13477)
Diego Devesa [Mon, 12 May 2025 11:08:22 +0000 (13:08 +0200)]
llama-bench : accept ranges for integer parameters (#13410)
Dan Johansson [Mon, 12 May 2025 11:06:19 +0000 (13:06 +0200)]
ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel (#13053)
* ggml-cpu: Integrate fp32=bf16xbf16 SME KleidiAI kernel
Signed-off-by: Dan Johansson <redacted>
* * code review fixes
Signed-off-by: Dan Johansson <redacted>
* * adds a comment that clarifies barrier usage
Signed-off-by: Dan Johansson <redacted>
---------
Signed-off-by: Dan Johansson <redacted>
Co-authored-by: Charles Xu <redacted>
Johannes Gäßler [Mon, 12 May 2025 08:51:21 +0000 (10:51 +0200)]
CUDA: fix misaligned synchronization in FA (#13469)
Xuan-Son Nguyen [Mon, 12 May 2025 08:29:13 +0000 (10:29 +0200)]
ggml : add mrope kernel for metal (#13457)
Atharva Dubey [Mon, 12 May 2025 05:15:32 +0000 (06:15 +0100)]
enable dpcpp nightly builds with libraries (#13406)
City [Sun, 11 May 2025 22:39:06 +0000 (00:39 +0200)]
mtmd : Use RMS norm for InternVL 3 38B and 78B mmproj (#13459)
Anthony Umfer [Sun, 11 May 2025 15:08:26 +0000 (11:08 -0400)]
tools : fix uninitialized llama_batch in server (#13436)
* add constructor to initialize server_context::batch, preventing destructor's call to llama_batch_free from causing an invalid free()
* Update tools/server/server.cpp
Co-authored-by: Xuan-Son Nguyen <redacted>
* use C++11 initializer syntax
* switch from Copy-list-initialization to Direct-list-initialization
---------
Co-authored-by: Xuan-Son Nguyen <redacted>
Sigbjørn Skjæret [Sun, 11 May 2025 14:20:39 +0000 (16:20 +0200)]
scripts : exit compare-llama-bench.py gracefully when there's nothing to compare (#13451)
Johannes Gäßler [Sun, 11 May 2025 14:09:33 +0000 (16:09 +0200)]
CUDA: fix crash with partial offloading of MoE (#13439)
David Huang [Sun, 11 May 2025 12:18:39 +0000 (20:18 +0800)]
Add `--no-op-offload` to improve `-ot` pp perf in MoE models like llama4 400B (#13386)
City [Sun, 11 May 2025 09:35:52 +0000 (11:35 +0200)]
mtmd : support InternVL 3 38B and 78B mmproj (#13443)
* Support InternVL 3 38B and 78B mmproj
* Swap norms in clip.cpp
* Group variables together
Xuan-Son Nguyen [Sun, 11 May 2025 09:34:23 +0000 (11:34 +0200)]
mtmd : move helpers to dedicated file (#13442)
* mtmd : move helpers to dedicated file
* fix windows build
* rm redundant include
Thomas Germer [Sat, 10 May 2025 20:26:46 +0000 (22:26 +0200)]
docs : Fix typo in InternVL3 model name (#13440)
Johannes Gäßler [Sat, 10 May 2025 20:22:48 +0000 (22:22 +0200)]
CUDA: fix race conditions FlashAttention kernels (#13438)
Sigbjørn Skjæret [Sat, 10 May 2025 20:08:07 +0000 (22:08 +0200)]
vocab : add ByteDance-Seed/Seed-Coder (#13423)
Xuan-Son Nguyen [Sat, 10 May 2025 17:57:54 +0000 (19:57 +0200)]
mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl (#13434)
* mtmd : add hard limit on image resolution for qwen2vl / qwen2.5vl
* fix typo
Xuan-Son Nguyen [Sat, 10 May 2025 16:44:49 +0000 (18:44 +0200)]
server : update docs (#13432)
Sigbjørn Skjæret [Sat, 10 May 2025 15:19:52 +0000 (17:19 +0200)]
llguidance : set tokenizer slices to default (#13424)
Thammachart Chinvarapon [Sat, 10 May 2025 14:34:48 +0000 (21:34 +0700)]
ci: free_disk_space flag enabled for intel variant (#13426)
before cleanup: 20G
after cleanup: 44G
after all built and pushed: 24G
https://github.com/Thammachart/llama.cpp/actions/runs/
14945093573 /job/
41987371245
Xuan-Son Nguyen [Sat, 10 May 2025 14:26:42 +0000 (16:26 +0200)]
mtmd : support InternVL 2.5 and 3 (#13422)
* convert : internvl support
* InternVL3-1B working
* fix regression
* rm mobilevlm from test
* fix conversion
* add test for internvl
* add to list of pre-quant
* restore boi/eoi check
* add clarify comment for norm eps
Johannes Gäßler [Sat, 10 May 2025 07:16:52 +0000 (09:16 +0200)]
CUDA: fix FlashAttention on Turing (#13415)
Xuan-Son Nguyen [Sat, 10 May 2025 06:16:29 +0000 (08:16 +0200)]
arg : add env var to control mmproj (#13416)
* arg : add env var to control mmproj
* small note about -hf --mmproj
Jeff Bolz [Sat, 10 May 2025 06:07:07 +0000 (23:07 -0700)]
vulkan: scalar flash attention implementation (#13324)
* vulkan: scalar flash attention implementation
* vulkan: always use fp32 for scalar flash attention
* vulkan: use vector loads in scalar flash attention shader
* vulkan: remove PV matrix, helps with register usage
* vulkan: reduce register usage in scalar FA, but perf may be slightly worse
* vulkan: load each Q value once. optimize O reduction. more tuning
* vulkan: support q4_0/q8_0 KV in scalar FA
* CI: increase timeout to accommodate newly-supported tests
* vulkan: for scalar FA, select between 1 and 8 rows
* vulkan: avoid using Float16 capability in scalar FA
Helton Reis [Fri, 9 May 2025 20:15:39 +0000 (17:15 -0300)]
chore(llguidance): use tagged version that does not break the build (#13413)
Xuan-Son Nguyen [Fri, 9 May 2025 17:29:37 +0000 (19:29 +0200)]
server : vision support via libmtmd (#12898)
* server : (experimental) vision support via libmtmd
* mtmd : add more api around mtmd_image_tokens
* mtmd : add more api around mtmd_image_tokens
* mtmd : ability to calc image hash
* shared_ptr for mtmd_image_tokens
* move hash to user-define ID (fixed)
* abstract out the batch management
* small fix
* refactor logic adding tokens to batch
* implement hashing image
* use FNV hash, now hash bitmap instead of file data
* allow decoding image embedding to be split into batches
* rm whitespace
* disable some features when mtmd is on
* fix --no-mmproj-offload
* mtmd_context_params no timings
* refactor server_inp to server_tokens
* fix the failing test case
* init
* wip
* working version
* add mtmd::bitmaps
* add test target
* rm redundant define
* test: mtmd_input_chunks_free
* rm outdated comment
* fix merging issue
* explicitly create mtmd::input_chunks
* mtmd_input_chunk_copy
* add clone()
* improve server_input struct
* clip : fix confused naming ffn_up and ffn_down
* rm ffn_i/o/g naming
* rename n_embd, n_ff
* small fix
* no check n_ff
* fix detokenize
* add const to various places
* add warning about breaking changes
* add c api
* helper: use mtmd_image_tokens_get_n_pos
* fix ctx_shift
* fix name shadowing
* more strict condition
* support remote image_url
* remote image_url log
* add CI test
* do not log base64
* add "has_multimodal" to /props
* remove dangling image
* speculative: use slot.cache_tokens.insert
* Apply suggestions from code review
Co-authored-by: Georgi Gerganov <redacted>
* rm can_be_detokenized
* on prmpt processing done, assert cache_tokens.size
* handle_completions_impl returns void
* adapt the new web ui
* update docs and hot topics
* rm assert
* small fix (2)
---------
Co-authored-by: Georgi Gerganov <redacted>
Alberto Cabrera Pérez [Fri, 9 May 2025 15:34:08 +0000 (16:34 +0100)]
sycl : implementation of reordered Q4_0 MMVQ for Intel GPUs (#12858)
* sycl : Implemented reorder Q4_0 mmvq
Signed-off-by: Alberto Cabrera <redacted>
* sycl : Fixed mmvq being called when reorder is disabled
* sycl : Improved comments in the quants header
Signed-off-by: Alberto Cabrera <redacted>
* Use static_assert
* safe_div -> ceil_div
* Clarify qi comment
* change the reorder tensor from init to execute OP
* dbg
* Undo changes to test-backend-ops
* Refactor changes on top of q4_0 reorder fix
* Missing Reverts
* Refactored opt_for_reorder logic to simplify code path
* Explicit inlining and unroll
* Renamed mul_mat_algo enum for consistency
---------
Signed-off-by: Alberto Cabrera <redacted>
Co-authored-by: romain.biessy <redacted>
Georgi Gerganov [Fri, 9 May 2025 12:14:56 +0000 (15:14 +0300)]
metal : optimize MoE for large batches (#13388)
ggml-ci
Johannes Gäßler [Fri, 9 May 2025 11:34:58 +0000 (13:34 +0200)]
CUDA: FA support for Deepseek (Ampere or newer) (#13306)
* CUDA: FA support for Deepseek (Ampere or newer)
* do loop unrolling via C++ template
Diego Devesa [Fri, 9 May 2025 11:02:07 +0000 (13:02 +0200)]
llama : do not crash if there is no CPU backend (#13395)
* llama : do not crash if there is no CPU backend
* add checks to examples
Johannes Gäßler [Fri, 9 May 2025 10:14:04 +0000 (12:14 +0200)]
CUDA: fix crash on large batch size for MoE models (#13384)
Bartowski [Fri, 9 May 2025 09:53:58 +0000 (05:53 -0400)]
imatrix : Add --parse-special for enabling parsing of special tokens in imatrix calculation (#13389)
* Add --parse-special for enabling parsing of special tokens in imatrix calculation
* whitespace
R0CKSTAR [Fri, 9 May 2025 09:25:50 +0000 (17:25 +0800)]
llama-run: add support for downloading models from ModelScope (#13370)
Signed-off-by: Xiaodong Ye <redacted>
Xuan-Son Nguyen [Fri, 9 May 2025 09:18:02 +0000 (11:18 +0200)]
mtmd : fix batch_view for m-rope (#13397)
* mtmd : fix batch_view for m-rope
* nits : fix comment
Xuan-Son Nguyen [Fri, 9 May 2025 09:17:51 +0000 (11:17 +0200)]
llama : one-off chat template fix for Mistral-Small-2503 (#13398)
* llama : one-off chat template fix for Mistral-Small-2503
* update readme
* add mistral-v7-tekken
Radoslav Gerganov [Fri, 9 May 2025 07:31:07 +0000 (10:31 +0300)]
rpc : add rpc_msg_set_tensor_hash_req (#13353)
* rpc : add rpc_msg_set_tensor_hash_req
Use a dedicated struct for the request of RPC_CMD_SET_TENSOR_HASH which
makes the code cleaner.
* fix
Jeff Bolz [Fri, 9 May 2025 07:23:41 +0000 (02:23 -0500)]
vulkan: Allow up to 4096 elements for mul_mat_id row_ids (#13326)
This assert fired running Qwen_Qwen3-30B-A3B-Q2_K.gguf:
GGML_ASSERT(nei0 * nei1 <= 3072);
The tensor is 8 x 512. Increase this array size to accommodate.
Xuan-Son Nguyen [Fri, 9 May 2025 07:06:37 +0000 (09:06 +0200)]
server : (webui) rename has_multimodal --> modalities (#13393)
* server : (webui) rename has_multimodal --> modalities
* allow converting SVG to PNG
* less complicated code
Diego Devesa [Thu, 8 May 2025 21:45:22 +0000 (23:45 +0200)]
ci : limit write permission to only the release step + fixes (#13392)
* ci : limit write permission to only the release step
* fix win cuda file name
* fix license file copy on multi-config generators
Matt Clayton [Thu, 8 May 2025 18:25:39 +0000 (14:25 -0400)]
mtmd : Expose helper_decode_image_chunk (#13366)
* mtmd: Expose helper_decode_image, output_embd_copy, image_tokens_copy/free
* Slim down
* Cleanups
Xuan-Son Nguyen [Thu, 8 May 2025 16:51:45 +0000 (18:51 +0200)]
server : (webui) fix a very small misalignment (#13387)
* server : (webui) fix a very small misalignment
* restore font-bold
Xuan-Son Nguyen [Thu, 8 May 2025 13:37:29 +0000 (15:37 +0200)]
server : (webui) revamp the input area, plus many small UI improvements (#13365)
* rework the input area
* process selected file
* change all icons to heroicons
* fix thought process collapse
* move conversation more menu to sidebar
* sun icon --> moon icon
* rm default system message
* stricter upload file check, only allow image if server has mtmd
* build it
* add renaming
* better autoscroll
* build
* add conversation group
* fix scroll
* extra context first, then user input in the end
* fix <hr> tag
* clean up a bit
* build
* add mb-3 for <pre>
* throttle adjustTextareaHeight to make it less laggy
* (nits) missing padding in sidebar
* rm stray console log
Sigbjørn Skjæret [Thu, 8 May 2025 13:34:29 +0000 (15:34 +0200)]
convert : support rope_scaling type and rope_type (#13349)
welix [Thu, 8 May 2025 13:03:53 +0000 (22:03 +0900)]
mtmd : fix the calculation of n_tokens for smolvlm (#13381)
Co-authored-by: Taichi Nishimura <redacted>
Georgi Gerganov [Thu, 8 May 2025 11:28:33 +0000 (14:28 +0300)]
context : allow cache-less context for embeddings (#13108)
* context : allow cache-less context for embeddings
ggml-ci
* context : enable reranking with encode()
ggml-ci
* context : encode() clears embd_seq
ggml-ci
* examples : use llama_encode() when appropriate
ggml-ci
* models : nomic bert moe does not require KV cache
* llama : update comments for llama_decode/llama_encode
ggml-ci
* context : update warning log [no ci]
Georgi Gerganov [Thu, 8 May 2025 11:26:50 +0000 (14:26 +0300)]
context : remove logits_all flag (#13284)
* context : remove logits_all flag
ggml-ci
* llama : remove logits_all flag + reorder llama_context_params
ggml-ci
Diego Devesa [Thu, 8 May 2025 11:15:28 +0000 (13:15 +0200)]
ci : move release workflow to a separate file (#13362)
Diego Devesa [Thu, 8 May 2025 11:15:15 +0000 (13:15 +0200)]
llama : print size and type of overridden tensors (#13364)
Alberto Cabrera Pérez [Thu, 8 May 2025 09:08:01 +0000 (10:08 +0100)]
sycl: addressing non-contiguous src1 mul_mats (nc and batched) (#13343)
* sycl: fixed non-contiguous src1 mul_mats (nc and batched)
* Fixed wrong static_cast inside kernel
Diego Devesa [Wed, 7 May 2025 14:36:33 +0000 (16:36 +0200)]
docker : disable arm64 and intel images (#13356)
Georgi Gerganov [Wed, 7 May 2025 13:39:36 +0000 (16:39 +0300)]
sync : ggml
ggml-ci
Daniel Bevenius [Mon, 5 May 2025 11:09:35 +0000 (13:09 +0200)]
whisper: remove MSVC warnings pragmas (whisper/3090)
* ggml : remove MSVC warnings pragmas
This commit removes the MSVC-specific pragmas as these are now handled
in ggml/CMakeLists.txt.
* whisper : remove MSVC warning pragmas
This commit removes the MSVC-specific pragmas. These are now handled in
the ggml/CMakeLists.txt file.
Jared Tweed [Fri, 2 May 2025 09:41:35 +0000 (02:41 -0700)]
cmake : removed stdc++fs (whisper/3097)
* removed stdc++fs
* kept line, but removed stdc++fs
Sigbjørn Skjæret [Wed, 7 May 2025 10:49:27 +0000 (12:49 +0200)]
llama : deci : support ffn-free with attention (#13296)
Ycros [Wed, 7 May 2025 08:23:28 +0000 (18:23 +1000)]
common : Add a warning when we can't match samplers from a string or char. (#13330)
R0CKSTAR [Wed, 7 May 2025 07:48:23 +0000 (15:48 +0800)]
cuda : remove nrows_x in mul_mat_q_process_tile (#13325)
Signed-off-by: Xiaodong Ye <redacted>
Georgi Gerganov [Wed, 7 May 2025 07:28:02 +0000 (10:28 +0300)]
examples : remove infill (#13283)
ggml-ci
piDack [Wed, 7 May 2025 07:23:11 +0000 (15:23 +0800)]
llama : support tie embedding for chatglm models (#13328)
Johannes Gäßler [Tue, 6 May 2025 21:35:51 +0000 (23:35 +0200)]
CUDA: mix virt/real CUDA archs for GGML_NATIVE=OFF (#13135)
Xuan-Son Nguyen [Tue, 6 May 2025 20:40:24 +0000 (22:40 +0200)]
clip : refactor graph builder (#13321)
* mtmd : refactor graph builder
* fix qwen2vl
* clean up siglip cgraph
* pixtral migrated
* move minicpmv to a dedicated build function
* move max_feature_layer to build_llava
* use build_attn for minicpm resampler
* fix windows build
* add comment for batch_size
* also support tinygemma3 test model
* qwen2vl does not use RMS norm
* fix qwen2vl norm (2)
DocShotgun [Tue, 6 May 2025 20:36:24 +0000 (13:36 -0700)]
sampling : make top_n_sigma no-op at <=0 or a single candidate (#13345)
oobabooga [Tue, 6 May 2025 18:24:15 +0000 (15:24 -0300)]
sampling : don't consider -infinity values in top_n_sigma (#13344)
Diego Devesa [Tue, 6 May 2025 18:15:31 +0000 (20:15 +0200)]
cmake : remove arm64 msvc presets (#13342)
Akarshan Biswas [Tue, 6 May 2025 14:57:06 +0000 (20:27 +0530)]
SYCL: Disable reorder optimize by default and stop setting tensor extras when optimize is disabled (#13254)
* SYCL: Do not set tensor extras when reorder optimize is disabled
* SYCL: Disable reorder optimize by default
Xuan-Son Nguyen [Tue, 6 May 2025 12:25:40 +0000 (14:25 +0200)]
llama : fix build_ffn without gate (#13336)
* llama : fix build_ffn without gate
* fix build on windows
* Revert "fix build on windows"
This reverts commit
fc420d3c7eef3481d3d2f313fef2757cb33a7c56 .
Johannes Gäßler [Tue, 6 May 2025 11:58:51 +0000 (13:58 +0200)]
CUDA: fix bad asserts for partial offload (#13337)
Sigbjørn Skjæret [Tue, 6 May 2025 09:12:06 +0000 (11:12 +0200)]
convert : qwen2/3moe : set yarn metadata if present (#13331)
* set yarn metadata if present
* add comment about enabling YaRN
Co-authored-by: Xuan-Son Nguyen <redacted>
---------
Co-authored-by: Xuan-Son Nguyen <redacted>
Johannes Gäßler [Tue, 6 May 2025 06:36:46 +0000 (08:36 +0200)]
CUDA: fix --split-mode row for MMQ (#13323)
compilade [Tue, 6 May 2025 02:27:31 +0000 (22:27 -0400)]
gguf-py : avoid requiring pyside6 for other scripts (#13036)
- gguf-py : remove gguf-py/gguf/scripts/__init__.py because it's not needed
Implicit namespaces are supported since Python 3.3 (https://peps.python.org/pep-0420/),
and the entrypoints in pyproject.toml can directly refer to the main functions.
Johannes Gäßler [Mon, 5 May 2025 20:32:13 +0000 (22:32 +0200)]
CUDA: fix logic for clearing padding with -ngl 0 (#13320)
oobabooga [Mon, 5 May 2025 20:12:19 +0000 (17:12 -0300)]
sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (#13264)
* sampling: add Top-nσ sampler to `llama-server` and sampler ordering
* revert: sampler ordering
* revert: VS' crappy auto-formatting
* revert: VS' crappy auto-formatting pt.2
* revert: my crappy eye sight...
* sampling: add XTC to Top-nσ sampler chain
* sampling: add Dyna. Temp. to Top-nσ sampler chain
* sampling: actually remove Top-nσ from sampler(oops)
* Integrate top_n_sigma into main sampler chain
* Define COMMON_SAMPLER_TYPE_TOP_N_SIGMA
* Formatting
* Lint
* Exit early in the sampler if nsigma < 0
---------
Co-authored-by: CasualAutopsy <redacted>
igardev [Mon, 5 May 2025 14:03:31 +0000 (17:03 +0300)]
server : Webui - change setText command from parent window to also send the message. (#13309)
* setText command from parent window for llama-vscode now sends the message automatically.
* Upgrade packages versions to fix vulnerabilities with "npm audit fix" command.
* Fix code formatting.
* Add index.html.gz changes.
* Revert "Upgrade packages versions to fix vulnerabilities with "npm audit fix" command."
This reverts commit
67687b7fda8a293724ba92ea30bb151677406bc8 .
* easier approach
* add setTimeout
---------
Co-authored-by: igardev <redacted>
Co-authored-by: Xuan Son Nguyen <redacted>
Xuan-Son Nguyen [Mon, 5 May 2025 14:02:55 +0000 (16:02 +0200)]
mtmd : rename llava directory to mtmd (#13311)
* mv llava to mtmd
* change ref everywhere
Xuan-Son Nguyen [Mon, 5 May 2025 10:54:44 +0000 (12:54 +0200)]
clip : fix confused naming ffn_up and ffn_down (#13290)
* clip : fix confused naming ffn_up and ffn_down
* rm ffn_i/o/g naming
* rename n_embd, n_ff
* small fix
* no check n_ff
Sigbjørn Skjæret [Mon, 5 May 2025 10:34:26 +0000 (12:34 +0200)]
convert : bailingmoe : set yarn metadata if present (#13312)
Akarshan Biswas [Mon, 5 May 2025 08:09:10 +0000 (13:39 +0530)]
SYCL: Disable mul_mat kernels for noncontiguous tensor b (#13308)
ggml-ci
Xuan-Son Nguyen [Sun, 4 May 2025 21:43:42 +0000 (23:43 +0200)]
mtmd : add C public API (#13184)
* init
* wip
* working version
* add mtmd::bitmaps
* add test target
* rm redundant define
* test: mtmd_input_chunks_free
* rm outdated comment
* fix merging issue
* explicitly create mtmd::input_chunks
* mtmd_input_chunk_copy
* add clone()
* add const to various places
* add warning about breaking changes
* helper: use mtmd_image_tokens_get_n_pos
Diego Devesa [Sun, 4 May 2025 19:25:43 +0000 (21:25 +0200)]
rpc : use backend registry, support dl backends (#13304)
Aaron Teo [Sun, 4 May 2025 17:49:12 +0000 (01:49 +0800)]
ggml : activate s390x simd for Q3_K (#13301)
Signed-off-by: Aaron Teo <redacted>
Diego Devesa [Sun, 4 May 2025 15:05:20 +0000 (17:05 +0200)]
llava/mtmd : fixes to fully support dl backends (#13303)