Daniel Bevenius [Tue, 16 Dec 2025 10:17:40 +0000 (11:17 +0100)]
model-conversion : add note about verifying previous models (#18082)
This commit adds a note to the README in the model-conversion
examples, advising developers to verify that previous versions of models
pass logits verification before adding new models from the same family.
Daniel Bevenius [Tue, 16 Dec 2025 10:17:20 +0000 (11:17 +0100)]
model-conversion : use CONVERTED_EMBEDDING_MODEL for embedding_verify_logits (#18079)
This commit updates the embedding model verification script to use the
CONVERTED_EMBEDDING_MODEL environment variable instead of using the
EMBEDDING_MODEL_PATH (the original embedding model path) as the basis
for the converted model file name.
The motivation for this that currently if the converted embedding model
file name differs from the original embedding model directory/name the
verification script will look for the wrong .bin files that were
generating when running the models.
ssweens [Mon, 15 Dec 2025 17:28:35 +0000 (09:28 -0800)]
kv-cache: Fix state restore fragmented cache (#17982)
* kv-cache : fix state restore with fragmented cache (#17527)
Change find_slot to allow non-contiguous allocation during state restore. Fixes 'failed to find available cells in kv cache' error when restoring state to fragmented cache.
* tests : update logic
* cleanup: tightened state_read_meta sig, added is_contiguous case
Jeremy Demeule [Mon, 15 Dec 2025 14:14:49 +0000 (15:14 +0100)]
metal: use shared buffers on eGPU (#17866)
* metal: use shared buffers on eGPU
With #15906, I noticed on important regression when using metal backend on eGPU.
This commit restore the previous behavior and add an option to force its activation.
Thomas Jarosch [Mon, 15 Dec 2025 10:29:29 +0000 (11:29 +0100)]
webui: add "delete all conversations" button to import/export tab (#17444)
* webui: add "delete all conversations" button to import/export tab
- Add 'Delete all conversations' functionality with confirmation dialog
- Add Trash icon and destructive styling for clear visual indication
- Redirects to "?new_chat=true#/" by using conversationsStore.deleteAll()
Congcong Cai [Fri, 12 Dec 2025 14:37:38 +0000 (22:37 +0800)]
cmake : set `CMAKE_RUNTIME_OUTPUT_DIRECTORY` for non standalone build (ggml/1394)
Some backend depends on CMAKE_RUNTIME_OUTPUT_DIRECTORY to create temporary file like metal backened.
Missing CMAKE_RUNTIME_OUTPUT_DIRECTORY will cause some cmake error like permission denied (try to copy file to root).
This PR wants to setup a default path for CMAKE_RUNTIME_OUTPUT_DIRECTORY when it does not exist.
Jeff Bolz [Sat, 13 Dec 2025 09:04:29 +0000 (03:04 -0600)]
vulkan: Multi-pass softmax for large number of cols (#17892)
When the number of cols is large, split each row across multiple workgroups.
There are three phases that communicate partial results through temp buffers:
(1) compute max partials
(2) take max of partials, compute sum(exp(x-max)) partials
(3) sum partials, compute scaled result
Daniel Bevenius [Sat, 13 Dec 2025 07:34:26 +0000 (08:34 +0100)]
model-conversion : use CONVERTED_MODEL value for converted model [no ci] (#17984)
* model-conversion : use CONVERTED_MODEL value for converted model [no ci]
This commit updates the model verification scripts to use the
CONVERTED_MODEL environment variable instead of using the MODEL_PATH
(the original model path) as the basis for the converted model file
name.
The motivation for this that currently if the converted model file name
differs from the original model directory/name the verification scripts
will look for the wrong .bin files that were generating when running the
models.
For example, the following steps were not possible:
```console
(venv) $ huggingface-cli download google/gemma-3-270m-it --local-dir ggml-org/gemma-3-270m
(venv) $ python3 convert_hf_to_gguf.py ggml-org/gemma-3-270m --outfile test-bf16.gguf --outtype bf16
(venv) $ cd examples/model-conversion/
(venv) $ export MODEL_PATH=../../ggml-org/gemma-3-270m
(venv) $ export CONVERTED_MODEL=../../test-bf16.gguf
(venv) $ make causal-verify-logits
...
Data saved to data/llamacpp-test-bf16.bin
Data saved to data/llamacpp-test-bf16.txt
Error: llama.cpp logits file not found: data/llamacpp-gemma-3-270m.bin
Please run scripts/run-converted-model.sh first to generate this file.
make: *** [Makefile:62: causal-verify-logits] Error 1
```
With the changes in this commit, the above steps will now work as
expected.
Daniel Bevenius [Fri, 12 Dec 2025 12:25:16 +0000 (13:25 +0100)]
model-conversion : remove max diff check in compare-logits [no ci] (#17954)
This commit removes the maximum difference check from the
compare-logits.py which would stop early if the difference between
the logits exceeded a threshold.
The motivation for removing this is that it can be useful to be able to
get the complete log for debugging/reporting purposes.
* feat: Use Popover component + improve interactions
* fix: Fetching props for only loaded models in ROUTER mode
* webui: prevent models selector popover from overflowing viewport
Use Floating UI's auto-positioning with 50dvh height limit and proper
collision detection instead of forcing top positioning. Fixes overflow
on desktop and mobile keyboard issues
* webui: keep search field near trigger in models selector
Place search at the 'near end' (closest to trigger) by swapping layout
with CSS flexbox order based on popover direction. Prevents input from
moving during typing as list shrinks
Pascal [Wed, 10 Dec 2025 21:18:21 +0000 (22:18 +0100)]
server: add presets (config) when using multiple models (#17859)
* llama-server: recursive GGUF loading
Replace flat directory scan with recursive traversal using
std::filesystem::recursive_directory_iterator. Support for
nested vendor/model layouts (e.g. vendor/model/*.gguf).
Model name now reflects the relative path within --models-dir
instead of just the filename. Aggregate files by parent
directory via std::map before constructing local_model
* server : router config POC (INI-based per-model settings)
* server: address review feedback from @aldehir and @ngxson
PEG parser usage improvements:
- Simplify parser instantiation (remove arena indirection)
- Optimize grammar usage (ws instead of zero_or_more, remove optional wrapping)
- Fix last line without newline bug (+ operator instead of <<)
- Remove redundant end position check
Feature scope:
- Remove auto-reload feature (will be separate PR per @ngxson)
- Keep config.ini auto-creation and template generation
- Preserve per-model customization logic
Complete rewrite of INI parser grammar and visitor:
- Use p.chars(), p.negate(), p.any() instead of p.until()
- Support end-of-line comments (key=value # comment)
- Handle EOF without trailing newline correctly
- Strict identifier validation ([a-zA-Z_][a-zA-Z0-9_.-]*)
- Simplified visitor (no pending state, no trim needed)
- Grammar handles whitespace natively via eol rule
Business validation preserved:
- Reject section names starting with LLAMA_ARG_*
- Accept only keys starting with LLAMA_ARG_*
- Require explicit section before key-value pairs
Children now receive minimal CLI args (executable, model, port, alias)
instead of inheriting all router args. Global settings pass through
LLAMA_ARG_* environment variables only, eliminating duplicate config
warnings.
Fixes: Router args like -ngl, -fa were passed both via CLI and env,
causing 'will be overwritten' warnings on every child spawn
* add common/preset.cpp
* fix compile
* cont
* allow custom-path models
* add falsey check
* server: fix router model discovery and child process spawning
- Sanitize model names: replace / and \ with _ for display
- Recursive directory scan with relative path storage
- Convert relative paths to absolute when spawning children
- Filter router control args from child processes
- Refresh args after port assignment for correct port value
- Fallback preset lookup for compatibility
- Fix missing argv[0]: store server binary path before base_args parsing
* Revert "server: fix router model discovery and child process spawning"