git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit

author	Pascal <redacted>
	Wed, 3 Dec 2025 14:10:37 +0000 (15:10 +0100)
committer	GitHub <redacted>
	Wed, 3 Dec 2025 14:10:37 +0000 (15:10 +0100)
commit	e7c2cf1356c8127140915a5f313e02dff4b07be8
tree	7cc044ca95b798429eb2cab1ec179c34e0d813eb	tree
parent	1257491047aed0f56b81f532a5a4865add918821	commit \| diff

server: add router multi-model tests (#17704) (#17722)

* llama-server: add router multi-model tests (#17704)

Add 4 test cases for model router:
- test_router_unload_model: explicit model unloading
- test_router_models_max_evicts_lru: LRU eviction with --models-max
- test_router_no_models_autoload: --no-models-autoload flag behavior
- test_router_api_key_required: API key authentication

Tests use async model loading with polling and graceful skip when
insufficient models available for eviction testing.

utils.py changes:
- Add models_max, models_dir, no_models_autoload attributes to ServerProcess
- Handle JSONDecodeError for non-JSON error responses (fallback to text)

* llama-server: update test models to new HF repos

* add offline

* llama-server: fix router LRU eviction test and add preloading

Fix eviction test: load 2 models first, verify state, then load
3rd to trigger eviction. Previous logic loaded all 3 at once,
causing first model to be evicted before verification could occur.

Add module fixture to preload models via ServerPreset.load_all()
and mark test presets as offline to use cached models

* llama-server: fix split model download on Windows

---------

Co-authored-by: Xuan-Son Nguyen <redacted>

tools/server/tests/unit/test_basic.py		diff \| blob \| history
tools/server/tests/unit/test_router.py		diff \| blob \| history
tools/server/tests/utils.py		diff \| blob \| history