]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
server : allow using LoRA adapters per-request (#10994)
authorXuan Son Nguyen <redacted>
Thu, 2 Jan 2025 14:05:18 +0000 (15:05 +0100)
committerGitHub <redacted>
Thu, 2 Jan 2025 14:05:18 +0000 (15:05 +0100)
commit0da5d860266c6928b8c9408efbd264ae59fedda6
treeedd1e4e9d3897381ba9b006480a1d9f525ccbf98
parenta45433ba209ee0b33d02c7dc4c31f29894ad83a6
server : allow using LoRA adapters per-request (#10994)

* slot.can_batch_with

* lora per request

* test: force disable cache prompt

* move can_batch_with check

* fix condition

* add slow test with llama 8b

* update docs

* move lora change task to queue

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <redacted>
* lora_base

* remove redundant check

---------

Co-authored-by: Georgi Gerganov <redacted>
examples/server/README.md
examples/server/server.cpp
examples/server/tests/README.md
examples/server/tests/requirements.txt
examples/server/tests/unit/test_lora.py
examples/server/tests/unit/test_speculative.py
examples/server/tests/utils.py
examples/server/utils.hpp