]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
llama.cpp : split llama_context_params into model and context params (#3301)
authorslaren <redacted>
Thu, 28 Sep 2023 19:42:38 +0000 (21:42 +0200)
committerGitHub <redacted>
Thu, 28 Sep 2023 19:42:38 +0000 (22:42 +0300)
commit16bc66d9479edd5ee12ec734973554d4493c5dfa
tree4cca787ebd86dd55fd176d27112117c74e9b34c6
parent0512d66670de3f650c579519833c085014b0f200
llama.cpp : split llama_context_params into model and context params (#3301)

* llama.cpp : split llama_context_params into model and context params

ggml-ci

* fix metal build

* fix freq_base/scale default to model value

* llama-bench : keep the same model between tests when possible

* move n_threads to llama_context_params, add n_threads_batch

* fix mpi build

* remove kv_size(), cuda scratch fixes

* remove low-vram option

* add n_threads_batch to system info, refactor to get_system_info()

* add documentation about --threads-batch to the READMEs

* llama-bench fix

* main : fix rope freq/scale warning

* llama.cpp : add llama_get_model
common : add llama_tokenize from model

* remove duplicated ctx/model functions

ggml-ci

* cuda : print total VRAM used
27 files changed:
common/common.cpp
common/common.h
common/train.cpp
examples/batched/batched.cpp
examples/beam-search/beam-search.cpp
examples/embd-input/embd-input-lib.cpp
examples/embd-input/embd-input-test.cpp
examples/embedding/embedding.cpp
examples/finetune/finetune.cpp
examples/llama-bench/llama-bench.cpp
examples/main/README.md
examples/main/main.cpp
examples/parallel/parallel.cpp
examples/perplexity/perplexity.cpp
examples/quantize-stats/quantize-stats.cpp
examples/save-load-state/save-load-state.cpp
examples/server/README.md
examples/server/server.cpp
examples/simple/simple.cpp
examples/speculative/speculative.cpp
examples/train-text-from-scratch/train-text-from-scratch.cpp
ggml-cuda.cu
llama.cpp
llama.h
tests/test-tokenizer-0-falcon.cpp
tests/test-tokenizer-0-llama.cpp
tests/test-tokenizer-1-llama.cpp