]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/commit
llama : llama_perf + option to disable timings during decode (#9355)
authorGeorgi Gerganov <redacted>
Fri, 13 Sep 2024 06:53:38 +0000 (09:53 +0300)
committerGitHub <redacted>
Fri, 13 Sep 2024 06:53:38 +0000 (09:53 +0300)
commit0abc6a2c25272d5cf01384dda8ee8bfec4ba8745
treeca075a9182e60fab558d7e5ca0d6dc0609426db0
parentbd35cb0ae357185c173345f10dc89a4ff925fc25
llama : llama_perf + option to disable timings during decode (#9355)

* llama : llama_perf + option to disable timings during decode

ggml-ci

* common : add llama_arg

* Update src/llama.cpp

Co-authored-by: Xuan Son Nguyen <redacted>
* perf : separate functions in the API

ggml-ci

* perf : safer pointer handling + naming update

ggml-ci

* minor : better local var name

* perf : abort on invalid sampler pointer

ggml-ci

---------

Co-authored-by: Xuan Son Nguyen <redacted>
23 files changed:
common/arg.cpp
common/common.cpp
common/common.h
common/sampling.cpp
examples/batched-bench/batched-bench.cpp
examples/batched.swift/Sources/main.swift
examples/batched/batched.cpp
examples/embedding/embedding.cpp
examples/eval-callback/eval-callback.cpp
examples/imatrix/imatrix.cpp
examples/llama-bench/llama-bench.cpp
examples/llava/llava-cli.cpp
examples/llava/minicpmv-cli.cpp
examples/lookup/lookup.cpp
examples/parallel/parallel.cpp
examples/passkey/passkey.cpp
examples/perplexity/perplexity.cpp
examples/retrieval/retrieval.cpp
examples/simple/simple.cpp
examples/speculative/speculative.cpp
include/llama.h
src/llama-sampling.cpp
src/llama.cpp