git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

]> git.djapps.eu Git - pkg/ggml/sources/llama.cpp/log

overview / pkg / ggml / sources / llama.cpp / log

fxzjshm [Tue, 4 Feb 2025 18:18:38 +0000 (02:18 +0800)]

HIP: force max threads per block to be 1024 (#11621)

Some old/vendor forked version of llvm still use 256. Explicitly set it to 1024 to align with upstream llvm.

Signed-off-by: fxzjshm <redacted>

commit | commitdiff | tree

Xuan-Son Nguyen [Tue, 4 Feb 2025 17:25:42 +0000 (18:25 +0100)]

server : add try..catch to places not covered by set_exception_handler (#11620)

* server : add try..catch to places not covered by set_exception_handler

* log_server_request: rm try catch, add reminder

commit | commitdiff | tree

Radoslav Gerganov [Tue, 4 Feb 2025 16:16:20 +0000 (18:16 +0200)]

arg : list RPC devices first when using --list-devices (#11655)

List devices in the same order as they appear when evaluating the model
and splitting tensors across devices, i.e. RPC devices come first in the
list.

ref #11435

commit | commitdiff | tree

Olivier Chafik [Tue, 4 Feb 2025 15:48:53 +0000 (15:48 +0000)]

`tool-call`: command r7b fix for normal responses (#11608)

* fix command r7b normal response regex + add to server test

* test multiline non-tool-call responses in test-chat

commit | commitdiff | tree

Shelby Jenkins [Tue, 4 Feb 2025 11:20:55 +0000 (05:20 -0600)]

readme : add llm_client Rust crate to readme bindings (#11628)

[This crate](https://github.com/ShelbyJenkins/llm_client) has been in a usable state for quite awhile, so I figured now is fair to add it.

It installs from crates.io, and automatically downloads the llama.cpp repo and builds it for the target platform - with the goal being the easiest user experience possible.

It also integrates model presets and choosing the largest quant given the target's available VRAM. So a user just has to specify one of the presets (I manually add the most popular models), and it will download from hugging face.

So, it's like a Rust Ollama, but it's not really for chatting. It makes heavy use of llama.cpp's grammar system to do structured output for decision making and control flow tasks.

commit | commitdiff | tree

Jhen-Jie Hong [Tue, 4 Feb 2025 11:15:24 +0000 (19:15 +0800)]

swift : fix llama-vocab api usage (#11645)

* swiftui : fix vocab api usage

* batched.swift : fix vocab api usage

commit | commitdiff | tree

Jhen-Jie Hong [Tue, 4 Feb 2025 11:07:18 +0000 (19:07 +0800)]

metal : use residency set for other platforms (#11648)

commit | commitdiff | tree

Georgi Gerganov [Tue, 4 Feb 2025 11:04:10 +0000 (13:04 +0200)]

authors : update

commit | commitdiff | tree

Georgi Gerganov [Tue, 4 Feb 2025 10:59:21 +0000 (12:59 +0200)]

sync : ggml

commit | commitdiff | tree

Christian Kastner [Mon, 3 Feb 2025 23:17:15 +0000 (00:17 +0100)]

cmake: Add ability to pass in GGML_BUILD_NUMBER (ggml/1096)

This makes git as a dependency optional, and is useful in the case where
ggml is built not from git, but from a tarball, or a distribution source
package.

This conditional also affects GGML_BUILD_COMMIT. Nothing seems to be
using it, though, so there doesn't seem much value factor it out, or
even require it.

commit | commitdiff | tree

Georgi Gerganov [Tue, 4 Feb 2025 07:30:42 +0000 (09:30 +0200)]

ci : do not stale-close roadmap issues

commit | commitdiff | tree

Olivier Chafik [Mon, 3 Feb 2025 23:49:27 +0000 (23:49 +0000)]

`tool-call`: allow `--chat-template chatml` w/ `--jinja`, default to chatml upon parsing issue, avoid double bos (#11616)

* tool-call: allow `--jinja --chat-template chatml`

* fix double bos issue (drop bos/eos tokens from jinja template)

* add missing try catch around jinja parsing to default to chatml

* Simplify default chatml logic

commit | commitdiff | tree

Xuan-Son Nguyen [Mon, 3 Feb 2025 23:10:52 +0000 (00:10 +0100)]

server : (webui) revert hacky solution from #11626 (#11634)

commit | commitdiff | tree

Woof Dog [Mon, 3 Feb 2025 22:16:27 +0000 (22:16 +0000)]

server : (webui) allow typing and submitting during llm response (#11626)

commit | commitdiff | tree

Daniel Bevenius [Mon, 3 Feb 2025 15:45:38 +0000 (16:45 +0100)]

server : remove CPPHTTPLIB_NO_EXCEPTIONS define (#11622)

This commit removes the CPPHTTPLIB_NO_EXCEPTIONS define from the server
code.

The motivation for this is that when using a debug build the server
would crash when an exception was throws and terminate the server
process, as it was unhandled. When CPPHTTPLIB_NO_EXCEPTIONS is set
cpp_httplib will not call the exception handler, which would normally
return a 500 error to the client. This caused tests to fail when using
a debug build.

Fixes: https://github.com/ggerganov/llama.cpp/issues/11613

commit | commitdiff | tree

Georgi Gerganov [Mon, 3 Feb 2025 12:57:08 +0000 (14:57 +0200)]

sync : ggml

commit | commitdiff | tree

Johannes Gäßler [Mon, 3 Feb 2025 12:25:56 +0000 (13:25 +0100)]

CUDA: fix Volta FlashAttention logic (#11615)

commit | commitdiff | tree

mashdragon [Mon, 3 Feb 2025 09:42:55 +0000 (09:42 +0000)]

server : (webui) Fix Shift+Enter handling (#11609)

* Fix Shift+Enter handling

`exact` on the Enter handler means the message is not sent when Shift+Enter is pressed anyway

* build index.html.gz

---------

Co-authored-by: Xuan Son Nguyen <redacted>

commit | commitdiff | tree

Johannes Gäßler [Sun, 2 Feb 2025 22:48:29 +0000 (23:48 +0100)]

HIP: fix flash_attn_stream_k_fixup warning (#11604)

commit | commitdiff | tree

uvos [Sun, 2 Feb 2025 21:40:09 +0000 (22:40 +0100)]

CUDA/HIP: add support for selectable warp size to mmv (#11519)

CUDA/HIP: add support for selectable warp size to mmv

commit | commitdiff | tree

uvos [Sun, 2 Feb 2025 21:08:05 +0000 (22:08 +0100)]

HIP: add GGML_CUDA_CC_IS_* for amd familys as increasing cc archtectures for amd gpus are not supersets of eatch other (#11601)

This fixes a bug where RDNA1 gpus other than gfx1010 where not handled correctly

commit | commitdiff | tree

Olivier Chafik [Sun, 2 Feb 2025 19:58:34 +0000 (19:58 +0000)]

nit: more informative crash when grammar sampler fails (#11593)

commit | commitdiff | tree

Johannes Gäßler [Sun, 2 Feb 2025 18:31:09 +0000 (19:31 +0100)]

CUDA: use mma PTX instructions for FlashAttention (#11583)

* CUDA: use mma PTX instructions for FlashAttention

* __shfl_sync workaround for movmatrix

* add __shfl_sync to HIP

Co-authored-by: Diego Devesa <redacted>

commit | commitdiff | tree

Eric Curtin [Sun, 2 Feb 2025 15:14:48 +0000 (16:14 +0100)]

Name colors (#11573)

It's more descriptive, use #define's so we can use compile-time
concatenations.

Signed-off-by: Eric Curtin <redacted>

commit | commitdiff | tree

Olivier Chafik [Sun, 2 Feb 2025 09:25:38 +0000 (09:25 +0000)]

`tool-call`: support Command R7B (+ return tool_plan "thoughts" in API) (#11585)

* `tool-call`: support Command R7B (w/ tool_plan return)

* `tool-call`: cleaner preservation of tokens + warn when likely bad chat template override

* `tool-call`: test cleanup / handle lazy grammar triggers

commit | commitdiff | tree

Olivier Chafik [Sun, 2 Feb 2025 09:10:15 +0000 (09:10 +0000)]

Fix exotic ci env that lacks ostringstream::str (#11581)

commit | commitdiff | tree

Michał Moskal [Sun, 2 Feb 2025 07:55:32 +0000 (23:55 -0800)]

sampling : support for llguidance grammars (#10224)

* initial porting of previous LLG patch

* update for new APIs

* build: integrate llguidance as an external project

* use '%llguidance' as marker to enable llg lark syntax

* add some docs

* clarify docs

* code style fixes

* remove llguidance.h from .gitignore

* fix tests when llg is enabled

* pass vocab not model to llama_sampler_init_llg()

* copy test-grammar-integration.cpp to test-llguidance.cpp

* clang fmt

* fix ref-count bug

* build and run test

* gbnf -> lark syntax

* conditionally include llguidance test based on LLAMA_LLGUIDANCE flag

* rename llguidance test file to test-grammar-llguidance.cpp

* add gh action for llg test

* align tests with LLG grammar syntax and JSON Schema spec

* llama_tokenizer() in fact requires valid utf8

* update llg

* format file

* add $LLGUIDANCE_LOG_LEVEL support

* fix whitespace

* fix warning

* include <cmath> for INFINITY

* add final newline

* fail llama_sampler_init_llg() at runtime

* Link gbnf_to_lark.py script; fix links; refer to llg docs for lexemes

* simplify #includes

* improve doc string for LLAMA_LLGUIDANCE

* typo in merge

* bump llguidance to 0.6.12

commit | commitdiff | tree

piDack [Sun, 2 Feb 2025 07:48:46 +0000 (15:48 +0800)]

llama : add support for GLM-Edge and GLM-Edge-V series models (#10573)

* add glm edge chat model

* use config partial_rotary_factor as rope ratio

* support for glm edge model

* vision model support

* remove debug info

* fix format

* llava.cpp trailing whitespace

* remove unused AutoTokenizer

* Update src/llama.cpp for not contain <|end|> or </s>

Co-authored-by: Xuan Son Nguyen <redacted>
* add edge template

* fix chat template

* fix confict

* fix confict

* fix ci err

* fix format err

* fix template err

* 9b hf chat support

* format

* format clip.cpp

* fix format

* Apply suggestions from code review

* Apply suggestions from code review

* Update examples/llava/clip.cpp

* fix format

* minor : style

---------

Co-authored-by: liyuhang <redacted>
Co-authored-by: piDack <redacted>
Co-authored-by: Xuan Son Nguyen <redacted>
Co-authored-by: liyuhang <redacted>
Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Olivier Chafik [Sat, 1 Feb 2025 18:22:38 +0000 (18:22 +0000)]

ci: use sccache on windows HIP jobs (#11553)

commit | commitdiff | tree

Olivier Chafik [Sat, 1 Feb 2025 12:24:51 +0000 (12:24 +0000)]

`sync`: minja (https://github.com/google/minja/commit/418a2364b56dc9be4ed9a1a2b0fb16fb53a7a22e) (#11574)

commit | commitdiff | tree

Eric Curtin [Sat, 1 Feb 2025 10:30:54 +0000 (11:30 +0100)]

Implement s3:// protocol (#11511)

For those that want to pull from s3

Signed-off-by: Eric Curtin <redacted>

commit | commitdiff | tree

Olivier Chafik [Sat, 1 Feb 2025 00:01:20 +0000 (00:01 +0000)]

ci: simplify cmake build commands (#11548)

commit | commitdiff | tree

Olivier Chafik [Fri, 31 Jan 2025 17:12:40 +0000 (17:12 +0000)]

`ci`: use sccache on windows instead of ccache (#11545)

* Use sccache on ci for windows

* Detect sccache in cmake

commit | commitdiff | tree

Olivier Chafik [Fri, 31 Jan 2025 14:15:25 +0000 (14:15 +0000)]

`tool-call`: fix llama 3.x and functionary 3.2, play nice w/ pydantic_ai package, update readme (#11539)

* An empty tool_call_id is better than none!

* sync: minja (tool call name optional https://github.com/google/minja/pull/36)

* Force-disable parallel_tool_calls if template doesn't support it

* More debug logs

* Llama 3.x tools: accept / trigger on more varied spaced outputs

* Fix empty content for functionary v3.2 tool call

* Add proper tool call docs to server README

* readme: function calling *is* supported now

* Apply suggestions from code review

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Olivier Chafik [Fri, 31 Jan 2025 13:48:31 +0000 (13:48 +0000)]

fix stop regression (#11543)

commit | commitdiff | tree

Olivier Chafik [Fri, 31 Jan 2025 08:24:29 +0000 (08:24 +0000)]

Fix chatml fallback for unsupported builtin templates (when --jinja not enabled) (#11533)

commit | commitdiff | tree

Olivier Chafik [Fri, 31 Jan 2025 08:12:40 +0000 (08:12 +0000)]

server : fix --jinja when there's no tools or schema (typo was forcing JSON) (#11531)

commit | commitdiff | tree

Steve Grubb [Fri, 31 Jan 2025 05:58:55 +0000 (00:58 -0500)]

common: Add missing va_end (#11529)

The va_copy man page states that va_end must be called to revert
whatever the copy did. For some implementaions, not calling va_end
has no consequences. For others it could leak memory.

commit | commitdiff | tree

Daniel Bevenius [Fri, 31 Jan 2025 05:04:53 +0000 (06:04 +0100)]

server : update help metrics processing/deferred (#11512)

This commit updates the help text for the metrics `requests_processing`
and `requests_deferred` to be more grammatically correct.

Currently the returned metrics look like this:
```console
\# HELP llamacpp:requests_processing Number of request processing.
\# TYPE llamacpp:requests_processing gauge
llamacpp:requests_processing 0
\# HELP llamacpp:requests_deferred Number of request deferred.
\# TYPE llamacpp:requests_deferred gauge
llamacpp:requests_deferred 0
```

With this commit, the metrics will look like this:
```console
\# HELP llamacpp:requests_processing Number of requests processing.
\# TYPE llamacpp:requests_processing gauge
llamacpp:requests_processing 0
\# HELP llamacpp:requests_deferred Number of requests deferred.
\# TYPE llamacpp:requests_deferred gauge
llamacpp:requests_deferred 0
```
This is also consistent with the description of the metrics in the
server examples [README.md](https://github.com/ggerganov/llama.cpp/tree/master/examples/server#get-metrics-prometheus-compatible-metrics-exporter).

commit | commitdiff | tree

Olivier Chafik [Thu, 30 Jan 2025 22:01:06 +0000 (22:01 +0000)]

`ci`: ccache for all github worfklows (#11516)

commit | commitdiff | tree

Olivier Chafik [Thu, 30 Jan 2025 19:13:58 +0000 (19:13 +0000)]

Tool call support (generic + native for Llama, Functionary, Hermes, Mistral, Firefunction, DeepSeek) w/ lazy grammars (#9639)

---------

Co-authored-by: Xuan Son Nguyen <redacted>
Co-authored-by: Georgi Gerganov <redacted>
Co-authored-by: Xuan Son Nguyen <redacted>

commit | commitdiff | tree

uvos [Wed, 29 Jan 2025 18:36:00 +0000 (19:36 +0100)]

HIP: require at least HIP 5.5

commit | commitdiff | tree

uvos [Wed, 29 Jan 2025 18:12:42 +0000 (19:12 +0100)]

HIP: Prepare reduction operators for wave 64

commit | commitdiff | tree

uvos [Wed, 29 Jan 2025 16:46:23 +0000 (17:46 +0100)]

CUDA/HIP: add warp_size to cuda_device_info

commit | commitdiff | tree

Olivier Chafik [Thu, 30 Jan 2025 10:30:27 +0000 (10:30 +0000)]

sync: minja (#11499)

commit | commitdiff | tree

mgroeber9110 [Thu, 30 Jan 2025 10:10:59 +0000 (11:10 +0100)]

vocab : correctly identify LF token for GPT-2 style BPE tokenizer (#11496)

commit | commitdiff | tree

Daniel Bevenius [Thu, 30 Jan 2025 10:05:00 +0000 (11:05 +0100)]

server : use lambda instead of std::bind (#11507)

This commit replaces the two usages of `std::bind` in favor of lambdas for
the callback functions for `callback_new_task` and
`callback_update_slots`.

The motivation for this changes is consistency with the rest of the code
in server.cpp (lambdas are used for all other callbacks/handlers). Also
lambdas are more readable (perhaps this is subjective) but also they are
recommended over `std::bind` in modern C++.

Ref: https://github.com/LithoCoders/dailycpp/blob/master/EffectiveModernC%2B%2B/chapter6/Item34_Prefer_lambdas_to_std::bind.md

commit | commitdiff | tree

Isaac McFadyen [Thu, 30 Jan 2025 09:11:53 +0000 (04:11 -0500)]

server : (docs) added response format for /apply-template [no ci] (#11503)

commit | commitdiff | tree

Guspan Tanadi [Thu, 30 Jan 2025 05:58:02 +0000 (12:58 +0700)]

readme : reference examples relative links (#11505)

commit | commitdiff | tree

Daniel Bevenius [Thu, 30 Jan 2025 04:48:14 +0000 (05:48 +0100)]

server : update json snippets in README.md [no ci] (#11492)

This commit updates some of JSON snippets in README.md file and
removes the `json` language tag from the code blocks.

The motivation for this changes is that if there is invalid json in a
code snippet these are highlighted in red which can make it somewhat
difficult to read and can be a little distracting.

commit | commitdiff | tree

Nigel Bosch [Wed, 29 Jan 2025 18:45:44 +0000 (12:45 -0600)]

server : add /apply-template endpoint for additional use cases of Minja functionality (#11489)

* add /apply-template endpoint to server

* remove unnecessary line

* add /apply-template documentation

* return only "prompt" field in /apply-template

* use suggested idea instead of my overly verbose way

commit | commitdiff | tree

Rémy Oudompheng [Wed, 29 Jan 2025 17:29:39 +0000 (18:29 +0100)]

vulkan: implement initial support for IQ2 and IQ3 quantizations (#11360)

* vulkan: initial support for IQ3_S

* vulkan: initial support for IQ3_XXS

* vulkan: initial support for IQ2_XXS

* vulkan: initial support for IQ2_XS

* vulkan: optimize Q3_K by removing branches

* vulkan: implement dequantize variants for coopmat2

* vulkan: initial support for IQ2_S

* vulkan: vertically realign code

* port failing dequant callbacks from mul_mm

* Fix array length mismatches

* vulkan: avoid using workgroup size before it is referenced

* tests: increase timeout for Vulkan llvmpipe backend

---------

Co-authored-by: Jeff Bolz <redacted>

commit | commitdiff | tree

Daniel Bevenius [Wed, 29 Jan 2025 15:34:18 +0000 (16:34 +0100)]

server : update auto gen files comments [no ci] (#11484)

* server : update auto gen files comments

This commit updates the 'auto generated files' comments in server.cpp
and removes `deps.sh` from the comment.

The motivation for this change is that `deps.sh` was removed in
Commit 91c36c269bca75b2d08119c653512cd20b4ea2ba ("server : (web ui)
Various improvements, now use vite as bundler (#10599)").

* squash! server : update auto gen files comments [no ci]

Move comments about file generation to README.md.

* squash! server : update auto gen files comments [no ci]

Remove the comments in server.cpp that mention that information
can be found in the README.md file.

commit | commitdiff | tree

Jeff Bolz [Wed, 29 Jan 2025 15:26:50 +0000 (09:26 -0600)]

vulkan: Catch pipeline creation failure and print an error message (#11436)

* vulkan: Catch pipeline creation failure and print an error message

Also, fix some warnings from my on-demand compile change.

* vulkan: fix pipeline creation logging

commit | commitdiff | tree

Eric Curtin [Wed, 29 Jan 2025 11:23:10 +0000 (12:23 +0100)]

Parse https://ollama.com/library/ syntax (#11480)

People search for ollama models using the web ui, this change
allows one to copy the url from the browser and for it to be
compatible with llama-run.

Signed-off-by: Eric Curtin <redacted>

commit | commitdiff | tree

Georgi Gerganov [Wed, 29 Jan 2025 09:25:29 +0000 (11:25 +0200)]

sync : ggml

commit | commitdiff | tree

William Tambellini [Thu, 23 Jan 2025 19:59:08 +0000 (11:59 -0800)]

ggml : add option to not print stack on abort (ggml/1081)

* Add option to not print stack on abort

Add option/envvar to disable stack printing on abort.
Also link some unittests with Threads to fix link errors on
ubuntu/g++11.

* Update ggml/src/ggml.c

---------

Co-authored-by: Diego Devesa <redacted>

commit | commitdiff | tree

issixx [Fri, 17 Jan 2025 12:29:08 +0000 (21:29 +0900)]

ggml-cpu : fix ggml_graph_compute_thread did not terminate on abort. (ggml/1065)

some threads kept looping and failed to terminate properly after an abort during CPU execution.

Co-authored-by: issi <redacted>

commit | commitdiff | tree

Daniel Bevenius [Wed, 29 Jan 2025 08:38:54 +0000 (09:38 +0100)]

embedding : enable --no-warmup option (#11475)

This commit enables the `--no-warmup` option for the llama-embeddings.

The motivation for this change is to allow the user to disable the
warmup when running the the program.

commit | commitdiff | tree

Molly Sophia [Wed, 29 Jan 2025 04:07:21 +0000 (12:07 +0800)]

llama: fix missing k_cache store for rwkv6qwen2 (#11445)

Signed-off-by: Molly Sophia <redacted>

commit | commitdiff | tree

Emreerdog [Tue, 28 Jan 2025 23:22:06 +0000 (02:22 +0300)]

cmake: add hints for locating ggml on Windows using Llama find-package (#11466)

commit | commitdiff | tree

peidaqi [Tue, 28 Jan 2025 23:03:42 +0000 (16:03 -0700)]

server : Fixed wrong function name in llamacpp server unit test (#11473)

The test_completion_stream_with_openai_library() function is actually with stream=False by default, and test_completion_with_openai_library() with stream=True

commit | commitdiff | tree

Xuan-Son Nguyen [Tue, 28 Jan 2025 23:02:56 +0000 (00:02 +0100)]

ci : fix build CPU arm64 (#11472)

* ci : fix build CPU arm64

* failed, trying ubuntu 22

* vulkan: ubuntu 24

* vulkan : jammy --> noble

commit | commitdiff | tree

uvos [Tue, 28 Jan 2025 22:06:32 +0000 (23:06 +0100)]

HIP: Supress transformation warning in softmax.cu

loops with bounds not known at compile time can not be unrolled.
when ncols_template == 0, the bounds of the loop are not constexpr, thus llvm cant unroll the loops here.

commit | commitdiff | tree

Nikita Sarychev [Tue, 28 Jan 2025 15:42:20 +0000 (07:42 -0800)]

HIP: Only call rocblas_initialize on rocblas versions with the multiple instantation bug (#11080)

This disables the workaround on rocblas fixed versions (>=4.0.0) to eliminate the runtime cost and unnecessary VRAM allocation of loading all tensile objects.

commit | commitdiff | tree

Eric Curtin [Tue, 28 Jan 2025 14:45:41 +0000 (15:45 +0100)]

Add github protocol pulling and http:// (#11465)

As pulling protocols to llama-run

Signed-off-by: Eric Curtin <redacted>

commit | commitdiff | tree

Nuno [Tue, 28 Jan 2025 14:17:25 +0000 (15:17 +0100)]

docker: allow installing pip packages system-wide (#11437)

Signed-off-by: rare-magma <redacted>

commit | commitdiff | tree

someone13574 [Tue, 28 Jan 2025 14:15:34 +0000 (09:15 -0500)]

cmake : don't fail on `GGML_CPU=OFF` (#11457)

commit | commitdiff | tree

Nuno [Tue, 28 Jan 2025 10:42:32 +0000 (11:42 +0100)]

docker: add perplexity and bench commands to full image (#11438)

Signed-off-by: rare-magma <redacted>

commit | commitdiff | tree

Akarshan Biswas [Tue, 28 Jan 2025 09:56:58 +0000 (15:26 +0530)]

SYCL : SOFTMAX F16 mask support and other fixes (#11261)

Implemented ggml_sycl_op_soft_max() F16 src1(mask) support for which a pragma deprecation warning was added during #5021.
To do this, had to decouple it from ggml_sycl_op_flatten which always considered src1 to be of fp32 type(many OP functions are dependent on it).

* SYCL: SOFTMAX F16 mask support and other fixes

* test-backend-ops: Add F16 mask test cases

commit | commitdiff | tree

Michael Engel [Tue, 28 Jan 2025 08:32:40 +0000 (09:32 +0100)]

Handle missing model in CLI parameters for llama-run (#11399)

The HTTP client in llama-run only prints an error in case the download of
a resource failed. If the model name in the CLI parameter list is missing,
this causes the application to crash.
In order to prevent this, a check for the required model parameter has been
added and errors for resource downloads get propagated to the caller.

Signed-off-by: Michael Engel <redacted>

commit | commitdiff | tree

Eric Curtin [Mon, 27 Jan 2025 18:36:10 +0000 (19:36 +0100)]

Add new hf protocol for ollama (#11449)

https://huggingface.co/docs/hub/en/ollama

Signed-off-by: Eric Curtin <redacted>

commit | commitdiff | tree

Haus1 [Mon, 27 Jan 2025 13:58:17 +0000 (08:58 -0500)]

AMD: parse the architecture as supplied by gcnArchName (#11244)

The value provided by minor doesn't include stepping for AMD, parse the value returned by gcnArchName instead to retrieve an accurate ID.

commit | commitdiff | tree

lexasub [Mon, 27 Jan 2025 13:42:09 +0000 (17:42 +0400)]

llama : minor fixes for up llama load model speed (#11448)

* impl::load change map bpe_ranks to onordered map for reduce time of impl::load on 30%

* llama_model_loader::init_mapping - replace new llama_mmap to std::make_unique<llama_mmap> for clean code & reduce (/2) time of running init_mappings

* Update src/llama-vocab.cpp

---------

Co-authored-by: lexasub <redacted>
Co-authored-by: Diego Devesa <redacted>

commit | commitdiff | tree

Johannes Gäßler [Mon, 27 Jan 2025 11:07:12 +0000 (12:07 +0100)]

llama: refactor llama_decode_impl (#11381)

commit | commitdiff | tree

Ihar Hrachyshka [Mon, 27 Jan 2025 07:41:59 +0000 (02:41 -0500)]

metal: Handle null returned from MTLCreateSystemDefaultDevice() (#11441)

This fixes segmentation fault error when running tests when no metal
devices are available (for example, when not linked with Core Graphics
framework or otherwise).

commit | commitdiff | tree

Xuan Son Nguyen [Sun, 26 Jan 2025 21:45:32 +0000 (22:45 +0100)]

docker : fix ARM build and Vulkan build (#11434)

* ci : do not fail-fast for docker

* build arm64/amd64 separatedly

* fix pip

* no fast fail

* vulkan: try jammy

commit | commitdiff | tree

Georgi Gerganov [Sun, 26 Jan 2025 18:06:16 +0000 (20:06 +0200)]

metal : use residency sets (#11427)

* metal : use residency sets

ggml-ci

* metal : restore commandBufferWithUnretainedReferences calls [no ci]

* metal : release descriptors

ggml-ci

* metal : check env GGML_METAL_NO_RESIDENCY

ggml-ci

* metal : fix build + clean-up

ggml-ci

commit | commitdiff | tree

Nuno [Sun, 26 Jan 2025 17:22:43 +0000 (18:22 +0100)]

docker: add missing vulkan library to base layer and update to 24.04 (#11422)

Signed-off-by: rare-magma <redacted>

commit | commitdiff | tree

bandoti [Sun, 26 Jan 2025 16:07:48 +0000 (12:07 -0400)]

cmake: add ggml find package (#11369)

* Add initial ggml cmake package

* Add build numbers to ggml find-package

* Expand variables with GGML_ prefix

* Guard against adding to cache variable twice

* Add git to msys2 workflow

* Handle ggml-cpu-* variants

* Link ggml/ggml-base libraries to their targets

* Replace main-cmake-pkg with simple-cmake-pkg

* Interface features require c_std_90

* Fix typo

* Removed unnecessary bracket from status message

* Update examples/simple-cmake-pkg/README.md

Co-authored-by: Georgi Gerganov <redacted>
* Update examples/simple-cmake-pkg/README.md

Co-authored-by: Georgi Gerganov <redacted>
---------

Co-authored-by: Georgi Gerganov <redacted>

commit | commitdiff | tree

Frank Mai [Sun, 26 Jan 2025 15:20:34 +0000 (23:20 +0800)]

rpc: fix register position (#11424)

Signed-off-by: thxCode <redacted>

commit | commitdiff | tree

Georgi Gerganov [Sun, 26 Jan 2025 12:30:15 +0000 (14:30 +0200)]

readme : update hot topics

commit | commitdiff | tree

Jeff Bolz [Sun, 26 Jan 2025 02:10:03 +0000 (20:10 -0600)]

build: apply MSVC /bigobj option to c/cpp files only (#11423)

commit | commitdiff | tree

Jeff Bolz [Sat, 25 Jan 2025 21:29:57 +0000 (15:29 -0600)]

vulkan: compile shaders on-demand (#11406)

Reduce first-run startup time and memory consumption.

Should fix #11339.

commit | commitdiff | tree

uvos [Sat, 25 Jan 2025 20:01:12 +0000 (21:01 +0100)]

Hip: disable VMM on hip as it seams that it dosent work in some configurations (#11420)

commit | commitdiff | tree

Jeff Bolz [Sat, 25 Jan 2025 17:26:37 +0000 (11:26 -0600)]

build: add /bigobj to MSVC build (#11407)

commit | commitdiff | tree

Diego Devesa [Sat, 25 Jan 2025 16:22:41 +0000 (17:22 +0100)]

docker : add GGML_CPU_ARM_ARCH arg to select ARM architecture to build for (#11419)

commit | commitdiff | tree

Xuan Son Nguyen [Sat, 25 Jan 2025 15:36:44 +0000 (16:36 +0100)]

server : fix cleaning up stream task (#11418)

* server : fix cleaning up stream task

* one more spot

commit | commitdiff | tree

Diego Devesa [Sat, 25 Jan 2025 14:22:29 +0000 (15:22 +0100)]

docker : fix CPU ARM build (#11403)

* docker : fix CPU ARM build

* add CURL to other builds

commit | commitdiff | tree

Georgi Gerganov [Sat, 25 Jan 2025 11:36:48 +0000 (13:36 +0200)]

ci : fix line breaks on windows builds (#11409)

* ci : fix line breaks on windows builds

* cont : another try

* ci : fix powershell line breaks

commit | commitdiff | tree

jiahao su [Fri, 24 Jan 2025 23:26:01 +0000 (07:26 +0800)]

CANN: Add Ascend CANN build ci (#10217)

* CANN: Add Ascend CANN build ci

* Update build.yml

* Modify cann image version

* Update build.yml

* Change to run on x86 system

* Update build.yml

* Update build.yml

* Modify format error

* Update build.yml

* Add 'Ascend NPU' label restrictions

* Exclude non PR event

Co-authored-by: Yuanhao Ji <redacted>
* Update build.yml

---------

Co-authored-by: Yuanhao Ji <redacted>

commit | commitdiff | tree

uvos [Fri, 24 Jan 2025 23:02:23 +0000 (00:02 +0100)]

hip : Add hipGraph and VMM support to ROCM (#11362)

* Add hipGraph support

* Enable VMM on rocm

commit | commitdiff | tree

Johannes Gäßler [Fri, 24 Jan 2025 20:02:43 +0000 (21:02 +0100)]

CUDA: fix FP16 cuBLAS GEMM (#11396)

commit | commitdiff | tree

uvos [Fri, 24 Jan 2025 16:50:49 +0000 (17:50 +0100)]

rocBLAS: Avoid fp32->fp16->fp32 conversion on cdna (#11356)

commit | commitdiff | tree

Georgi Gerganov [Fri, 24 Jan 2025 16:41:30 +0000 (18:41 +0200)]

release : pack /lib in the packages (#11392)

* release : pack /lib and /include in the packages

* cmake : put libs in /bin

* TMP : push artifacts

* Revert "TMP : push artifacts"

This reverts commit 4decf2c4dfc5cdf5d96ea44c03c8f9801ab41262.

* ci : fix HIP cmake compiler options to be on first line

* ci : restore the original HIP commands

* ci : change ubuntu build from latest to 20.04

* ci : try to fix macos build rpaths

* ci : remove obsolete MacOS build

* TMP : push artifacts

* ci : change back to ubuntu latest

* ci : macos set build rpath to "@loader_path"

* ci : fix typo

* ci : change ubuntu package to 22.04

* Revert "TMP : push artifacts"

This reverts commit 537b09e70ffc604c414ee78acf3acb4c940ec597.

commit | commitdiff | tree

Jafar Uruç [Fri, 24 Jan 2025 13:30:13 +0000 (13:30 +0000)]

docs : Update readme to build targets for local docker build (#11368)

commit | commitdiff | tree

Johannes Gäßler [Fri, 24 Jan 2025 11:38:31 +0000 (12:38 +0100)]

CPU/CUDA: fix (GQA) mul mat back, add CUDA support (#11380)

commit | commitdiff | tree

Bernhard M. Wiedemann [Fri, 24 Jan 2025 11:21:35 +0000 (12:21 +0100)]

cmake : avoid -march=native when reproducible build is wanted (#11366)

See https://reproducible-builds.org/ for why this is good
and https://reproducible-builds.org/specs/source-date-epoch/
for the definition of this variable.

Without this patch, compiling on different machines produced different binaries, which made verification of results difficult.

Fixes: #11317
This patch was done while working on reproducible builds for openSUSE.

commit | commitdiff | tree

Eric Curtin [Fri, 24 Jan 2025 09:39:24 +0000 (09:39 +0000)]

Update llama-run README.md (#11386)

For consistency

Signed-off-by: Eric Curtin <redacted>

commit | commitdiff | tree

stduhpf [Fri, 24 Jan 2025 08:02:38 +0000 (09:02 +0100)]

server : (webui) put DeepSeek R1 CoT in a collapsible <details> element (#11364)

* webui : put DeepSeek R1 CoT in a collapsible <details> element

* webui: refactor split

* webui: don't use regex to split cot and response

* webui: format+qol

* webui: no loading icon if the model isn't generating

* ui fix, add configs

* add jsdoc types

* only filter </think> for assistant msg

* build

* update build

---------

Co-authored-by: Xuan Son Nguyen <redacted>

Packaging of ggml-org/llama.cpp

RSS Atom